Sie sind auf Seite 1von 25

Training

Manual

Broadband Filtering
2
Contents
Overview of Web Filtering on Worcestershire Broadband .................................................................4
SmartFilter configuration at County Hall ............................................................................................5
Changing Smartfilter Categorisation...............................................................................................5
Smartfilter Configuration .................................................................................................................7
Policy groups ..................................................................................................................................7
Global restrictions ...........................................................................................................................7
Site exemptions ..............................................................................................................................7
Search engine keyword checking...................................................................................................7
Exercises ........................................................................................................................................8
Squid Access Controls .......................................................................................................................9
How Access Controls work ...........................................................................................................11
Basic Source and Destination ACLS .................................................................................12
Regular Expression ACLs..................................................................................................14
Other ACL Types ...............................................................................................................16
An Example ..................................................................................................................................19
Squid Proxy Exercises..................................................................................................................22
How the use of Squid and SmartFilter is enforced...........................................................................23
Using Log Files.................................................................................................................................24
Calamaris......................................................................................................................................24
Exercise ........................................................................................................................................25

3
Overview of Web Filtering on Worcestershire Broadband
Decisions over filtering take place in three locations:

1. SecureComputing, the company that supplies SmartFilter, the County’s main web filtering
product.
2. County Hall – where SmartFilter is configured and adjusted according to the requirements
of Worcestershire schools.
3. Your schools – where sensible configuration of the Squid proxy server can considerably
enhance the effectiveness of filtering, and allow you to manage users’ access to websites in
a variety of ways.

The operation of filtering and management may be visualised with the help of the following
diagram:

Secure Computing:
- Control List

Control list
downloaded
twice-weekly
County Hall -
broadband team:
- config.txt
- search.txt
- site.txt

Request re-
Request categorisation of a
exemptions or site on the
additions to SecureComputing
site.txt website.
School
- Squid ACLs

Configure Squid
to restrict access

4
SmartFilter configuration at County Hall
The SmartFilter software runs on County Hall’s Web cache servers. It uses a list of “allowed” or
“denied” rules to determine which sites can be accessed by users. These rules relate to the
categories which SecureComputing have applied to each site, and are as follows:

Category Permission
1 Anonymizers/Translators (an) denied
2 Art and Culture (ac) allowed
3 Chat (ch denied
4 Criminal Skills (cs) denied
5 Cults/Occult (oc) denied
6 Dating (mm) denied
7 Drugs (dr) denied
8 Entertainment (et) allowed
9 Extreme/Obscene/Violence (ex) denied
10 Gambling (gb) denied
11 Games (gm) denied
12 General News (nw) allowed
13 Hate Speech (hs) denied
14 Humor (hm) allowed
15 Investing (in) allowed
16 Job Search (js) allowed
17 Lifestyle (lf) denied
18 Mature (mt) denied
19 MP3 Sites (mp) denied
20 Nudity (nd) allowed
21 On-line Sales (os) allowed
22 Personal Pages (pp) allowed
23 Politics, Opinion, and Religion (po) allowed
24 Portal Sites (ps) allowed
25 Self-Help/Health (sh) allowed
26 Sex (sx) denied
27 Sports (sp) allowed
28 Travel (tr) allowed
29 Usenet News (na) denied
30 Webmail (wm) allowed

Changing Smartfilter Categorisation


In some cases you may find that sites you would expect to be banned under these rules are not.
This is likely to be because either the site is “incorrectly” categorised, or it is not categorised at all.
The first thing you should do when you encounter a site that doesn’t appear to be categorised the
way you would expect is to check the SecureComputing Website.

You can view the categorisation of any website by visiting the SecureComputing
“SmartFilterWhere” page at the following URL:

http://www.securecomputing.com/cgi-bin/filter_whereV3.cgi

5
The page looks like this:

Enter URLs
here.

Use one or more of the URL fields to request categorisation information. A typical result might look
like this:

Use the “Suggest a


Change” drop down
box where you
believe the site is
incorrectly
categorised.

6
Smartfilter Configuration

At the installation of SmartFilter at County Hall provides three main configuration files that control
the operation of the filter – config.txt, site.txt and search.txt.

Policy groups

The config.txt file at County Hall allows the creation of “policy groups”. Each of these groups can
have separate filtering regimes, according to the 30 categories.

Existing groups are:


• Schools
• Libraries

It is possible that these could be extended in future, for example, to take account of the needs of
different types of school. The allow/deny scheme for schools has been listed above. In the case
of libraries (at the time of writing), only the Sex and Gambling categories are denied.

The policy groups are also responsible for defining any denied file types. Currently, school users
accessing the Web via the proxy are not permitted to download .exe or .zip files.

Global restrictions

The config.txt file also decides on a number of global restrictions, the most important of which is
probably access to sites via IP address. This is currently permitted, as library users must have
access to Hotmail accounts, which make extensive use of IP address links.

Site exemptions

The site.txt file on the County Hall cache servers allows us to add sites that have not been
categorised by SmartFilter, or to override the control list. For example, the site:

http://www.hardcorevideos.org

is included in the list to allow it to be categorised as Sex.

On the other hand,

http://www.thinkquest.org

is exempted in the same file, as it would otherwise be banned in schools under the Chat category.

Search engine keyword checking

The search.txt file allows checking to take place on users’ entries into certain search engines. The
list of search engines currently in force is as follows:

*google.com
*google.de
*google.fr
*google.co.uk
*google.it
*google.ca
7
*google.co.jp
*google.co.kr
*go.com
*infoseek.com
*altavista.digital.com
*altavista.com
*altavista.senet.com.au
*altavistacanada.com
*altavista.magallanes.net
*altavista.skali.com.my
*altavista.yellowpages.com.au
*austronaut.ims.at
*lycos.com
*yahoo.com
*yahoo.dk
*yahoo.fr
*yahoo.de
*yahoo.it
*yahoo.no
*yahoo.es
*yahoo.se
*yahoo.com.au
*yahoo.co.uk
*yahoo.co.jp
*yahoo.co.kr
*yahoo.com.sg
*excite.com
*excite.de
*excite.co.jp
*excite.co.uk
*mckinley.com
*webcrawler.com
*hotbot.com
*dejanews.com
*nlightn.com
*snap.com
*whoizzy.com

Entries into any of these search engines are matched against a list of proscribed keywords.
Positive matches will result in an “Access Denied by SmartFilter” message, with an indication as to
the category.

Exercises
1. Use SmartFilterWhere to check the categorisation of two or three websites you know. If
you disagree with the categorisation, request a suitable change.
2. Change the proxy settings on the browser to allow .exe files to be downloaded. Confirm
that your changes are working. Change the setting back afterwards. See the section
“How the use of Squid and SmartFilter is enforced” on page 23 for further information.
3. Check the operating of the keyword checking on various search engines using the word
“hardcore” or “cocaine”.

8
Squid Access Controls
Squid provides a system of access controls to enable you to decide who gets access to what, and
when. You can manage squid access control lists (ACLs) using Webmin. To log into Webmin, go
to https://10.<IPID>.1.1:10000 where <IPID> is your school’s IP identifier number. At the Finstall
Centre, for example, we would type in:

https://10.11.1.1:10000 or https://finstall.networcs.net:10000

Choose the Servers Tab:

Then Squid Proxy Server:

9
Then Access Control:

Always be sure to
The Access Control screen looks like this: use the “Apply
Changes” link to
ensure that your
ACLs are correctly
enforced.

This is the list of


access control lists.
The most important
are the blacklist and
whitelist, but you can
also add your own for
special purposes

The proxy restrictions


list is an ordered list
that defines how
ACLs are applied.
Access checking
starts from the top.

10
How Access Controls work

There are two main parts to the Access Control page. On the left are Access Control lists (ACLs).
These are named definitions. For example, “Blacklist” is the name that has been given to a list of
Web Server Hostnames, and this list can be edited to suit your purposes. As such, the Blacklist
ACL is just a definition and does not determine any particular action on the part of the proxy server.

On the right is a list of “Proxy restrictions”. This is essentially an ordered list of ACLs, with each
one allowing or denying traffic according to the content of the ACL. It is important to recognise the
following facts:
• ACLs do not have any effect on the operation of the proxy server until they are incorporated
into the proxy restrictions list.
• The position in the list is critical, as processing starts at the top of the list and proceeds
down the list until a matching rule is found. This means that if you place your rule under the
“Allow all” rule, it will have no effect.

In addition to editing the blacklist, there is a range of other types of ACL that can be created. Click
on the drop-down list by the “Create new ACL” to see a list of the types available to you.

These different types


of ACL are available;
the most commonly
used is “Web Server
Hostname” – as in
the blacklist.

Before we go any further, some explanation of these ACL types might be helpful. The following
definitions were taken from the Squid website (which will explain the occasionally quirky use of
English), to which we have added our own notes, with examples from the NGfL server. Some of
the more exotic ACLs have been omitted, as you are unlikely to need to use them (if necessary,
please refer to the documentation at www.squid-cache.org). Each one corresponds with one of
the types listed in the drop-down box shown above (shown in red below). The syntax relates to the
configuration file squid.conf (the configuration file used by Squid to record all preferences).

11
Basic Source and Destination ACLS

Acl types:
Src This will look for the client IPAddress.
Client Address
Usage acl aclname src ip-address/netmask.
Example 1. This refers to the whole Network with address 172.16.1.0
acl aclname src 172.16.1.0/24
2. This refers specific single IPAddress
acl aclname src 172.16.1.25/32
3. This refers range of IPAddress from 172.16.1.25 to 172.16.1.35
acl aclname src 72.16.1.25/255.255.255.255-
172.16.1.35/255.255.255.255

Note You will need to take care with the netmask, to ensure that you are
addressing the correct group of machines. To be sure that you are
using the correct netmask, you might want to acquire a subnet
calculator, such as that found on Cisco’s free “ConfigMaker” software.
If you wish to address a single machine, use the subnet mask
255.255.255.255.

Dst This is same as src with the only difference that it refers the Server IP
Webserver address address. First Squid will carry out a dns-lookup for the IPAddress of
the domain-name which is in request header. Then this acl is
interpreted.

12
Srcdomain This can be used to control requests from other domains – perhaps to
Client Hostname prevent other schools from using your squid proxy!
Since squid needs to reverse dns lookup (from client ip-address to
client domain-name) before this acl is interpreted, it can cause
processing delays. This lookup adds some delay to the request.
Usage acl aclname srcdomain domain-name
Example acl aclname srcdomain .kovaiteam.com
Note Here “.” is important (see the note on Webserver Hostname below).

Dstdomain This is the ACL type used by the blacklist to control access to sites
Webserver Hostname that have not been filtered by SmartFilter.
Usage acl aclname dstdomain domain-name
Example acl aclname dstdomain .kovaiteam.com
Hence this look for *.kovaiteam.com from URL
Note Here “.” is important.

Note that you can force the proxy to match multiple servers on the
same domain by prefixing it with a dot and omitting the hostname. For
example,

.bbc.co.uk

would match both of these:

www.bbc.co.uk
news.bbc.co.uk

Also, do not enter the protocol part of the URL into this list – in other
words this is correct:

www.ibsed.networcs.net

but not:

http://www.ibsed.networcs.net

13
Regular Expression ACLs

“Regex”-type ACLs use pattern matching to control access. Typically, you will only need to enter a
single word to match against a domain name to achieve a result.

srcdom_regex This could be used to control accesses from stations with names
Client Regexp containing a particular character string, although you are probably
more likely to want to achieve this using IP addresses.
Since squid needs to reverse dns lookup (from client ip-address to
client domain-name) before this acl is interpreted, it can cause
processing delays. This lookup adds some delay to the request
Usage acl aclname srcdom_regex pattern
Example acl aclname srcdom_regex kovai
Hence this look for the word “kovai” from the client domain name
Note This type of ACL may introduce delays into the display of pages.

dstdom_regex This can be used to find website requests containing a specific


Webserver Regexp character string.
Usage acl aclname dstdom_regex pattern
Example acl aclname srcdom_regex kovai
Hence this will look for the word “kovai” from the destination domain
name

The above example will find www.cannabis.com.

14
url_regex The url_regex means to search the entire URL for the regular
URL Regexp expression you specify. Note that these regular expressions are case-
sensitive
Usage acl aclname url_regex pattern
Example acl ACLREG url_regex cooking
ACLREG refers to the url containing ``cooking'' not “Cooking”

This example will find the word “radio” anywhere in the URL; it will
find:
www.radio.com
www.bbc.co.uk/radio3
etc.

urlpath_regex The urpath_regex regular expression pattern matching from URL but
URL Path Regexp without protocol and hostname. Note that these regular expressions
are case-sensitive
Usage acl aclname urlpath_regex pattern
Example acl ACLPATHREG urlpath_regex cooking
ACLPATHREG refers only containing “cooking” not “Cooking” and
without referring protocol and hostname.
If URL is http://www.visolve.com/folder/subdir/cooking/first.html then
this acltype only looks after http://www.visolve.com/ .

This example will find


www.bbc.co.uk/news
but not:
www.news.com

15
Browser Regular expression pattern matching on the request's user-agent
Browser Regexp header
Usage acl aclname browser pattern
Example acl aclname browser MOZILLA
This refers to the requests, which are coming from the browsers who
have “MOZILLA” keyword in the user-agent header

This example could be used to prevent users from using the Mozilla
browser.

Other ACL Types

Time Time of day, and day of week


Data & Time
Usage acl aclname time [day-abbrevs] [h1:m1-h2:m2]
day-abbrevs:
S - Sunday
M - Monday
T - Tuesday
W - Wednesday
H - Thursday
F - Friday
A - Saturday
h1:m1 must be less than h2:m2
Example acl ACLTIME time M 9:00-17:00
ACLTIME refers day of Monday from 9:00 to 17:00.

This example will find accesses occurring during weekday lunchtimes.

16
Port Access can be controlled by destination (server) port address
URL Port
Usage acl aclname port port-no
Example This example allows http_access only to the destination
172.16.1.115:80 from network 172.16.1.0
acl acceleratedhost dst 172.16.1.115/255.255.255.255
acl acceleratedport port 80
acl mynet src 172.16.1.0/255.255.255.0
http_access allow acceleratedhost acceleratedport
mynet
http_access deny all

This example is used in the default NGfL server Squid proxy setup to
identify connections over secure sockets.

Proto This specifies the transfer protocol


URL Protocol
Usage acl aclname proto protocol
Example acl aclname proto HTTP FTP
This refers protocols HTTP and FTP

The above example is in the default NGfL server Squid proxy setup.

17
Method This specifies the type of the method of the request
Request method
Usage acl aclname method method-type
Example acl aclname method GET POST
This refers get and post methods only

This is the only example of this ACL in the default Squid proxy setup.
You are probably unlikely to need to use this ACL.

18
An Example

Let’s put together an ACL to prevent access to the Internet from a particular machine (or group of
machines) at a particular time of day. In this case, the time of day will be lunchtime every
weekday.

To start with, we create a Date and Time ACL:

Choose Date and


Time from the drop-
down list, then click
on “Create new
ACL”.

Now define the days and times of day you want your ACL to refer to:

Click on the
“Selected” radio
button, then
select the days
you want the list
to apply to.

Click on the
radio button to
the left of the
first time box,
then enter the
start and end
times.

After you have clicked on the “Save” button, your list will look something like this:

19
The new ACL in
place.

Next, we will create an ACL to define the station (or group of stations) to which we want the ACL to
apply. Note that this step would unnecessary if we wanted the rule to apply to all stations on the
network.

Create a new “Client Address” ACL. To do this, you must give the first address in the range, plus a
subnet mask. If there is only to be one machine in the range (as in this example), you can leave
the “To IP” box empty. If you want to create a list of several ad-hoc addresses, this can be done
by saving single addresses one by one.

Finally, you need to join these two ACLs together in a single proxy restriction, and move it into the
appropriate place in the list to create the desired effect. Click on the “Add proxy restriction” link at
the bottom of the Proxy restrictions list.
20
Select both the “lunchtimes” list and the station56 list, and ensure that the “Deny” radio button is
selected, then click on Save.

Both of the new


ACLs have been
selected – the rule
will be looking for
requests from
“station56”
happening during
“lunchtimes”.

Now this list must be moved into position, using the arrows on the right, as shown below:

The new list is


positioned above
“Allow all” to
ensure that it has
the desired effect.

21
Squid Proxy Exercises

1 Set up a new blacklist, and test it to ensure that it works correctly


2 Set up a whitelist and test it to ensure that it works correctly
3 Set up a list to filter out sites with a specific word in the domain name
4 Set up a list to filter out sites with a specific word in the path
5 Set up a list to stop a group of computers in the room from accessing the Internet
6 Configure Squid to prevent any web access between set hours
7 Create a whitelist that only applies to a group of workstations, and make three websites
available to users of those stations. Can you make this list apply on Monday mornings
only?
8 Implement failure URL.

22
How the use of Squid and SmartFilter is enforced
SmartFilter and Squid proxy Web access restrictions are only in force if the browser is using the
local NGfL server as a proxy. In Internet Explorer 6, the settings can be found in Tools – Internet
Options – Connections – LAN Settings. The setting must refer to port 3128 as follows:

If this setting is not present, the user will have unfiltered access to the Web, unless firewall
restrictions on the NGfL server have been implemented to prevent this. On many school networks,
the administrator has the ability to enforce proxy settings, so firewall settings of this type are not
required.

23
Using Log Files
Your NGfL server keeps a continuous log of Website accesses. A typical portion of the log might
look something like this:
1021296336.929 3857 10.11.1.56 TCP_MISS/200 3807 GET http://services.postcodeanywhere.co.uk/form.asp? -
ROUNDROBIN_PARENT/cache1.networcs.net text/html
1021296337.032 103 10.11.1.56 TCP_MISS/304 230 GET http://www.worcestershire.gov.uk/home/jo_house_top.jpg -
ROUNDROBIN_PARENT/cache2.networcs.net text/html
1021296337.108 102 10.11.1.56 TCP_MISS/304 230 GET http://www.worcestershire.gov.uk/home/jo_boat.jpg -
ROUNDROBIN_PARENT/cache2.networcs.net text/html
1021296337.166 134 10.11.1.56 TCP_CLIENT_REFRESH_MISS/304 230 GET http://www.worcestershire.gov.uk/home/content_tag.gif -
ROUNDROBIN_PARENT/cache2.networcs.net text/html
1021296337.285 83 10.11.1.56 TCP_MISS/304 230 GET http://www.worcestershire.gov.uk/home/jo_house_bottom.jpg -
ROUNDROBIN_PARENT/cache2.networcs.net text/html
1021296338.580 1605 10.11.1.56 TCP_MISS/304 230 GET http://www.worcestershire.gov.uk/home/jo_middle_text.gif -
ROUNDROBIN_PARENT/cache1.networcs.net text/html
1021296338.641 1632 10.11.1.56 TCP_MISS/304 230 GET http://www.worcestershire.gov.uk/home/jo_house_middle.jpg -
ROUNDROBIN_PARENT/cache1.networcs.net text/html
1021296338.643 1476 10.11.1.56 TCP_CLIENT_REFRESH_MISS/304 230 GET http://www.worcestershire.gov.uk/home/spacer_line.gif -

Calamaris

As such, it is of little use. However, you are provided with a log analysis tool known as Calimaris
(website at http://calamaris.cord.de/ ). This provides the following:

• Summary
• Incoming requests by method
• Incoming UDP-requests by status
• Incoming TCP-requests by status
• Outgoing requests by status
• Outgoing requests by destination
• Request-destinations by 2ndlevel-domain
• Request-destinations by toplevel-domain
• TCP-Request-protocol
• Requested content-type
• Requested extensions
• Incoming UDP-requests by host
• Incoming TCP-requests by host
• Performance in 60 minute steps

Here is an example of the 2nd level domain report:

24
To gain access to Calamaris, choose the “Servers” tab in Webmin, then choose “Squid Proxy”, and
finally “Calamaris Log Analysis”:

This should give you a good idea of how your proxy is being used, and the type of sites that are
being most frequently accessed. However, should you wish to have direct access to the log files
themselves, this can be arranged on a weekly basis (beware – a compressed archive from a
week’s log file can easily be 10MB…). Please contact the Broadband Support Team if you wish to
pursue this.

Exercise

Browse the Calamaris Log Analysis page for your school and determine the most popular sites.

25

Das könnte Ihnen auch gefallen