Beruflich Dokumente
Kultur Dokumente
Manual
Broadband Filtering
2
Contents
Overview of Web Filtering on Worcestershire Broadband .................................................................4
SmartFilter configuration at County Hall ............................................................................................5
Changing Smartfilter Categorisation...............................................................................................5
Smartfilter Configuration .................................................................................................................7
Policy groups ..................................................................................................................................7
Global restrictions ...........................................................................................................................7
Site exemptions ..............................................................................................................................7
Search engine keyword checking...................................................................................................7
Exercises ........................................................................................................................................8
Squid Access Controls .......................................................................................................................9
How Access Controls work ...........................................................................................................11
Basic Source and Destination ACLS .................................................................................12
Regular Expression ACLs..................................................................................................14
Other ACL Types ...............................................................................................................16
An Example ..................................................................................................................................19
Squid Proxy Exercises..................................................................................................................22
How the use of Squid and SmartFilter is enforced...........................................................................23
Using Log Files.................................................................................................................................24
Calamaris......................................................................................................................................24
Exercise ........................................................................................................................................25
3
Overview of Web Filtering on Worcestershire Broadband
Decisions over filtering take place in three locations:
1. SecureComputing, the company that supplies SmartFilter, the County’s main web filtering
product.
2. County Hall – where SmartFilter is configured and adjusted according to the requirements
of Worcestershire schools.
3. Your schools – where sensible configuration of the Squid proxy server can considerably
enhance the effectiveness of filtering, and allow you to manage users’ access to websites in
a variety of ways.
The operation of filtering and management may be visualised with the help of the following
diagram:
Secure Computing:
- Control List
Control list
downloaded
twice-weekly
County Hall -
broadband team:
- config.txt
- search.txt
- site.txt
Request re-
Request categorisation of a
exemptions or site on the
additions to SecureComputing
site.txt website.
School
- Squid ACLs
Configure Squid
to restrict access
4
SmartFilter configuration at County Hall
The SmartFilter software runs on County Hall’s Web cache servers. It uses a list of “allowed” or
“denied” rules to determine which sites can be accessed by users. These rules relate to the
categories which SecureComputing have applied to each site, and are as follows:
Category Permission
1 Anonymizers/Translators (an) denied
2 Art and Culture (ac) allowed
3 Chat (ch denied
4 Criminal Skills (cs) denied
5 Cults/Occult (oc) denied
6 Dating (mm) denied
7 Drugs (dr) denied
8 Entertainment (et) allowed
9 Extreme/Obscene/Violence (ex) denied
10 Gambling (gb) denied
11 Games (gm) denied
12 General News (nw) allowed
13 Hate Speech (hs) denied
14 Humor (hm) allowed
15 Investing (in) allowed
16 Job Search (js) allowed
17 Lifestyle (lf) denied
18 Mature (mt) denied
19 MP3 Sites (mp) denied
20 Nudity (nd) allowed
21 On-line Sales (os) allowed
22 Personal Pages (pp) allowed
23 Politics, Opinion, and Religion (po) allowed
24 Portal Sites (ps) allowed
25 Self-Help/Health (sh) allowed
26 Sex (sx) denied
27 Sports (sp) allowed
28 Travel (tr) allowed
29 Usenet News (na) denied
30 Webmail (wm) allowed
You can view the categorisation of any website by visiting the SecureComputing
“SmartFilterWhere” page at the following URL:
http://www.securecomputing.com/cgi-bin/filter_whereV3.cgi
5
The page looks like this:
Enter URLs
here.
Use one or more of the URL fields to request categorisation information. A typical result might look
like this:
6
Smartfilter Configuration
At the installation of SmartFilter at County Hall provides three main configuration files that control
the operation of the filter – config.txt, site.txt and search.txt.
Policy groups
The config.txt file at County Hall allows the creation of “policy groups”. Each of these groups can
have separate filtering regimes, according to the 30 categories.
It is possible that these could be extended in future, for example, to take account of the needs of
different types of school. The allow/deny scheme for schools has been listed above. In the case
of libraries (at the time of writing), only the Sex and Gambling categories are denied.
The policy groups are also responsible for defining any denied file types. Currently, school users
accessing the Web via the proxy are not permitted to download .exe or .zip files.
Global restrictions
The config.txt file also decides on a number of global restrictions, the most important of which is
probably access to sites via IP address. This is currently permitted, as library users must have
access to Hotmail accounts, which make extensive use of IP address links.
Site exemptions
The site.txt file on the County Hall cache servers allows us to add sites that have not been
categorised by SmartFilter, or to override the control list. For example, the site:
http://www.hardcorevideos.org
http://www.thinkquest.org
is exempted in the same file, as it would otherwise be banned in schools under the Chat category.
The search.txt file allows checking to take place on users’ entries into certain search engines. The
list of search engines currently in force is as follows:
*google.com
*google.de
*google.fr
*google.co.uk
*google.it
*google.ca
7
*google.co.jp
*google.co.kr
*go.com
*infoseek.com
*altavista.digital.com
*altavista.com
*altavista.senet.com.au
*altavistacanada.com
*altavista.magallanes.net
*altavista.skali.com.my
*altavista.yellowpages.com.au
*austronaut.ims.at
*lycos.com
*yahoo.com
*yahoo.dk
*yahoo.fr
*yahoo.de
*yahoo.it
*yahoo.no
*yahoo.es
*yahoo.se
*yahoo.com.au
*yahoo.co.uk
*yahoo.co.jp
*yahoo.co.kr
*yahoo.com.sg
*excite.com
*excite.de
*excite.co.jp
*excite.co.uk
*mckinley.com
*webcrawler.com
*hotbot.com
*dejanews.com
*nlightn.com
*snap.com
*whoizzy.com
Entries into any of these search engines are matched against a list of proscribed keywords.
Positive matches will result in an “Access Denied by SmartFilter” message, with an indication as to
the category.
Exercises
1. Use SmartFilterWhere to check the categorisation of two or three websites you know. If
you disagree with the categorisation, request a suitable change.
2. Change the proxy settings on the browser to allow .exe files to be downloaded. Confirm
that your changes are working. Change the setting back afterwards. See the section
“How the use of Squid and SmartFilter is enforced” on page 23 for further information.
3. Check the operating of the keyword checking on various search engines using the word
“hardcore” or “cocaine”.
8
Squid Access Controls
Squid provides a system of access controls to enable you to decide who gets access to what, and
when. You can manage squid access control lists (ACLs) using Webmin. To log into Webmin, go
to https://10.<IPID>.1.1:10000 where <IPID> is your school’s IP identifier number. At the Finstall
Centre, for example, we would type in:
https://10.11.1.1:10000 or https://finstall.networcs.net:10000
9
Then Access Control:
Always be sure to
The Access Control screen looks like this: use the “Apply
Changes” link to
ensure that your
ACLs are correctly
enforced.
10
How Access Controls work
There are two main parts to the Access Control page. On the left are Access Control lists (ACLs).
These are named definitions. For example, “Blacklist” is the name that has been given to a list of
Web Server Hostnames, and this list can be edited to suit your purposes. As such, the Blacklist
ACL is just a definition and does not determine any particular action on the part of the proxy server.
On the right is a list of “Proxy restrictions”. This is essentially an ordered list of ACLs, with each
one allowing or denying traffic according to the content of the ACL. It is important to recognise the
following facts:
• ACLs do not have any effect on the operation of the proxy server until they are incorporated
into the proxy restrictions list.
• The position in the list is critical, as processing starts at the top of the list and proceeds
down the list until a matching rule is found. This means that if you place your rule under the
“Allow all” rule, it will have no effect.
In addition to editing the blacklist, there is a range of other types of ACL that can be created. Click
on the drop-down list by the “Create new ACL” to see a list of the types available to you.
Before we go any further, some explanation of these ACL types might be helpful. The following
definitions were taken from the Squid website (which will explain the occasionally quirky use of
English), to which we have added our own notes, with examples from the NGfL server. Some of
the more exotic ACLs have been omitted, as you are unlikely to need to use them (if necessary,
please refer to the documentation at www.squid-cache.org). Each one corresponds with one of
the types listed in the drop-down box shown above (shown in red below). The syntax relates to the
configuration file squid.conf (the configuration file used by Squid to record all preferences).
11
Basic Source and Destination ACLS
Acl types:
Src This will look for the client IPAddress.
Client Address
Usage acl aclname src ip-address/netmask.
Example 1. This refers to the whole Network with address 172.16.1.0
acl aclname src 172.16.1.0/24
2. This refers specific single IPAddress
acl aclname src 172.16.1.25/32
3. This refers range of IPAddress from 172.16.1.25 to 172.16.1.35
acl aclname src 72.16.1.25/255.255.255.255-
172.16.1.35/255.255.255.255
Note You will need to take care with the netmask, to ensure that you are
addressing the correct group of machines. To be sure that you are
using the correct netmask, you might want to acquire a subnet
calculator, such as that found on Cisco’s free “ConfigMaker” software.
If you wish to address a single machine, use the subnet mask
255.255.255.255.
Dst This is same as src with the only difference that it refers the Server IP
Webserver address address. First Squid will carry out a dns-lookup for the IPAddress of
the domain-name which is in request header. Then this acl is
interpreted.
12
Srcdomain This can be used to control requests from other domains – perhaps to
Client Hostname prevent other schools from using your squid proxy!
Since squid needs to reverse dns lookup (from client ip-address to
client domain-name) before this acl is interpreted, it can cause
processing delays. This lookup adds some delay to the request.
Usage acl aclname srcdomain domain-name
Example acl aclname srcdomain .kovaiteam.com
Note Here “.” is important (see the note on Webserver Hostname below).
Dstdomain This is the ACL type used by the blacklist to control access to sites
Webserver Hostname that have not been filtered by SmartFilter.
Usage acl aclname dstdomain domain-name
Example acl aclname dstdomain .kovaiteam.com
Hence this look for *.kovaiteam.com from URL
Note Here “.” is important.
Note that you can force the proxy to match multiple servers on the
same domain by prefixing it with a dot and omitting the hostname. For
example,
.bbc.co.uk
www.bbc.co.uk
news.bbc.co.uk
Also, do not enter the protocol part of the URL into this list – in other
words this is correct:
www.ibsed.networcs.net
but not:
http://www.ibsed.networcs.net
13
Regular Expression ACLs
“Regex”-type ACLs use pattern matching to control access. Typically, you will only need to enter a
single word to match against a domain name to achieve a result.
srcdom_regex This could be used to control accesses from stations with names
Client Regexp containing a particular character string, although you are probably
more likely to want to achieve this using IP addresses.
Since squid needs to reverse dns lookup (from client ip-address to
client domain-name) before this acl is interpreted, it can cause
processing delays. This lookup adds some delay to the request
Usage acl aclname srcdom_regex pattern
Example acl aclname srcdom_regex kovai
Hence this look for the word “kovai” from the client domain name
Note This type of ACL may introduce delays into the display of pages.
14
url_regex The url_regex means to search the entire URL for the regular
URL Regexp expression you specify. Note that these regular expressions are case-
sensitive
Usage acl aclname url_regex pattern
Example acl ACLREG url_regex cooking
ACLREG refers to the url containing ``cooking'' not “Cooking”
This example will find the word “radio” anywhere in the URL; it will
find:
www.radio.com
www.bbc.co.uk/radio3
etc.
urlpath_regex The urpath_regex regular expression pattern matching from URL but
URL Path Regexp without protocol and hostname. Note that these regular expressions
are case-sensitive
Usage acl aclname urlpath_regex pattern
Example acl ACLPATHREG urlpath_regex cooking
ACLPATHREG refers only containing “cooking” not “Cooking” and
without referring protocol and hostname.
If URL is http://www.visolve.com/folder/subdir/cooking/first.html then
this acltype only looks after http://www.visolve.com/ .
15
Browser Regular expression pattern matching on the request's user-agent
Browser Regexp header
Usage acl aclname browser pattern
Example acl aclname browser MOZILLA
This refers to the requests, which are coming from the browsers who
have “MOZILLA” keyword in the user-agent header
This example could be used to prevent users from using the Mozilla
browser.
16
Port Access can be controlled by destination (server) port address
URL Port
Usage acl aclname port port-no
Example This example allows http_access only to the destination
172.16.1.115:80 from network 172.16.1.0
acl acceleratedhost dst 172.16.1.115/255.255.255.255
acl acceleratedport port 80
acl mynet src 172.16.1.0/255.255.255.0
http_access allow acceleratedhost acceleratedport
mynet
http_access deny all
This example is used in the default NGfL server Squid proxy setup to
identify connections over secure sockets.
The above example is in the default NGfL server Squid proxy setup.
17
Method This specifies the type of the method of the request
Request method
Usage acl aclname method method-type
Example acl aclname method GET POST
This refers get and post methods only
This is the only example of this ACL in the default Squid proxy setup.
You are probably unlikely to need to use this ACL.
18
An Example
Let’s put together an ACL to prevent access to the Internet from a particular machine (or group of
machines) at a particular time of day. In this case, the time of day will be lunchtime every
weekday.
Now define the days and times of day you want your ACL to refer to:
Click on the
“Selected” radio
button, then
select the days
you want the list
to apply to.
Click on the
radio button to
the left of the
first time box,
then enter the
start and end
times.
After you have clicked on the “Save” button, your list will look something like this:
19
The new ACL in
place.
Next, we will create an ACL to define the station (or group of stations) to which we want the ACL to
apply. Note that this step would unnecessary if we wanted the rule to apply to all stations on the
network.
Create a new “Client Address” ACL. To do this, you must give the first address in the range, plus a
subnet mask. If there is only to be one machine in the range (as in this example), you can leave
the “To IP” box empty. If you want to create a list of several ad-hoc addresses, this can be done
by saving single addresses one by one.
Finally, you need to join these two ACLs together in a single proxy restriction, and move it into the
appropriate place in the list to create the desired effect. Click on the “Add proxy restriction” link at
the bottom of the Proxy restrictions list.
20
Select both the “lunchtimes” list and the station56 list, and ensure that the “Deny” radio button is
selected, then click on Save.
Now this list must be moved into position, using the arrows on the right, as shown below:
21
Squid Proxy Exercises
22
How the use of Squid and SmartFilter is enforced
SmartFilter and Squid proxy Web access restrictions are only in force if the browser is using the
local NGfL server as a proxy. In Internet Explorer 6, the settings can be found in Tools – Internet
Options – Connections – LAN Settings. The setting must refer to port 3128 as follows:
If this setting is not present, the user will have unfiltered access to the Web, unless firewall
restrictions on the NGfL server have been implemented to prevent this. On many school networks,
the administrator has the ability to enforce proxy settings, so firewall settings of this type are not
required.
23
Using Log Files
Your NGfL server keeps a continuous log of Website accesses. A typical portion of the log might
look something like this:
1021296336.929 3857 10.11.1.56 TCP_MISS/200 3807 GET http://services.postcodeanywhere.co.uk/form.asp? -
ROUNDROBIN_PARENT/cache1.networcs.net text/html
1021296337.032 103 10.11.1.56 TCP_MISS/304 230 GET http://www.worcestershire.gov.uk/home/jo_house_top.jpg -
ROUNDROBIN_PARENT/cache2.networcs.net text/html
1021296337.108 102 10.11.1.56 TCP_MISS/304 230 GET http://www.worcestershire.gov.uk/home/jo_boat.jpg -
ROUNDROBIN_PARENT/cache2.networcs.net text/html
1021296337.166 134 10.11.1.56 TCP_CLIENT_REFRESH_MISS/304 230 GET http://www.worcestershire.gov.uk/home/content_tag.gif -
ROUNDROBIN_PARENT/cache2.networcs.net text/html
1021296337.285 83 10.11.1.56 TCP_MISS/304 230 GET http://www.worcestershire.gov.uk/home/jo_house_bottom.jpg -
ROUNDROBIN_PARENT/cache2.networcs.net text/html
1021296338.580 1605 10.11.1.56 TCP_MISS/304 230 GET http://www.worcestershire.gov.uk/home/jo_middle_text.gif -
ROUNDROBIN_PARENT/cache1.networcs.net text/html
1021296338.641 1632 10.11.1.56 TCP_MISS/304 230 GET http://www.worcestershire.gov.uk/home/jo_house_middle.jpg -
ROUNDROBIN_PARENT/cache1.networcs.net text/html
1021296338.643 1476 10.11.1.56 TCP_CLIENT_REFRESH_MISS/304 230 GET http://www.worcestershire.gov.uk/home/spacer_line.gif -
Calamaris
As such, it is of little use. However, you are provided with a log analysis tool known as Calimaris
(website at http://calamaris.cord.de/ ). This provides the following:
• Summary
• Incoming requests by method
• Incoming UDP-requests by status
• Incoming TCP-requests by status
• Outgoing requests by status
• Outgoing requests by destination
• Request-destinations by 2ndlevel-domain
• Request-destinations by toplevel-domain
• TCP-Request-protocol
• Requested content-type
• Requested extensions
• Incoming UDP-requests by host
• Incoming TCP-requests by host
• Performance in 60 minute steps
24
To gain access to Calamaris, choose the “Servers” tab in Webmin, then choose “Squid Proxy”, and
finally “Calamaris Log Analysis”:
This should give you a good idea of how your proxy is being used, and the type of sites that are
being most frequently accessed. However, should you wish to have direct access to the log files
themselves, this can be arranged on a weekly basis (beware – a compressed archive from a
week’s log file can easily be 10MB…). Please contact the Broadband Support Team if you wish to
pursue this.
Exercise
Browse the Calamaris Log Analysis page for your school and determine the most popular sites.
25