Sie sind auf Seite 1von 13

Proxy Server

“DATA NETWORK” FOR JTOs PH-II : Proxy Server

Proxy Server
Introduction
Although the volume of Web traffic on the Internet is staggering, a
large percentage of that traffic is redundant---multiple users at any given site
request much of the same content. This means that a significant percentage
of the WAN infrastructure carries the identical content (and identical requests
for it) day after day. Eliminating a significant amount of recurring
telecommunications charges offers an enormous savings opportunity for
enterprise and service provider customers.

Data networking is growing at a dizzying rate. More than 80% of Fortune 500
companies have Web sites. More than half of these companies have
implemented intranets and are putting graphically rich data onto the
corporate WANs. The number of Web users is expected to increase by a
factor of five in the next three years. The resulting uncontrolled growth of
Web access requirements is straining all attempts to meet the bandwidth
demand.

Caching
Caching is the technique of keeping frequently accessed information in
a location close to the requester. A Web cache stores Web pages and
content on a storage device that is physically or logically closer to the user---
this is closer and faster than a Web lookup. By reducing the amount of traffic
on WAN links and on overburdened Web servers, caching provides
significant benefits to ISPs, enterprise networks, and end users. There are
two key benefits :

 Cost savings due to WAN bandwidth reduction---ISPs can place cache


engines at strategic points on their networks to improve response times
and lower the bandwidth demand on their backbones. ISPs can station
cache engines at strategic WAN access points to serve Web requests
from a local disk rather than from distant or overrun Web servers.

In enterprise networks, the dramatic reduction in bandwidth usage due to


Web caching allows a lower-bandwidth (lower-cost) WAN link to serve the
same user base. Alternatively, the organisation can add users or add more
services that use the freed bandwidth on the existing WAN link.

 Improved productivity for end users---The response of a local Web cache


is often three times faster than the download time for the same content
over the WAN. End users see dramatic improvements in response times,
and the implementation is completely transparent to them.

Other benefits include the following :

BRBRAITT Nov-2006 2
“DATA NETWORK” FOR JTOs PH-II : Proxy Server

 Secure access control and monitoring---The cache engine provides


network administrators with a simple, secure method to enforce a sitewide
access policy through URL filtering.

 Operational logging---Network administrators can learn which URLs


receive hits, how many requests per second the cache is serving, what
percentage of URLs are served from the cache, and other related
operational statistics.

Web Caching : How it works


Web caching works as follows :

1) A user accesses a Web page.

2) While the page is being transmitted to the user, the caching system saves
the page and all its associated graphics on a local storage device. That
content is now cached.

3) Another user (or the original user) accesses that Web page later in the
day.

4) Instead of sending the request over the Internet, the Web cache system
delivers the Web page from local storage. This process speeds download
time for the user, and reduces bandwidth demand on the WAN link.

5) The important task of ensuring that data is up-to-date is addressed in a


variety of ways, depending on the design of the system.

Browser-Based Client Caching

Internet browser applications allow an individual user to cache Web pages


(that is, images and HTML text) on his or her local hard disk. A user can
configure the amount of disk space devoted to caching.

This set-up is useful in cases where a user accesses a site more than
once. The first time the user views a Web site, that content is saved as files
in a subdirectory on that computer's hard disk. The next time the user points
to this Web site, the browser gets the content from the cache without
accessing the network. The user notices that the elements of the page---
especially larger Web graphics such as buttons, icons, and images---appear
much more quickly than they did the first time the page was opened.

BRBRAITT Nov-2006 3
“DATA NETWORK” FOR JTOs PH-II : Proxy Server

Figure 1 demonstrates the benefits gained by a single node using


browser caching.

Internet

Web Server

A B C

Caching to local disk Web clients

Fig. 1 Benefits gained by a single node using browser caching

This method serves this user well, but does not benefit other users on
the same network who might access the same Web sites. In Figure 1 , the
fact that User A has cached a popular page has no effect on the download
time of this page for Users B and C.

Caching Solution on the Network Level : The Proxy Server &


Network Cache Concept

To limit bandwidth demand caused by the uncontrolled growth of Internet use,


vendors have developed applications that extend local caching to the network
level. The two current types of network-level caching products are proxy
servers and network caches :

 Proxy servers are software applications that run on general-purpose


hardware and operating systems. A proxy server is placed on hardware
that is physically between a client application, such as a Web browser,
and a Web server. The proxy acts as a gatekeeper that receives all
packets destined for the Web server and examines each packet to
determine whether it can fulfil the requests itself; if it can't, it forwards the
request to the Web server. Proxy servers can also be used to filter

BRBRAITT Nov-2006 4
“DATA NETWORK” FOR JTOs PH-II : Proxy Server

requests, for example, to prevent employees from accessing a specific set


of Web sites.

Unfortunately, proxy servers are not optimised for caching, and they fail under
heavy network loads. In addition, because the proxy is in the path of all user
traffic (it's a "bump in the cable"), two problems arise: All traffic is slowed to
allow the proxy to examine each packet, and failure of the proxy software or
hardware causes all users to lose network access. Further, proxies require
configuration of each user's browser---an unacceptable option for service
providers and large enterprises. Expensive hardware is required to
compensate for low software performance and the lack of scalability of proxy
servers.

 In response to these shortcomings, some vendors have created network


caches. These caching-focused software applications are designed to
improve performance by enhancing the caching software and eliminating
the other slow aspects of proxy server implementations. However,
because network caches run under general-purpose operating systems
(such as UNIX or Windows NT) that involve very high per-process context
overhead, they cannot scale to large numbers of simultaneous processes
in a graceful fashion. This is especially true for networking caching
systems that can have many thousands of simultaneous and short-lived
transactions.

Example : Cisco's Network-Based Shared Caching

The cache engine solution comprises the Web Cache Control Protocol (a
standard feature of Cisco IOS software) and one or more Cisco cache
engines that store the data in the local network.

The Web Cache Control Protocol defines the communication between the
cache engine and the router. Using the Web Cache Control Protocol, the
router directs only Web requests to the cache engine (rather than to the
intended server). The router also determines cache engine availability, and
redirects requests to new cache engines as they are added to an installation.

The Cisco cache engine is a single-purpose network appliance that stores


and retrieves content using highly optimised caching and retrieval algorithms.
(See Figure 2)

BRBRAITT Nov-2006 5
“DATA NETWORK” FOR JTOs PH-II : Proxy Server

Internet

Cache Engine

Fig. 2 Cisco cache engine connected to a Cisco IOS Router

Cache Engine Operation

Using the Web Cache Control Protocol, the Cisco IOS router routes requests
for TCP port 80 (HTTP traffic) over a local subnet to the cache engine. The
cache engine is dedicated solely to content management and delivery.
Because only Web requests are routed to the cache engine, no other user
traffic is affected by the caching process---Web caching is done "off to the
side." For non-Web traffic, the router functions entirely in its traditional role.

The cache engine works as follows :

1) A client requests Web content in the normal fashion.

2) The router, running the Web Cache Control Protocol, intercepts TCP port
80 Web traffic and routes it to the cache engine. The client is not involved
in this transaction, and no changes to the client or browser are required.

3) If the cache engine does not have the requested content, it sends the
request to the Internet or intranet in the normal fashion. The content
returns to, and is stored at, the cache engine.

4) The cache engine returns the content to the client. Upon subsequent
requests for the same content, the cache engine fulfils the request from
local storage.

BRBRAITT Nov-2006 6
“DATA NETWORK” FOR JTOs PH-II : Proxy Server

Transparency

Because the router redirects packets destined for Web servers to the cache
engine, the cache engine operates transparently to clients. Clients do not
need to configure their browsers to be in proxy server mode. This is a
compelling feature for ISPs and large enterprises, for whom uniform client
configuration is extremely expensive and difficult to implement. In addition,
the operation of the cache engine is transparent to the network---the router
operates entirely in its normal role for non-Web traffic. This transparent
design is a requirement for a system to offer networkwide scalability, fault
tolerance, and fail-safe operation.

Hierarchical Use

Because the Cisco cache engine is transparent to the user and to network
operation, customers can place cache engines in several network locations in
a hierarchical fashion. For example, if an ISP places a large cache farm at its
main point of access to the Internet, then all its points of presence (POPs)
benefit.

Proxy servers are widely used to help the users of an Internet node to
get faster response time in fetching a web page while reducing the line
congestion between the user (individuals or company or university and the
upstream service provider (ISP).

It is highly recommended that we enforce "proxy" usage in ISP network


in order to preserve the valuable bandwidth of the node. Each institution /
organisation should also run a proxy service and promote the use of proxy to
all individual users.

The main reason for using a proxy server is to give access to the
Internet from within a firewall. An application-level proxy makes a firewall
safely permeable for users in an organisation, without creating a potential
security hole through which one might get into the subnet. A very important
thing about proxies is that even a client without DNS can use the Web : It
needs only the IP address of the proxy. Application level proxy facilitates
caching at the proxy. Usually, one proxy server is used by all clients
connected to a subnet. This is why the proxy is able to do efficient caching of
documents that are requested by more than one client. The fact that proxies
can provide efficient caching makes them useful even when no firewall
machine is in order. Configuring a group to use a caching proxy server is
easy (Most popular Web client programs already have proxy support built in),
and can decrease network traffic costs significantly, because once the first
request was made for a certain document, the next ones are retrieved from a
local cache.

Proxying is a standard method for getting through firewalls, rather than


having each client get customised to support a special firewall product or a
method. That is, you don't need to make changes in the source codes of
clients, which is impossible in some cases. It can be configured to be a proxy
client. It is also possible to write clients that only understand HTTP - other

BRBRAITT Nov-2006 7
“DATA NETWORK” FOR JTOs PH-II : Proxy Server

protocols are handled by the proxy in a transparent way. Using proxies allows
high level logging of client transactions (data and time, URL, and some other
fields in an HTTP transaction) which is not possible in the IP or TCP level.

Web caching performs the local storage of Web content to serve these
redundant user requests more quickly, without sending the requests and the
resulting content over the WAN. Proxy is a way to store requested Internet
objects (i.e., data available via the HTTP, FTP, and gopher protocols) on a
system closer to the requesting site than to the source.

Web browsers can then use the local cache as a proxy HTTP server,
reducing access time as well as bandwidth consumption.

Workstations do not have a direct physical connection to the Internet


and therefore it is not possible for them to communicate in a direct way. The
proxy server is placed between the physical connection point to the Internet,
and the connection point to the local net, delivers the requests from the local
net to the Internet as if it was the original requester.

The proxy is a special HTTP server that typically runs on a firewall machine.
It waits for requests coming from inside the firewall, and then sends them to
the remote server, gets the response and sends it back to the client.

For transactions of a client with the proxy server, the client only uses HTTP,
even when accessing a resource served by a remote server using another
protocol, like FTP. When sending a request to a proxy, the full URL is
specified and not just the pathname along with optional search keywords as
with regular HTTP request.

At this point the proxy starts to function as a client to retrieve the document,
using the suitable protocol module.

As for caching, it is done by saving copies of retrieved pages and objects


(like common graphic and voice files) in a local file on disk for further
requests of users. The caching mechanism can survive restarts of the proxy
process and also restarts of the server machine. When a retrieval of an
updated document is needed, the remote server should be contacted for the
GET request. Using the head information of a document is good for checking
if it has been modified.

For more efficiency in these situations, the If-Modified-Since request header


was added to the HTTP, which means the header contains the last
modification time and date of the object currently in the client. Now, if the
object hasn't been changed, than only a new expiry date is sent, otherwise
the request is served as a regular HTTP request.

A good proxy system gives suitable tools for managing and controlling the
data flow. For instance : user authorisation for accessing sites, blocking
"strangers" trying to get into the local net, tracing users operations and
storing some common information for the benefit of all the net users without
the need to bring it again from outside.

BRBRAITT Nov-2006 8
“DATA NETWORK” FOR JTOs PH-II : Proxy Server

Implementation

Cache Hierarchy

Cache hierarchy is the way to connect proxy server, called child, to


another proxy server, called parent. Proxy server can usually act as both
child and parent. Cache hierarchy provides a more efficiency of caching.
e.g. if an ISP provides you with parent caches called cache1.nectec.or.th, and
cache2.nectec.or.th which you (your proxy server) can point to as a child.

Caching Topologies

There are two main topologies available for configuring caches.

 Distributed Caching
Here the contents of the cache are distributed over a number of proxy
servers, all connected together in a net. The group of servers creates an
Array in which the members co-operate therefore creating a strong caching
system with the ability to share the load. This also serves well when one of
the proxies fails. The data flowing to the different users does not stop. This
technology makes the finding of information in the cache area quite fast.

 Hierarchical Caching (or mesh)


Here the proxy servers are being connected in hierarchical way. Users
requests are first of all being processed by a local server, and then, if it could
not find the requested information within the local cache area, it turns to
another proxy server according to the hierarchy rules that were set. And if still
no hit has occurred, the request will be sent to the original server.

In a cache hierarchy one cache establishes peering relationships with its


neighbour caches : either parent or sibling. A parent is one level up, and a
sibling cache is in the same level. The general flow of requests is up the
hierarchy. A cache that doesn't hold the requested object refers to its
neighbour caches to see if they might have the object. If one of the
neighbours has it the cache will request it from them. (There are ways to
decide which neighbour should be preferred, in case of several hits). Else,
the cache forwards the request either to a parent, or directly to the origin
server. A "neighbour hit" can be cached from parent or sibling while
"neighbour miss" can't be fetched from a sibling.

Client / Server Implementation

Big part of WWW clients are built on top of libwww (the WWW Common
Library), that handles communication protocols like HTTP, FTP and others, all
used in the Web. All the proxy support needed is handled with libwww
transparently. This is done by using environment variables, which are set to
the URL of the proxy that knows, by this way, the protocol of the requests it

BRBRAITT Nov-2006 9
“DATA NETWORK” FOR JTOs PH-II : Proxy Server

should serve (In some cases there is a proxy for every protocol, but it is not
very likely).

The libwww forces a connection to the proxy rather than to the remote server
whenever an environment variable is used, set to a certain protocol. Usually
an exception list for a client to not go through a proxy is available, too, which
is useful for connecting local servers. Since proxy support in a Web is very
simple-libwww is not a necessity for clients. In fact, proxy support is
implemented only for HTTP version 1.0 on the server side, which most clients
these days are able to use.

The proxy acts both as a server and a client. When accepting requests from
clients it functions as server, but acts as a client when connected to the
remote server. The proxy forwards header fields it got from the client to the
remote server. However, it doesn't pass the full URL it got, but only the path
and keyword portion in the URL. A full proxy server should be able to speak
all Web protocols.

Implementations have developed in the last years to support passing access


authorisation information to the remote server. That is necessary when
accessing protected documents.

Hierarchical Cache Relations

The clients A-D have two levels of hierarchy, while client E has only one level
of hierarchy, which means that if he sends a HTTP request, it goes directly to
proxy A and proxies B-C are not involved at all. If one of the clients marked as
A-D sends a HTTP request, it goes through the relevant proxy, and if there is
a "cache miss" it sends a ICP request to the sibling proxy and the parent
proxy.
Proxy Servers

Clients
Proxy A
Siblings
Parent

Proxy B Proxy C

Client A Client B Client C Client D Client E


Cache Hit

BRBRAITT Nov-2006 10
“DATA NETWORK” FOR JTOs PH-II : Proxy Server

Client A sends a HTTP request, it gets to proxy B, which has the relevant
document cached, it then reports a "cache hit" and retrieves the relevant
document to the client.

Cache Miss

Client A sends a HTTP request, it gets to proxy B, which does not have the
relevant document cached, it then sends a ICP request to its siblings (proxy
C), and to its parent (proxy A). Proxies A & C does not have the document
either, and they do not send a ICP reply, after a set period of time, proxy B
has a time-out, and it requests its parent (Proxy A) to retrieve the document.
Proxy A send a HTTP request to the destination server, and gets a HTTP
reply, he then sends the document in a ICP reply to proxy B, which then
sends the relevant document to the client.

Proxy Servers

Clients
Proxy A
Siblings
Parent

Proxy B Proxy C

HTTP
Response

Client A Client B Client C Client D Client E

Advantages / Disadvantages

Security Issues

Many of the current firewall designs rely on the combination of packet


filtering and the proxy technology (especially "transparent proxying"
technology). Today, Proxy systems can manage the different operation
authorisations that users have when surfing (for example: who is allowed to
use which protocol), blocking unwanted surfers outside the local net from
going in, and run a log file containing users operations. Of course that's all
besides the filtering on the basis of IP address.
However, the caching ability which makes the Web run faster, has its security
disadvantages. It could be bad for business advertising at Web sites. It might
even violate copyright law.

BRBRAITT Nov-2006 11
“DATA NETWORK” FOR JTOs PH-II : Proxy Server

Advertisers behind a site have a problem with the caching proxy servers.
They have no way of knowing the number of readers behind a hit-it could be
one or hundred thousand - they can't tell without looking at the log files of the
proxies. Furthermore, every copyrighted document sitting in the proxy's cache
is, in fact, an unauthorised copy.

The wrong solution would be to disable the caching. It will hurt the
performance, causing fewer visitors at the advertisers sites. A good solution
would be letting a caching proxy to keep a copy of a Web page if the proxy
promises in return, to tell the Web server the number of hits it got for that
page over a reasonable time period. Undoubtedly, advertisers would prefer a
more specific information of the readers, but that's something to argue about.
Other problems arise when using the Internet Cache Protocol (ICP) - a
lightweight format message used for communication among Web proxy
caches, implemented on top of UDP. ICP is used for object location, and can
be used for cache selection. Because of its connection-less nature, it has
vulnerability to some methods of attack. By checking the source IP address of
an ICP message-certain degree of protection is accomplished. ICP queries
should be processed only if the querying address is allowed to access the
cache. ICP replies should only be accepted from known neighbours,
otherwise ignored. Trusting the validity of address in the IP level makes ICP
susceptible to IP address spoofing which has many problematic
consequences (for example: inserting bogus ICP queries, inserting bogus ICP
replies thereby preventing a certain neighbour from being used or forcing a
certain neighbour to be used). In fact, only routers are able to detect spoofed
addresses, hosts can't do it. But still, the IP Authentication Header can be
used to provide cryptographic authentication for the IP packet with the ICP in
it.

In general, the caching method can cut down duplicate request up to


30%. However, in order to investigate the overall effects of different caching
strategies on the network as a whole, a mathematical model should be used.

Examples of Proxy Servers


1) Microsoft Proxy Server 2.0-recently developed for use with
Windows
NT 4.0.
Has firewall functionality, and caching ability as well.
2) Netscape Proxy Server 2.5-for use with UNIX and Windows NT.
Caches Web pages and scans for viruses at the same time.

BRBRAITT Nov-2006 12
“DATA NETWORK” FOR JTOs PH-II : Proxy Server

Summary and Analysis


The proxy technology, which has developed greatly in the last years,
offers the perfect solution for organisations sitting on a closed subnet behind
a firewall, who are interested in giving their employees a controlled access to
the Internet. A proxy is in fact, an http server that sits on a firewall machine
and usually has a caching ability making surfing much faster.

This ability makes it attractive also in case there is no firewall. It also allows
one to read documents "unplugged" to the Internet. This caching proxy
functions as a server when connected by a client, and as a client when
contacting the original server.

Today's proxies are very sophisticated, so that the security "holes" are
minimal.

The size of Proxy server depends on the operational target of your site. If you
have large communication lines (e.g. 512 kbps/2 Mbps or more) and
thousands of users, you may set-up a moderate size proxy server running
disk of some 4 GB or more. Make sure that the CPU has a lot of memory to
provide fast response time and the system is safe from power outage. It is
advisable to use high-reliability disk file system if you can afford it. Multi-unit
SCSI disk drives work faster than few larger capacity disks.

When running the proxy server, please make sure that you promote the use
of the proxy well to monitor the system performance such as the amount of hit
and savings per day for outside-line access.

References

• RFC 1919
• RFC 2186
• RFC 2187

BRBRAITT Nov-2006 13

Das könnte Ihnen auch gefallen