Beruflich Dokumente
Kultur Dokumente
Where to Cache
Table Of Contents
Proxy Caching
Introduction
Where to Cache
Controlling the Cache
Cache Replacement Algorithms
Cache Hierarchies
Proxy Caching
Battling the "World Wide Wait"
Introduction
Definition of Proxy Caching
"Internet Object Caching is a way to store requested Internet Objects (i.e., data available
through http, ftp and gopher protocols) on a system closer to the requesting site than the
source. Web browsers can then use the local cache as a proxy HTTP server, reducing
access time as well as reducing bandwidth consumption."
http://squid-cache.org/Doc/FAQ/FAQ-1.html
Proxy servers work by intercepting requests for documents or files and then seeing if
they have a local copy of that particular object. If a current one exists, then that
document is returned to the client. If that document does not exist in the cache, or it is
deemed that the document held in the cache is no longer current, then a new copy of the
file is obtained via the web. This object is then forwarded onto the client, and a copy is
kept locally so that the next computer to request the same object can obtain the
document more swiftly. Caching can take place at the browser, or at the server.
The diagram above shows the relationship between a client (ie a web browser), and a cache. The cache
intercepts the web requests and either returns something it already has stored, or passes the request onto
the Location in the original request.
The technology used within a cache is very similar to that used in any web server,
however there are some subtle differences. Proxy Caches do not work with the same
efficiency as a Web Server, and all requests to a cache must be made with a complete
URL (rather than a relative path, which is fine on a web browser).
Why?
Security
Beyond the obvious savings in download time and bandwidth costs, Proxy Servers can
also provide a valuable service as part of the security policy of a company. For instance,
networks can be configured so that the only device that can make HTTP requests is the
proxy server. All computers must make their http requests through the proxy server.
This would serve to
1. Reduce the risk of attack to individual PCs and the network as a whole because
only one single machine is making external requests, and this is far simpler and
more robust to administer
2. Allow filtering of sites that users of the proxy can access - for example requests
for documents that are not cached might be rejected.
The clients (browsers) in the above diagram are behind a firewall - only the cache can make requests
through the firewall, the individual clients do not have this privilege.
Speed
There are a number of factors at play in reducing the efficiency at which your browser is
able to retrieve a web page. Factors such as DNS lookups for URLs, slow response
times of Web servers, the size of the object desired for retrieval, and general network
congestion all lead to the network leg of any data transfer being the slowest of all. Also,
Web Servers that can only run on HTTP/1.0 will always behave more slowly than those
that run on HTTP/1.1. The notion of sidestepping all of these issues should then have
immediate appeal. Keeping a copy of the objects that you want closer to the client will
avoid many of these pitfalls.
Bandwidth
As well as decreasing response times, many organizations pay for their Internet
connection based on data volume rather than length of connection. Objects retrieved
from within the organization, rather than the wild, will ultimately save the organization
money, and increase the efficiency with which the organizations conduct their online
affairs.
Checkpoint Questions
1. What are the three main benefits of using a cache?
Where to Cache
Copyright 2000 RMIT Computer Science
All Rights Reserved
C0SC1300
Introduction
Table Of Contents
Proxy Caching
Introduction
Where to Cache
Controlling the Cache
Cache Replacement Algorithms
Cache Hierarchies
Where to cache?
There are two approaches to caching - browser caching and server caching (or proxy
caching). Both use the same approach of intercepting requests for Internet objects, and
checking to see if they have a valid local copy already stored.
Browser Caching
Browsers can be configured to keep local copies of the files you browse on your own
hard disk. They use simple algorithms and allow minor configuration options.
Internet Explorer allows you to specify when the caching is done, but provides no
control over whether caching is done using Disk Cache or Memory Cache.
IE Settings
IE Settings
Internet Explorer allows the users to choose the cache level, among the following
available options.
Every Time You Start IE - If you log on to the Internet and access a page youve
previously visited, your browser will check only once during that session to see if that
page has been updated. At all other times, it will take the page from cache. For most
types of Web activity, this setting should be sufficient, and this setting is recommended.
Every Visit to the Page - Every time you access a web page, the browser will check to
see if it has been changed. If the last modified date of the page is older than the date of
the cached copy, it retrieves the page from the cache, and if it has been modified
recently, it retrieves the page directly from its source. This setting is harmless but
unnecessary.
Never - Your browser will never check if a page has been updated and will always use a
cached page. This setting is not recommended.
Netscape Settings
Netscape Settings
Netscape provides a further element in that it allows you to determine how much RAM
and disk space are available for use as a cache. Storing objects in memory for retrieval
will enable better performance than storing them on disk. As always with these things,
there is a limit to what can be achieved due to system constraints. This is as valid on a
server as it is on browser.
The main benefit of caching locally over caching on a server is that local caching will
eliminate any network hops.
A further benefit is that server caches can be part of what are known as Cache Farms which essentially is a network of server caches working together in order to capture the
greatest variety and quantity of objects possible. Companies like Bigpond offer the
services of their Web farms at a price.
Cache farms are discussed in more detail later.
Checkpoint Questions
1. What benefits does browser caching provide?
2. What benefits does networked caching provide?
Introduction
C0SC1300 - Lecture Notes
Web Servers and Web Technology
C0SC1300
Where to Cache
Table Of Contents
Proxy Caching
Introduction
Where to Cache
Controlling the Cache
Cache Replacement Algorithms
Cache Hierarchies
Your browser settings will also impact on whether the document is refreshed or not, and
quite commonly you may find yourself specifically telling the browser to get a fresh
copy of the document from the server.
HTTP/1.1 200 OK
Date: Fri, 30 Oct 1998 13:19:41 GMT
Server: Apache/1.3.3 (Unix)
Cache-Control: max-age=3600, must-revalidate
Expires: Fri, 30 Oct 1998 14:19:41 GMT
Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT
ETag: "3e86-410-3596fbbc"
Content-Length: 1040
Content-Type: text/html
Example of HTML Header information for a web page that contains caching instructions.
Checkpoint Questions
1. Why do we need to control how our web pages are cached, and how do we as web
developers manage it?
Where to Cache
C0SC1300 - Lecture Notes
Web Servers and Web Technology
C0SC1300
Controlling the Cache
Table Of Contents
Proxy Caching
Introduction
Where to Cache
Controlling the Cache
Cache Replacement Algorithms
Cache Hierarchies
Second-Chance
Second-chance is just like FIFO, but data is marked or flagged each time it is used,
and the data is managed by a queue.
When evicting, we inspect the oldest data that was added to the cache:
if the data is unmarked, we evict it
if it marked, we make it newest data in and unmark it, and move on to the
next-oldest item
The basic principle is to look for old data that has not been referenced for a period of
time.
Second-chance is inefficient for the reason that we need to maintain queues of data by
age, and reorganise this queue frequently. Another approach is to arrange data in a
circular queue, and to move a hand around the queue. The hand points to the oldest
page.
If the hand points to an unmarked item, it is evicted. If it points to a marked item, it is
unmarked and the hand moves on.
The only difference to second-chance is the implementation.
Checkpoint Questions
1. Use with the FIFO page, and determine the state of the cache after the following
values are accessed 0 1 2 3 4 5 6 3 4 5 6 2 3 4.
2. Use with the LRU page, and determine the state of the cache after the following
values are accessed 0 1 2 3 4 5 6 3 4 5 6 2 3 4.
3. Use with the 2nd page, and determine the state of the cache after the following
values are accessed 0 1 2 3 4 5 6 3 4 5 6 2 3 4.
Links
1. FIFO page
2. LRU page
3. Second Chance page
C0SC1300
Cache Replacement Algorithms
Table Of Contents
Proxy Caching
Introduction
Where to Cache
Controlling the Cache
Cache Replacement Algorithms
Cache Hierarchies
This situation can be improved if we increase the intelligence and cooperation of our
network of caches - in this case, the paradigm is moves away from a pyramid where the
information flows one way, to a more parallel scheme, with many caches on similar
levels and with a greater intelligence available in the extent of communication between
them. Harvard University developed a communications protocol called ICP (Internet
Cache Protocol) which uses UDP to communicate between caches. Requests for
information are passed between caches using UDP, and when a positive result is
received a complete request is made using HTTP. The net result of this more informed
discussion is that redundancy of stored objects is decreased, and load sharing between
different components of the cache farm is much more viable. When the requesting cache
receives a positive response via ICP, it makes a formal request for the document from
the cache that made the positive reponse, otherwise, it goes out to the Internet to request
the document from the original location.
Checkpoint Questions
1. What benefits does a cache farm have over a cache hierarchy?