Sie sind auf Seite 1von 19

An Architecture for On-Demand Active Web Content Distribution

Chengdu Huang, Seejo Sebastine, and Tarek Abdelzaher Department of Computer Science University of Virginia Charlottesville, VA 22904

fchuang, seejo, zaherg@cs.virginia.edu

ABSTRACT The phenomenal growth of the world-wide web has made it the most popular Internet application today. Web caching and content delivery services have been recognized as valuable techniques to mitigate the explosion of web trafc. An increasing fraction of web trafc today is dynamically generated and therefore uncacheable using present static approaches. Scalable delivery of such active content poses a myriad of challenges, including content replication, update propagation, and consistency management. This paper describes a scalable architecture for transparent demand-driven distribution of active content. Our approach involves migrating the scripts which generate the dynamic web trafc, and their data, from the server to an active content distribution proxy nearest to the client. Our architecture is extendible in that it can incorporate a variety of data consistency management policies to ensure that the client sees an up-to-date copy of the dynamically generated web page. We show that signicant improvements are observed in effective throughput and client response time.

1 Introduction
The dramatic explosion of the Internet as an information source generates signicant demand for efcient content delivery architectures. The World Wide Web is the dominant Internet application today, motivating a web-based content distribution service. Web caching and content delivery networks are currently the key performance accelerators in the web infrastructure. Such services have traditionally operated on static content. An increasing fraction of the trafc on the web today is dynamically generated as many web sites evolve to provide sophisticated e-commerce and personalized services. Such dynamically generated content is uncacheable using present static approaches. There has been considerable interest in caching dynamic content recently [7, 13, 15, 10, 24, 8]. Most approaches investigate caching the dynamic pages resulting from script execution on the server. However, since many dynamically-generated pages are personalized, the hit ratio on them is very low. Hence, in many cases, caching them does not aid much in reducing server load. Instead, we propose a content-distribution service in which any requested content-generating script (and thus the need for computing power) can be migrated transparently from its origin server to one of a network of active cache proxies that is nearest to the requesting client. In effect the service replicates active web content
The work reported in this paper was supported in part by the National Science Foundation under grants ANI-0105873 and CCR0093144

on demand. Active web content refers to the scripts themselves that generate dynamic web pages, rather than the content they generate. When dynamically-generated content is requested from a proxy, the proxy requests the script which generates that content from the origin server (unless it already has it), installs it locally, and executes it to serve the request. The script is thus said to be cached. File accesses made by cached scripts are handled by the proxys I/O subsystem which is instrumented to fetch, upon the rst reference, a copy of the accessed data objects from the origin server. This copy is subsequently kept consistent with its counterpart on the origin server using a QoS-aware consistency management protocol. The protocol allows one to specify per-content-class temporal consistency requirements that should be enforced by the run-time system. For example, one may specify that all fetched premium objects should be no more than 10-minutes out of date, while all fetched basic objects should be no more than 30-minutes out of date. Our architecture is useful for sites like my.yahoo.com where a very large number of personalized pages are dynamically generated from a much more limited base of underlying fairly static data objects. Such pages are often called pseudo dynamic content. Caching such pages is not useful since they are not shared by multiple clients. Caching the underlying data objects and their scripts, on the other hand, can signicantly reduce the load on the origin server. Such architecture provides the benets of better scalability and improves client-observed access latencies. In our architecture, active caches and providers must mutually authenticate themselves before active content can be migrated such that this content (i) is shared with trusted caches only, and (ii) is itself trusted by the cache. While we do not develop authetication services ourselves, schemes proposed recently for secureaware web caches [18] can be applied directly to our architecture as described in related work. In this paper, we focus on the demand-driven active content caching protocol. Content providers can subscribe to our content-distribution service (presumably at a price) to benet from lower origin server load, improved service scalability, and lower latency to their clients. The service behaves as a load-balancer for the original content web servers in the sense that the request load for a popular server gets distributed among multiple proxies. The only requests that are ltered through to the origin servers are preliminary page accesses and active proxy requests to maintain the data consistency. From a local ISPs perspective, our service reduces backbone trafc for which the ISP is responsible to the backbone provider, as more requests for active content are satised locally. This reduces the costs paid by the ISP. Clients also observe an improvement in access latencies since more pages are served to the client from a closer location on the Internet. The rest of this paper is organized as follows. Section 2 describes related work. Section 3 describes the active caching architecture and its various components. Section 4 describes the implementation details. Section 5 presents the experimental evaluation of our architecture. Finally, Section 6 concludes the paper and discusses avenues for future work.

2 Related Work
Proxy caching of static content [6, 9, 29] has been discussed at length in prior literature. Caching active content at web proxies has received less consideration and has generally been conned to java code [7, 4]. The benets of static web content caching, however, are limited considering the signicantly increasing fraction of uncacheable objects under the current web caching schemes [28].

Web server workload can be reduced by server-side caching of dynamically generated content [13, 15, 8, 21, 31]. Smith et al. [13] proposed a solution for cooperative caching of dynamic web content on a distributed web server. In their system, clustered nodes collaborate with each other to cache and maintain result consistency using Time-To-Live methods. Iyengar et al. [8, 15] propose that cache servers export an API that allows individual applications to explicitly cache and invalidate application-specic content. These techniques are developed for caching identical requests at origin content providers and are not very feasible for proxy caches. An efcient consistency management protocol is proposed in [31] for the dynamic content cached at the front-end of web servers. Dynamic pages are classied based on the URL patterns and can be invalidated by an application. This is a complementary approach to the ne-grained invalidation/update scheme based on dependency graphs [8]. Beck et al. [3] discussed the issues of replicating dynamic content due to platform heterogeneity and proposed using a portable channel representation. Candan et al. [5] addressed the problem of making changes in database content reected to cached dynamically generated web pages by intelligently invalidating dynamic content in caches based on the state changes in the database. Dynamic content caching can have inherent limitations in terms of performance improvement when deployed at web servers. For example, the user perceived latency due to the backbone delay can not be reduced. In contrast, proxy caching can reduce the backbone trafc and the user response time due to the backbone delay by caching dynamic content near the clients. In this paper, we focus on the proxy caching of the scripts that generate dynamic content. Cao et al. [7] propose an elegant architecture similar to ours, in which a piece of Java code (called a cache applet) is attached to each dynamic document and cached on the proxy. This cache applet has the task of deciding whether a cached version of the dynamic document is to be returned, whether a new version is to be created on the proxy, or whether the request is to be forwarded to the origin server for regeneration. The cache applet is run on the proxy whenever a request for a cached document is received. This approach is extremely exible and can be used to maintain consistency in an application-specic manner as well as dynamically modify existing documents. However, this exibility is achieved at a price. It requires starting up a new Java process (virtual machine or compiled code) for every request in order to execute the applet. In addition, keeping track of result equivalence requires creation and maintenance of a pattern-matching network. Since the cache applet for each document is independent (for security reasons), cache-applets used to implement result-equivalence-based caching would need to save and restore their pattern-matching network to persistent storage, which can be a signicant performance constraint. Besides, the cache applet can not handle the situations in which the dynamic content is not written in Java. Proxy caches that are capable of delivering dynamic content are both more vulnerable to bad content and may have more control in that in some applications they can change state at the origin server (e.g., by selling products on behalf of an e-commerce site). Traditional end-to-end security mechanisms are no longer sufcient to ensure integrity. Gemini [18] is a security-aware publisher-centric web cache. With the help of a public key infrastructure to provide key distribution, clients, caches and publishers can check each others digital signature. Each origin server (publisher) has a certicate identifying itself and a public key. The origin server decides which proxy caches are trustable. So each content object from the origin server is associated with an access control list specifying which caches are allowed to stored this object. A cache receives both the content object and access control list and generates the dynamic content. Along with its reponse to client, the cache may attach the access control list from origin server so that the client can verify if the cache is authorized to generate the requested content. This scheme can be adopted with in our architecture to authenticate caches 3

to servers and vice versa via their digital signatures, and ensures that active content is replicated and executed only in the presence of mutual trust. Akamai uses Edge Side Include (ESI) [11] technology to handle dynamic content in their content distribution network service. ESI is a markup language that developers can use to identify content fragments for dynamic assembly at the network edge servers. Therefore, only those non-cachable or expired fragments need to be fetched from origin servers, thereby lowering the need to retrieve complete web pages and decreasing the workload of origin servers. In this scheme, dynamic content still can not be cached in proxy cache servers. Recent research on peer-to-peer networks [27, 22, 23, 30] investigates efcient and scalable distributed content lookup services. These efforts generally focus on content-based application-layer routing, i.e., how to nd the content location of a given key in a network with tens of thousands or more nodes. Efciency, scalability and fault-tolerance are the major design concerns. An example of applying these concepts specifically to web caching is Squirrel [17], which is a decentralized peer-to-peer web cache. Instead of setting up a centralized proxy-cache server for a group of client machines, the client machines themselves cooperate in a peer-to-peer fashion to provide the functionality of the web cache. Every client machine allocates a certain portion of its disk for caching web content and exports this local cache to other machines. All local caches in a group of machines synthesize a large shared peer-to-peer virtual web cache. The target environment of Squirrel is corporate networks with 100 to 100,000 client desktops. Calo et al. [4] proposed an architecture for active content caching very similar to ours. In their architecture when proxy caches receive dynamic content requests, they determine the execution parameters that may be needed for this invocation and download the program and conguration information from the origin server. The application logic is executed on the proxy server and administrative information or error messages are logged and sent to the origin server. One difference between our architecture and theirs is that in their implementation all the download programs must be Java based while in our implementation we have less restrictions on application programming languages. We also explicitly consider the QoS requirements associated with content classes.

3 Architecture
In this section we present the architecture for our content-distribution service. We consider the network model shown in Figure 1. In this architecture, some web cache proxies are capable of replication and execution of active objects, in addition to the traditional caching of static objects. We call them active cache proxies. These proxies are connected to servers who export active content for replication. We call them origin servers. We call such caches and servers that support our extensions, compliant. The active caches take care of maintaining the scripts and data on the cache consistent with that on the origin server. In the following three subsections we discuss, respectively, protocol and architectural issues regarding origin servers, active proxies, and the consistency management between the two.

3.1 The Origin Server


An origin server is one which contains the original copy of a particular site. In this paper, we associate each site with a single origin server. Any updates to the content of the site originate at that server. To

Server

Network Core

Network node Cache Storage + Comput. Power

Storage-capable QoS-aware rim

Clients

Figure 1: Network Model for Active Caching support on-demand active content replication, the server software needs to be slightly modied. However, the extensions should be integrated in a way that does not violate current standard protocols when a compliant server communicates with a non-compliant (i.e., legacy) cache or vice-versa. The current version of HTTP allows for clients and/or proxies to negotiate the type of requested content with the servers. For example, a client or proxy can specify that it accepts only documents written in French. Our departure point lies in introducing content type denitions that refer to active content (to be used in requests to fetch the content-generating scripts themselves). We also introduce an HTTP header option that alerts the receiver that active content is requested. In our architecture, active content types are classied by the language of the script (for which the cache should have a compiler or interpreter) and/or the operating system for which the script is designed. Thus, one type of scripts, for example, would be Perl scripts on UNIX. A non-compliant server that receives a request for a URL such as http://www.server.com/cgibin/prog.cgi would execute the named script locally by default, sending the resulting page back to the client or proxy. When a compliant server receives such a request, it looks for the new header option. If it is not found it serves the result of script execution, as an ordinary server would. Otherwise, it identies the request as coming from a compliant sender (an active cache) and examines the types of scripts the cache says it supports. If the list does not include the requested script type, the server executes the script locally. If the particular script type is supported by the cache, the replication protocol is invoked and the script is replicated. Our extensions are therefore fully interoperable with the current HTTP standards. The software of the origin server requires only a few changes in order to recognize requests from active caches. A dynamic content module is implemented that determines if a requested script to be executed locally or sent to the requester. In the latter case, the server may send additional information pertaining to consistency management of the script replica as will be discussed shortly. This information is called a ticket. The active caching protocol is illustrated in Figure 2. Steps 1, 2, and 7 of the protocol (shown in Figure 2) are identical to standard HTTP. Steps 3, 4, 5, and 6 are specic to active content replication.

3.2 The Active Cache


The active cache is the architectural component responsible for locally replicating scripts from remote origin servers. Each active cache is composed of two primary components; a proxy cache and an active server, as shown in Figure 3. The proxy cache handles the job of caching static content, as well the dynamic content generated by executing server scripts if needed. The active server handles the job of executing one or more

Web Server
3. Server decides whether to execute script, or send it to cache.

4. Server replies with either the script execution results, or the script, plus a ticket. 6. Script is executed (if applicable) and ticket is marked

7. Cache sends results to client

5. Cache fetches dependent objects, if needed. 2. Cache requests script from server.

Client
1. Client issues request for dynamic content

Cache Proxy

Figure 2: The Active Caching Protocol types of related dynamic content, say for example cgi, fast-cgi etc. Hence the active cache can have multiple active servers to handle different types of dynamic content. For example, one server can handle Perl scripts, while another can handle ASP content. We have used a modied version of the freely available Squid proxy cache [25] as our proxy cache component and the Apache web server [1] as our active server. The Squid proxy cache includes a redirector component. Clients contact the Squid cache component with content requests. If the requests are for static content, Squid directly contacts the origin server, retrieves the results, and caches them for future accesses. However, if the requests are for dynamic content, Squid uses its redirector to modify the client requested URLs. Squid redirects such requests for dynamic content to the Apache active server component, which then fetches the script from the origin server, executes it and returns the results to Squid. Since both proxy and active server components are either running on the same physical machine or on machines on the same local network, the overhead involved is far less than going over the Internet backbone to the origin server. In addition, the Squid proxy cache can cache the results of execution of the script on the active server if so congured. Hence future requests for such scripts, with the same inputs, may be served from the Squid cache. The other components of the architecture as shown in Figure 3 are a URL rewriting engine, a timer module and a script cache, all of which are part of the active server component. The URL rewriting engine receives only those requests (for active content) which are forwarded from the proxy cache component. This is because the active server is not visible to the outside world, and it services only requests from the Squid proxy cache component. The URL rewriting engine then rewrites these URLs as proxy requests to the server which the client had originally requested. When the active server component in the active cache receives a reply from the origin server, it executes it with the client requested parameters. The fetched script is also saved in its local script cache. Any data that the script requires in order to execute is also fetched from the server and cached as will be detailed in the next section. In this paper, we deal only with read-only scripts, i.e., those that do not modify their data. This signicantly simplies consistency management.

3.3 Consistency Management


The consistency management protocol ensures that each script and its data objects are kept (weakly) consistent with the server. Each active server has a Consistency Manager module which handles the job of maintaining consistency. Consistency management is based on periodic polling, which is similar to the common time-tolive (TTL) approach. Each cached script is associated with a TTL. The scripts can be re-executed as long as the TTL has not expired. We believe that in practice the scripts will not change very often. Hence, weak

Active Cache
Dynamic Content Module
URL Rewriter Redirector r cSt p i ehc aC

Dynamic Content Module Content Server

Content Server

Timer

Active Server Proxy Cache

... Internet
URL Rewriter

Timer e l iF ehc aC

t p i r cS e hc a C

Active Server

Client

Client

Client

Dynamic Content Module Content Server

Unsubscribed Server

Figure 3: Components of an Active Cache consistency is sufcient. The data objects used by the scripts are associated with a polling period at which they should be retrieved from the origin server. They are considered consistent with the origin server as long as their period has not expired. During that time, these cached data objects can be accessed by the scripts upon user requests. A new copy of the data objects is requested from the origin server upon period expiration. To reduce periodic polling, the server may associate each such data object with a ticket. The ticket species the number of accesses that should occur during one polling period for periodic polling to continue. If the required number of accesses does not occur during some polling period, the cache does not contact the server upon period expiration. Instead, it contacts it upon the arrival of the next request. Hence, if the request rate is low no extra polling overhead is observed. The polling period can be dynamically adapted in a QoS-sensitive manner in response to network load conditions. The maximum allowable polling period can be specied on a per content-class basis. The consistency manager maintains the mapping between the scripts and their data les. It contains a timer thread which is invoked at constant intervals of time. At each timeout, the consistency manager checks which data les need to be re-fetched from origin servers. These les are fetched in parallel by concurrent threads. The current bandwidth available and the CPU workload are monitored every time a new thread is spawned. If the available bandwidth falls below a threshold or CPU workload exceeds a certain threshold, the consistency manager expands the polling period of some data les these data les will be fetched at the next time out. A main objective of our dynamic content caching scheme is maximizing timely hits under resource and consistency constraints. In our service model, all content is divided into classes. Each class classi has an associated deadline, minimum consistency, and nominal consistency requirements. The deadline, Di , species the desired response time for this class. It sets the order in which polling requests are made to the origin servers. The minimum and nominal consistency requirements specify the maximum and nominal tolerable staleness of data objects in that class, denotes Smax and Snom respectively. For example, stock data may have a nominal tolerable staleness Snom = 3 minutes, and a maximum tolerable staleness Smax = 15 minutes. Our objective is to maximize the weighted sum of timely hits. A hit is timely if it meets the deadline constraint. It is weighted by its utility w, which is a function of the content freshness and consistency S? constraints. In our implementation, we use, w = avg ( SnomSmax ), averaged over all data items touched by ?Smax the script serving the request, where S is the time elapsed since the given data item was fetched from the 7

server. To ensure fairness, the consistency manager adjusts the polling periods of all objects such that their utility weights are equal and are maximized for the current load conditions.

4 Implementation Details
In this section we discuss the implementation details of our architecture and protocol. In our active cache, we chose to use the freely available Squid proxy cache as the proxy cache component and the Apache web server [1] as the active server component, as mentioned in Section 3.

4.1 Implementation
We describe our implementation by following the journey of a request arriving from a client to an active cache. Upon arrival of the request, we use a redirector component to selectively rewrite URLs requested by the client. In our case, we redirect client URLs requesting dynamic content to our active server rather than the requested origin server. Client requests for static content are not redirected and are directly forwarded to the requested origin server. For example, if the original request was to http://www.xyz.com/cgibin/hello.cgi, and if our active server was at http://t2.cs.virginia.edu:8080/, after redirection the request would be http://t2.cs.virginia.edu:8080/cgi-bin/hello.cgi. We use the redirection capabilities available in Squid for the redirector component. Squid has a very useful feature that allows a third-party redirector to be used. Hence, we used a tool available under the GNU General Public License; jesred version 1.2 [16], as our redirector. We chose jesred for this purpose, over other more popular redirectors such as Squirm [26], because it is about 2-3 times faster and is highly congurable. In order for URL redirection to work, Squid is congured to use jesred as its redirector. By default, Squid rewrites any HTTP Host: header in redirected requests to point to the new host to which the URL has been redirected. However, we turn this option off, because we would like to save the name of the host for whom the request was originally meant, to eventually forward the request to that host. The conguration of jesred has two parts; conguring the Access Control List and conguring the Redirector Rules. Jesred is congured only to accept the URLs that Squid accepts. As part of jesreds Redirection rules, it is congured to ignore URLs that end in .html, .htm, .shtml, .java, .jpg etc. Hence all requests for static content are not redirected by jesred, and are handled by Squid just like it handles any static content. In contrast, URLs which contain strings like cgi-bin, .asp, are redirected by jesred to the active server. This is done by adding a rule, similar to the one given here:
regex http://[n]*/cgi-bin/(.*$) http://t2.cs.virginia.edu/cgi-bin/n1 /cgi-bin/

When the Squid proxy cache receives a request for dynamic content, it rst checks to see if it already has a cached version of that content. After performing checks to determine if the cached content is still valid, it either returns the cached page, or redirects the request to the appropriate active server to obtain the most up-to-date version of that content. When an active server receives such a redirected request, it rst checks to see if the script that has been requested is already cached locally. If so, it executes the script with the current 8

inputs and returns the results to the Squid proxy cache. In case a cached version of the script is not available locally, the active server initiates a URL rewrite operation. The URL is rewritten to fetch the script from the server specied in the HTTP Host: header, which in this case, holds the name of the server for which the request was originally meant. We have used the functionality provided by Apaches mod rewrite module to perform the URL rewriting. A sample rewrite-rule that can be used is given below:
RewriteRule /cgi-bin/(.*$) http://%fHTTP HOSTg/cgi-bin/$1 [P]

This rule states that all URLs which are meant to be served from the cgi-bin directory are to be rewritten to be served from the same directory, but on the host specied by the HTTP HOST environment variable. Since we wish to cache the script that is returned, we force this request to be a proxy request (indicated by [P]). This proxy request is handled by Apaches mod proxy module. For example, if a request meant for http://www.xyz.com/cgi-bin/hello.cgi was redirected to the active server residing at http://t2.cs.virginia.edu:8080/, then the active server on determining that the URL was for a script called hello.cgi which is not cached locally, rewrites the URL again to http://www.xyz.com/cgi-bin/hello.cgi. When the origin server receives such a proxy request for dynamic content, it has to be able to distinguish that the request originated from an active cache. In order to facilitate this, the active server modies the HTTP request header before forwarding the request. It modies the HTTP [12] Accept header, by adding the string
x-script/x-activeScripts

to the header. When the dynamic content module on the destination server recognizes this special extension, it comprehends that the script is not to be executed locally, but is to be returned to the requester as is. While we have not implemented a security mechanism in our experimental prototype, the server would also have to look at the digital signature of the requesting proxy to decide whether it can be trusted with the script. If the cache is not trausted, the script is executed locally. When a trusted active server receives a reply from the origin server, it checks the origin servers signature versus those supplied at content-provider subscription time. If the server is trusted, it caches the reply (i.e the script) in its local script cache. We have used Apaches proxy caching capabilities (provided by the mod proxy module) for this purpose. Finally, it executes the script with the parameters supplied by the client and returns the results to the Squid proxy cache.

5 Script API
In order for the active server to be able to fetch the data for the script from the origin server, we require that the script I/O be performed using our read/write primitives. We also require that the script executable be linked with our library. This allows us to implement location transparency such that a script always nds its needed data objects locally regardless of where it executes, even when our consistency manager has to fetch them rst from the origin server. The requirement to use a particular middleware API is arguably a limitation. This limitation, however, is shared by all technologies that depart from legacy code. For example, while Java is recognized as an important vehicle for software portability, at the time of its inception an obvious limitation of 9

having that portability was that a user had to learn a new programming language and translate into it all current code. Similar arguments can be made regarding CORBA, RPC and other languages or middleware. In this paper, we simply present our tools and their potential, without engaging in the argument over the commercial viability of their interfaces in their current form. In the future, it may be worthwhile to investigate hiding our API calls beneath a standard I/O library interface making them transparent to the programmer. Our API buys location-independence of server scripts and could be implemented in languages such as Java, although the present proof-of-concept implementation is in C. The API consists of the following calls:
FD = ASopen (file, mode, class, TTL)

The ASopen() call is used to open an existing le, and in some cases to create it rst. The le parameter is the name and path of the le to be opened. The mode indicates in what modes it can be opened, e.g: r, w, r+ etc. The class parameter can have one of three different values:
e- The file expires in TTL amount of time, after which it must be automatically refetched. d- The file must be fetched on-demand, i.e. it is fetched on every request. t- The file is a temporary file, i.e. it does not have consistency requirements. It is created locally, if it does not exist, and is removed when closed.

The third and fourth parameters are optional. If they are not specied, then the consistency mechanism is assumed to be d - i.e. on-demand, and the TTL is zero. The next call is the ASread() call:
Status = ASread (FD, Buffer, Count)

This call attempts to read Count number of bytes from the le description FD, into the buffer Buffer. We provide an ASwrite() call too. However, currently this call is only used to write to les whose consistency requirements indicate that it is a temporary le. The next call is the ASclose call(), which closes the le associated with the le descriptor FD. The last call in our API is ASContentServer (). This call basically indicates to the script, the IP address or host name of the origin server for which it was written. This is required in order to distinguish whether the script is currently cached on the active server proxy or executing on the origin server.
ASContentServer (IP Address)

6 Evaluation
In this section we evaluate the performance of our system with experiments based on both emulation environment and PlanetLab [20], a real-world Internet testbed. Our experiment results demonstrate that our system 10

can lead to signicant average client perceived latency improvement and better service scalability. We also present mechanisms to provide real-time guarantee for client requests and leveraging the trade-off between client request processing latency and content staleness. Analysis and performance of these mechanisms are provided. Our experiment testbed consists of a full implementation of an active cache and one origin server using instrumented Squid and Apache servers as discussed in Sections 3 and 4. To emulate a large number of clients sending requests, we use SURGE (Scalable URLs Generator)[2], a tool that generates references matching empirical measurements of web request size distribution, relative le popularity, embedded le references, idle periods of individual clients, and locality of reference. The advantage of SURGE is that it mimics well a set of real clients and generates very realistic web workloads. We made a slight extension to SURGE to generate URLs for dynamic content provided by content publishers. To show the general performance gain in client-observed response time, two sets of experiments are conducted for scenarios with and without the active cache. Over 50,000 client requests for more than 500 scripts were generated and sent to the server within approximately 15 minutes. In the rst set of these experiments, these requests were sent directly to the origin servers. In the second set, the requests were sent via an active cache. We conducted these experiments on PlanetLab. The origin server and active cache server are deployed in Europe (Universita di Bologna) and North Carolina, USA (Duke University) respectively. Clients running SURGE are in Virginia, USA (University of Virginia). This setup that clients are relatively close to active cache while origin server is far away is based on the vision that while origin server can virtually be anywhere in the world, when our active cache server is widely deployed as a CDN system, every client would be able to connect to a CDN server which runs our service and is close to the client. The end-to-end delay consists of the Internet backbone delay of from clients to the active cache server, the delay from active cache server to the origin server, the server and/or proxy processing time and the client delay. Our experimental results (Figure 6) show that there is an average 200% reduction in latency on average. Figure 6 plots the observed response time of a community of users accessing the same script with and without an active cache. The client observed latency is limited below 80 milliseconds for most of the time when the active cache is present. Note how most of the time the script is executed on the active server and results are sent back to the clients from the active cache. The latency is correspondingly low. A few spikes occur when the active server goes across the wide area network to re-fetch the data les from the origin server according to the consistency requirement of the script. Without active cache, the latency is not only much higher compared to that of with active cache but also has a very large variabilities which can be attributed to the instability of Internet backbone. In the following, we are going to illustrate and evaluate our systems performance under high workload. Since PlanetLab is a shared platform, we were explicitly discouraged from running extensive overload experiments on its servers. Instead, we used NIST Net [19] as a WAN emulator to emulate the environment of Planet-Lab servers we used. NIST Net is a Linux Kernel module that can add congured latencies to incoming packets. We used two servers in our LAN and congured the module according the network latency data we gathered from PlanetLab. To demonstrate the delity of this emulated WAN environment, we redid the experiments above in our LAN with NIST Net. Figure 6 shows that the result of emulated WAN matches fairly well with the that of PlanetLab. So, the experiments from this point on were conducted on the emulated WAN.

11

500 450 Avg Client Perceived Delay (msc) 400 350 300 250 200 150 100 50 0 0 10

w/o active cache w/ active cache w/ active cache (emulation)

20

30

40

50 Scripts

60

70

80

90

100

Figure 4: Average Latency Improvement

1000

w/o active cache w/ active cache Consistency Manager Working Set 10 20 30 40 50 60 Observations 70 80 90 100

800 Client perceived delay (msec)

600

400

200

Figure 5: Client perceived latency for a single script

12

CPU utilization(%) polling=15s

80 70 60 50 40 30 20 10 0 5 90 80 70 60 50 40 30 20 10 0 5

Active Cache Origin Server(w/ activecache) Origin Server(w/o activecache)

CPU utilization(%) polling=30s

90

90 80 70 60 50 40 30 20 10 0 5 90 10 15 20 25 30 35 40 45 50 55 (ii) Observations 80 15s 30s 60s Active Cache Origin Server(w/ activecache) Origin Server(w/o activecache)

10 15 20 25 30 35 40 45 50 55 (i) Observations

CPU utilization(%) polling=60s

CPU utilization(%)

Active Cache Origin Server(w/ activecache) Origin Server(w/o activecache)

70 60 50 40 30 20 10 0

10 15 20 25 30 35 40 45 50 55 (iii) Observations

10 15 20 25 30 35 40 45 50 55 (iv) Observations

Figure 6: CPU utilization and impact of polling period Our architecture achieves reduction of workload on the origin server and client-perceived latency at the cost of adding workload to the active cache. Figure 6 depicts one of the important metrics of workload CPU utilization for both the origin server and active cache. It is shown that the active cache considerably reduces the CPU utilization of the origin server, effectively shifting the computational demand from the remote server to the local proxy. Figures 6-i, 6-ii, and 6-iii show the CPU utilization of the active cache and the origin server (with and without the cache) for different polling periods at the active cache. As might be expected, higher polling periods correspond to higher utilization, which quanties the overhead of the scheme. Active cache utilizations for different polling periods are compared in Figure 6-iv. One of the important advantage of our architecture is we have better ability to provide real-time guarantees on latency of content access because replicating content to network edge and hence closer to clients not only reduces clients average content access latencies but also dampens the impact of variabilities of Internet. Our previous work [14] has studied the feasibility of a CDN service that replicates content on Internet so that a global delay bound is guaranteed on access requests from anywhere in the CDN. Focus of [14] has been put on dealing with network latencies. However, to provide real-time guarantee on content access, end system latency can dominate the end-to-end client perceived delay. Providing mechanisms for system to adapt to workload change and hence avoid overloading the server is crucial to the success of provisioning real-time

13

guarantee. We argue that for many applications, getting prompt response from server is more important than always being served with the freshest content. Hence, we can sacrice staleness of content in favor of better performance of conforming latency bound. Specically, in active cache when there are a large number of scripts registered in the system, responsiveness of the system can be suffered by consistency managers high workload of maintaining consistency of these scripts. In protect the system from being overloaded by consistency maintenance workload, we can reduce the frequency of consistency managers polling origin servers, i.e. release the consistency of content. As mentioned in Section 3.3, in our active cache system, latency bound on content access and consistency requirements (nominal and maximum tolerable staleness) are specied in QoS contract with content providers. When the requests timely hit ratio (dened as ratio of client requests served within latency bound) drops to a certain value (called low watermark), we begin to release consistency of content gradually. Since all content is categorized into different classes, the consistency degradation will be started with the lowest priority class. The consistency releasing will be continued until requests timely hit ratio rises up to a certain level or all content classes are already at their maximal tolerable staleness. When timely hit ratio reaches a certain point (called high watermark), the system will try to tighten the consistency again, starting from the most important class, until timely hit ratio drops to low watermark or all classes are already at their nominal staleness. We implemented two mechanisms to do the adaptive consistency adjustment according to requests requests timely hit ratio. In either mechanism, the consistency manager periodically gets reports from proxy cache informing the current request timely hit ratio based on it makes decision on adjusting consistencies of content classes. The period of adjusting consistency should be comparable to consistency of content classes. We call the minimal adjustment to consistency value adjustment granularity. The two mechanisms to be discussed are: Linear Adjustment Every time consistency manager received reports of timely hit ratio, it makes changes of at most one adjustment granularity, for both relaxing and tightening consistency. Proportional Adjustment The changes the consistency manager makes is determined by (HRc ? low watermark )=c where HRcurr is the current timely hit ratio reported by proxy cache, and constant c is a congurable system parameter. This mechanism is proportional in that the adjustment to be made is proportional to the difference between current and targeted timely hit ratio. Compared to linear adjustment, proportional adjustment can be faster in bringing timely hit ratio back to the low watermark but also has the potential of overreacting because the effect of last adjustment may has not been fully taken when the consistency manager is about to make another adjustment. Now we are going to present evaluation of these two mechanisms. In the experiments, we have three content classes, all of which have 10 seconds as nominal staleness and 60 seconds as maximum tolerance staleness. We chose low watermark and high watermark to be 0.95 and 0.98. Proxy cache reports current timely hit ratio to consistency manager every 20 seconds. Figure 6 illustrates the workload used in our experiments. At the beginning, only a small number of clients requesting scripts of a small set (labelled working set). At time unit 3 and 4, we introduce many more clients to the system whose requests span a large set of scripts. The change of number of clients and gradual increase of working set over time are plotted in Figure 6.

14

1000

Working Set Size Number of Clients Consistency Manager Working Set 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Time (unit)

800

Number of Clients

600

400

200

Figure 7: Workload used In the two sets of the experiments, we choose 300ms and 500ms as the latency bound for all three content classes. As we can see in Figure 8, 9, 10, and 11, the client requests introduced at time unit 3 and 4 significantly impact the timely hit ratio at those times. Consistency manager immediately takes action to relaxing consistency and timely hit ratio gradually rises back. Figure 8 and 9 depict the performance of linear adjustment and proportional adjustment for tight latency bound (300ms). From the gures we can see that linear adjustment is conservative in relaxing consistency and makes a minimal change every time unit and nally relaxes all three content classes to their maximum tolerable staleness. In contrast, proportional adjustment makes abrupt changes to contents consistency when a large latency bound miss ratio is seen. Hence, all the content classes quickly reach their tolerance bound. Note that proportional adjustments aggressiveness does help in pulling up the timely hit ratio faster. Also note that in these experiments no class has chance to get back better consistency because the latency bound is tight. Figure 10 and 11 show the results with a less stringent latency bound (500ms). Different from the previous case, not all content classes have to be pushed to maximum tolerable staleness because of the larger latency bound. The process of st relaxing then tightening the consistency of content was seen in both linear and proportional adjustment cases. Proportional adjustment mechanism still enjoys a slightly faster recovery of timely hit ratio than linear adjustment. On the other hand, linear adjustment achieves better consistency for content. In Figure 10 we can tell that class 0, the most important class, enjoyed the nominal staleness all the time and at time unit 19 class 1 also got back to the nominal staleness while Figure 11 shows that class 0 suffered a long period of time of being degraded to the maximum staleness and the other two classes did not get as good consistency as those of linear adjustment case. Two additional observations can be made from the gures. First, for all cases, the three classes had roughly the same timely hit ratios over the experiment time which indicates fairness in terms of delay guarantee. Second, it is clear that consistency of lowest priority class gets relaxed rst and gets tightened last, as it supposed to be.

15

70 60

0.9 50 Consistency (sec) Timely Hit Ratio 40 30 20 0.6 10 0 1 2 3 4 5 6 7 8 Time Class 0 Class 1 Class 2 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Overall Class 0 Class 1 Class 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Time 0.8

0.7

0.5

(a) Consistency

(b) Timely Hit Ratio

Figure 8: Linear Adjustment (latency bound = 300ms)

70 60

0.9 50 Consistency (sec) Timely Hit Ratio 40 30 20 0.6 10 0 1 2 3 4 5 6 7 8 Time Class 0 Class 1 Class 2 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Overall Class 0 Class 1 Class 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Time 0.8

0.7

0.5

(a) Consistency

(b) Timely Hit Ratio

Figure 9: Proportioanl Adjustment (latency bound = 300ms)

16

70 60

0.9 50 Consistency (sec) Timely Hit Ratio 40 30 20 0.6 10 0 1 2 3 4 5 6 7 8 Time Class 0 Class 1 Class 2 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Overall Class 0 Class 1 Class 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Time 0.8

0.7

0.5

(a) Consistency

(b) Timely Hit Ratio

Figure 10: Linear Adjustment (latency bound = 500ms)

70 60

0.9 50 Consistency (sec) Timely Hit Ratio 40 30 20 0.6 10 0 1 2 3 4 5 6 7 8 Time Class 0 Class 1 Class 2 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Overall Class 0 Class 1 Class 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Time 0.8

0.7

0.5

(a) Consistency

(b) Timely Hit Ratio

Figure 11: Proportioanl Adjustment (latency bound = 500ms)

17

7 Conclusion and Future Work


In this paper, we proposed and implemented a novel and scalable architecture for dynamic web content replication. Our solution involves replicating the scripts which generate the active web content, and their data, from the server to active caches which are nearer to the clients. Our experimental results showed signicant improvement of effective throughput and client response time. Reducing workload on origin servers also improves service scalability. In addition our architecture provides customizable consistency semantics. Currently our implementation supports read-only scripts. Subscribers (i.e., clients) can not modify the data on the publisher (i.e., origin server). We are investigating methods to allow replicated scripts to modify shared objects. Our current implementation just downloads the executables from the publishers to the active server, so we potentially have a platform heterogeneity issue. Origin servers can actually be more active instead of just passively serving the requests from active caches. Origin servers can take certain proactive strategies to feed data to active caches on the Internet backbone. How to make origin server and active server cooperate in an effective way is an interesting topic. Last, our implementation requires all the scripts to use a set of ActiveServer API. We are trying to remove this limitation so that the programmer can write scripts with their favorite languages without consideration to a new interface.

References
[1] Apache. The apache software foundation http://www.apache.org/. [2] P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In ACM SIGMETRICS 98, 1998. [3] M. Beck, T. Moore, L. Abrahamsson, C. Achouiantz, and P. Johansson. Enabling full service surrogates using the portable channel representation. In International World Wide Web Conference, 2001. [4] S. Calo and D. Verma. An architecture for acceleration of large scale distributed web applications, 2002. [5] K. S. Candan, W.-S. Li, Q. Luo, W.-P. Hsiung, and D. Agrawal. Enabling dynamic content caching for databasedriven web sites. In Proceedings of the 2001 ACM SIGMOD international conference on Management of Data on Management of data, 2001. [6] P. Cao and S. Irani. Cost-aware WWW proxy caching algorithms. In USENIX Symposium on Internet Technologies and Systems, 1997. [7] P. Cao, J. Zhang, and K. Beach. Active cache: Caching dynamic contents on the web. In Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing (Middleware 98), 1998. [8] J. Challenger, P. Dantzig, and A. Iyengar. A scalable system for consistently caching dynamic web data. In Proceedings of the 18th Annual Joint Conference of the IEEE Computer and Communications Societies, New York, New York, 1999. [9] A. Chankhunthod, P. B. Danzig, C. Neerdaels, M. F. Schwartz, and K. J. Worrell. A hierarchical internet object cache. In USENIX Annual Technical Conference, pages 153164, 1996. [10] F. Douglis, A. Haro, and M. Rabinovich. HPP: HTML macro-preprocessing to support dynamic document caching. In USENIX Symposium on Internet Technologies and Systems, 1997. [11] Edge side include: http://www.esi.org. [12] J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. Hypertext transfer protocol - http/1.1, 1999.

18

[13] V. Holmedahl, B. Smith, and T. Yang. Cooperative caching of dynamic content on a distributed web server. In Proceedings of Seventh IEEE International Symposium on High Performance Distributed Computing (HPDC-7), pages 243250, 1998. [14] C. Huang and T. Abdelzaher. A performance evaluation of content distribution networks with real-time guarantees, 2003. [15] A. Iyengar and J. Challenger. Improving web server performance by caching dynamic data. In USENIX Symposium on Internet Technologies and Systems, 1997. [16] Jesred. http://www-ivs.cs.uni-magdeburg.de/ elkner/webtools/jesred/. [17] S. Lyer, A. Rowstron, and P. Druschel. Squirrel: A decentralized peer-to-peer web cache. In Proceedings of the 21th ACM Symposium on Principles of Distrbuted Computing (PODC), 2002. [18] A. Myers, J. Chuang, U. Hengartner, Y. Xie, W. Zhuang, and H. Zhang. A secure, publisher-centic web caching infrastructure. In IEEE INFOCOM, 2001. [19] N. Net. http://snad.ncsl.nist.gov/itg/nistnet/. [20] Planet-Lab. http://www.planet-lab.org. [21] K. Rajamani and A. Cox. A simple and effective caching scheme for dynamic conetent. Technical Report TR-00371, Computer Science Dept. Rice University, 2000. [22] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content addressable network. In SIGCOMM, 2001. [23] A. Rowstron and P. Druschel. Storage management and caching in past, a large-scale, persistent peer-to-peer storage utility. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, Banff, Canada, Oct 2001. [24] B. Smith, A. Acharya, T. Yang, and H. Zhu. Exploiting result equivalence in caching dynamic web content. In USENIX Symposium on Internet Technologies and Systems, 1999. [25] Squid. Squid web proxy cache http://www.squid-cache.org/. [26] Squirm. Squirm: A redirector for squid http://squirm.foote.com.au/. [27] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM, 2001. [28] A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, A. R. Karlin, and H. M. Levy. On the scale and performance of cooperative web proxy caching. In Symposium on Operating Systems Principles, pages 1631, 1999. [29] H. Yu, L. Breslau, and S. Shenker. A scalable web cache consistency architecture. In SIGCOMM, pages 163174, 1999. [30] B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB//CSD-01-1139, U. C. Berkeley, 2001. [31] H. Zhu and T. Yang. Class-based cache management for dynamic web contents. In IEEE INFOCOM, 2001.

19