Beruflich Dokumente
Kultur Dokumente
Abstract traffic and for balancing load and content among servers so
that service level agreements can be met.
Content networking is an emerging technology, where One way of deploying content networking technologies
the requests for content are steered by content routers that is to interpose a content router between a client and the web
examine not only the destinations but also content descrip- server or server cluster. In this approach [18], the content
tors such as URLs and cookies. In the current deployments router is used as a load balancer for the back-end servers
of content networking, content routing is mostly confined and the content router should be all knowing to make opti-
to selecting the most appropriate back-end server in virtu- mal decisions. This approach can be generalized to arrive at
alized web server clusters. In this paper, we present an ar- the second approach [20], where name servers may be used
chitecture for wide-area content routing. The architecture is to select the most appropriate server site depending on the
based on tagging the requests at ingress points. The tags are geographical locations of the clients, servers, and network
designed to incorporate several different attributes of the and server load status. The first approach has the disadvan-
content in the routing process. Simulations are carried out tage of lack of scalability and a single point of failure and
to compare the performance of the proposed scheme with a the second approach is inefficient in handling portions of a
DNS-based content access scheme. site.
In this paper, we present a wide-area content-based rout-
ing architecture. The routers perform content-based rout-
1. Introduction ing by examining the content carried by the traffic flow. In
the proposed architecture, we describe a novel content de-
As business critical applications are deployed over the rived tagging scheme where content is examined only once
Internet, it is becoming necessary for the service or content and the subsequent forwarding operations are based on the
authors to ensure that the clients are receiving their service tag prepended at the ingress router. We perform simulation
or content at a satisfactory level to preserve brand equity studies that use various different metrics to compare the per-
and customer loyalty. This issue has prompted a flurry of formance of the proposed architecture with a DNS-based
activities in the area of server-initiated caching and replica- content access scheme which is the most popular content
tion. access scheme on the Internet [8].
One emerging technology in this area is content-aware The rest of the paper is organized as follows: Section 2
networking, where a new generation of routers specifically presents the overall architecture for our content-based rout-
designed to address the unique requirements of Web traf- ing technique and the content characterization and classifi-
fic are used. These content-based routers have the ability to cation schemes we had used to create the tags. The simula-
route traffic flows based on some attribute of the content be- tion results are presented and discussed in Section 3 and the
ing requested such as URL or cookie values. The previous Section 4 presents related work in the field.
generation of routers that routed traffic flows based on pa-
rameters such as destination IP, protocol ID, and transport 2. Overall Architecture
port number cannot differentiate, for example, between a
CGI script request or a streaming audio request. However, This section describes our content-based routing archi-
these two requests have very different quality of service tecture called the protocol independent content switching
(QoS) requirements. Content-based routers, on the other (PICS). In PICS, client and server sites are interconnected
hand, provide flexibility in defining policies for prioritizing through an overlay network called virtual content network
Server selection
2. Check binding information Server set selection
3. Select content tag for request Path selection
4. Select the nexthop
Figure 2. Content tagging process in a CER. Figure 4. (a) IP resolution scheme (b) Tag res-
olution scheme.
Similar to the CERs the CSRs also maintain content-
based routing tables. Unlike CERs, the CSRs do not have
any content-to-tag binding information. The CSRs assume
Find the tag
that the appropriate tags are already created by the CERs. Client
request
CER
from FIB and
Find the Forward
examines nexthop from encapsulated
encapsulate
The CSRs use the content-based routing tables to steer the client
the request
content packet to
request routing table nexthop
requests toward the appropriate server side CER. Fig. 3 packet
nexthop is a No is nexthop
Encapsulated Packet Encapsulated Packet CSR a CER
Yes
1. Read Content Header Hosting server
deliver content to
Content Routing Table
3. Forward Packet Decapsulate
client along the Forward to nexthop is the
the request
Tag Next Hop same path hosting server egress CER
packet
traversed by the
request
2. Select Next Hop
A content accessing scheme essentially implements two 2.5. Content Characterization and Content Classi-
major functions: (a) server selection: selecting a server site fication for Content-based Routing
and a server within the site that can serve the requested
document and (b) path selection: selecting the path along Content characterization is process that identifies the
which the selected server delivers the content to the clients. key attributes of content which can be used to generate an
Most of the existing content accessing approaches execute accurate description of the content and its resource require-
these functions in two different phases. In the first phase, ments from the perspective of of content-based routing and
1000
30000
800
400 20000
200 15000
0
10000
50 150 250 500
Number of Nodes
5000
DNS Site-based PICS Site-based PICS Set-based
0
50 150 250 500
Number of Nodes
Figure 6. Resolution time with 25% replication DNS Site-based PICS Site-based PICS Set-based
and 100,000 requests.
2000
they are able to exploit the multiple paths or path segments
more effectively than a DNS based access scheme. More-
1500
over, in PICS, the availability of multiple paths are consid-
1000 ered in conjunction with the server placements, which fur-
ther increases the performance.
500
Experiments were conducted to examine the impact of
0
50 150 250 500
streaming content. We model streaming content by inject-
Number of Nodes ing persistent requests. These requests are routed using the
DNS Site-based PICS Site-based PICS Set-based
PICS mechanisms but the route is “pinned” to the initial
configuration for the life of the connection. Fig. 9 shows the
variation of the overall network throughput with the percent
Figure 7. Content delivery time with 25% repli- of streaming requests in the request pool. It can be observed
cation and 100,000 requests. that the throughput increases by about 15% to 20% with the
increase in streaming content. We also observed that about
11% of the streaming content requests were rejected due to
Currently, the CSRs use the routing fractions to load bal- the unavailability of capacity for routing the requests.
ance among the different candidate server replicas. One of
the side effects of this process that is observed through ex-
periments is the increases link utilization. Any large-scale 4. Literature Review
system such as the PICS that uses status information has
to handle the staleness in the available information. We are Content routing involves both locating and accessing
currently investigating mechanisms for interpreting the stal- content. Locating content may include content discovery
eness in the information. Further, the basic PICS framework on the network. Accessing content typically requires iden-
we present here can be extended to: (a) load balance only tifying a network path with the desired quality of service
resource intensive requests (large objects or streaming me- parameters and setting up sufficient resources along the
dia) and concentrate requests for small objects and (b) prior- path. Existing approaches (e.g., domain name based rout-
250
value pairs are used to identify different services and an
200
unique identifier is used to identify the node advertising the
150 service.
100 The Name Based Routing Protocol (NBRP) [13] imple-
50
ments a highly scalable Internet-wide name resolver that
takes into consideration the current “performance” level of
0
0 1 10 25 the target servers in performing the name resolution. This
Request for Streaming Content (%) technique is similar to the current two phased data access
50 Nodes 150 Nodes 250 Nodes 500 Nodes paradigm in the Internet. The NBRP aggregates content
names based on how the domains are connected to the In-
ternet, i.e., an ISP content router may represent the content
Figure 9. Variation of the overall through- originating from the servers that are being served by the
put with percentage of streaming requests ISP. Depending on the replication patterns of a document,
for 25% set-based replication and 100,000 re- it could be part of several aggregates. An Internet name
quests. resolution protocol (INRP) is used to translate a client re-
quest to an INRP request which is then used to route the
request to the best performing content server through a net-
ing) have devised separate schemes for locating and ac- work of content routers. When a request reaches a content
cessing content. In this chapter, we examine a representa- router that is nearest to the best performing content server,
tive set of projects from the related literature on content- the router returns the address of the server back to the re-
based routing approaches. These approaches can be di- questing client using the same path as the INRP request.
vided into two classes. The first class represents highly
distributed networked naming systems. These systems use 4.2. Class II: Content Routing Models
some performance-based metrics to resolve a high-level
content-based name to a machine name that will lead to the The Content Addressable Network (CAN) [17] presents
location of the content. The second class represents sys- a highly distributed hash table architecture that can be im-
tems that “route” actual content requests considering net- plemented on the Internet. The main idea is to create a vir-
work and server status. Currently, there are only few sys- tual coordinate space of -dimensions and hash a key on
tems that fall into this category. These include scalable to this space. The virtual coordinate space is divided into
Web server clusters that are formed by interposing content subspaces and one or more CAN nodes are responsible for
switches between a cluster of high-performance servers and maintaining the information for their associated subspaces.
the clients. Collectively, the CAN nodes form an overlay network that
spans the coordinate space. A key is hashed to find the des-
4.1. Class I: Name-based Routing Models tination coordinate position in the virtual space. Using this
position, the subspace and the corresponding CAN nodes
The Intentional Naming System (INS) [19] presents a re- that hold the relevant “value” are identified. Request pack-
source discovery and routing system based on the descrip- ets are then sent towards the destination CAN nodes. A
tion of the services. Applications create and advertise in- binning scheme described in [16] can be used to create the
tentional names for each service they provide through an coordinate space. The binning technique is a scalable and
overlay network, of spanning tree topology, comprising In- distributed method which can also be used to perform server
tentional Name Resolvers (INRs). The INRs locally cache selection in the Internet.
the advertisements comprising the intentional names of the PAST [7] is a persistent P2P storage utility that uses a hy-
services and the IP addresses of the corresponding service percube based routing scheme, called Pastry, to route con-
providers. Any client requiring service will probe the INR tents. The Pastry system [2] is an overlay network of nodes
using an intentional name of the service. The INR returns where each node is assigned a randomly generated 128-bit
the IP addresses that correspond to the intentional name identifier to denote the node’s position in a circular nodeid
(called early binding) or tunnels the data received from the space ranging from 0 to . Given a message with a
client to the location with the least routing metric (e.g., key, a Pastry node routes the message to a node whose iden-
server status), called intentional anycast, depending on the tification number is numerically closest to the key in O(log
service option required by the client. The intentional names N) steps, where N is the number of Pastry nodes in the net-