Survey

Neutrino – Database Replication in

Mobile Network
Literature Survey by,

9 Bala Krishnan.L.M.
9 Karthik C
9 Kumaran V
9 Senthil Kumar V
1

Neutrino – Replication for Mobile Networks

Replication:
Replication is a technique that allows improving the quality of distributed services. In the
past few years, it has been increasingly applied to Web services, notably for hosting Web sites.
In such cases, replication involves creating copies of a site’s Web documents, and placing these
document copies at well-chosen locations. In addition, various measures are taken to ensure
(possibly different levels of) consistency when a replicated document is updated. Finally, effort
is put into redirecting a client to a server, hosting a document copy, such that the client is
optimally served. Replication can lead to reduced client latency and network traffic by
redirecting client requests to a replica closest to that client. It can also improve the availability of
the system, as the failure of one replica does not result in entire service outage.
Parameters of Replication:
In metric determination, we address the question how to find and estimate the metrics
required by different components of the system. Metric determination is the problem of
estimating the value of the objective function parameters. We discuss two important issues
related to metric estimation that need to be addressed to build a good replica hosting system. The
first issue is metric identification: the process of identifying the metrics that constitute the
objective function the system aims to optimize. For example, a system might want to minimize
client latency to attract more customers, or might want to minimize the cost of replication. The
other important issue is the process of metric estimation. This involves the design of mechanisms
and services related to estimation or measurement of metrics in a scalable manner. As a concrete
example, measuring client latency to every client is generally not scalable. In this case, we need
to group clients into clusters and measure client-related metrics on a per-cluster basis instead of
on a per-client basis (we call this process of grouping clients as client clustering). In general, the
metric estimation component measures various metrics needed by other components of the
replica hosting system.
Adaptation triggering addresses the question when to adjust or adapt the system
configuration. Consider a flash crowd causing poor client latency. The system must identify such
a situation and react, for example, by increasing the number of replicas to handle the increase in
the number of requests. Similarly, congestion in a network where a replica is hosted can result in
poor accessibility of that replica. The system must identify such a situation and possibly move
that replica to another server. The adaptation-triggering mechanisms do not form an input
parameter of the objective function. Instead, they form the heart of the feedback element., thus
indirectly control and maintain the system in an acceptable state.
With replica placement we address the question where to place replicas. This issue
mainly concerns two problems: selection of locations to install replica servers that can host
replicas (replica server placement) and selection of replica servers to host replicas of a given
object (replica content placement). The server placement problem must be addressed during the
initial infrastructure installation and during the hosting infrastructure upgrading. The replica
content placement algorithms are executed to ensure that content placement results in an
acceptable value of l, given a set of replica servers. Replica placement components use metric
2

estimation services to get the value of metrics required by their placement algorithms. Both
replica server placement and replica content placement form controllable input parameters of the
objective function.
With consistency enforcement we consider how to keep the replicas of a given object
consistent. Maintaining consistency among replicas adds overhead to the system, particularly
when the application requires strong consistency (meaning clients are intolerant to stale data) and
the number of replicas is large. The problem of consistency enforcement is defined as follows.
Given certain application consistency requirements, we must decide what consistency models,
consistency policies and content distribution mechanisms can meet these requirements. A
consistency model dictates the consistency-related properties of content delivered by the systems
to its clients. These models define consistency properties of objects based on time, value, or the
order of transactions executed on the object. A consistency model is usually adopted by
consistency policies, which define how, when, and which content distribution mechanisms must
be applied. The content distribution mechanisms specify the protocols by which replica servers
exchange updates. For example, a system can adopt a time-based consistency model and employ
a policy where it guarantees its clients that it will never serve a replica that is more than an hour
older than the most recent state of the object. This policy can be enforced by different
mechanisms.
Request routing is about deciding how to direct clients to the replicas they need. We
choose from a variety of redirection policies and redirection mechanisms. Whereas the
mechanisms provide a method for informing clients about replica locations, the policies are
responsible for determining which replica must serve a client. The request routing problem is
complementary to the placement problem, as the assumptions made when solving the latter are
implemented by the former. For example, we can place replica servers close to our clients,
assuming that the redirection policy directs the clients to their nearby replica servers. However,
deliberately drifting away from these assumptions can sometimes help in optimizing the
objective function. For example, we may decide to direct some client requests to more distant
replica servers to offload the client-closest one. Therefore, we treat request routing as one of the
(controllable) objective function parameters.
Wireless Network Technology:

Wi-Fi, also unofficially known as Wireless Fidelity, is a wireless technology brand
owned by the Wi-Fi Alliance intended to improve the interoperability of wireless local area
network products based on the IEEE 802.11 standards. Common applications for Wi-Fi include
Internet and VoIP phone access, gaming, and network connectivity for consumer electronics
such as televisions, DVD players, and digital cameras.
Definition:
Wi-Fi Alliance is a consortium of separate and independent companies agreeing to a set
of common interoperable products based on the family of IEEE 802.11 standards. Wi-Fi certifies
products via a set of established test procedures to establish interoperability. Those
manufacturers that are members of Wi-Fi Alliance whose products pass these interoperability
tests can mark their products and product packaging with the Wi-Fi logo.
Wi-Fi Technical Information
According to the brand style guide of the Wi-Fi Alliance (the owner of the Wi-Fi brand):
Products which successfully pass the Wi-Fi Alliance testing may use the Wi-Fi CERTIFIED
brand. The Alliance tests and certifies the interoperability of wireless LAN products based on the
3

IEEE 802.11 standards. Studies show that 88% of consumers prefer products that have been
tested by an independent organization. Wi-Fi technologies have gone through several generations
since their inception in 1997. Wi-Fi is supported to different extents under Microsoft Windows,
Apple Macintosh and open source Unix and Linux operating systems. Contrary to popular belief,
Wi-Fi is not an abbreviation for "Wireless Fidelity"
Uses:
A Wi-Fi enabled device such as a PC, game console, cell phone, MP3 player or PDA can
connect to the Internet when within range of a wireless network connected to the Internet. The
area covered by one or more interconnected access points is called a hotspot. Hotspots can cover
as little as a single room with wireless-opaque walls or as much as many square miles covered by
overlapping access points. Wi-Fi can also be used to create a mesh network. Both architectures
are used in community networks.
Wi-Fi also allows connectivity in peer-to-peer (wireless ad-hoc network) mode, which
enables devices to connect directly with each other. This connectivity mode is useful in
consumer electronics and gaming applications.
When the technology was first commercialized there were many problems because
consumers could not be sure that products from different vendors would work together. The Wi-
Fi Alliance began as a community to solve this issue so as to address the needs of the end user
and allow the technology to mature. The Alliance created the branding Wi-Fi CERTIFIED to
show consumers that products are interoperable with other products displaying the same
branding.
Many consumer devices use Wi-Fi. Amongst others, personal computers can network to
each other and connect to the Internet, mobile computers can connect to the Internet from any
Wi-Fi hotspot, and digital cameras can transfer images wirelessly.
Routers which incorporate a DSL or cable modem and a Wi-Fi access point are often used in
homes and other premises, and provide Internet access and internetworking to all devices
connected wirelessly or by cable into them. Devices supporting Wi-Fi can also be connected in
ad-hoc mode for client-to-client connections without a router.
Business and industrial Wi-Fi is widespread as of 2007. In business environments,
increasing the number of Wi-Fi access points provides redundancy, support for fast roaming and
increased overall network capacity by using more channels or creating smaller cells. Wi-Fi
enables wireless voice applications (VoWLAN or WVOIP). Over the years, Wi-Fi
implementations have moved toward 'thin' access points, with more of the network intelligence
housed in a centralized network appliance, relegating individual Access Points to be simply
'dumb' radios. Outdoor applications may utilize true mesh topologies. As of 2007 Wi-Fi
installations can provide a secure computer networking gateway, firewall, DHCP server,
intrusion detection system, and other functions.
In addition to restricted use in homes and offices, Wi-Fi is publicly available at Wi-Fi
hotspots provided either free of charge or to subscribers to various providers. Free hotspots are
often provided by businesses such as hotels, restaurants, and airports who offer the service to
attract or assist clients. Sometimes free Wi-Fi is provided by enthusiasts, or by organizations or
authorities who wish to promote business in their area. Metropolitan-wide Wi-Fi (Mu-Fi) already
has more than 300 projects in process.
4

Advantages of Wi-Fi:
Wi-Fi allows LANs to be deployed without cabling for client devices, typically reducing
the costs of network deployment and expansion. Spaces where cables cannot be run, such as
outdoor areas and historical buildings, can host wireless LANs.
As of 2007 wireless network adapters are built into most modern laptops. The price of chipsets
for Wi-Fi continues to drop, making it an economical networking option included in ever more
devices. Wi-Fi has become widespread in corporate infrastructures, which also helps with the
deployment of RFID technology that can piggyback on Wi-Fi.
Different competitive brands of access points and client network interfaces are inter-
operable at a basic level of service. Products designated as "Wi-Fi Certified" by the Wi-Fi
Alliance are backwards inter-operable. Wi-Fi is a global set of standards. Unlike mobile
telephones, any standard Wi-Fi device will work anywhere in the world.
Wi-Fi is widely available in more than 250,000 public hotspots and tens of millions of
homes and corporate and university campuses worldwide. WPA is not easily cracked if strong
passwords are used and WPA2 encryption has no known weaknesses. New protocols for Quality
of Service (WMM) make Wi-Fi more suitable for latency-sensitive applications (such as voice
and video), and power saving mechanisms (WMM Power Save) improve battery operation.
Disadvantages of Wi-Fi:
Spectrum assignments and operational limitations are not consistent worldwide. Most of
Europe allows for an additional 2 channels beyond those permitted in the U.S for the 2.4 GHz
band. (1-13 vs. 1-11); Japan has one more on top of that (1-14). Europe, as of 2007, is now
essentially homogeneous in this respect. A very confusing aspect is the fact a Wi-Fi signal
actually occupies five channels in the 2.4 GHz resulting in only 3 non-overlapped channels in the
US: 1, 6, 11, and four in Europe: 1,5,9,13
Some countries, such as Italy, formerly required a 'general authorization' for any Wi-Fi
used outside an operator's own premises, or require something akin to an operator registration.
Equivalent isotropically radiated power (EIRP) in the EU is limited to 20 dBm (0.1 W).
Power consumption is fairly high compared to some other low-bandwidth standards, such as
Zigbee and Bluetooth, making battery life a concern.
The most common wireless encryption standard, Wired Equivalent Privacy or WEP, has
been shown to be easily breakable even when correctly configured. Wi-Fi Protected Access
(WPA and WPA2), which began shipping in 2003, aims to solve this problem and is now
available on most products. Wi-Fi Access Points typically default to an open (encryption-free)
mode. A novice user benefit from a zero-configuration device that works out of the box, but this
default is without security enabled, providing open wireless access to their LAN. To turn security
on requires the user to configure the device, usually via a software graphical user interface
(GUI). Wi-Fi networks that are open (unencrypted) can be monitored and used to read and copy
data (including personal information) transmitted over the network, unless another security
method is used to secure the data, such as a VPN or a secure web page. (See HTTPS/Secure
Socket Layer.)
Many 2.4 GHz 802.11b and 802.11g Access points default to the same channel on initial
startup, contributing to congestion on certain channels. To change the channel of operation for an
access point requires the user to configure the device.
Wi-Fi networks have limited range. A typical Wi-Fi home router using 802.11b or
802.11g with a stock antenna might have a range of 32 m (120 ft) indoors and 95 m (300 ft)
5

outdoors. Range also varies with frequency band. Wi-Fi in the 2.4 GHz frequency block has
slightly better range than Wi-Fi in the 5 GHz frequency block. Outdoor range with improved
(directional) antennas can be several kilometers or more with line-of-sight.
Wi-Fi pollution, or an excessive number of access points in the area, especially on the same or
neighboring channel, can prevent access and interfere with the use of other access points by
others, caused by overlapping channels in the 802.11g/b spectrum, as well as with decreased
signal-to-noise ratio (SNR) between access points. This can be a problem in high-density areas,
such as large apartment complexes or office buildings with many Wi-Fi access points.
Additionally, other devices use the 2.4 GHz band: microwave ovens, cordless phones, baby
monitors, security cameras, and Bluetooth devices can cause significant additional interference.
General guidance to those who suffer these forms of interference or network crowding is to
migrate to a Wi-Fi 5GHz product (802.11a) usually a dual band product as the 5GHz band is
relatively unused and there are many more channels available. This also requires users to set up
the 5GHz band to be the preferred network in the client and to configure each network band to a
different name (SSID).
It is also an issue when municipalities, or other large entities such as universities, seek to
provide large area coverage. Everyone is considered equal for the base standard without
802.11e/WMM when they use the band. This openness is also important to the success and
widespread use of 2.4 GHz WI-Fi, but makes it unsuitable for "must-have" public service
functions or where reliability is required.
Interoperability issues between brands or proprietary deviations from the standard can disrupt
connections or lower throughput speeds on other user's devices that are within range.
Additionally, Wi-Fi devices do not, as of 2007, pick channels to avoid interference.
Standard devices
Wireless access points connect a group of wireless devices to an adjacent wired LAN. An
access point is similar to an Ethernet hub, relaying data between connected wireless devices in
addition to a (usually) single connected wired device, most often an Ethernet hub or switch,
allowing wireless devices to communicate with other wired devices.
Wireless adapters allow devices to connect to a wireless network. These adapters connect
to devices using various external or internal interconnects such as PCI, miniPCI, USB, Express
Card, Card bus and PC card. Most new laptop computers are equipped with internal adapters.
Internal cards are generally more difficult to install.
Wireless routers integrate WAP, Ethernet switch, and internal Router firmware
application that provides IP Routing, NAT, and DNS forwarding through an integrated WAN
interface. A wireless router allows wired and wireless Ethernet LAN devices to connect to a
(usually) single WAN device such as cable modem or DSL modem. A wireless router allows all
three devices (mainly the access point and router) to be configured through one central utility.
This utility is most usually an integrated web server which serves web pages to wired and
wireless LAN clients and often optionally to WAN clients. This utility may also be an
application that is run on a desktop computer such as Apple's AirPort.
Wireless Ethernet bridges connect a wired network to a wireless network. This is
different from an access point in the sense that an access point connects wireless devices to a
wired network at the data-link layer. Two wireless bridges may be used to connect two wired
networks over a wireless link, useful in situations where a wired connection may be unavailable,
such as between two separate homes.
6

Wireless range extenders or wireless repeaters can extend the range of an existing
wireless network. Range extenders can be strategically placed to elongate a signal area or allow
for the signal area to reach around barriers such as those created in L-shaped corridors. Wireless
devices connected through repeaters will suffer from an increased latency for each hop.
Additionally, a wireless device at the end of chain of wireless repeaters will have a throughput
that is limited by the weakest link within the repeater chain.
Most commercial devices (routers, access points, bridges, repeaters) designed for home
or business environments use either RP-SMA or RP-TNC antenna connectors. PCI wireless
adapters also mainly use RP-SMA connectors. Most PC card and USB wireless only have
internal antennas etched on their printed circuit board while some have MMCX connector or
MC-Card external connections in addition to an internal antenna. A few USB cards have a RP-
SMA connector. Most Mini PCI wireless cards utilize Hirose U.FL connectors, but cards found
in various wireless appliances contain all of the connectors listed. Many high-gain (and
homebuilt antennas) utilizes the Type N connector more commonly used by other radio
communications methods.
Non-standard devices
USB-Wi-Fi adapters, food container "Cantinas", parabolic reflectors, and many other
types of self-built antennas are increasingly made by do-it-yourselfers. For minimal budgets, as
low as a few dollars, signal strength and range can be improved dramatically.
As of 2007, Long Range Wi-Fi kits have begun to enter the market. Companies like
BroadbandXpress offer long range, inexpensive kits that can be setup with limited knowledge.
These kits utilize specialized antennas which increase the range of Wi-Fi dramatically, up to the
world record 137.2 miles (220 km). These kits are commonly used to get broadband internet to a
place without direct broadband access.
The longest link ever achieved was by the Swedish space agency. They attained 310 km,
but used 6 watt amplifiers to reach an overhead stratospheric balloon. The longest link without
amplification was 279 km in Venezuela, 2006.
Wireless LAN:
A wireless LAN or WLAN is a wireless local area network, which is the linking of two or
more computers without using wires. WLAN utilizes spread-spectrum or OFDM (802.11a)
modulation technology based on radio waves to enable communication between devices in a
limited area, also known as the basic service set. This gives users the mobility to move around
within a broad coverage area and still be connected to the network.
For the home user, wireless has become popular due to ease of installation, and location freedom
with the gaining popularity of laptops. Public businesses such as coffee shops or malls have
begun to offer wireless access to their customers; some are even provided as a free service. Large
wireless network projects are being put up in many major cities. Google is even providing a free
service to Mountain View, California and has entered a bid to do the same for San Francisco.
New York City has also begun a pilot program to cover all five boroughs of the city with
wireless Internet access.
In 1970 University of Hawaii, under the leadership of Norman Abramson, developed the
world’s first computer communication network using low-cost ham-like radios, named The bi-
directional star topology of the system included seven computers deployed over four islands to
communicate with the central computer on the Oahu Island without using phone lines. "In 1979,
F.R. Gfeller and U. Bapst published a paper in the IEEE Proceedings reporting an experimental
7

wireless local area network using diffused infrared communications. Shortly thereafter, in 1980,
P. Ferrert reported on an experimental application of a single code spread spectrum radio for
wireless terminal communications in the IEEE National Telecommunications Conference. In
1984, a comparison between Infrared and CDMA spread spectrum communications for wireless
office information networks was published by Kaveh Pahlavan in IEEE Computer Networking
Symposium which appeared later in the IEEE Communication Society Magazine. In May 1985,
the efforts of Marcus led the FCC to announce experimental ISM bands for commercial
application of spread spectrum technology. Later on, M. Kavehrad reported on an experimental
wireless PBX system using code division multiple access. These efforts prompted significant
industrial activities in the development of a new generation of wireless local area networks and it
updated several old discussions in the portable and mobile radio industry.
The first generation of wireless data modems was developed in the early 1980's by
amateur communication groups. They added a voice band data communication modem, with data
rates below 9600 bit/s, to an existing short distance radio system such as a walkie talkie. The
second generation of wireless modems was developed immediately after the FCC announcement
in the experimental bands for non-military use of the spread spectrum technology. These
modems provided data rates on the order of hundreds of Kbit/s. The third generation of wireless
modem then aimed at compatibility with the existing LANs with data rates on the order of
Mbit/s. Several companies developed the third generation products with data rates above 1
Mbit/s and a couple of products had already been announced.
"The first of the IEEE Workshops on Wireless LAN was held in 1991. At that time early
wireless LAN products had just appeared in the market and the IEEE 802.11 committee had just
started its activities to develop a standard for wireless LANs. The focus of that first workshop
was evaluation of the alternative technologies. The technology was relatively mature, a variety of
applications had been identified and addressed and technologies that enable these applications
were well understood. Chip sets aimed at wireless LAN implementations and applications, a key
enabling technology for rapid market growth, were emerging in the market. Wireless LANs were
being used in hospitals, stock exchanges, and other in building and campus settings for nomadic
access, point-to-point LAN bridges, ad-hoc networking, and even larger applications through
internetworking. The IEEE 802.11 standard and variants and alternatives, such as the wireless
LAN interoperability forum and the European HIPERLAN specification had made rapid
progress, and the unlicensed PCS and the proposed SUPERNet bands also presented new
opportunities."
On July 21, 1999, AirPort debuted at the Macworld Expo in New York City with Steve
Jobs picking up an iBook supposedly to give the cameraman a better shot as he surfed the Web.
Applause quickly built as people realized there were no wires. This was the first time Wireless
LAN became publicly available at consumer pricing and easily available for home use. Before
the release of the Airport, Wireless LAN was too expensive for consumer use and used
exclusively in large corporate settings.
Originally WLAN hardware was so expensive that it was only used as an alternative to
cabled LAN in places where cabling was difficult or impossible. Early development included
industry-specific solutions and proprietary protocols, but at the end of the 1990s these were
replaced by standards, primarily the various versions of IEEE 802.11 (Wi-Fi). An alternative
ATM-like 5 GHz standardized technology, HIPERLAN, has so far not succeeded in the market,
and with the release of the faster 54 Mbit/s 802.11a (5 GHz) and 802.11g (2.4 GHz) standards,
almost certainly never will.
8

In November 2006, the Australian Commonwealth Scientific and Industrial Research

Organization (CSIRO) won a legal battle in the US federal court of Texas against Buffalo
Technology which found the US manufacturer had failed to pay royalties on a US WLAN patent
CSIRO had filed in 1996. CSIRO are currently engaged in legal cases with computer companies
including Microsoft, Intel, Dell, Hewlett-Packard and Netgear which argue that the patent is
invalid and should negate any royalties paid to CSIRO for WLAN-based products.
Benefits:
The popularity of wireless LANs is a testament primarily to their convenience, cost
efficiency, and ease of integration with other networks and network components. The majority of
computers sold to consumers today come pre-equipped with all necessary wireless LAN
technology.
The benefits of wireless LANs include:
• Convenience: The wireless nature of such networks allows users to access network
resources from nearly any convenient location within their primary networking
environment (home or office). With the increasing saturation of laptop-style computers,
this is particularly relevant.
• Mobility: With the emergence of public wireless networks, users can access the internet
even outside their normal work environment. Most chain coffee shops, for example, offer
their customers a wireless connection to the internet at little or no cost.
• Productivity: Users connected to a wireless network can maintain a nearly constant
affiliation with their desired network as they move from place to place. For a business,
this implies that an employee can potentially be more productive as his or her work can
be accomplished from any convenient location.
• Deployment: Initial setup of an infrastructure-based wireless network requires little more
than a single access point. Wired networks, on the other hand, have the additional cost
and complexity of actual physical cables being run to numerous locations (which can
even be impossible for hard-to-reach locations within a building).
• Expandability: Wireless networks can serve a suddenly-increased number of clients with
the existing equipment. In a wired network, additional clients would require additional
wiring.
• Cost: Wireless networking hardware is at worst a modest increase from wired
counterparts. This potentially increased cost is almost always more than outweighed by
the savings in cost and labor associated to running physical cables.
Disadvantages:
Wireless LAN technology, while replete with the conveniences and advantages described
above has its share of downfalls. For a given networking situation, wireless LANs may not be
desirable for a number of reasons. Most of these have to do with the inherent limitations of the
technology.
• Security: Wireless LAN transceivers are designed to serve computers throughout a
structure with uninterrupted service using radio frequencies. Because of space and cost,
the antennas typically present on wireless networking cards in the end computers are
generally relatively poor. In order to properly receive signals using such limited antennas
throughout even a modest area, the wireless LAN transceiver utilizes a fairly
considerable amount of power. What this means is that not only can the wireless packets
be intercepted by a nearby adversary's poorly-equipped computer, but more importantly,
a user willing to spend a small amount of money on a good quality antenna can pick up
9

packets at a remarkable distance; perhaps hundreds of times the radius as the typical user.
In fact, there are even computer users dedicated to locating and sometimes even cracking
into wireless networks, known as wardrivers. On a wired network, any adversary would
first have to overcome the physical limitation of tapping into the actual wires, but this is
not an issue with wireless packets. To combat this consideration, wireless networks users
usually choose to utilize various encryption technologies available such as Wi-Fi
Protected Access (WPA). Some of the older encryption methods, such as WEP are
known to have weaknesses that a dedicated adversary can compromise. (See main article:
Wireless security.)
• Range: The typical range of a common 802.11g network with standard equipment is on
the order of tens of meters. While sufficient for a typical home, it will be insufficient in a
larger structure. To obtain additional range, repeaters or additional access points will
have to be purchased. Costs for these items can add up quickly. Other technologies are in
the development phase, however, which feature increased range, hoping to render this
disadvantage irrelevant.
• Reliability: Like any radio frequency transmission, wireless networking signals are
subject to a wide variety of interference, as well as complex propagation effects (such as
multipath, or especially in this case Rician fading) that are beyond the control of the
network administrator. In the case of typical networks, modulation is achieved by
complicated forms of phase-shift keying (PSK) or quadrature amplitude modulation
(QAM), making interference and propagation effects all the more disturbing. As a result,
important network resources such as servers are rarely connected wirelessly.
• Speed: The speed on most wireless networks (typically 1-108 Mbit/s) is reasonably slow
compared to the slowest common wired networks (100 Mbit/s up to several Gbit/s).
There are also performance issues caused by TCP and its built-in congestion avoidance.
For most users, however, this observation is irrelevant since the speed bottleneck is not in
the wireless routing but rather in the outside network connectivity itself. For example, the
maximum ADSL throughput (usually 8 Mbit/s or less) offered by telecommunications
companies to general-purpose customers is already far slower than the slowest wireless
network to which it is typically connected. That is to say, in most environments, a
wireless network running at its slowest speed is still faster than the internet connection
serving it in the first place. However, in specialized environments, the throughput of a
wired network might be necessary. Newer standards such as 802.11n are addressing this
limitation and will support peak throughputs in the range of 100-200 Mbit/s.
Architecture
Stations
All components that can connect into a wireless medium in a network are referred to as
stations. All stations are equipped with wireless network interface cards (WNICs). Wireless
stations fall into one of two categories: access points and clients.
Access points
Access points (APs) are base stations for the wireless network. They transmit and receive
radio frequencies for wireless enabled devices to communicate with.
10

Clients
Wireless clients can be mobile devices such as laptops, personal digital assistants, IP
phones, or fixed devices such as desktops and workstations that are equipped with a wireless
network interface.
Basic service set
The basic service set (BSS) is a set of all stations that can communicate with each other.
There are two types of BSS: independent BSS and infrastructure BSS. Every BSS has an
identification (ID) called the BSSID, which is the MAC address of the access point servicing the
BSS.
Independent basic service set
An independent BSS is an ad-hoc network that contains no access points, which means
they can not connect to any other basic service set.
Infrastructure basic service set
An infrastructure BSS can communicate with other stations not in the same basic service
set by communicating through access points.
Extended service set
An extended service set (ESS) is a set of connected BSSes. Access points in an ESS are
connected by a distribution system. Each ESS has an ID called the SSID which is a 32-byte
(maximum) character string. For example, "linksys" is the default SSID for Linksys routers.
Distribution system
A distribution system connects access points in an extended service setup.
Types of wireless LANs
Peer-to-Peer or ad-hoc wireless LAN
A peer-to-peer (P2P) allows wireless devices to directly communicate with each other.
Wireless devices within range of each other can discover and communicate directly without
involving central access points. This method is typically used by two computers so that they can
connect to each other to form a network.
If a signal strength meter is used in this situation, it may not read the strength accurately and can
be misleading, because it registers the strength of the strongest signal, which may be the closest
computer.
802.11 specs define the physical layer (PHY) and MAC (Media Access Control) layers.
However, unlike most other IEEE specs, 802.11 includes three alternative PHY standards:
diffuse infrared operating at 1 Mbit/s in; frequency-hopping spread spectrum operating at 1
Mbit/s or 2 Mbit/s; and direct-sequence spread spectrum operating at 1 Mbit/s or 2 Mbit/s. A
single 802.11 MAC standard is based on CSMA/CA (Carrier Sense Multiple Access with
Collision Avoidance). The 802.11 specification includes provisions designed to minimize
collisions. Because two mobile units may both are in range of a common access point, but not in
range of each other. The 802.11 has two basic modes of operation: Ad hoc mode enables peer-to-
peer transmission between mobile units. Infrastructure mode in which mobile units communicate
through an access point that serves as a bridge to a wired network infrastructure is the more
common wireless LAN application the one being covered. Since wireless communication uses a
more open medium for communication in comparison to wired LANs, the 802.11 designers also
included a shared-key encryption mechanism, called wired equivalent privacy (WEP), or Wi-Fi
Protected Access, (WPA, WPA2) to secure wireless computer networks.
11

Bridge
A bridge can be used to connect networks, typically of different types. A wireless
Ethernet bridge allows the connection of devices on a wired Ethernet network to a wireless
network. The bridge acts as the connection point to the Wireless LAN.
Wireless Distribution System:
A Wireless Distribution System is a system that enables the interconnection of access
points wirelessly. As described in IEEE 802.11, it allows a wireless network to be expanded
using multiple access points without the need for a wired backbone to link them, as is
traditionally required.
An access point can be either a main, relay or remote base station. A main base station is
typically connected to the wired Ethernet. A relay base station relays data between remote base
stations, wireless clients or other relay stations to either a main or another relay base station. A
remote base station accepts connections from wireless clients and passes them to relay or main
stations. Connections between "clients" are made using MAC addresses rather than by specifying
IP assignments.
All base stations in a Wireless Distribution System must be configured to use the same radio
channel, and share WEP keys if they are used. They can be configured to different service set
identifiers.
WDS may also be referred to as repeater mode because it appears to bridge and accept wireless
clients at the same time (unlike traditional bridging). It should be noted, however, that
throughput in this method is inversely proportional to two raised to the power of the number of
"hops", as all traffic uses the same channel. For example, client traffic going through one relay
station before it reaches the main access point will see at most half the maximum throughput that
a directly connected AP would experience and a client two hops from the directly connected AP
will see at most one quarter of the maximum throughput seen at the directly connected AP.
Content Addressable Storage:
Content-addressable storage, also referred to as associative storage or abbreviated CAS,

is a mechanism for storing information that can be retrieved based on its content, not its storage
location. It is typically used for high-speed storage and retrieval of fixed content, such as
documents stored for compliance with government regulations. Roughly speaking, content-
addressable storage is the permanent-storage analogue to content-addressable memory.
Content-addressed vs. Location-addressable
When being contrasted with content-addressed storage, a typical local or networked
storage device is referred to as location-addressable. In a location-addressable storage device,
each element of data is stored onto the physical medium, and its location recorded for later use.
The storage device often keeps a list, or directory, of these locations. When a future request is
made for a particular item, the request includes only the location (for example, path and file
names) of the data. The storage device can then use this information to locate the data on the
physical medium, and retrieve it. When new information is written into a location-addressed
device, it is simply stored in some available free space, without regard to its content. The
information at a given location can usually be altered or completely overwritten without any
special action on the part of the storage device.
In contrast, when information is stored into a CAS system, the system will record a content
address, which is an identifier uniquely and permanently linked to the information content itself.
12

A request to retrieve information from a CAS system must provide the content identifier, from
which the system can determine the physical location of the data and retrieve it. Because the
identifiers are based on content, any change to a data element will necessarily change its content
address. In nearly all cases, a CAS device will not permit editing information once it has been
stored. Whether it can be deleted is often controlled by a policy.
While the idea of content-addressed storage is not new, production-quality systems were not
readily available until roughly 2003. In mid-2004, the industry group SNIA began working with
a number of CAS providers to create standard behavior and interoperability guidelines for CAS
systems.
Pros and Cons
CAS storage works most efficiently on data that does not change often. It is of particular interest
to large organizations that must comply with document-retention laws, such as Sarbanes-Oxley.
In these corporations a large volume of documents will be stored for as much as a decade, with
no changes and infrequent access. CAS is designed to make the searching for a given document
content very quickly, and provides an assurance that the retrieved document is identical to the
one originally stored. (If the documents were different, their content addresses would differ.) In
addition, since data is stored into a CAS system by what it contains, there is never a situation
where more than one copy of an identical document exists in storage. By definition, two identical
documents have the same content address and so point to the same storage location.
For data that changes frequently, CAS is not as efficient as location-based addressing. In these
cases, the CAS device would need to continually recompute the address of data as it was
changed, and the client systems would be forced to continually update information regarding
where a given document exists. For random access systems, a CAS would also need to handle
the possibility of two initially identical documents diverging, requiring a copy of one document
to be created on demand.
Typical Implementation
The first commercially available CAS system, EMC's Centera platform, is typical of a CAS
implementation. The system consists of a series of networked nodes, divided between storage
nodes and access nodes. The access nodes maintain a synchronized directory of content
addresses and the corresponding storage node where each address can be found. When a new
data element, or blob (Binary large object), is added, the device calculates a hash of the content
and returns this hash as the blob's content address. As mentioned above, the hash is searched for
to verify that identical content is not already present. If the content already exists, the device
does not need to perform any additional steps; the content address already points to the proper
content. Otherwise, the data is passed off to a storage node and written to the physical media.
When a content address is provided to the device, it first queries the directory for the physical
location of the specified content address. The information is then retrieved from a storage node,
and the actual hash of the data recomputed and verified. Once this is complete, the device can
supply the requested data to the client. Within the Centera system, each content address actually
represents a number of distinct data blobs, as well as optional metadata. Whenever a client adds
an additional blob to an existing content block, the system recomputed the content address.
To provide additional data security, the Centera access nodes, when no read or write operation is
in progress, constantly communicate with the storage nodes, checking the presence of at least
two copies of each blob as well as their integrity. Additionally, they can be configured to
exchange data with a different, e.g. off-site, Centera system, thereby strengthening the
precautions against accidental data loss.
13

IBM has another flavor of CAS which can be software based, Tivoli Storage manager 5.3, or
hardware based, the IBM DR550. The architecture is different in that it is based off of a HSM,
(hierarchical storage management,) design which provides some additional flexibility such as
being able to support not only WORM disk but WORM tape and the migration of data from
WORM disk to WORM tape and vice versa. This provides for additional flexibility in disaster
recovery situations as well as the ability to reduce storage costs by moving data off disk to tape.
Another typical implementation is from iTernity. The concept of iTernity bases of container,
each container is addressed by its hash value. A container is a multiple number of fixed content
documents, so one container is not changeable and the hash value is fixed after the write process.
Use of CAS in distributed systems:

The exploding interest in distributed hash tables suggests that Content Addressable
Storage (CAS) will be a basic facility in future computing environments. In this paper we show
how CAS can be used to improve the performance of a conventional distributed system built on
the client-server model. NFS, AFS and Coda are examples of distributed systems that are now
wellentrenched in many computing environments. Our goal is to improve client performance in
situations where a distant server is accessed across a slow WAN, but one or more CAS providers
that export a standardized CAS interface are located nearby on a LAN.
The concept of a recipe is central to our approach. The recipe for a synopsis that contains a list of
data block identifiers; each block identifier is a cryptographic hash over the contents of the
block. Once the data blocks identified in a recipe have been obtained, they can be combined as
prescribed in the recipe to reconstruct the. On a cache miss over a low-bandwidth network, a
client may request a’s recipe rather than its data. Often, the client may be able to reconstruct
from its recipe by contacting nearby CAS providers (including CAS services available on the
client) to which it has LAN access. In this usage scenario, a recipe helps transform WAN
accesses into LAN (or local client) accesses. Since a recipe is a first class entity, it can be used as
a substitute for the in many situations. For example, if space is tight in a cache, s may be
replaced by the corresponding recipes, which are typically much smaller. This is preferable to
evicting the entirely from the cache because the recipe reduces the cost of the cache miss
resulting from a future reference to the. If the client is disconnected, the presence of the recipe
may make even more of a difference. Replacing an unserviceable cache miss by reconstruction.
It is important to note that our approach is opportunistic: we are not dependent on CAS for the
correct operation of the distributed system. The use of recipes does not in any way compromise
attributes such as naming, consistency, or write-sharing semantics. Indeed, the use of CAS is
completely transparent to users and applications. When reconstructing a, some blocks may not be
available from CAS providers. In that case, those blocks must be fetched from the distant server.
Even in this situation, there is no loss of consistency or correctness. CAS providers can be
organized peer-to-peer networks such as Chord and Pastry, but may also be impromptu systems.
For example, each desktop on a LAN could be enhanced with a daemon that provides a CAS
interface to its local disk. The CAS interface can be extremely simple; in our work, we use just
four calls Query, MultiQuery, Fetch, and MultiFetch (explained in more detail in Section 5.3).
With such an arrangement, system administrators could offer CAS access without being required
to abide by any peer-to-peer protocol or provide additional storage space. In particular, CAS
providers need not make any guarantees regarding content persistence or availability. As a proof
of concept, we have implemented a distributed system called CASPER that employs recipes. We
have evaluated CASPER at different bandwidths using a variety of benchmarks. Our results
14

indicate that substantial performance improvement is possible through the use of recipes.
Because CASPER imposes no requirements regarding the availability of CAS providers, they are
free to terminate their service at any time. This encourages users to offer their desktops as CAS
providers with the full confidence that they can withdraw at any time. In some environments, a
system administrator may prefer to dedicate one or more machines as CAS providers. Such a
dedicated CAS provider is referred to as a jukebox.
Template based replication:

Over the past few years the World-Wide Web has taken significant importance into our
lives, and many businesses and public services now rely on it as their primary communication
medium. This drives the need for scalable hosting architectures capable of supporting arbitrary
levels of load with acceptable performance. However, while this problem is now well understood
for static content, providing scalable infrastructures for hosting dynamically generated Web
content still remains a challenge. Dynamic Web content allows Web sites to personalize the
delivered contents to individual clients, and to take action upon certain requests such as
processing an order in an ecommerce site. Content is dynamically generated upon each client
request by application-specific business logic, which typically issues one or more queries to an
underlying database. Numerous systems for scalable hosting of Web applications have been
proposed. These systems typically cache (fragments of) the generated pages, distribute the
computation across multiple application servers or cache the results of database queries.
However, although these techniques can be very effective depending on the application, in many
cases their ultimate scalability bottleneck resides in the throughput of the origin database where
the authoritative version of the application state is stored. Database replication techniques can of
course help here, but the generic replication algorithms used by most databases do not scale
linearly as they require to apply all update, deletion and insertion (UDI) queries to every
database replica. The system throughput is therefore limited to the point where the quantity of
UDI queries alone is sufficient to overload one server, regardless of the number of machines
employed. The only solutions to this problem are to increase the throughput of each individual
server or to use partial replication so that UDI queries can be executed at only a subset of all
servers. However, partially replicating a database is in turn difficult because queries can
potentially span data items which are stored at different servers. Current partially replicated
solutions rely on either active participation of the application programmer or on one special
server holding the full database to execute complex queries (and thereby becoming the new
throughput bottleneck).
A database replication system that exploits the fact that the database queries issued by
typical Web applications belong to a relatively small number of query templates. A query
template is a parametrized SQL query whose parameter values are passed to the system at
runtime. Prior knowledge of these templates allows one to select database table placements such
that each query template can be treated locally by at least one server. We demonstrate that
careful table placements based on the data span and the relative execution costs of different
templates can provide major scalability gains in terms of sustained throughput. We further show
that this technique can easily be used in combination with any existing template-based database
query caching system, thereby obtaining reduced access latency and yet some more throughput
scalability.
15

Distributed Database:
A distributed database is a database that is under the control of a central database
management system (DBMS) in which storage devices are not all attached to a common CPU. It
may be stored in multiple computers located in the same physical location, or may be dispersed
over a network of interconnected computers. Collections of data (eg. in a database) can be
distributed across multiple physical locations. A distributed database is distributed into separate
partitions/fragments. Each partition/fragment of a distributed database may be replicated (ie.
redundant fail-overs, RAID like). Besides distributed database replication and fragmentation,
there are many other distributed database design technologies. For example, local autonomy,
synchronous and asynchronous distributed database technologies. These technologies'
implementation can and does depend on the needs of the business and the
sensitivity/confidentiality of the data to be stored in the database, and hence the price the
business is willing to spend on ensuring data security, consistency and integrity.
Basic architecture
A database server is the software managing a database, and a client is an application that
requests information from a server. Each computer in a system is a node. A node in a distributed
database system acts as a client, a server, or both, depending on the situation.
Horizontal fragments
subsets of tuples (rows) from a relation (table).
Vertical fragments
subsets of attributes (columns) from a relation (table).
Mixed fragment
a fragment which is both horizontally and vertically fragmented, or a logical collection of objects
in an ODBMS.
Important considerations
Care with a distributed database must be taken to ensure that:
• The distribution is transparent — users must be able to interact with the system as if it
were one logical system. This applies to the system's performance, and methods of access
amongst other things.
• Transactions are transparent — each transaction must maintain database integrity across
multiple databases. Transactions must also be divided into subtransactions, each
subtransaction affecting one database system..
Advantages of distributed databases
• Reflects organizational structure — database fragments are located in the departments
they relate to.
• Local autonomy — a department can control the data about them (as they are the ones
familiar with it.)
• Improved availability — a fault in one database system will only affect one fragment,
instead of the entire database.
• Improved performance — data is located near the site of greatest demand, and the
database systems themselves are parallelized, allowing load on the databases to be
balanced among servers. (A high load on one module of the database won't affect other
modules of the database in a distributed database.)
• Economics — it costs less to create a network of smaller computers with the power of a
single large computer.
16

• Modularity — systems can be modified, added and removed from the distributed
database without affecting other modules (systems).
Disadvantages of distributed databases
• Complexity — extra work must be done by the DBAs to ensure that the distributed nature
of the system is transparent. Extra work must also be done to maintain multiple disparate
systems, instead of one big one. Extra database design work must also be done to account
for the disconnected nature of the database — for example, joins become prohibitively
expensive when performed across multiple systems.
• Economics — increased complexity and a more extensive infrastructure means extra
labor costs.
• Security — remote database fragments must be secured, and they are not centralized so
the remote sites must be secured as well. The infrastructure must also be secured (e.g., by
encrypting the network links between remote sites).
• Difficult to maintain integrity — in a distributed database, enforcing integrity over a
network may require too much of the network's resources to be feasible.
• Inexperience — distributed databases are difficult to work with, and as a young field
there is not much readily available experience on proper practice.
Partial Replication:
By replicating the whole database there is a reduced complexity in design of the

distributed system. But the size of the data present in the mobile device is high. In laptops and
PDAs huge storage cannot be expected. To avoid such storage problems partial replication
scheme can be employed.
The partial replication is possible by partitioning the database logically. The partitioning
can be done by two different schemes. The schemes are:
1) Horizontal fragmentation
2) Vertical fragmentation
Sample Student Data:
Roll No. Name Mark1 Mark2 Total

1234 Xyz 98 78 176
1235 Yyy 89 78 167
1534 Abc 78 67 148
2342 Xxx 67 56 123
Horizontal Fragmentation:
Replication1:
17

1234 Xyz 98 78 176

1235 Yyy 89 78 167
Replication2:

1534 Abc 78 67 148
2342 Xxx 67 56 123
Vertical Fragmentation:
Roll No. Name Total

1234 Xyz 176
1235 Yyy 167
1534 Abc 148
2342 Xxx 123
Roll No. Mark1 Mark2
1234 98 78
1235 89 78
1534 78 67
2342 67 56
JDBC Overview:
JDBC has been part of the Java Standard Edition since the release of JDK 1.1. The JDBC classes
are contained in the Java package java.sql. Starting with version 3.0, JDBC has been developed
under the Java Community Process. JSR 54 specifies JDBC 3.0 (included in J2SE 1.4), JSR 114
specifies the JDBC Rowset additions, and JSR 221 is the specification of JDBC 4.0 (included in
Java SE 6). JDBC allows multiple implementations to exist and be used by the same application.
The API provides a mechanism for dynamically loading the correct Java packages and
registering them with the JDBC Driver Manager. The Driver Manager is used as a connection
factory for creating JDBC connections. JDBC connections support creating and executing
statements. These statements may be update statements such as SQL CREATE, INSERT,
UPDATE and DELETE or they may be query statements using the SELECT statement.
Additionally, stored procedures may be invoked through a statement. Statements are one of the
following types:
Statement – the statement is sent to the database server each and every time.
PreparedStatement – the statement is cached and then the execution path is pre determined on
the database server allowing it to be executed multiple times in an efficient manner.
CallableStatement – used for executing stored procedures on the database.
18

Update statements such as INSERT, UPDATE and DELETE return an update count that
indicates how many rows were affected in the database. These statements do not return any other
information. Query statements return a JDBC row result set. The row result set is used to walk
over the result set. Individual columns in a row are retrieved either by name or by column
number. There may be any number of rows in the result set. The row result set has metadata that
describes the names of the columns and their types. There is an extension to the basic JDBC API
in the javax.sql package that allows for scrollable result sets and cursor support among other
things.
JDBC Drivers
JDBC Drivers are client-side adaptors (they are installed on the client machine, not on the server)
that convert requests from Java programs to a protocol that the DBMS can understand.
Types
There are commercial and free drivers available for most relational database servers. These
drivers fall into one of the following types:
• Type 1, the JDBC-ODBC bridge
• Type 2, the Native-API driver
• Type 3, the network-protocol driver
• Type 4, the native-protocol drivers
Internal JDBC driver, driver embedded with JRE in Java-enabled SQL databases. Used for Java
stored procedures.
JDBC URL, all Database Connection String
Terminologies in Neutrino:
Template based replication:
This is a database replication system that exploits the fact that the database queries issued
by typical Web applications belong to a relatively small number of query templates. A query
template is a parameterized SQL query those parameter values are passed to the system at
runtime. Prior knowledge of these templates allows one to select database table placements such
that each query template can be treated locally by at least one server. We demonstrate that
careful table placements based on the data span and the relative execution costs of different
templates can provide major scalability gains in terms of sustained throughput. We further show
that this technique can easily be used in combination with any existing template-based database
query caching system, thereby obtaining reduced access latency and yet some more throughput
scalability.
Poor consistency with mobile clients:
Weak consistency models are based on the fact that replicas can tolerate inconsistencies
for at least some time period. The replicas perform the process of reconciliation to make their
copies consistent. Conflicts are resolved as and when they arise. User is allowed to perform
optimistic updates on the data. Conflict resolution is however application dependent and some
19

work are needed on part of the application developers to handle conflicts. Such models scale
very well as background update propagation demands less coordination among sites.
The internet has played a big role in the development and popularity of distributed
applications. Some of the most popular application like Napster, Gnutella, Freenet have created a
tremendous interest in the user community. Deploying these distributed/peer-to-peer applications
over the internet has posed several challenges. With the number of users being so large,
distributed systems deployed today across the internet need to exhibit several important
properties like – high availability, scalability across millions of users, ability to tolerate network
failures, network partitioning, higher rates of packet loss and varied bandwidth connections to
the internet. Despite all the harsh conditions, the distributed application should be able to
perform well enough to provide satisfactory user response times. The number of mobile devices
in use is exploding with advancement in mobile computing. Mobile users are becoming more
and more interconnected. Since mobile users want to increasingly share data and demand access
to collaborative applications, the weak consistency model has emerged as a natural way to allow
mobile users to share data among themselves. Since mobile client have limited processing power
and memory, and they often are in a disconnected state, maintaining strong consistency for
mobile replicas is not a feasible option. Hence, many weak consistency solutions for mobile
applications have been suggested.
Scalability in Large Scale Systems:

Weakly consistent replication protocols normally disseminate information through
methods like anti-entropy or gossip. These are examples of epidemic protocols where the updates
are propagated lazily. These protocols make probabilistic guarantees that there will be eventual
consistency amongst all the replicas. This kind of approach has been around for quite a while and
has the advantage that in a large-scale loosely coupled environment it scales better and performs
better than other approaches. Lazy replication is used in peer-to-peer systems like Gnutella. The
highly scalable storage system Oceanstore designed to scale across billions of users provides API
supporting weak consistency for increased availability and performance.
Master Copy & Server Proxy:

The master copy of the database is present somewhere in the network. This copy of the
database is connected to server proxy through a JDBC driver. The master database is usually a
sophisticated database management system like oracle. The server proxy maintains the list of
clients connected to the network. Whenever a client contacts the server it has to contact the
server proxy. This also maintains transparency by hiding the other replicas from one replica. The
server proxy also maintains the list of updations. Server proxy may be present in the master itself
or somewhere in the network.
Edge Servers:
These are endpoints for this system. Whenever a client has to get data from this content
distribution network they contact the edge servers. The edge servers in turn query the replicas for
data. The requests to the replicas are queued. The client contacting the edge servers are unaware
of the technology that is being used to distribute the data.
20

Query Router:
Query router is a major module of Neutrino which helps in achieving partial replication.
The query router can be thought of something very similar to a directory service which has the
information about the placement of replicas and their load. These routers then find a feasible
replica which can meet the client request and serves the content to the client.
Neutrino JDBC Driver:

This module is responsible for contacting the server proxy and checks for the signal
strength regularly. Whenever there is connectivity to the network the server proxy is contacted.
The application program always contacts the JDBC driver.
21

Survey

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Survey

Hochgeladen von

Copyright:

Verfügbare Formate

Neutrino – Database Replication in

Literature Survey by,

Neutrino – Replication for Mobile Networks

Wireless Network Technology:

In November 2006, the Australian Commonwealth Scientific and Industrial Research

Content Addressable Storage:

Content-addressable storage, also referred to as associative storage or abbreviated CAS,

Use of CAS in distributed systems:

Template based replication:

By replicating the whole database there is a reduced complexity in design of the

Sample Student Data:

Roll No. Name Mark1 Mark2 Total

Roll No. Name Mark1 Mark2 Total

1234 Xyz 98 78 176

Roll No. Name Mark1 Mark2 Total

Roll No. Name Total

Poor consistency with mobile clients:

Scalability in Large Scale Systems:

Master Copy & Server Proxy:

Neutrino JDBC Driver:

Das könnte Ihnen auch gefallen