Beruflich Dokumente
Kultur Dokumente
ScientificAmerican.com
THE FUTURE OF THE WEB
special online issue no. 2
The dotcom bubble may have finally burst but there can be no doubt that the Internet has forever changed the way
we communicate, do business and find information of all kinds. Scientific American has regularly covered the
advances making this transformation possible. And during the past five years alone, many leading researchers
and computer scientists have aired their views on the Web in our pages.
In this collection, expert authors discuss a range of topicsfrom XML and hypersearching the web to filtering
information and preserving the Internet in one vast archive. Other articles cover more recent ideas, including
ways to make Web content more meaningful to machines and plans to create an operating system that would
span the Internet as a whole. --the Editors
TABLE OF CONTENTS
2 Filtering Information on the Internet
BY PAUL RESNICK; SCIENTIFIC AMERICAN, MARCH 1997
Look for the labels to decide if unknown software and World Wide Web sites are safe and interesting.
by Paul Resnick
T he Internet is often
called a global village,
suggesting a huge but
close-knit community
that shares common
values and experiences. The metaphor
is misleading. Many cultures coexist on
the Internet and at times clash. In its
public spaces, people interact commer-
has developed a set of technical stan-
dards called PICS (Platform for Internet
Content Selection) so that people can
electronically distribute descriptions of
digital works in a simple, computer-
readable form. Computers can process
these labels in the background, auto-
matically shielding users from undesir-
able material or directing their atten-
net. Each RSACi (the i stands for
Internet) label has four numbers, in-
dicating levels of violence, nudity, sex
and potentially offensive language. An-
other organization, SafeSurf, has devel-
oped a vocabulary with nine separate
scales. Labels can reflect other concerns
beyond indecency, however. A privacy
vocabulary, for example, could describe
cially and socially with strangers as well tion to sites of particular interest. The Web sites information practices, such
as with acquaintances and friends. The original impetus for PICS was to allow as what personal information they col-
city is a more apt metaphor, with its parents and teachers to screen materials lect and whether they resell it. Similarly,
suggestion of unlimited opportunities they felt were inappropriate for children an intellectual-property vocabulary could
and myriad dangers. using the Net. Rather than censoring describe the conditions under which an
To steer clear of the most obviously what is distributed, as the Communica- item could be viewed or reproduced [see
offensive, dangerous or just boring neigh- tions Decency Act and other legislative Trusted Systems, by Mark Stefik, page
borhoods, users can employ some me- initiatives have tried to do, PICS enables 78]. And various Web-indexing organi-
chanical filtering techniques that identi- users to control what they receive. zations could develop labels that indi-
fy easily definable risks. One technique cate the subject categories or the relia-
is to analyze the contents of on-line ma- Whats in a Label? bility of information from a site.
terial. Thus, virus-detection software Labels could even help protect com-
searches for code fragments that it
knows are common in virus programs.
Services such as AltaVista and Lycos can
P ICS labels can describe any aspect
of a document or a Web site. The
first labels identified items that might
puters from exposure to viruses. It has
become increasingly popular to down-
load small fragments of computer code,
either highlight or exclude World Wide run afoul of local indecency laws. For bug fixes and even entire applications
Web documents containing particular example, the Recreational Software Ad- from Internet sites. People generally trust
words. My colleagues and I have been visory Council (RSAC) adapted its com-
at work on another filtering technique puter-game rating system for the Inter-
based on electronic labels that can be
added to Web sites to describe digital
works. These labels can convey charac- FILTERING SYSTEM for the World Wide
teristics that require human judgment Web allows individuals to decide for them-
whether the Web page is funny or offen- selves what they want to see. Users speci-
fy safety and content requirements (a),
siveas well as information not readily
which label-processing software (b) then
apparent from the words and graphics, consults to determine whether to block ac-
such as the Web sites policies about the cess to certain pages (marked with a stop
use or resale of personal data. sign). Labels can be affixed by the Web
The Massachusetts Institute of Tech- sites author (c), or a rating agency can
nologys World Wide Web Consortium store its labels in a separate database (d).
BRYAN CHRISTIE
the activities of neo-Nazi groups, could literary quality, a search engine might reputation is at stake. Or she might trust
publish PICS labels that identify Web be able to suggest links to items labeled an auditing organization of some kind
pages containing neo-Nazi propaganda. that way. Or if the user prefers that per- to vouch for Bill.
a
c AUTHOR
LABELS
b LABEL-PROCESSING
SOFTWARE
d DATABASE
OF INDEPENDENT
LABELS
FREE OF VIRUSES
CONTAINS VIOLENCE
Of course, some labels address mat- wallscombinations of software and tend to stifle noncommercial communi-
ters of personal taste rather than points hardware that block their citizens ac- cation. Labeling requires human time
of fact. Users may find themselves not cess to certain newsgroups and Web sites. and energy; many sites of limited inter-
trusting certain labels, simply because Another concern is that even without est will probably go unlabeled. Because
they disagree with the opinions behind central censorship, any widely adopted of safety concerns, some people will
them. To get around this problem, sys- vocabulary will encourage people to block access to materials that are unla-
tems such as GroupLens and Firefly rec- make lazy decisions that do not reflect beled or whose labels are untrusted. For
ommend books, articles, videos or mu- their values. Today many parents who such people, the Internet will function
sical selections based on the ratings of may not agree with the criteria used to more like broadcasting, providing access
like-minded people. People rate items assign movie ratings still forbid their only to sites with sufficient mass-mar-
with which they are familiar, and the children to see movies rated PG-13 or ket appeal to merit the cost of labeling.
software compares those ratings with R; it is too hard for them to weigh the While lamentable, this problem is an
opinions registered by other users. In merits of each movie by themselves. inherent one that is not caused by label-
making recommendations, the software Labeling organizations must choose ing. In any medium, people tend to
assigns the highest priority to items ap- vocabularies carefully to match the cri- avoid the unknown when there are
proved by people who agreed with the teria that most people care about, but risks involved, and it is far easier to get
users evaluations of other materials. even so, no single vocabulary can serve information about material that is of
People need not know who agreed with everyones needs. Labels concerned only wide interest than about items that ap-
them; they can participate anonymous- with rating the level of sexual content peal to a small audience.
ly, preserving the privacy of their evalu- at a site will be of no use to someone Although the Net nearly eliminates
ations and reading habits. concerned about hate speech. And no the technical barriers to communica-
Widespread reliance on labeling raises labeling system is a full substitute for a tion with strangers, it does not remove
a number of social concerns. The most thorough and thoughtful evaluation: the social costs. Labels can reduce those
obvious are the questions of who de- movie reviews in a newspaper can be costs, by letting us control when we ex-
cides how to label sites and what labels far more enlightening than any set of tend trust to potentially boring or dan-
are acceptable. Ideally, anyone could la- predefined codes. gerous software or Web sites. The chal-
bel a site, and everyone could establish Perhaps most troubling is the sugges- lenge will be to let labels guide our ex-
individual filtering rules. But there is a tion that any labeling system, no matter ploration of the global city of the
concern that authorities could assign la- how well conceived and executed, will Internet and not limit our travels.
bels to sites or dictate criteria for sites
to label themselves. In an example from
a different medium, the television indus-
try, under pressure from the U.S. gov- The Author Further Reading
ernment, has begun to rate its shows for
PAUL RESNICK joined AT&T Labs Rating the Net. Jonathan Weinberg in Hast-
age appropriateness. Research in 1995 as the founding mem- ings Communications and Entertainment Law
Mandatory self-labeling need not ber of the Public Policy Research group. Journal, Vol. 19; March 1997 (in press). Avail-
lead to censorship, so long as individu- He is also chairman of the PICS work- able on the World Wide Web at http://www.
als can decide which labels to ignore. ing group of the World Wide Web Con- msen.com/ ~ weinberg/rating.htm
But people may not always have this sortium. Resnick received his Ph.D. in Recommender Systems. Special section in
power. Improved individual control re- computer science in 1992 from the Mas- Communications of the ACM, Vol. 40, No. 3;
sachusetts Institute of Technology and March 1997 (in press).
moves one rationale for central control
was an assistant professor at the M.I.T. The Platform for Internet Content Selection
but does not prevent its imposition. Sloan School of Management before home page is available on the World Wide Web
Singapore and China, for instance, are moving to AT&T. at http://www.w3.org/PICS
experimenting with national fire-
by Brewster Kahle
ranging from text to video to audio re- What makes this experiment possible
M
cording. In comparison, the Library of is the dropping cost of data storage. The
anuscripts Congress contains about 20 terabytes of price of a gigabyte (a billion bytes) of
from the li- text information. In the coming months, hard-disk space is $200, whereas tape
brary of Alex- our computers and storage media will storage using an automated mounting
andria in an- make records of other areas of the In- device costs $20 a gigabyte. We chose
cient Egypt dis- ternet, including the Gopher informa- hard-disk storage for a small amount of
appeared in a fire. The early printed tion system and the Usenet bulletin data that users of the archive are likely
books decayed into unrecognizable boards. The material gathered so far to access frequently and a robotic de-
shreds. Many of the oldest cinematic has already proved a useful resource to vice that mounts and reads tapes auto-
films were recycled for their silver con- historians. In the future, it may provide matically for less used information. A
tent. Unfortunately, history may repeat the raw material for a carefully indexed, disk drive accesses data in an average of
itself in the evolution of the Internet searchable library. 15 milliseconds, whereas tapes require
and its World Wide Web. The logistics of taking a snapshot of four minutes. Frequently accessed in-
No one has tried to capture a com- the Web are relatively simple. Our Inter- formation might be historical docu-
prehensive record of the text and imag- net Archive operates with a staff of 10 ments or a set of URLs no longer in use.
es contained in the documents that ap- people from offices located in a convert- We plan to update the information
pear on the Web. The history of print ed military basethe Presidioin down- gathered at least every few months. The
and film is a story of loss and partial re- town San Francisco; it also runs an in- first full record required nearly a year
construction. But this scenario need not formation-gathering computer in the to compile. In future passes through the
be repeated for the Web, which has in- San Diego Supercomputer Center at the Web, we will be able to update only the
creasingly evolved into a storehouse of University of California at San Diego. information that has changed since our
valuable scientific, cultural and histori- The software on our computers last perusal.
cal information. crawls the Netdownloading docu- The text, graphics, audio clips and
The dropping costs of digital storage ments, called pages, from one site after other data collected from the Web will
mean that a permanent record of the another. Once a page is captured, the never be comprehensive, because the
Web and the rest of the Internet can be software looks for cross references, or crawler software cannot gain access to
preserved by a small group of technical links, to other pages. It uses the Webs many of the hundreds of thousands of
professionals equipped with a modest hyperlinksaddresses embedded with- sites. Publishers restrict access to data
complement of computer workstations in a document pageto move to other or store documents in a format inacces-
and data storage devices. A year ago I pages. The software then makes copies sible to simple crawler programs. Still,
and a few others set out to realize this again and seeks additional links con- the archive gives a feel of what the Web
vision as part of a venture known as the tained in the new pages. The crawler looks like during a given period of time
Internet Archive. avoids downloading duplicate copies of even though it does not constitute a full
By the time this article is published, pages by checking the identification record.
we will have taken a snapshot of all names, called uniform resource locators After gathering and storing the public
parts of the Web freely and technically (URLs), against a database. Programs contents of the Internet, what services
accessible to us. This collection of data such as Digital Equipment Corporations will the archive provide? We possess the
will measure perhaps as much as two AltaVista also employ crawler software capability of supplying documents that
trillion bytes (two terabytes) of data, for indexing Web sites. are no longer available from the origi-
by Clifford Lynch
O
ne sometimes hears the mixes everywhere with works of lasting bears most of the responsibility for or-
Internet characterized importance. ganizing information on the Internet. In
as the worlds library In short, the Net is not a digital libra- theory, software that automatically
for the digital age. This ry. But if it is to continue to grow and classifies and indexes collections of dig-
description does not thrive as a new means of communica- ital data can address the glut of infor-
stand up under even casual examina- tion, something very much like tradi- mation on the Netand the inability of
tion. The Internetand particularly its tional library services will be needed to human indexers and bibliographers to
collection of multimedia resources organize, access and preserve networked cope with it. Automating information
known as the World Wide Webwas information. Even then, the Net will not access has the advantage of directly ex-
not designed to support the organized resemble a traditional library, because ploiting the rapidly dropping costs of
publication and retrieval of informa- its contents are more widely dispersed computers and avoiding the high ex-
tion, as libraries are. It has evolved into than a standard collection. Consequent- pense and delays of human indexing.
what might be thought of as a chaotic ly, the librarians classification and se- But, as anyone who has ever sought
repository for the collective output of lection skills must be complemented by information on the Web knows, these
the worlds digital printing presses. the computer scientists ability to auto- automated tools categorize information
This storehouse of information con- mate the task of indexing and storing differently than people do. In one sense,
tains not only books and papers but information. Only a synthesis of the the job performed by the various index-
BRYAN CHRISTIE
raw scientific data, menus, meeting differing perspectives brought by both ing and cataloguing tools known as
minutes, advertisements, video and au- professions will allow this new medium search engines is highly democratic. Ma-
dio recordings, and transcripts of inter- to remain viable. chine-based approaches provide uniform
T he Internet came into its own a few years ago, when the
World Wide Web arrived with its dazzling array of photogra-
phy, animation, graphics, sound and video that ranged in subject
cats category. To narrow
the search, the user can
click on any icons that
matter from high art to the patently lewd. Despite the multimedia show black cats. Using its
barrage, finding things on the hundreds of thousands of Web sites previously generated col-
still mostly requires searching indexes for words and numbers. or analysis, the search en-
Someone who types the words French flag into the popular gine looks for matches of
search engine AltaVista might retrieve the requested graphic, as images that have a similar
long as it were captioned by those two identifying words. But what color profile. The presen-
if someone could visualize a blue, white and red banner but did tation of the next set of
not know its country of origin? icons may show black
Ideally, a search engine should allow the user to draw or scan in catsbut also some mar-
a rectangle with vertical thirds that are colored blue, white and malade cats sitting on
redand then find any matching images stored on myriad Web black cushions. A visitor
sites. In the past few years, techniques that combine key-word in- to WebSEEk can refine a
dexing with image analysis have begun to pave the way for the search by adding or ex-
first image search engines. cluding certain colors from an image when initiating subsequent
Although these prototypes suggest possibilities for the indexing queries. Leaving out yellows or oranges might get rid of the odd
of visual information, they also demonstrate the crudeness of ex- marmalade. More simply, when presented with a series of icons,
isting tools and the continuing reliance on text to track down im- the user can also specify those images that do not contain black
agery. One project, called WebSEEk, based at Columbia University, cats in order to guide the program away from mistaken choices. So
illustrates the workings of an image search engine. WebSEEk be- far WebSEEk has downloaded and indexed more than 650,000 pic-
gins by downloading files found by trolling the Web. It then at- tures from tens of thousands of Web sites.
tempts to locate file names containing acronyms, such as GIF or Other image-searching projects include efforts at the University
MPEG, that designate graphics or video content. It also looks for of Chicago, the University of California at San Diego, Carnegie Mel-
words in the names that might identify the subject of the files. lon University, the Massachusetts Institute of Technologys Media
When the software finds an image, it analyzes the prevalence of Lab and the University of California at Berkeley. A number of com-
different colors and where they are located. Using this information, mercial companies, including IBM and Virage, have crafted soft-
it can distinguish among photographs, graphics and black-and- ware that can be used for searching corporate networks or data-
white or gray images. The software also compresses each picture bases. And two companiesExcalibur Technologies and Interpix
so that it can be represented as an icon, a miniature image for dis- Softwarehave collaborated to supply software to the Web-based
play alongside other icons. For a video, it will extract key frames indexing concerns Yahoo and Infoseek.
from different scenes. One of the oldest image searchers, IBMs Query by Image Con-
A user begins a search by selecting a category from a menu tent (QBIC), produces more sophisticated matching of image fea-
cats, for example. WebSEEk provides a sampling of icons for the tures than, say, WebSEEk can. It is able not only to pick out the col-
ors in an image but also to gauge texture by several measures gram that is the work of David A. Forsyth of Berkeley and Margaret IBM CORPORATION/ROMTECH/COREL
contrast (the black and white of zebra stripes), coarseness (stones M. Fleck of the University of Iowa. The software begins by analyz-
versus pebbles) and directionality (linear fence posts versus omni- ing the color and texture of a photograph. When it finds matches
directional flower petals). QBIC also has a limited ability to search for flesh colors, it runs an algorithm that looks for cylindrical areas
for shapes within an image. Specifying a pink dot on a green back- that might correspond to an arm or leg. It then seeks other flesh-
ground turns up flowers and other photographs with similar colored cylinders, positioned at certain angles, which might con-
shapes and colors, as shown above. Possible applications range firm the presence of limbs. In a test last fall, the program picked
from the selection of wallpaper patterns to enabling police to out 43 percent of the 565 naked people among a group of 4,854
identify gang members by clothing type. images, a high percentage for this type of complex image analysis.
All these programs do nothing more than match one visual fea- It registered, moreover, only a 4 percent false positive rate among
ture with another. They still require a human observeror accom- the 4,289 images that did not contain naked bodies. The nudes
panying textto confirm whether an object is a cat or a cushion. were downloaded from the Web; the other photographs came
For more than a decade, the artificial-intelligence community has primarily from commercial databases.
labored, with mixed success, on nudging computers to ascertain The challenges of computer vision will most likely remain for a
directly the identity of objects within an image, whether they are decade or so to come. Searches capable of distinguishing clearly
cats or national flags. This approach correlates the shapes in a pic- among nudes, marmalades and national flags are still an unreal-
ture with geometric models of real-world objects. The program ized dream. As time goes on, though, researchers would like to
can then deduce that a pink or brown cylinder, say, is a human arm. give the programs that collect information from the Internet the
One example is software that looks for naked people, a pro- ability to understand what they see.
BRYAN CHRISTIE
the gatherers (dark blue arrows) for a file of key words (red arrows) that could be
processed into an index (tan page) for querying by a user.
and his colleagues at the University of cessed, thus alleviating the load on the method will depend mostly on users.
Colorado at Boulder developed soft- network and the computers tied to it. For which users will it then come to re-
ware, called Harvest, that lets a Web Gatherers might also serve a different semble a library, with a structured ap-
site compile indexing data for the pages function. They may give publishers a proach to building collections? And for
it holds and to ship the information on framework to restrict the information whom will it remain anarchic, with ac-
request to the Web sites for the various that gets exported from their Web sites. cess supplied by automated systems?
search engines. In so doing, Harvests This degree of control is needed because Users willing to pay a fee to under-
automated indexing program, or gath- the Web has begun to evolve beyond a write the work of authors, publishers,
erer, can avoid having a Web crawler distribution medium for free informa- indexers and reviewers can sustain the
export the entire contents of a given site tion. Increasingly, it facilitates access to tradition of the library. In cases where
across the network. proprietary information that is furnished information is furnished without charge
Crawler programs bring a copy of for a fee. This material may not be open or is advertiser supported, low-cost com-
each page back to their home sites to ex- for the perusal of Web crawlers. Gath- puter-based indexing will most likely
tract the terms that make up an index, a erers, though, could distribute only the dominatethe same unstructured envi-
process that consumes a great deal of information that publishers wish to ronment that characterizes much of the
network capacity (bandwidth). The gath- make available, such as links to sum- contemporary Internet. Thus, social and
erer, instead, sends only a file of index- maries or samples of the information economic issues, rather than technolog-
ing terms. Moreover, it exports only in- stored at a site. ical ones, will exert the greatest influence
formation about those pages that have As the Net matures, the decision to in shaping the future of information re-
been altered since they were last ac- opt for a given information collection trieval on the Internet.
and the
G
WEB
The combination of hypertext and a
global Internet started a revolution.
A new ingredient, XML, is
poised to finish the job
not know what to make of the infor- powered machine sits waiting idly, be-
mation, which to its eyes would be no cause it has only been told about <H1>s
HTML (shorthand for Hypertext Mark- more intelligible than <H1>blah blah and <BOLD>s, not about prices and
up Language). Although HTML is the </H1> <BOLD>blah blah blah </BOLD>. shipping options.
most successful electronic-publishing lan- As programming legend Brian Kerni- Thus also the dissatisfying quality of
guage ever invented, it is superficial: in ghan once noted, the problem with Web searches. Because there is no way
essence, it describes how a Web brows- What You See Is What You Get is to mark something as a price, it is effec-
er should arrange text, images and push- that what you see is all youve got. tively impossible to use price informa-
buttons on a page. HTMLs concern with Those angle-bracketed labels in the ex- tion in your searches.
appearances makes it relatively easy to ample just above are called tags. HTML
learn, but it also has its costs. has no tag for a drug reaction, which Something Old, Something New
One is the difficulty in creating a Web highlights another of its limitations: it is
site that functions as more than just a
fancy fax machine that sends documents
to anyone who asks. People and compa-
inflexible. Adding a new tag involves a
bureaucratic process that can take so
long that few attempt it. And yet every
T he solution, in theory, is very sim-
ple: use tags that say what the in-
formation is, not what it looks like. For
nies want Web sites that take orders from application, not just the interchange of example, label the parts of an order for
customers, transmit medical records, medical records, needs its own tags. a shirt not as boldface, paragraph, row
even run factories and scientific instru- Thus the slow pace of todays on-line and columnwhat HTML offersbut
ments from half a world away. HTML bookstores, mail-order catalogues and as price, size, quantity and color. A pro-
was never designed for such tasks. other interactive Web sites. Change the gram can then recognize this document
So although your doctor may be able quantity or shipping method of your as a customer order and do whatever it
to pull up your drug reaction history on order, and to see the handful of digits needs to do: display it one way or dis-
his Web browser, he cannot then e-mail that have changed in the total, you play it a different way or put it through a
it to a specialist and expect her to be able must ask a distant, overburdened server bookkeeping system or make a new shirt
to paste the records directly into her hos- to send you an entirely new page, graph- show up on your doorstep tomorrow.
pitals database. Her computer would ics and all. Meanwhile your own high- We, as members of a dozen-strong
W3C working group, began crafting
such a solution in 1996. Our idea was
MARKED UP WITH XML TAGS, one file powerful but not entirely original. For
containing, say, movie listings for an entire city generations, printers scribbled notes on
can be displayed on a wide variety of devices. manuscripts to instruct the typesetters.
Stylesheets can filter, reorder and render the
.
000
.
:15
11
0a
how
9:0
6:4
tion
0,
s
ult
5,
2:1
ad
Insu
for
Named Standard Generalized Mark-
at
ach
are
<movie>
Sh r Trek:
e
.50
es
en.
ildr
tim
$8
<title>Star Trek: Insurrection</title> re r ch up Language, or SGML, this language
Sta
ow
ts a eac
h fo
ke
<star>Patrick Stewart</star> Tic d $5.00
<star>Brent Spiner</star>
an for describing languagesa metalan-
AUDIBLE
<theatre> SPEECH guagehas since proved useful in many
STYLESHEET
<theatre-name>MondoPlex 2000</theatre-name>
large publishing applications. Indeed,
<showtime>1415</showtime>
<showtime>1630</showtime> HTML was defined using SGML. The
<showtime>1845</showtime> only problem with SGML is that it is
<showtime>2100</showtime>
<showtime>2315</showtime>
too generalfull of clever features de-
<price>
File Edit View Special
Star Trek
signed to minimize keystrokes in an era
<adult-price>8.50</-price> CONVENTIONAL
SCREEN
Select a showtime
Buy tickets
when every byte had to be accounted for.
<child-price>5.00</-price>
</price>
STYLESHEET
Shakespeare in
It is more complex than Web browsers
</theatre> can cope with.
<theatre> Our team created XML by removing
<theatre-name>Bigscreen 1</theatre-name>
<showtime>1930</showtime> frills from SGML to arrive at a more
<price> streamlined, digestible metalanguage.
Sta
<adult-price>6.00</adult-price>
</price> HANDHELD
Mon rek
2:15 doPlex
rT
XML consists of rules that anyone can
6:45 4:30
</theatre>
DISPLAY
STYLESHEET Sh
11:1 9:00
5
follow to create a markup language from
LAURIE GRACE
speake-
</movie> are
... scratch. The rules ensure that a single
<movie>
<title>Shakespeare in Love</title>
compact program, often called a parser,
<star>Gwyneth can process all these new languages.
Consider again the doctor who wants
to e-mail your medical record to a spe-
SoftlandAirlines 116
e Scheduled Flights - JFK - XML Browser 7/4/99
Sun
New York(JFK)
Arrive 10:55 am Softland irlinesA 121
Flight Confirmation - XML Browser
File Edit View Favorites Help
7/4/99 New York(JFK)
8:00 am 7/4/99 7h 55m London(LHR) to New York(JFK)
SoftlandAirlines 115 Sun ? Your reservation
Arrive 11:25
Youam
Softland A
will be entered.
irlines
must purchase your tickets
119
Sun Depart 8:00 am Arrive 10:55 am
within 72 hours. Proceed?
New York(JFK)
7/4/99
8:45 am 7/4/99 7h 55m London(LHR) to New York(JFK)
SoftlandAirlines 118 Sun
Arrive 11:45Yes
am Softland
No
A
irlines
Cancel
117
Sun Depart 8:45 am Arrive 11:40 am
New York(JFK)
8:55 am 7/4/99 7h 55m London(LHR) to New York(JFK) Arrive 12:00 pm SoftlandAirlines 123
Sun Depart 8:55 am Arrive 11:45 am SoftlandAirlines 120
Show remaining
10:00 am 7/4/99 7h 55m London(LHR) to seats
New York(JFK)
SoftlandAirlines 116 Fare restrictions: A
Softland irlines 125
Sun Depart 10:00 am Arrive 12:00 am
Book flight Must stay over a Saturday night.
10:55 am 7/4/99 7h 55m London(LHR) to New York(JFK)
Show fare SoftlandAirlines 121
Tickets must be Softland A
irlines 127
purchased within
Sun Depart 10:55 am Arrive 1:45 pm 24 hours of reservation and not less than
restrictions 7 days prior to flight.
12:00 pm 7/4/99 7h 55m London(LHR) to New York(JFK)
SoftlandAirlines 119
A
Softland irlines 129
Tickets are nonrefundable. Changes to
Sun Depart 12:00 pm Enter
Arrivenew
2:55 pm itinerary will result in $75 fee and
itinerary
payment of difference in fare.
1:15 pm 7/4/99 7h 55m London(LHR) to New York(JFK)
Sun Depart 1:15 pm Arrive 4:10 pm SoftlandAirlines 117
1:55 pm 7/4/99 7h 55m London(LHR) to New York(JFK)
Sun Depart 1:55 pm Arrive 4:50 pm SoftlandAirlines 123 e Softland Airlines Flight Finder - XML Browser
2:00 pm 7/4/99 7h 55m London(LHR) to New York(JFK) File Edit View Favorites Help
Sun Depart 2:00 pm Arrive 4:55 pm SoftlandAirlines 125
2:00 pm 7/4/99 7h 55m London(LHR) to New York(JFK)
Sun Depart 2:00 pm Arrive 4:55 pm SoftlandAirlines 127 Try our fast Roundtrip
Fare Finder:
LAURIE GRACE
Book a flight
XML HYPERLINK can open a menu of several op- Leaving from Departing Time
tions. One option might insert an image, such as a 3/19/99 evening
Going to Returning Time
plane seating chart, into the current page (red arrow).
3/19/99 evening
Others could run a small program to book a flight
1 adult More search options
(yellow arrow) or reveal hidden text (green arrow).
The links can also connect to other pages (blue arrow). This search is limited to adult round trip coach fare
Click "Book a flight" to do a more detailed search.
frequently retrieve tens of thousands of precise measure of best; indeed, it lies ors and fonts that are invisible to hu-
pages, many of them useless. How can in the eye of the beholder. man viewers. This practice, called spam-
people quickly locate only the informa- Search engines such as AltaVista, Info- ming, has become one of the main rea-
tion they need and trust that it is au- seek, HotBot, Lycos and Excite use sons why it is currently so difficult to
thentic and reliable? heuristics to determine the way in which maintain an effective search engine.
We have developed a new kind of to order and thereby prioritize pages. Spamming aside, even the basic as-
search engine that exploits one of the These rules of thumb are collectively sumptions of conventional text searches
Webs most valuable resources its myr-
iad hyperlinks. By analyzing these inter-
connections, our system automatically
WEB PAGES (white dots) are scattered over the Internet with little structure, making it
locates two types of pages: authorities difficult for a person in the center of this electronic clutter to find only the information
and hubs. The former are deemed to be desired. Although this diagram shows just hundreds of pages, the World Wide Web
the best sources of information on a currently contains more than 300 million of them. Nevertheless, an analysis of the way
particular topic; the latter are collec- in which certain pages are linked to one another can reveal a hidden order.
by
TIM BERNERS-LEE,
JAMES HENDLER and
ORA LASSILA
PHOTOILLUSTRATIONS BY MIGUEL SALMERON
TIM BERNERS-LEE, JAMES HENDLER and ORA LASSILA are individually and collectively obsessed
with the potential of Semantic Web technology. Berners-Lee is director of the World Wide Web using the current preliminary versions of
Consortium (W3C) and a researcher at the Laboratory for Computer Science at the Massachu- the unifying language.
setts Institute of Technology. When he invented the Web in 1989, he intended it to carry more Another vital feature will be digital
semantics than became common practice. Hendler is professor of computer science at the
University of Maryland at College Park, where he has been doing research on knowledge rep- signatures, which are encrypted blocks of
resentation in a Web context for a number of years. He and his graduate research group de- data that computers and agents can use
veloped SHOE, the first Web-based knowledge representation language to demonstrate many to verify that the attached information
of the agent capabilities described in this article. Hendler is also responsible for agent-based has been provided by a specific trusted
computing research at the Defense Advanced Research Projects Agency (DARPA) in Arlington, source. You want to be quite sure that a
Va. Lassila is a research fellow at the Nokia Research Center in Boston, chief scientist of Nokia
Venture Partners and a member of the W3C Advisory Board. Frustrated with the difficulty of statement sent to your accounting pro-
building agents and automating tasks on the Web, he co-authored W3Cs RDF specification, gram that you owe money to an online
which serves as the foundation for many current Semantic Web efforts. retailer is not a forgery generated by the
and the framework to make such tech- vice, et cetera. These activities formed tise their functionalitywhat they can do
nologies more feasible. chains in which a large amount of data and how they are controlled much like
software agents. Being much more flexible and employ services and other devices for them brings great benefits. Like a Finnish-
than low-level schemes such as Universal added information or functionality. It is English dictionary, or a weights-and-mea-
Plug and Play, such a semantic approach not hard to imagine your Web-enabled sures conversion table, the relations allow
opens up a world of exciting possibilities. microwave oven consulting the frozen- communication and collaboration even
For instance, what today is called food manufacturers Web site for opti- when the commonality of concept has not
home automation requires careful config- mal cooking parameters. (yet) led to a commonality of terms.
uration for appliances to work together. The Semantic Web, in naming every
Semantic descriptions of device capabili- Evolution of Knowledge concept simply by a URI, lets anyone ex-
ties and functionality will let us achieve the semantic web is not merely the press new concepts that they invent with
such automation with minimal human in- tool for conducting individual tasks that minimal effort. Its unifying logical lan-
tervention. A trivial example occurs when we have discussed so far. In addition, if guage will enable these concepts to be
Pete answers his phone and the stereo properly designed, the Semantic Web can progressively linked into a universal Web.
sound is turned down. Instead of having assist the evolution of human knowledge This structure will open up the knowl-
to program each specific appliance, he as a whole. edge and workings of humankind to
could program such a function once and Human endeavor is caught in an eter- meaningful analysis by software agents,
for all to cover every local device that ad- nal tension between the effectiveness of providing a new class of tools by which
vertises having a volume control the small groups acting independently and we can live, work and learn together.
TV, the DVD player and even the media the need to mesh with the wider commu-
players on the laptop that he brought nity. A small group can innovate rapidly
home from work this one evening. and efficiently, but this produces a sub-
The first concrete steps have already culture whose concepts are not under- MORE TO E XPLORE
been taken in this area, with work on de- stood by others. Coordinating actions Weaving the Web: The Original Design and
veloping a standard for describing func- across a large group, however, is painful- Ultimate Destiny of the World Wide Web by Its
tional capabilities of devices (such as ly slow and takes an enormous amount Inventor.
screen sizes) and user preferences. Built of communication. The world works Tim Berners-Lee, with Mark Fischetti. Harper San
Francisco, 1999.
on RDF, this standard is called Compos- across the spectrum between these ex-
ite Capability/Preference Profile (CC/PP). tremes, with a tendency to start small World Wide Web Consortium (W3C): www.w3.org/
Initially it will let cell phones and other from the personal idea and move to- W3C Semantic Web Activity: www.w3.org/2001/sw/
nonstandard Web clients describe their ward a wider understanding over time. An introduction to ontologies:
www.SemanticWeb.org/knowmarkup.html
characteristics so that Web content can An essential process is the joining to-
Simple HTML Ontology Extensions Frequently
be tailored for them on the fly. Later, gether of subcultures when a wider com- Asked Questions (SHOE FAQ):
when we add the full versatility of lan- mon language is needed. Often two groups www.cs.umd.edu/projects/plus/SHOE/faq.html
guages for handling ontologies and log- independently develop very similar con- DARPA Agent Markup Language (DAML) home page:
ic, devices could automatically seek out cepts, and describing the relation between www.daml.org/
movie is assembled on the fly from fragments on several hun- ready produced a number of Internet-scale, or peer-to-peer,
dred computers belonging to people like her. applications that attempt to tap the vast array of underutilized
Marys computers are moonlighting for other people. But machines available through the Internet [see box on page 42].
computers, network-attached storage de- ifornia, Berkeley. Anderson was on the faculty of the computer science department from
vices and maybe handheld devices. 1985 to 1991. He is now director of the SETI@home project and chief science officer of Unit-
The Internet resource pool differs ed Devices, a provider of distributed computing software that is allied with the distrib-
from private resource pools in several im- uted.net project. Kubiatowicz is an assistant professor of computer science at Berkeley
XPLANE
portant ways. More than 150 million and is chief architect of OceanStore, a distributed storage system under development with
hosts are connected to the Internet, and many of the properties required for an ISOS.
Movie
Movie Service Movie File
1. Acme Movie Service Agents Fragments
wants to distribute
movies to viewers. Acme Send Mary
requests hosts from the 3. Acme sends its movie movie
ISOS server, the systems service agent program to
traffic cop. The ISOS the hosts. Acme splits its 5. Movie service instructs its agents 7. The movie is assembled,
server sends a host list, movie into fragments and and Mary is free to
to send Mary the movie. ISOS pays
for which Acme pays. also sends them to hosts. enjoy her Acme movie.
the hosts for their work.
a computer breaks down, its owner even- Some characteristics of the resource rious and malicious users may attempt to
tually fixes or replaces it. pool create difficulties that an ISOS must disrupt, cheat or spoof the system. All
Extraordinary parallel data transmis- deal with. The resource pool is heteroge- these problems have a major influence on
sion is possible with the Internet resource neous: Hosts have different processor the design of an ISOS.
pool. Consider Marys movie, being up- types and operating systems. They have
loaded in fragments from perhaps 200 varying amounts of memory and disk Who Gets What?
hosts. Each host may be a PC connected space and a wide range of Internet con- AN INTERNET-SCALE operating sys-
to the Internet by an antiquated 56k mo- nection speeds. Some hosts are behind tem must address two fundamental is-
dem far too slow to show a high-quali- firewalls or other similar layers of soft- sues how to allocate resources and how
ty video but combined they could deliv- ware that prohibit or hinder incoming to compensate resource suppliers. A
er 10 megabits a second, better than a ca- connections. Many hosts in the pool are model based on economic principles in
ble modem. Data stored in a distributed available only sporadically; desktop PCs which suppliers lease resources to con-
system are available from any location are turned off at night, and laptops and sumers can deal with both issues at once.
(with appropriate security safeguards) systems using modems are frequently not In the 1980s researchers at Xerox PARC
and can survive disasters that knock out connected. Hosts disappear unpredict- proposed and analyzed economic ap-
sections of the resource pool. Great secu- ablysometimes permanentlyand new proaches to apportioning computer re-
rity is also possible, with systems that hosts appear. sources. More recently, Mojo Nation de-
could not be compromised without break- The ISOS must also take care not to veloped a file-sharing system in which
ing into, say, 10,000 computers. antagonize the owners of hosts. It must users are paid in a virtual currency
In this way, the Internet-resource par- have a minimal impact on the non-ISOS (mojo) for use of their resources and
adigm can increase the bounds of what is uses of the hosts, and it must respect lim- they in turn must pay mojo to use the sys-
possible (such as higher speeds or larger itations that owners may impose, such as tem. Such economic models encourage
data sets) for some applications, where- allowing a host to be used only at night owners to allow their resources to be
as for others it can lower the cost. For or only for specific types of applications. used by other organizations, and theory
certain applications it may do neither Yet the ISOS cannot trust every host to shows that they lead to optimal alloca-
its a paradigm, not a panacea. And de- play by the rules in return for its own tion of resources.
XPLANE
signing an ISOS also presents a number good behavior. Owners can inspect and Even with 150 million hosts at its dis-
of obstacles. modify the activities of their hosts. Cu- posal, the ISOS will be dealing in scarce
Financial modeling
SETI: Analysis
$ of celestial
radio signals
3-D rendering
and animation
World
Computer
Network Matching gene
sequences
Streaming media
File backup and pay-per-view service
archiving for
hundreds of years
A computer operating system that pro- proach popular in some peer-to-peer sys- ered on and connected to the network.
vides only core functions is called a micro- tems, but central servers are needed to Usage policies spell out the rules an own-
kernel. Higher-level functions are built on ensure privacy of sensitive data, such as er has dictated for using her resources.
top of it as user programs, allowing them accounting data and other information Task descriptions include the resources
to be debugged and replaced more easi- about the resource hosts. Centralization assigned to an online service and the
ly. This approach was pioneered in acad- might seem to require a control system queued jobs of a data-processing task.
emic research systems and has influenced that will become excessively large and To make their computers available to
some commercial systems, such as Win- unwieldy as the number of ISOS-con- the network, resource sellers contact the
dows NT. Most well-known operating nected hosts increases, and it appears to server complex (for instance, through a
systems, however, are not microkernels. introduce a bottleneck that will choke the Web site) to download and install an
The core facilities of an ISOS include system anytime it is unavailable. These ISOS agent program, to link resources to
resource allocation (long-term assign- fears are unfounded: a reasonable num- their ISOS account, and so on. The ISOS
ment of hosts processing power and ber of servers can easily store informa- agent manages the hosts resource usage.
storage), scheduling (putting jobs into tion about every Internet-connected host Periodically it obtains from the ISOS
queues, both across the system and with- and communicate with them regularly. server complex a list of tasks to perform.
in individual hosts), accounting of re- Napster, for example, handled almost 60 Resource buyers send the servers task
source usage, and the basic mechanisms million clients using a central server. Re- requests and application agent programs
for distributing and executing applica- dundancy can be built into the server (to be run on hosts). An online service
tion programs. The ISOS should not du- complex, and most ISOS online services provider can ask the ISOS for a set of
plicate features of local operating systems can continue operating even with the hosts on which to run, specifying its re-
running on hosts. servers temporarily unavailable. source requirements (for example, a dis-
The system should be coordinated by The ISOS server complex would tributed backup service could use spo-
servers operated by the ISOS provider, maintain databases of resource descrip- radically connected resource hosts
which could be a government-funded or- tions, usage policies and task descrip- Marys laptop which would cost less
ganization or a consortium of companies tions. The resource descriptions include, than constantly connected hosts). The
that are major resource sellers and buy- for example, the hosts operating system, ISOS supplies the service with addresses
ers. (One can imagine competing ISOS processor type and speed, total and free and descriptions of the granted hosts and
providers, but we will keep things simple disk space, memory space, performance allows the application agent program to
XPLANE
and assume a unique provider.) Central- statistics of its network connections, and communicate directly between hosts on
ization runs against the egalitarian ap- statistical descriptions of when it is pow- which it is running. The service can re-