Sie sind auf Seite 1von 42

What are Primary Sources?

Primary sources provide first-hand testimony or direct evidence concerning a topic under
investigation. They are created by witnesses or recorders who experienced the events or
conditions being documented. Often these sources are created at the time when the events
or conditions are occurring, but primary sources can also include autobiographies,
memoirs, and oral histories recorded later. Primary sources are characterized by their
content, regardless of whether they are available in original format, in
microfilm/microfiche, in digital format, or in published format.
Printed or Published Texts
Books & Pamphlets
Books and Pamphlets
Determining what is a primary source can be tricky, and in no case is this more apparent
than with books and pamphlets. From one vantage point, books are the quintessential
secondary source: scholars use primary source materials such as letters and diaries to
write books, which are in turn secondary sources. However, books can also be a rich
source of primary source material. In some instances, as in the case of published
memoirs, autobiographies, and published documents, it is easy to determine when a book
functions as a primary source.
Serials (newspapers, periodicals, magazines)
Serials (newspapers, periodicals, magazines)
A serial is a publication, such as a magazine, newspaper, or scholarly journal, that is
published in ongoing installments. Like books, serials can function both as primary
sources and secondary sources depending on how one approaches them. Age is an
important factor in determining whether a serial publication is primarily a primary or a
secondary source.
Government Documents
Government Documents
A government’s documents are direct evidence of its activities, functions, and policies.
For any research that relates to the workings of government, government documents are
indispensible primary sources.

A wide range of primary sources are found in government documents: the hearings and
debates of legislative bodies; the official text of laws, regulations and treaties; records of
government expenditures and finances; statistical compilations such as census data;
investigative reports; scientific data; and many other sources that touch virtually all
aspects of society and human endeavor. This information comes in a similarly wide
variety of formats, including books, periodicals, maps, CD-ROMs, microfiche, and online

What makes all these sources “government documents”? What all these sources have in
common is that they are published or otherwise made available to the general public by a
government for the general public, at government expense or as required by law. They
are a government's official “voice.” Government documents are usually housed in
separate sections of libraries, and have their own specialized arrangement and finding
Note that government document collections typically do not include primary legal
sources such as court decisions and law codes, which are often published by for-profit
publishers and are found either in the main library collection or in separate law libraries.

For decades the U.S. government has been the largest publisher in the world, but
government documents are also produced by regional, state, and local governments, and
by international bodies such as the United Nations and the European Union.
Manuscript and archival material
Manuscript & Archival Material
Manuscript and archival materials are unique resources that can be found in only one
library or institution (though digital copies or copies on microfilm or microfiche may be
available elsewhere). They are valuable primary source material for researchers in many
fields of study, including history, political science, sociology, literature, journalism,
cultural anthropology, health sciences, law, and education. Manuscripts and archival
materials are distinct from other library materials in the ways they are described,
accessed, handled and evaluated.

Manuscripts and archives are unpublished primary sources. The term archives, when it
refers to documents, as opposed to a place where documents are held, refers to the
records made or received and maintained by an institution or organization in pursuance of
its legal obligations or in the transaction of its business. The term manuscripts, which
originally referred to handwritten items, refers now to a body of papers of an individual
or a family. Both terms can encompasses a broad array of documents and records of
numerous formats and types. Archival records or manuscripts may include business and
personal correspondence, diaries and journals, legal and financial documents,
photographs, maps, architectural drawings, objects, oral histories, computer tape, video
and audio cassettes.
Maps are primary sources because they are created in particular cultural contexts.
Mapmakers may have hidden agendas or be influenced by political or social factors.
Maps may reveal misperceptions or deliberate misrepresentations.
Realia / Artifacts
Realia / Artifacts
Once functional objects used by people, realia and artifacts convey important information
about the lives and histories of peoples. Realia and artifacts are three-dimensional and
unlike two-dimensional objects such as books and manuscripts, can be either man-made
or naturally occurring. While all collected realia and artifacts are deemed as having
documentary value, some are valued for their intrinsic worth, others for their artistic
merit, and others for their historical significance or scientific value. Realia and artifacts
commonly used for research are: War memorabilia such as canteens, mess kits, and
uniforms Emblems and badges

> Cards and board games

> Jewelry, clothing, and textiles
> Leather goods
> Needlework
> Hair, wool, and silk
Tablets contain commemorative inscriptions, scholarly treatises, letters and business
documents, administrative accounts, and literature in poetry and prose, epic narratives,
recipes magic spells, and many other documents created in the ancient world.
Visual Materials
Visual Materials

The term “visual material” refers to any primary source in which images, instead of or in
conjunction with words and/or sounds, are used to convey meaning. Some common and
useful types of visual materials are as follows:

> Original art, including but not limited to paintings, drawings, sculpture, architectural
drawings and plans, and monoprints.
> Prints, which are works produced in multiple but limited numbers such as woodcuts,
engravings, etchings, and lithographs
> Graphic arts, including materials such as posters, trade cards, and computer generated
> Photographs
> Film and video

Any of these materials can provide valuable information to a researcher. Factual

information can often be extracted from visual materials; however, the best information
imparted by these materials is often of a subjective nature, providing insight into how
people see themselves and the world in which they exist.
Primary sources reveal information about the production and performance of music, aural
traditions, histories of musical composition, notation, and technique, information about
music theory and about individuals’ and cultures’ technological advancement, economy,
education, cognition, and more. The types of resources used in research include:

> Manuscript music scores

> Musical instruments
> Sheet music
> Historical and contemporary sound recordings on LP and disc
> Books, periodicals, photographs, and archives related to music and musician

Sound Recordings
Sound Recordings
Sound recordings include not only music but also the spoken word - poetry, plays,
speeches, etc.
Oral History
Oral History<
Oral history interviews and video memoirs provide important perspectives for historians.
Since the invention of the tape recorder in the 1950s, oral history projects of many kinds
have proliferated, ranging from the “man-on-the-street” type of interview to the more
formal Presidential archives. Oral history projects usually are centered on a theme, such
as Yale’s Oral History American Music project, which is dedicated to the collection and
preservation of oral and video memoirs in the voices of American musicians.
Dissertations are book-length studies based on original research and written in partial
fulfillment of requirements for the doctoral degree. Although usually secondary sources,
dissertations can themselves be primary sources or can be extremely helpful in
identifying and locating primary sources.

Dissertations that can be primary sources might be edited versions of texts or could be
used to analyze the influence of a professor on a generation of graduate students and, by
extension, on the teaching and writing in a discipline over a period of time. Because a
dissertation is based on original research, its bibliography will contain references to
primary sources used by the author and can often lead to manuscripts, diaries,
newspapers and other primary material of interest.

If we tracked nothing but urban legends on, the task of assigning truth values
to all the entries would be much easier. As Jan Harold Brunvand wrote in The Vanishing

[U]rban legends must be considered false, at least in the sense that the same rather
bizarre events could not actually have happened in so many localities to so many aunts,
cousins, neighbors, in-laws, and classmates of the hundreds and thousands of individual
tellers of the tales.

However, since this site also chronicles many items that do not fit the traditional
definition of "urban legend" (e.g., trivia, rumors, hoaxes, common misconceptions, odd
facts), our single rating system must be able to accommodate disparate types of entries.
As a result, the colored bullets we use to classify items have slightly different meanings
depending upon the nature of the entries being rated. Below are expansive definitions of
the colors' meanings:

White bullets are the ones most commonly associated with "pure" urban legends —
entries that describe plausible events so general that they could have happened to
someone, somewhere, at some time, and are therefore essentially unprovable. Some
legends that describe events known to have occurred in real life are also put into this
category if there is no evidence that the events occurred before the origination of the
Green bullets are used for two similar but distinct types of entries: claims that are
demonstrably true, and urban legends that are based on real events. For the former,
"demonstrably true" means that the claim has been established by a preponderance of
(reliable) evidence; for the latter, a green bullet indicates that the legend described is
based on an actual occurrence. (The word "based" is key here: many legends describe
events that have taken place in real life, but those events did not occur until the related
legend was already in circulation.)

Yellow bullets generally describe disputed claims — factual items which the available
evidence is too contradictory or insufficient to establish as either true or false. This
category also includes claims that have a kernel of truth to them but are not literally true
as stated. (For example, an entry that read "Soupy Sales was fired for asking children to
send him 'little green pieces of paper' on his TV show" would fit this classification
because even though Soupy Sales did make such a request, he was not fired for doing so.)
Some legends also fall into this classification when it cannot be determined whether the
legends preceded similar real life events, or vice-versa.

Red bullets mark claims which cannot be established as true by a preponderance of

(reliable) evidence. Some urban legends are also placed into this category because they
describe events too implausible to have actually occurred, or too fantastic to have escaped
mention in the media of the day.

Multi-colored bullets identify claims which are a mixture of truth and falsehood.

Uniform Behavior
Claim: A large number of "missing" UPS uniforms have been acquired by terrorists.


Example: [Collected via e-mail, 2003]

A large number of UPS uniforms have gone missing.

Please inform all properties to check ID's and be alert to "replacement" delivery

Please forward accordingly

SECURITY ALERT: $32,000 worth of UPS uniforms have been purchased over the last
30 days by person(s) unknown. Law enforcement is working on the case however no
suspect(s) have been indentified. Subjects may try to gain access by wearing one of these
uniforms. If anyone has suspicions about a UPS delivery (i.e., no truck but driver, no
UPS identification, etc., contact UPS to verify employment).

If you see or have seen a UPS delivery from an unknown driver please ask for proper ID
and be alert to any suspicious packages or deliveries. Please notify building security or
appropriate law enforcement.

There has been a huge purchase, $32,000.00 worth, of United Parcel Service (UPS)
uniforms on eBay over the last (30) thirty days. This could represent a serious threat as
bogus drivers can drop off anything to anyone with deadly consequences.

If you have any questions when a UPS driver appears at your door, they should be able to
furnish valid I.D. Additionally, if someone in a UPS uniform comes to make a drop off or
pick up, make absolutely sure they are driving a UPS truck. UPS does not make
deliveries or pick ups in anything except company vehicles. If you have a problem,
immediately call local law enforcement.

On 3/31/03 an alert was issued to UPS drivers. Drivers were asked to keep track of their
uniforms and to dispose of same according to UPS guidelines.

Some of you may have already heard the above information, but I will keep sending out
new alerts as I get them.

Origins: The potential for further terrorist attacks in the U.S. loomed great in the minds
of many in the wake of the September 11 terrorist attacks, with the perception of
impending danger at times working to color how some people saw and reacted to what
might otherwise have been regarded as fairly unremarkable occurrences.

In February 2003 a number of security alerts regarding UPS uniforms were distributed by
both private and law enforcement sources. They seemed to come from every direction,
with many of them stating their information originated with a warning issued by United
Parcel Service (UPS) regarding missing delivery personnel uniforms. Those who
encountered these warnings immediately linked them to the threat of
terrorism, at once grasping the potential for harm if al Qaeda members took to
impersonating office couriers. The warning about missing uniforms echoes another
terrorist-related rumor, one that asserted in the days immediately following the
September 11 attacks thirty Ryder, Verizon, and U-Haul trucks had gone missing,
presumably swiped by terrorists intent upon using them as camouflage for further

The rumor that a large number of uniforms were "missing" (implying they had been
stolen or hijacked and were now in the hands of persons unknown for use in nefarious
schemes, presumably terrorism-related activity) seems to have sprung from speculation at
the beginning of 2003 about the intentions of a small cadre of buyers who bid what
seemed like outrageously high sums for UPS uniforms on the on-line auction site eBay.
(Despite eBay's later claims to the contrary, UPS uniforms were being offered and sold
on their site as late as January 2003.) Because our new terrorist-aware mode of thinking
affects how we perceive events, many people skipped over other potentially less
terrifying explanations (e.g., uniform collectors adding to their stock, former UPS
employees acquiring old uniforms out of nostalgia, run-of-the-mill thieves needing cover
for their endeavors, uniform fetishists looking to spice up their sex lives with some 'home
delivery') and went straight to the assumption that UPS uniforms were being snapped up
terrorists. That several different people (or at least someone with several different eBay
IDs) were simultaneously bidding high prices for UPS uniforms did work against the
more mundane explanations, but terrorists' spending thousands of dollars on a public
auction site to buy up easily-duplicated brown uniforms wasn't much more plausible.
(Generally only someone with a strong emotional attachment to an inherently non-
valuable common object will insist upon owning an original and be willing pay an
exorbitant fee to acquire it; others are content with buying or making replicas.)

Many explanations for this rumor have been bruited about since its inception. Some of
the people who sold UPS uniforms (often acquired by purchasing them through thrift
shops) on eBay before the auction site clamped down on the practice early in 2003 said
they were contacted by "cyber crime" units who only wanted to verify that the uniforms
were not stolen and who told them that UPS was buying up their uniforms to keep them
off the street. Other people claim that a private firm hired by UPS has been buying up the
uniforms on their behalf, or even that due to national security concerns, the FBI has
arranged to be the top bidder for any UPS uniforms sold on-line. If there's anything to
these stories, nobody connected with them has been forthcoming about it yet. The
reponse we finally received from UPS via e-mail disclaimed any notion of "missing"
uniforms but reinforced the notion that UPS and law enforcement agencies are concerned
about recent sales of used UPS uniforms:
A number of security alerts regarding UPS uniforms recently have been distributed by
both private and law enforcement sources. There are two primary versions of these alerts:

1) Misleading reports of a missing shipment of UPS uniforms.

2) Alerts regarding a large number of uniforms being purchased by an individual.

Reports that a shipment of UPS uniforms is missing are simply not true. There is no
missing shipment of uniforms.

As for alerts regarding uniforms being purchased by an individual, this matter has been
investigated by law enforcement with UPS' involvement and cooperation and resolved to
the satisfaction of all parties.

UPS does not condone the sale or unauthorized use of its uniforms. UPS investigates
reports of such unauthorized use but due to security concerns, we are not at the liberty to
discuss such matters in any further detail.
As the Washington Post reported, law enforcement agencies, eBay, and UPS were all
eager to deny any claims of missing or stolen uniforms:
The FBI has debunked several similar UPS stories since the Sept. 11, 2001, terrorist

UPS spokeswoman Susan Rosenberg in Atlanta says the e-mail has been "thoroughly
investigated" by the FBI and local law enforcement. "It is the urban legend of missing
uniforms," she says.

EBay spokesman Kevin Pursglove also says the UPS story "comes up empty."
Our best estimate of how this story played out is that after UPS were alerted to online
sales of their discarded uniforms, they realized the potential public relations disaster that
would follow any unfortunate incident involving the use of a UPS uniform (terrorist-
related or not) and decided to work behind the scenes to convince on-line auction sites to
drop such listings, perhaps even quietly spending money themselves to buy up some of
the available uniforms. After all, you can't remain one of the world's top package delivery
services if people are afraid to open the door for your deliverymen.

Last updated: 31 March 2011


Oldenburg, Don. "UPS Rumors Are Uniformly Wrong."

The Washington Post. 8 April 2003 (p. C10).

Varian, Bill. "Delivery Togs Threat Dressed Down As Myth."

St. Petersburg Times. 15 July 2003.

The [Louisville] Courier-Journal. "Person in Brown Works for UPS."

2 March 2003.

Triangle of Life
Claim: Rescue expert's 'Triangle of Life' article gives good advice about earthquake


Example: [Collected via e-mail, December 2009]

Critical Earthquake Safety Information

Interesting perspective.
Please read this and pass it on — it could save your life!!


Edited by Larry Linn for MAA Safety Committee brief on 4/13/04.

My name is Doug Copp. I am the Rescue Chief and Disaster Manager of the American
Rescue Team International (ARTI), the world's most experienced rescue team. The
information in this article will save lives in an earthquake.

I have crawled inside 875 collapsed buildings, worked with rescue teams from 60
countries, founded rescue teams in several countries, and I am a member of many rescue
teams from many countries. I was the United Nations expert in Disaster Mitigation
(UNX051 - UNIENET) for two years. I have worked at every major disaster in the world
since 1985, except for simultaneous disasters.

In 1996 we made a film which proved my survival methodology to be correct. The

Turkish Federal Government, City of Istanbul, University of Istanbul, Case Productions
and ARTI cooperated to film this practical, scientific test. We collapsed a school and a
home with 20 mannequins inside. Ten mannequins did "duck and cover," and ten
mannequins I used in my "triangle of life" survival method. After the simulated
earthquake collapse we crawled through the rubble and entered the building to film and
document the results. The film, in which I practiced my survival techniques under
directly observable, scientific conditions, relevant to building collapse, showed there
would have been zero percent survival for those doing duck and cover. There would
likely have been 100 percent survivability for people using my method of the "triangle of
life." This film has been seen by millions of viewers on television in Turkey and the rest
of Europe, and it was seen in the USA, Canada and Latin America on the TV program
Real TV.

The first building I ever crawled inside of was a school in Mexico City during the 1985
earthquake. Every child was under their desk. Every child was crushed to the thickness of
their bones. They could have survived by lying down next to their desks in the aisles. It
was obscene, unnecessary and I wondered why the children were not in the aisles. I didn't
at the time know that the children were told to hide under something.

Simply stated, when buildings collapse, the weight of the ceilings falling upon the objects
or furniture inside crushes these objects, leaving a space or void next to them. This space
is what I call the "triangle of life". The larger the object, the stronger, the less it will
compact. The less the object compacts, the larger the void, the greater the probability that
the person who is using this void for safety will not be injured. The next time you watch
collapsed buildings, on television, count the "triangles" you see formed. They are
everywhere. It is the most common shape, you will see, in a collapsed building. They are
everywhere. I trained the Fire Department of Trujillo (population 750,000) in how to
survive, take care of their families, and to rescue others in earthquakes.
The chief of rescue in the Trujillo Fire Department is a professor at Trujillo University.
He accompanied me everywhere. He gave personal testimony: "My name is Roberto
Rosales. I am Chief of Rescue in Trujillo. When I was 11 years old, I was trapped inside
of a collapsed building. My entrapment occurred during the earthquake of 1972 that
killed 70,000 people. I survived in the "triangle of life" that existed next to my brother's
motorcycle. My friends who got under the bed and under desks were crushed to death [he
gives more details, names, addresses etc.]...I am the living example of the "triangle of
life". My dead friends are the example of "duck and cover".


1) Everyone who simply "ducks and covers" WHEN BUILDINGS COLLAPSE is

crushed to death — Every time, without exception. People who get under objects, like
desks or cars, are always crushed.

2) Cats, dogs and babies all naturally often curl up in the fetal position. You should too in
an earthquake. It is a natural safety/survival instinct. You can survive in a smaller void.
Get next to an object, next to a sofa, next to a large bulky object that will compress
slightly but leave a void next to it.

3) Wooden buildings are the safest type of construction to be in during an earthquake.

The reason is simple: the wood is flexible and moves with the force of the earthquake. If
the wooden building does collapse, large survival voids are created. Also, the wooden
building has less concentrated, crushing weight. Brick buildings will break into
individual bricks. Bricks will cause many injuries but less squashed bodies than concrete

4) If you are in bed during the night and an earthquake occurs, simply roll off the bed. A
safe void will exist around the bed. Hotels can achieve a much greater survival rate in
earthquakes, simply by posting a sign on the back of the door of every room, telling
occupants to lie down on the floor, next to the bottom of the bed during an earthquake.

5) If an earthquake happens while you are watching television and you cannot easily
escape by getting out the door or window, then lie down and curl up in the fetal position
next to a sofa, or large chair.

6) Everybody who gets under a doorway when buildings collapse is killed. How? If you
stand under a doorway and the doorjamb falls forward or backward you will be crushed
by the ceiling above. If the door jam falls sideways you will be cut in half by the
doorway. In either case, you will be killed!

7) Never go to the stairs. The stairs have a different "moment of frequency" (they swing
separately from the main part of the building).The stairs and remainder of the building
continuously bump into each other until structural failure of the stairs takes place. The
people who get on stairs before they fail are chopped up by the stair treads. They are
horribly mutilated. Even if the building doesn't collapse, stay away from the stairs. The
stairs are a likely part of the building to be damaged. Even if the stairs are not collapsed
by the earthquake, they may collapse later when overloaded by screaming, fleeing
people. They should always be checked for safety, even when the rest of the building is
not damaged.

8) Get Near the Outer Walls Of Buildings Or Outside Of Them If Possible - It is much
better to be near the outside of the building rather than the interior. The farther inside you
are from the outside perimeter of the building the greater the probability that your escape
route will be blocked;

9) People inside of their vehicles are crushed when the road above falls in an earthquake
and crushes their vehicles; which is exactly what happened with the slabs between the
decks of the Nimitz Freeway. The victims of the San Francisco earthquake all stayed
inside of their vehicles. They were all killed. They could have easily survived by getting
out and sitting or lying next to their vehicles, says the author. Everyone killed would have
survived if they had been able to get out of their cars and sit or lie next to them. All the
crushed cars had voids 3 feet high next to them, except for the cars that had columns fall
directly across them.

10) I discovered, while crawling inside of collapsed newspaper offices and other offices
with a lot of paper, that paper does not compact. Large voids are found surrounding
stacks of paper.

Origins: We can't say that every single point mentioned in the above article about
earthquake safety by controversial "rescue expert" Doug Copp is wrong or bad advice,
but there are some pretty substantial reasons why readers might want to take the article
(particularly its advice that everyone who uses the "duck and cover" technique in an
earthquake ends up crushed to death) with some very large grains of salt.

1) Disaster preparedness experts with the American Red Cross have disputed that
findings based on earthquake experiences in other countries (e.g., Turkey) are applicable
to earthquake situations that might occur in the United States, where building codes are
substantially different:
We at the American Red Cross have studied the research on the topic of earthquake
safety for many years. We have benefited from extensive research done by the California
Office of Emergency Services, California Seismic Safety Commission, professional and
academic research organizations, and emergency management agencies, who have also
studied the recommendation to "drop, cover, and hold on!" during the shaking of an
earthquake. Personally, I have also benefited from those who preceded me in doing
earthquake education in California since the Field Act was passed in 1933.

What the claims made by Mr. Copp of ARTI, Inc., does not seem to distinguish is that the
recommendation to "drop, cover, and hold on!" is a U.S.-based recommendation based on
U.S. Building Codes and construction standards. Much research in the United States has
confirmed that "Drop, Cover, and Hold On!" has saved lives in the United States.
Engineering researchers have demonstrated that very few buildings collapse or "pancake"
in the U.S. as they might do in other countries. Using a web site to show one picture of
one U.S. building that had a partial collapse after a major quake in an area with thousands
of buildings that did not collapse during the same quake is inappropriate and misleading.
2) The validity of the research methodology and conclusions expressed in the article
quoted above has been criticized by other disaster preparedness experts:
Copp likes to base his evidence on the Turkish "experiment" that he was involved with.
Unfortunately, unbeknownst to all involved, this was not an experiment at all, but rather a
voluntary organization's search and rescue exercise. My colleagues in Turkey corroborate
that a building scheduled for demolition was used as a search and rescue training
opportunity. They did decide to put mannequins in different spots to see what would
happen. And indeed they reported finding mannequins unharmed next to large and heavy

What is the problem with this? Simply this: To collapse the building, they rammed the
columns, causing the building to pancake. They did NOT simulate an earthquake.
Earthquakes come in waves. They cause lateral shaking. They cause a variety of different
kinds of damage. Since this experiment didn't produce anything resembling shaking it
really doesn't tell us anything at all about what would happen during an earthquake.
3) Doug Copp's claim that he performed rescue work at the World Trade Center (for
which he was paid $650,000 in compensation for injuries he supposedly sustained there)
has been challenged in a series of articles published in the Albuquerque Journal
describing him as a self-serving opportunist rather than a true rescue expert:
Self-proclaimed rescue guru Doug Copp's mission to ground zero was considered so
important that he had clearance to be flown to New York even though all civilian air
traffic in the United States had been grounded. Once there, he says he assumed a pivotal
role and sustained devastating injuries while wading through the "toxic soup" in search of
survivors and victims, and was awarded nearly $650,000 for his injuries. But there is
little evidence Copp performed real rescue work, and it is doubtful that he deserves

Doug Copp was awarded $649,000, tax free, from the fund set up to compensate victims
of 9/11. He says it's not enough. But it's doubtful he deserves anything. A Journal
investigation found little evidence that Copp did real rescue work in New York. His
forays into the rubble were to shoot video, some of which he tried to sell. His claim of
seeking medical care within the time frame appears false. All typical of Copp's years as a
self-proclaimed rescue guru.
(Other entries in the Albuquerque Journal's series of articles relayed complaints from
numerous people who dealt with Mr. Copp and noted that he was under investigation by a
U.S. Department of Justice fraud unit.)

We'd recommend sticking with safety information prepared by established earthquake

safety experts, such as the American Red Cross, the Federal Emergency Management
Agency, and Earthquake Country Alliance.
Last updated: 24 April 2010


Linthicum, Leslie. "New Mexican's Claims of Ground Zero Rescue Work Called Into
Albuquerque Journal. 11 July 2004.

Linthicum, Leslie. "'Bombero' Arrives at Ground Zero."

Albuquerque Journal. 12 July 2004.

Linthicum, Leslie. "Striking Out at Ground Zero."

Albuquerque Journal. 13 July 2004.

Linthicum, Leslie. "'Knucklehead' or Hero?"

Albuquerque Journal. 14 July 2004.

Linthicum, Leslie. "Widow Tells of Copp Ordeal."

Albuquerque Journal. 18 July 2004.

Linthicum, Leslie. "Feds Investigate 9/11 Injury Claim."

Albuquerque Journal. 18 July 2004.

Printed from, a service of the Albuquerque Journal


Sunday, July 18, 2004

Feds Investigate 9/11 Injury Claim
By Leslie Linthicum
Journal Staff Writer
A U.S. Department of Justice fraud unit is investigating former New Mexico
resident Doug Copp's claim to the September 11th Victim Compensation Fund.
An investigator for the Fraud Detection Office of the Justice Department's Office of
Inspector General has interviewed several of the people featured in the Journal's
investigation of Copp, "A 9/11 Phony."
The Journal published the four-part series last week.
Copp has defended his work at ground zero and has said the people who dispute his
claim are out to get him.
A Justice Department spokesman would not confirm the investigation, but the
department has told Rep. Tom Udall, D-N.M., that it is investigating.
Udall, who lobbied on Copp's behalf to the victim's fund, asked for the Justice
Department inquiry after Copp's work at ground zero was challenged.
"The Department of Justice's Inspector General has launched an investigation based
on our request coupled with the questions raised by the Journal's series," Udall
spokesman Glen Loveland said Friday. "We are pleased the department has responded to
our request to immediately launch an investigation to determine if Mr. Copp's claim to
the 9/11 Victim Compensation Fund is unfounded and assess whether there is enough
evidence to refer the matter for criminal prosecution."
The Journal's series examined Copp's claims that he played a key role in the World
Trade Center rescue and recovery operation and was seriously injured in the process.
Just hung around

No one who worked with Copp said he did real rescue work in the aftermath of the
terrorist attacks. Instead, they said, he took videotape at the site, tried to get on TV, hung
around a hotel and promoted himself.
In the Journal series:

Everyone who went to New York with Copp disputed his claims about what he did

Doctors questioned Copp's claims that he is seriously and terminally ill;

The doctor Copp says he sought care from within the fund's time requirement said the
encounter never happened;

And Copp's body-finding machine, which he said he invented, turned out to be a

commercially available gas detector.
In January, Copp received $649,000 from the fund, which was set up by Congress to
compensate those injured in the Sept. 11, 2001, terrorist attacks or the families of the
several thousand people who were killed.
A Justice Department spokesman in Washington, D.C., said the department has
investigated about a dozen fund fraud claims.
The fund processed about 2,900 death claims and 4,400 injury claims before it shut
down last month. Some of those claims were withdrawn or denied. The fund paid out
about $6.5 billion.
Several people have been charged with fraud in connection with false claims.
Some recent cases involving the fund, in which fraudulent claims were identified
before payment was made, resulted in sentences of between 12 and 18 months in prison
for mail fraud and making false statements. Mail fraud carries a maximum prison
sentence of five years and a $250,000 fine. The statute also requires the convicted person
to pay back the money taken by fraud.
Norm Cairns, spokesman for U.S. Attorney David Iglesias in Albuquerque, said if
Copp were to be prosecuted for making a false claim to the fund, the criminal
proceedings would likely occur in U.S. District Court in New Mexico.
'Bald-faced liar'

Copp flew to New York aboard the Journal Publishing Company's corporate jet on
Sept. 13. The flight had special clearance from the Federal Aviation Administration when
all civilian aircraft in the U.S. were grounded.
Copp got to the World Trade Center site 21/2 days after the terrorist attacks. The
compensation fund covered rescue workers who were injured there in the first four days.
Congress created the fund on Sept. 22, 2001, while Copp was still in New York. He
returned to his home in the East Mountains on Sept. 26 or Sept. 27.
Copp has since left New Mexico and is living in Nova Scotia, Canada, where he was
He did not respond to a phone call from the Journal about the Justice Department
investigation except to say, "Wow, I'm amazed you're calling me" and to refer the call to
a law firm he said was representing him.
John Norman, chief of special operations for the New York Fire Department, said a
special agent from the Justice Department interviewed him about Copp's claims to have
played a crucial role in World Trade Center rescue operations.
Norman said he told the investigator that Copp had no authority to be at the World
Trade Center site and was not in charge of clearing the underground cavities.
Norman also told the Journal Copp was a "bald-faced liar."
The Justice Department has also contacted Ray Lynch, who was the deputy
commissioner in the New York Mayor's Office of Emergency Services and who refused
to give Copp credentials when he arrived in New York.
Lynch told the Journal that Copp's claim to have high-level White House
authorization was fake.
Copp exposed

Andrew Hubert of the Virginia FEMA Task Force also said he has been contacted
by a Justice Department investigator.
Hubert contacted the Journal after reading the series last week and said he was the
first person Copp approached at the Jacob Javits Center in New York, where rescue
teams were getting credentials and assignments. Hubert said he has become familiar with
Copp from 15 years of traveling to foreign disasters.
"It dawned on me right away who he was and I exposed him to the rest of our
command and Ray Lynch," Hubert said.
Elliot Pierce, the Albuquerque doctor who Copp said he sought care from when he
was having coughing fits at the World Trade Center, was also contacted by the Justice
Pierce was never contacted by anyone with the September 11th Victim
Compensation Fund, even though Copp used an encounter he said he had with Pierce to
prove he met the fund's requirement for seeking medical help.
Pierce said he would have told anyone who called that he never spoke to Copp about
his health and wasn't even in New York until late on Sept. 18— after the time frame had

All content copyright © and Albuquerque Journal and may not be
republished without permission. Requests for permission to republish, or to copy and
distribute must be obtained at the the Albuquerque Publishing Co. Library, 505-823-

Semantic Web takes big step forward

By Paul Krill
Created 2008-01-15 09:00AM

The Semantic Web, a concept tossed around for years as a Web extension to make it
easier to find and group information, is getting a critical boost Tuesday from the World
Wide Web Consortium (W3C).

W3C will announce publication of SPARQL [1] (pronounced "sparkle") query

technology, a Semantic Web component enabling people to focus on what they want to
know rather than on the database technology or data format used to store data, W3C said.

The potential of the Semantic Web cannot be underestimated. By scanning the Web on
behalf of users, even Google's ad-based business model could be impacted, an analyst

SPARQL queries express high-levels goals and are easier to extend to unanticipated data
sources. The technology overcomes limitations of local searches and single formats,
according to W3C.

"[SPARQL is] the query language and protocol for the Semantic Web," said Lee
Feigenbaum, chair of the RDF (Resource Description Framework) Data Access Working
Group at W3C, which is responsible for SPARQL.

Already available in 14 known implementations, SPARQL is designed to be used at the

scale of the Web to allow queries over distributed data sources independent of format. It
also can be used for mashing up Web 2.0 data.

The Semantic Web, the W3C said, is intended to enable sharing, merging, and reusing of
data globally. "The basic idea of the Semantic Web is take the idea of the Web, which is
effectively a linked set of documents around the world, and apply it to data," Feigenbaum

"One way to think about the Semantic Web is the Web as one big database," said Ian
Jacobs, W3C representative. A database, he said, enables querying and manipulation of
data. More database-like Web sites are emerging, he said.
Comparing the Semantic Web to search sites such as Google, Jacobs said Google allows
for searching through document text, essentially. The Semantic Web, meanwhile, allows
for automation and combining of data, he said.

While the Semantic Web concept has been talked about for several years, Feigenbaum
believes momentum is building. He cited DBpedia, [2] which extracts structured
information form Wikipedia, as an example of a Web site based on the Semantic Web.

With the Semantic Web's ability to hone in on just the information a user needs,
companies based on a Web search advertising model such as Google may have to
reconsider their plans, said analyst Jonas Lamis, executive director of SciVestor.

"They may need to rethink their business model because if I have an agent that acts on
my behalf and finds things that are interesting for me, it's not necessarily going to be
reading Google ads to do that," Lamis said.

The goal of the Semantic Web is to serve as a giant set of databases that can be
integrated, Jacobs said. The Semantic Web has seen a lot of uptake in the health care and
life sciences, he said. The drug discovery and pharmaceutical fields can use it to take
clinical results and learn from data, according to Jacobs.

At pharmaceuticals company Eli Lilly, Semantic Web technology is being used for

"We're using it for our targeted assessment tools, which helps us to find out as much
information or find out lots of information about drug targets of interest," said Susie
Stephens, principal research scientist at Eli Lilly and chair of the W3C Semantic Web
Education and Outreach Working Group. A drug target is a protein in the body that is to
be modified with a particular drug.

"We use Semantic Web technologies to help us link to lots of information about the drug
targets," she said.

The SPARQL specification works with other W3C Semantic Web technologies. These
include: RDF, for representing data; RDF Schema; Web Ontology Language (OWL) for
building vocabularies; and Gleaning Resource Descriptions from Dialects of Languages
(GRDDL) for automatic extraction of Semantic Web data from documents.

SPARQL also can use other W3C standards such as WSDL.

The W3C RDF Data Access Working Group has produced three SPARQL
recommendations being issued Tuesday: the SPARQL Query Language for RDF;
SPARQL Protocol for RDF; and SPARQL Query Results for XML Format.
Participants in the working group include persons from companies such as Agfa-Gevaert,
HP, IBM, Matsushita, and Oracle. W3C released statements of support from numerous
parties, including HP and Oracle.

"SPARQL is a key element for integrated information access across information silos and
across business boundaries. HP customers can benefit from better information utilization
by employing semantic Web technologies," said Jean-Luc Chatelain, CTO of HP
Software Information management, in the company's statement.

"HP's Jena Semantic Web framework has a complete implementation of query language,
protocol, and result set processing," Chatelain said.

"As an active participant in this working group, Oracle believes the standardization of
SPARQL will play an instrumental role in achieving the vision of the Semantic Web,"
said Don Deutsch, Oracle vice president of Standards Strategy and Oracle, in Oracle's

Operating Systems

Source URL (retrieved on 2011-05-08 09:41AM):


Print Article
Close Window
Semantic Web set for critical mass
Paul Krill

Click here to find out more!

June 17, 2009 (InfoWorld)

The Semantic Web, the long-ballyhooed concept to make it easier to find pertinent
information and link varying types of data on the Web, is finally closing in on critical
mass, W3C (World Wide Web Consortium) officials contended Tuesday at a technical

The idea of a Semantic Web has been bantered about for what seems like a decade now,
but it has never gained much dominance on the IT stage. But all that may be changing.
[ Last year, the Semantic Web took a big step forward with the publication of the
SPARQL query technology. ]

"We're very close to [critical mass]," said Ralph Swick, technology and society technical
director for the W3C, in an interview at the Semantic Technology Conference in San
Jose, Calif. The conference features vendors pitching wares to commercialize semantic

"We've been working on it for 10 years, and we're starting to see the commercial pickup,"
Swick said. Critical Semantic Web technologies are under the jurisdiction of the WC3,
including RDF (Resource Description Framework) for representing information on the
Web; OWL (Web Ontology Language), enabling information in documents to be
processed by applications; and SPARQL for querying RDF data.

The Semantic Web can be used for applications, such as building better mashups or
publishing Facebook data, said Ivan Herman, Semantic Web activity lead at W3C. He
cited "mashups on steroids" as one use. Using Semantic Web technologies, information
could be accessed and linked, such as data in medical databases, geographical
information, and government data, rather than just documents, Herman said.

A presenter at the conference, Thomas Tague, a platform strategy official at Thomson

Reuters, advised technology developers to devise tools that use the Semantic Web and
semantics. Thomson Reuters uses semantic concepts in its OpenCalais service for data-

"Go build a tool. Don't start a user experience company," Tague said. Examples of such
tools could be a database to deal with semantic metadata or tools for modeling that data.

"Semantic technologies have been available for a long time," Tague said. "Frankly, it's
time to talk about where the money is. It's just time to have that conversation."

Monetization could come from perhaps adding semantic capabilities to social sites and
improving opportunities for advertising performance. Semantic search is another
possibility for the Semantic Web. Domain-specific search also presents possibilities in
such areas as real estate, music, or pharmaceuticals, Tague said. Semantic gaming also
presents an opportunity, with games being extraordinarily interactive, he said.

"Not everybody needs to be the next Google," Tague stressed. "You can build incredibly
strong, successful businesses by developing solutions that add high value to a small

The Web has progressed from Web 1.0, which was about destinations, to Web 2.0, which
added social activities, to Web 3.0, Tague said. "Web 3.0 is about cleaning up the mess
and harvesting the value you created in Web 2.0 and if we can make that happen, we'll
have a great Web," he said.
Google, with its Rich Snippets technology, and Yahoo, with SearchMonkey, are backing
Semantic Web technologies, leveraging RDF, W3C officials said. "The idea behind both
of these technologies is that the document author can provide additional data to the user
when Yahoo and Google present the search results," said Swick. Microsoft's Bing search
engine, which organizes answers, was highlighted in the showroom at the conference.

Also appearing at the conference was Tom Gruber, CTO, of Siri, which is devising its
"Siri Virtual Personal Assistant," which leverages a mobile phone and natural language
capabilities to help users accomplish tasks via the Internet. The technology features
location awareness and a conversational interface to help users, for example, find a local
business, see area maps, or perhaps to book tickets for a baseball game.

The technology does not leverage artificial intelligence but is "task-oriented," Gruber

W3C officials cited attendance at the event as evidence of the newfound staying power of
the Semantic Web. Conference presenters said attendance was in excess of 1,200 persons
as of Tuesday; last year's event attracted at least 1,000 attendees. Attendees include
technologists, researchers, venture capitalists and others.

Other semantic technologies touted at the conference included:

Cycorp TextPrism, offering personalized information feeds.

Saltlux Storm, a semantic business platform providing an infrastructure application
framework and development methodologies to extract and manage semantic metadata
from enterprise information.
SemanticV StingRay, a dynamic semantic information tool that discovers concepts in
documents and associates them with phrases, documents, blogs and other sources.
TextWise Semantic Signatures, to mine content to uncover deeper meanings in text
and create a "signature" for each document.
TopBraid Suite, a Semantic Web solutions platform from TopQuadrant featuring tools
to discover and visualize relevant data without programming.

The semantic Web gets down to business

It's still early going, but e-commerce and other sites are finding the investment well worth
their time, money and effort.
Elisabeth Horwitt

February 22, 2011 (Computerworld)

Despite the recession, luggage retailer enjoyed phenomenal 2010 holiday
sales -- some 33% higher than the previous year. (The online retail sector as a whole
reported a 15% gain this past holiday season.) Both Black Friday and Cyber Monday
sales set all-time records, according to Ebags Inc. co-founder Peter Cobb.

Cobb credits much of these gains to his company's deployment of Endeca Technologies
Inc.'s online retail platform, which uses semantic technology to analyze shoppers'
keyword choices and clicks, and then winnows down results from categories to
subcategories and microcategories. The end result? "Guiding the shopper to the perfect
bag very quickly," Cobb says.
Chris Cummings CTO Chris Cummings says that the online retailer's use of semantic-based
software has played a major role in increasing sales. "Since it was deployed, our
conversion rates have doubled," he reports.

Endeca's Web site navigation software allows shoppers to use type, brand, price and size
filters to get to relevant choices, Cobb explains. "With over 500 brands and 40,000 bags,
we recognized a few years ago how important semantic search and guidance was to the
shopping experience."

By providing highly detailed descriptions of products and their attributes, and linkages
between categories, the semantic technology has also enabled Ebags to attain higher
placement on Web search engine results pages, according to the e-retailer's chief
technology officer, Chris Cummings.

In the late 1990s, Tim Berners-Lee, now widely known as the father of the World Wide
Web, announced his vision of a "semantic Web" that would help people find exactly the
information, answer or product they were looking for. This would happen, he hoped,
without users having to design complex queries or try dozens of different keyword
combinations or sort through thousands of irrelevant URLs.

To help make this happen, the World Wide Web Consortium (W3C), under Berners-Lee's
direction, has developed standards that allow computer platforms and software agents to
identify, access and integrate information from disparate Web sites and domains, as well
as from various information silos within an enterprise.

Using the W3C standard Resource Description Framework (RDF), for example, retailers
and manufacturers could pass detailed product information back and forth, says Jay
Myers, lead Web development engineer at "Right now, a lot of our
vendors provide product information in spreadsheets, which makes it hard to distill." isn't currently taking full advantage of the W3C RDF's capabilities; that's
still a future goal, according to Myers. Indeed, Berners-Lee's dream is still a long way
from reality, although it's getting closer. Many business decision-makers remain skeptical
that the paybacks of adopting semantic technology will make up for the costs and risks.
What's needed is a killer app that will persuade a critical mass of business users to invest
in semantic Web software, says Phil Simon, a consultant and the author of The Next
Wave of Technologies.
About this story

This is first part of a two-part series about the semantic Web. This installment explains
various semantic Web technologies, including search. It explores their potential uses and
paybacks, illustrated with real business cases, including ones involving the use of
sentiment analysis. It also provides some best practices and tips from the trenches for
anyone planning, or at least considering, a deployment.

Part 2 provides an overview of commercial and open-source products, frameworks and

services that support semantic technology and discusses how they can be used as building
blocks to develop a successful semantic Web infrastructure. It also delves into the
implications of growing industry support for W3C semantic standards.

Slowly but surely, however, semantic Web technology is catching on. Business users in
industry sectors ranging from e-commerce, e-publishing and healthcare to marketing and
financial services are reaping its benefits, even if they don't always understand how it
works and even though hard ROI numbers have been hard to come by. An established
practice like sentiment analysis -- the art of figuring out what customers and others really
think of your company and product -- is getting a boost from semantic technology. (See
related story.)

Moreover, enterprise software vendors like IBM, Oracle, SAS and Microsoft have started
to incorporate semantic search and W3C standards into their platforms, as have Web
search engines like Google, Microsoft's Bing and Yahoo.'s Myers can attest to this: Soon after his team began adding semantic
metadata to product pages on store blogs, he reports, they saw an increase of about 30%
in "organic" search traffic -- meaning traffic that results from user searches rather than
clicks on Web ads.
What it's all about

Semantic software uses a variety of techniques to analyze and describe the meaning of
data objects and their inter-relationships. These include a dictionary of generic and, often,
industry-specific definitions of terms, as well as analysis of grammar and context to
resolve language ambiguities such as words with multiple meanings.

For example, the phrase "there are 40 rows in the table" uses rows as a noun, whereas
"she rows five times a week" uses rows as a verb. Likewise, the word stock has one
meaning in the phrase "I used beef bones for my soup stock," another in "the supermarket
keeps a lot of stock on hand" and yet another in "analysts are bearish on the stock."
Advice for going semantic

Remember that collaboration between subject matter experts and IT staff is crucial
when developing a semantic ontology.
Make sure you have a specific business mission before you build an ontology,
otherwise it will wind up being a useless exercise.
Resist the urge to jump in with both feet right away; it's better to go slowly, implement
projects that solve real problems and win converts along the way

Resolving language ambiguities ensures that a shopper who does a search using the
phrase "used red cars" will also get results from Web sites that use slightly different terms
with similar meanings, such as "pre-owned red automobiles," for example.

It also makes it possible for a user to, say, type in a complex query like "progressive rock
songs from the 1970s with odd time signatures and atmospheric feels" at a music Web
site like iTunes or and get back Pink Floyd, says Simon.

Once defined, content is tagged with descriptive metadata or "markups" and is mapped
into an ontology. (See diagram.) Ontologies are schema that describe data objects and
their relationships. Developing them is typically a collaborative effort involving
technicians who understand semantic schema and subject matter experts who understand
business language.
Semantic Web
A semantic network is a complex map of associations and meanings of words. It includes
all definitions of all words, as well as relationships among all words. Source: Expert
System SPA, Modena, Italy.

Semantic Web technology refers to products and architectures that support semantic
searches, queries, publishing and retrieval based on W3C standards. These include Web
Ontology Language (otherwise known as OWL), the Resource Description Framework
(RDF) and Simple Protocol And RDF Query Language (SPARQL), as well as existing
Web protocols like XML and HTTP.
The hidden helper's Cummings admits that he's not all that familiar with semantic technology.
However, he is very aware that Endeca's semantic-based online retail platform has played
a major role in increasing Ebags' sales. "Since it was deployed, our conversion rates have
doubled," he reports. (Conversion is the term used to describe what happens when a
shopper who clicks on a link to an e-commerce site actually buys something.)

Indeed, business users, and even some IT executives, don't always realize that their e-
commerce or enterprise software platforms are using semantic technology. However, they
definitely appreciate the paybacks.

In addition to stronger sales numbers, other benefits of semantic technology can include
more clicks from Web search engines, higher customer satisfaction rankings and,
internally, more timely and effective decision-making and faster responses to competitors
and market changes.

One of the earliest applications of semantic technology has been to help business users
more easily find and access the information they need, no matter where the data is located
and no matter who owns it.
Michael Lang, CEO of Revelytix, a maker of ontology-management tools in Sparks,
Maryland, is betting that semantic platforms will supplant traditional business
intelligence systems. The main reason he's expecting this to happen, he says, is because
semantic technology eliminates the need to extract, transform and load all relevant data
from disparate information silos into data warehouses and marts that need to be
constantly updated.

With semantic technology, all of that happens on the fly and in the background.

According to Lynda Moulton, an analyst at Gilbane Group, a Cambridge, Mass.-based

research arm of Outsell Inc., semantic technology can provide significant benefits for
enterprises that are confronted with data that has some combination of the following

• It's voluminous, with millions of unstructured documents.

• It's complex in scope and depth.

• It's valuable to end users, but in small, disparate pieces.

• It's needed by highly paid and highly skilled professionals for use in their areas of

• It's undifferentiated for e-discovery and research purposes. That means, for example,
that the information lacks metadata and is not available in a structured format that
supports intelligent searches.

• It's likely to have an impact on the bottom line, indirectly or directly, when discovered.

Semantic technology can process such information so that it can be "aggregated,

federated, pinpointed or analyzed to reveal concepts or meanings" that are logistically
impossible for human beings to obtain manually, Moulton says. Early adopters of
semantic technology included companies in the publishing and life sciences industries;
they're now being followed by enterprises "whose content has grown to proportions
unmanageable by humans," says Moulton.
Competing for clicks

Semantic technologies can "make search engines better or more precise in finding
relevant content," says Moulton. So if your company operated a retail Web site, that
would mean that semantically-enabled searches would do a better job of leading shoppers
to your site and then helping them find products they want to buy., for example, realized "high ROI in terms of increased store and product
visibility on the Web," Myers says. While adding semantic metadata to product pages on
some 1,100 store blogs was no small task, Myers' team saved a great deal of technical
grunt work by using GoodRelations, an ontology that German university professor Martin
Hepp developed specifically for e-commerce.
Jay Myers
Jay Myers, lead Web development engineer at, says that soon after his team
began adding semantic metadata to product pages in store blogs, they saw an increase of
about 30% in "organic" search traffic.

GoodRelations provides a standardized vocabulary -- the semantic Web term for

ontology -- for product, price and company data. This information can be embedded into
existing Web pages, then processed by other computers, applications and search engines
that support W3C protocols. As mentioned above, this makes richer product information
available to search engines that support W3C standards. It also provides the potential for
cross-domain semantic querying across e-commerce sites -- as long as other e-commerce
companies incorporate the vocabulary into their data, too. So far, only a handful of
retailers have done so, including and, more recently,

While Myers could give no hard numbers on time savings, he said that, in contrast with
most deployments of new methodologies and technologies, "we spent very little overhead
time implementing GoodRelations in our markup." After an "initial introduction,"
developers typically found working with GoodRelations as easy as coding standard
HTML, Myers says. is exploiting the power and precision of semantic search not only to help
shoppers find what they want but also to bring their attention to specific types of
products, such as "long-tail" items that don't generate huge sales, Myers explains. And
early last year, his team developed a program, based on semantic Web standards, that
makes it easy for store managers to publish information about "open box" or returned
products on the store's WordPress blog. Because these products are slightly cheaper, they
are much in demand among customers with budget restrictions, Myers points out.

Semantic Web platforms from vendors such as Expert System, Cambridge Semantics,
Sinequa and Lexalytics allow users to query both internal enterprise data, and Web
sources, including blogs, social networks like Facebook, and other Web 2.0 media.
Answering employees' questions

Bouygues Construction is using Sinequa's Context Engine to put employees in touch with
in-house experts who can answer their questions in a broad range of areas, says Eric Juin,
the worldwide construction firm's e-services and knowledge management director. "It
could be a lawyer, an engineer or an executive, anywhere in the world." The semantic
platform identifies and categorizes all experience within the company, worldwide, by
analyzing vast quantities of unstructured information, including training materials,
project documentation and other internal sources, as well as Web-based newspapers and
scientific publications, Juin says.
Eric Juin
Eric Juin of Bouygues Construction says there's plenty of anecdotal evidence that
semantic software has helped employees avoid mistakes on construction sites.
The platform is also being used to help knowledge workers quickly find information that
resides either on internal systems or on the Web, Juin says. The semantic engine pores
through documents, as well as comments from internal experts, and scores the material in
terms of relevance to the user's query, he adds.

Juin says that while no hard ROI numbers are available, there's plenty of anecdotal
evidence that the platform has helped Bouygues employees to avoid mistakes on
construction sites by allowing them to rapidly contact people who can answer their
questions. These anecdotes helped his staff cost-justify the deployment to management,
he adds. Not that the project was expensive. It cost "just a [small percentage]" of the cost
of Bouygues' ERP project, Juin says.
Tips from the trenches

Data housekeeping is a critical preliminary step, experts agree. "The extent to which
content is enriched with good metadata [means] you can start to build applications that
deliver on the promise of semantic Web," says Geoffrey Bock, an analyst at Gilbane

Consultant Simon says he's worked on a number of projects implementing

"breakthrough" information technologies, and he has learned that if you don't do
housekeeping chores like cleansing and deduplicating data, "you just have better access
to bad data."

Lee Feigenbaum, vice president of technology at Cambridge Semantics, advises IT and

business people to work together to determine a project where semantic technology
would yield "differential value." Will it speed up the development cycle? Enable end
users to infer new data? Improve experiences for customers or partners?
If you don't do housekeeping chores like cleansing and deduplicating data, you just have
better access to bad data.
Phil Simon, business and IT consultant

Take things slow, at least at first, Simon advises. The project will reach critical mass as
people get used to it and start to realize the benefits, he says.

Best Buy is doing just that. Its semantic Web deployment, which is about a year old now,
is very much a work in progress, Myers says -- as is the semantic Web itself.

"There are lots of semantic tools and open-source projects out there, plus SPARQL is a
really powerful query language," Myers says. "This gives me hope that semantic
technology is at least one of the answers to the problem of Big Data. We have this large
mass of data under our noses that we're not utilizing. If we can find a way to gain insight
from it and pass that on to customers and business partners, that's a big competitive

Semantic Web: Tools you can use

Want to get started with semantic technology? Here are some products and services to
check out.
Elisabeth Horwitt

March 23, 2011 (Computerworld)

Vince Fioramonti had an epiphany back in 2001. He realized that valuable investment
information was becoming increasingly available on the Web, and that a growing number
of vendors were offering software to capture and interpret that information in terms of its
importance and relevance.

"I already had a team of analysts reading and trying to digest financial news on
companies," says Fioramonti, a partner and senior international portfolio analyst at
Hartford, Conn.-based investment firm Alpha Equity Management. But the process was
too slow and results tended to be subjective and inconsistent.

The following year, Fioramonti licensed Autonomy Corp.'s semantic platform, Intelligent
Data Operating Layer (IDOL), to process various forms of digital information
automatically. Deployment ran into a snag, however: IDOL provided only general
semantic algorithms. Alpha Equity would have had to assign a team of programmers and
financial analysts to develop finance-specific algorithms and metadata, Fioramonti says.
Management scrapped the project because it was too expensive.

(For more information about semantic technologies, including search, see Part 1 of this
story, "The semantic Web gets down to business.")
Vince Fioramonti
Vince Fioramonti says Alpha Equity Management turned to a service provider to solve
the problems it encountered when implementing semantic technology.

The breakthrough for Alpha Equity came in 2008, when the firm signed up for Thomson
Reuters' Machine Readable News. The service collects and analyzes online news from
3,000 Reuters reporters, and from third-party sources such as online newspapers and
blogs. It then analyzes and scores the material for sentiment (how the public feels about a
company or product), relevance and novelty.

The results are streamed to customers, who include public relations and marketing
professionals, stock traders performing automated black box trading and portfolio
managers who aggregate and incorporate such data into longer-term investment

A monthly subscription to the service isn't cheap, Fioramonti says. According to one
estimate -- which Thomson Reuters would not comment on -- the cost of real-time data
updates is between $15,000 and $50,000 per month. But Fioramonti says the service's
value more than justifies the price Alpha Equity pays for it. He says the information has
helped boost the performance of the firm's portfolio and it has enabled Alpha Equity to
get a jump on competitors. "Thomson Reuters gives us the news and the analysis, so we
can continue to grow as a quantitative practitioner," he says.

Alpha Equity's experience is hardly unique. Whether a business decides to build in-house
or hire a service provider, it often pays a hefty price to fully exploit semantic Web
technology. This is particularly true if the information being searched and analyzed
contains jargon, concepts and acronyms that are specific to a particular business domain.

Here's an overview of what's available to help businesses deploy and exploit semantic
Web infrastructures, along with a look at what's still needed for the technology to achieve
its potential.
The key standards

At the core of Tim Berners-Lee's as-yet-unrealized vision of a semantic Web is federated

search. This would enable a search engine, automated agent or application to query
hundreds or thousands of information sources on the Web, discover and semantically
analyze relevant content, and retrieve exactly the product, answer or information the user
was seeking.

Although federated search is catching on -- most notably in Windows 7, which supports it

as a feature -- it's a long way from a Webwide phenomenon.

To help federated search gain traction, the World Wide Web Consortium (W3C) has
developed several key standards that define a basic semantic infrastructure. They include
the following:

• Simple Protocol and RDF Query Language (SPARQL), which defines a standard
language for querying and accessing data.

• Resource Description Framework (RDF) and RDF Schema (RDFS), which describe
how information is represented and structured in a semantic ontology (also called a

• Web Ontology Language (or OWL), which provides a richer description of the ontology
and also includes some RDFS elements.

The final versions of these standards are supported by leading semantic Web platform
vendors such as Cambridge Semantics, Expert System, Revelytix, Endeca, Lexalytics,
Autonomy and Topquadrant.

Major Web search engines, including Google, Yahoo and Microsoft Bing, are starting to
use semantic metadata to prioritize searches and to support W3C standards like RDF.

And enterprise software vendors like Oracle, SAS Institute and IBM are jumping on
board, too. Their offerings include Oracle Database 11g Semantic Technologies, SAS
Ontology Management and IBM's InfoSphere BigInsights.
Semantic basics

Semantic software uses a variety of techniques to analyze and describe the meaning of
data objects and their inter-relationships. These include a dictionary of generic and, often,
industry-specific definitions of terms, as well as analysis of grammar and context to
resolve language ambiguities such as words with multiple meanings.

The purpose of resolving language ambiguities is to help ensure, for example, that a
shopper who does a search using a phrase like "used red cars" will also get results from
Web sites that use slightly different terms with similar meanings, such as "pre-owned"
instead of "used" and "automobile" instead of "car."

For more information about semantic technologies, including search, see Part 1 of this
story, "The semantic Web gets down to business." It explores the technology's potential
uses and paybacks, illustrated with real business cases, including ones involving the use
of sentiment analysis. It also provides some best practices and tips from the trenches for
anyone planning, or at least considering, a deployment.

W3C standards are designed to resolve inconsistencies in the way various organizations
organize, describe, present and structure information, and thereby pave the way for cross-
domain semantic querying and federated search.

To illustrate the advantage of using such standards, Michael Lang, CEO of Revelytix, a
Sparks, Md.-based maker of ontology-management tools, offers the following scenario: If
200 online consumer electronics retailers used semantic Web standards such as RDF to
develop ontologies that describe their product catalogs, Revelytix's software could make
that information accessible via a SPARQL query point. Then, says Lang, online shoppers
could use W3C-compliant browser tools to search for products across those sites, using
queries such as: "Show all flat-screen TVs that are 42-52 inches, and rank the results by

Search engines and some third-party Web shopping sites offer product comparisons, but
those comparisons tend to be limited in terms of the range of attributes covered by a
given search. Moreover, shoppers will often find that the data provided by third-party
shopping sources is out of date or otherwise incorrect or misleading -- it may not, for
example, have accurate information about the availability of a particular size or color.
Standards-based querying across the merchants' own Web sites would enable shoppers to
compare richer, more up-to-date information provided by the merchants themselves.

The W3C SPARQL Working Group is currently developing a SPARQL Service

Description designed to standardize how SPARQL "endpoints," or information sources,
present their data, with specific standards for how they describe the types and amount of
data they have, says Lee Feigenbaum, vice president of technology at Cambridge
Semantics and co-chair of the W3C SPARQL Working Group.
Building blocks and software tools
Tools, platforms, prewritten components and services are available to help make
semantic deployments less time-consuming, less technically complex and (somewhat)
less costly. Here's a brief look at some options.

Jena is an open-source Java framework for building semantic Web applications. It

includes APIs for RDF, RDFS and OWL, a SPARQL query engine and a rule-based
inference engine. Another platform, Sesame, is an open-source framework for storing,
inferencing and querying RDF data.

Most leading semantic Web platforms come with knowledge repositories that describe
general terms, concepts and acronyms, giving users a running start in creating ontologies.
"Customers have conflicting demands: to have the platform be able to come back with
accurate answers out of the box, and to have it tailored to their business area," says Seth
Redmore, vice president of product management at Lexalytics.

To address that quandary, Lexalytics sells its semantic platform primarily to service
provider partners, who then fine-tune it for specific business domains and applications.
Thomson Reuters' Machine Readable News service is one example.
You're not just looking for a needle in a haystack -- you're looking for the right haystack.
Semantics provides a critical means of separating the wheat from the chaff.
Mills Davis, managing director, Project10X

Other platform vendors have been rolling out business-specific solutions. Endeca, for
example, provides application development toolkits for e-business and enterprise
semantic applications, including specific offerings for e-commerce and e-publishing.

There are also tools to automatically incorporate semantic metadata, and W3C standards,
into existing bodies of information. For example, Revelytix's Spyder utility automatically
transforms both structured and unstructured data to RDF, according to Lang. It then
presents, or "advertises," the information on the Web as a SPARQL endpoint that can be
accessed by SPARQL-compliant browsers, he adds.

An open-source tool called D2RQ can map selected database content to RDF and OWL
ontologies, making the data accessible to SPARQL-compliant applications.

Revelytix sells a W3C-compliant knowledge-modeling tool called, a wiki-

based framework designed to help everyone from technical specialists and subject matter
experts to business users collaboratively develop a semantic vocabulary that describes
and maps domain-specific information residing on multiple Web sites. Communities of
interest can then use to access, share and refine that knowledge, according to

For example, consultancy Dachis Group has developed what it calls a Social Business
Design architecture whose purpose is to help users collaborate, share ideas and then
narrow down and "expose and make sense of" data within a business organization or
other community of relevant individuals, such as customers or partners, says Lee Bryant,
managing director of the firm's European operations.

Such offerings can significantly ease the task of developing a semantic infrastructure. For
instance, Bouygues Construction used Sinequa's semantic platform, Context Engine, and
needed only about six months to do an initial implementation of a semantic system for
locating in-house expertise, according to Eric Juin, director of e-services and knowledge
management at Bouygues.
Eric Juin
Eric Juin says it took Bouygues Construction only six months to develop its semantic

Bouygues has since developed a semantic search application that helps knowledge
workers quickly find information that resides either on internal systems or on the Web,
Juin says.

Context Engine indexed and calculated the relevance of people and concepts in a half-
million documents, including meeting minutes, product fact sheets, training materials and
project documentation, he says. The platform includes a "generic semantic dictionary" of
common words and terms, which it can translate between various languages, according to
Juin. For example, a French employee could semantically search a document written in

Certain business-specific acronyms and terms have to be added manually -- that's an

ongoing process that requires semantic experts to collaborate with business users, Juin
says. Over time, however, his group has been adding fewer keyword definitions, because
the semantic engine can use other, related words to determine a term's relevance to a
specific subject, he says.
The SaaS option

Companies that lack the internal resources to build their own semantic Web infrastructure
can follow Alpha Equity's lead and go with a semantic service provided by a third party.

One such provider is Thomson Reuters, which, in addition to its Machine Readable News
service, offers a service called OpenCalais through which it creates semantic metadata for
customers' submitted content. Customers can deploy that tagged content for search, news
aggregation, blogs, catalogs and other applications, according to Thomas Tague, a vice
president at Thomson Reuters.

OpenCalais also includes a free toolkit that customers can use to create their own
semantic infrastructures and metadata, and to set up links to other Web providers. The
service now processes more than 5 million documents per day, according to Tague.

DNA13 (now part of the CNW Group), Lithium Technologies (now the owner of Scout
Labs) and Cymfony are among the semantic service providers that query, collect and
analyze Web-based news and social media, with an eye toward helping customers in
areas such as brand and reputation management, customer relationship management and
When will the semantic Web really matter?

In a 2010 Pew Research survey of about 895 semantic technology experts and
stakeholders, 47% of the respondents agreed that Berners-Lee's vision of a semantic Web
won't be realized or make a significant difference to end users by the year 2020. On the
other hand, 41% of those polled predicted that it would. The remainder did not answer
that query.

The basic W3C standards are finalized and gaining support, and there's an increasing
number of platforms and software tools. Still, semantic Web technology -- and standards
-- are far from achieving that critical mass of support needed to fully exploit their
benefits, experts agree.

It's important at this point to make a clear distinction between semantic technologies in
general and semantic Web technologies that make use of W3C standards and that
specifically apply to Web information sources.
Semantic Web

Semantic technologies are definitely catching on, particularly for enterprise knowledge
management and business intelligence, experts agree. The market for semantic-based text
analysis tools that help users "find what they want in unstructured information" is
growing at about 20% per year, says Susan Feldman, an analyst at research firm IDC.
Moreover, most enterprise search platforms now include semantic technology, she says.

Compared with more traditional BI tools, one of semantic technology's main benefits is
that it gives subject matter experts the ability to build their own query structures without
IT needing to go through the rigorous and time-consuming tasks of building and then
rebuilding data warehouses and data marts. For example, "an expert in, say, compliance
and regulations can build a semantic structure in two weeks, not nine months," and then
change it quickly and easily, says Mills Davis, managing director at semantic research
firm Project10X.

Other benefits of semantic technology -- again compared to traditional BI tools -- include

the ability to perform more complex and broader queries and analysis of unstructured
data, and the ability to start small with targeted queries and then grow and evolve in small

On the Web, semantic technology has established a firm foothold in a growing number of
niche business markets. One is e-publishing, where online news services DBpedia,
Geonames, RealTravel and MetaWeb (Freebase) were early adopters. Another is the
online financial information services business, where companies such as Thomson
Reuters and Dow Jones have jumped on board. Some of the prominent users of Thomson
Reuters' OpenCalais offering include news media organizations like CBS Interactive and
its CNET unit, Slate, the Huffington Post, and e-news aggregator Moreover
Technologies. Furthermore, over 9,000 online publishing sites now use OpenPublish, a
package that integrates OpenCalais with Drupal, an open-source content management

More recently, online retailers have started deploying semantic Web platforms to help
optimize product and brand placements in search engine results, and to provide
consumers with richer and more efficient shopping experiences.
Obstacles to overcome

Still missing, however, is widespread support of W3C standards and common

vocabularies that will facilitate semantic queries across different Web and business
domains. Right now, the majority of semantic Web schemas are developed by, and
proprietary to, individual companies both on and off the Web, and different groups within
a business enterprise. Such frameworks often contain business- and function-specific
terms, jargon and acronyms that don't translate well to other knowledge domains. As a
result, to do cross-domain querying, semantic applications and services must interface
with each information source's ontology individually, according to industry sources.

Take the case of Eni. The global energy company's technical and subject matter experts
have spent 12 years developing and fine-tuning a semantic-based BI platform based on
Expert System's Cogito, according to Daniele Montanari, Eni's practice leader for
semantic technologies. The platform supports oil-, gas- and power-related trading,
production and logistics processes, Montanari adds.

Cogito allows Eni's end users to go to a preselected and often presubscribed information
source on the Web, locate key information on a particular topic and generate a "corpus"
that can then be downloaded, automatically updated and semantically queried, Montanari
Daniele Montanari
Semantic schemas tend to be specific to a particular business area, says Eni's Daniele

Semantic schemas tend to be specific to a particular business area, Montanari says. For
example, the company's refining division has developed semantic frameworks and
classifications to quickly locate information within a vast corpus of articles. Many of
those articles were written by Eni's R&D group, while others come from Web sources to
which the group subscribes, he notes.

However, generalized Web searches -- say, for the latest technical developments in the
oil industry -- are problematic because each site has its own largely proprietary ontology,
Montanari says. "To cover multiple sources within an information domain, you have to
define a common semantic model," he notes.

The same issues apply to internal semantic queries, Montanari says. His group had once
hoped to create an enterprisewide semantic schema that would "model and map
correspondences for everything in our databases and data sets, with no ambiguity
anymore," but the company was unable to resolve differences among business domains
including oil, gas, R&D, marketing and others.

"Even at the linguistics level, there are issues," he notes. As a result, internal queries tend
to remain within a particular business group or specialty.
Moving things along

Standardized ontologies are starting to crop up in industries feeling regulatory and/or

customer pressure, such as healthcare and pharmaceuticals. Whether e-commerce
companies will rally around a common schema remains to be seen.

One such effort is the GoodRelations e-commerce semantic vocabulary. As of now, only
a handful of companies, including and, have signed up for
it. Google recently announced that it also supports the vocabulary, according to Hepp
Research, which markets and publishes GoodRelations.

"Like telephones and the Internet... the technology becomes more valuable as more
people use it," says Phil Simon, a consultant and the author of The Next Wave of
Technologies. What's still missing, for many businesses, is a clear payback that will
justify the often major cost of deployment, he adds. A company that wants to make a
large body of unstructured information accessible, either internally or on the Web, "can
spend years and years setting up a semantic Web infrastructure ... before it sees a payoff,"
says Simon, noting that such efforts can involve huge investments in cleaning up and
tagging of data, plus investments in new technologies.

Indeed, the semantic Web, like many other groundbreaking information technologies
before it, may be stuck in a classic Catch-22: A critical mass of users is needed before the
benefits kick in, but businesses, particularly e-commerce companies, won't jump in until
that magic number is reached.
Just because there aren't any universal standards out there doesn't mean we can't start
giving machines a shot at some semblance of product categorization.
Jay Myers, lead Web development engineer,

In his blog, Random Musings From Jay Myers,'s lead Web development
engineer, Jay Myers, says: "Product categories can be unique to a retailer/manufacturer,
and with billions of consumer products and endless numbers of product categories,
universal product categorization seems to be an unreachable goal. I have seen a few
attempts at mass product categorization, but I haven't seen a ton of progress (who would
want to manage a massive global product taxonomy?!). Furthermore, getting consensus
on category definitions seems like a futile effort that should really be avoided."

More optimistically, he goes on, "just because there aren't any universal standards out
there doesn't mean we can't start giving machines a shot at some semblance of product
categorization" using available W3C standards and ontologies like GoodRelations.
"That's a win-win," he adds, "because the business gets satisfied customers, and the
customer makes optimal buying decisions based on relevant product data."
Indeed, many other members of the semantic Web community remain hopeful that
semantic technology will revolutionize the Web -- eventually.

"Increasing user and data mobility, and the expansion of Internet services and digital data
information into everyday life, are pushing us in the semantic direction," says
Project10X's Davis. With the rapid proliferation of Web information sources whose
provenance is questionable, he continues, "you're not just looking for a needle in a
haystack -- you're looking for the right haystack. Semantics provides a critical means of
separating the wheat from the chaff."

"When I mention semantic Web in tech circles, nine out of 10 don't know what the hell
I'm talking about," Next Wave author Simon notes. "But do I believe in its power, and
that it will be a game changer down the road? Absolutely."

Horwitt, a freelance reporter and former Computerworld senior editor, is based in Waban,
Mass. Contact her at

Emerging Semantic Web Technology Could Help Intelligence Analysts Spot New Terror
| February 11, 2011

In light of 9/11, the attempted Christmas Day bombing in 2009 and even last year’s
WikiLeaks incident, it’s clear that the search and information-sharing process across
government intelligence databases is flawed and missing an element that would
potentially enable analysts to see threats and prevent future incidents.

Semantic technology is used increasingly to help organizations manage, integrate and

gain intelligence from multiple streams of unstructured data and information. Semantic is
unique in its ability to exceed the limits of other technologies and approach the automatic
understanding of a text. While Semantic Web — a web of understood word meanings and
connections — technology is quickly eclipsing first-generation, keyword-based index
search systems and second-generation social media interaction, the transition is far from
complete. Nowhere is this technology more useful than in the national intelligence space.

As a semantic technology professional, I think about how semantic technology could

have aided in connecting the dots between the available information in the government
intelligence community in the 2009 Christmas bomber case, and most recently, the highly
publicized leak of classified government information about the war in Afghanistan.

As a former intelligence analyst, I know the frustration of lacking both complete

information and computer systems capable of aiding the analysis process. Almost a
decade after 9/11 and untold dollars later, the nation still struggles with effective
intelligence sharing. An often mentioned issue is the lack of collaboration among
intelligence teams on the analysis of incoming information from the multitude of existing

The Los Angeles Times points to others:

“Lawmakers have been pushing for a capability to search across the government’s vast
library of terrorism information, but intelligence officials say there are serious technical
and policy hurdles. The databases are written in myriad computer languages; different
legal standards are employed on how collected information can be used; and there is
reluctance within some agencies to share data.”

The newspaper then makes the connection to 2009’s Christmas bomber threat:
“That makes it harder to connect disparate pieces of threat information, which is exactly
what went wrong in the case of Umar Farouk Abdulmutallab, a Nigerian who on
Christmas Day tried to blow up an airplane using explosives sewn into his underwear.
The bomb failed to detonate, and a passenger jumped on him.”

Analysts must have a reason to collaborate. They must foresee or imagine how one or
more evidence streams, often with many missing elements, overlap or fold into one
another to form a complete picture. The reality is, even really good human analysts
cannot juggle more than 50 to 60 data points — events, names, places, times, dates and
the connections between them — at once.

But good technology that mimics the same approach has no such limitation. Allowing
such a system to build the larger picture — to connect the dots — through trial and error,
quickly and repeatedly with an analyst reviewing that picture for plausibility, internal
consistency and impact, would be a more effective approach than adding a small army of
new analysts to the problem.

A system that proactively and constantly builds and tests all the available evidence on a
person, action, event, etc., is the current architecture of a Semantic Web. This approach is
becoming prevalent in the private sector, and governments also are now taking advantage
of the Semantic Web rather than a simple web of keywords.

To test this proposition, I used the timeline of known facts about the 2009 Christmas
bomber as reported by The New York Times. Although this is a retrospective view, I
wanted to know what I would have concluded over time, if I were an analyst and had
good information sharing and robust analytical support, such as current Semantic Web
technology can provide.

To begin, I took all the known facts and began to process them semantically. I used a
semantic search and analysis system to analyze the content for people, places, things,
facts, time and geography, but most importantly, for events. Such analysis answers: Who
did what to whom when and where? Based on our established event timeline, in summer
2009, Abdulmutallab would have hit intelligence databases when Britain placed him on a
watch/no-fly list after his student visa was rejected.
We can see right away that Abdulmutallab was known to have studied in 2004 and 2005
in Sanaa, Yemen; he has a direct connection to the radical Yemen cleric Anwar al-
Awlaki; he’s loosely connected to al-Qaida because of his presence in Yemen; and he
disappeared in September 2009. But the most important thread is that he was already on
Britain’s watch/no entry list.

On the whole, perhaps this picture doesn’t portray a person who has planned a terrorist
attack. But more connections come to light when we continue to build the picture of
Abdulmutallab into fall 2009.

Through November 2009, several things become apparent. First, the number of
connection points has risen significantly between Abdulmutallab, Yemen and al-Qaida
relative to the previous summer. Second, the number of evidentiary warning signs around
Abdulmutallab has grown to include his father, the United Nations and several U.S.
agencies (e.g., the National Security Agency and National Counterterrorism Center).
Third, there’s a lack of communication or information sharing among U.S. agencies.

Nonetheless, Abdulmutallab was placed on a terrorist watch list but not on the more
restrictive no-fly list. This may not have been the case if analysts had a diagram that
visualized the increased strength between him and al-Qaida, as well as the increase in
additional connections of concern at this stage of this analysis.

In retrospect, we know that there was still more time. Adding the events from December
2009 in the examination makes the graph richer still:

Once again, as the connection between Abdulmutallab, Yemen and al-Qaida increases,
more U.S. agencies take note, and now he has purchased airline tickets with a U.S.
destination and didn’t check any baggage (a Transportation Security Administration
warning signal since 9/11). As with most intelligence analysis, the strongest indicators
come too late, so understanding how to fit them into the overall picture quickly is
essential — in this case, the time it took Abdulmutallab to fly from Africa to the
Netherlands and then to the United States. Semantic technology that can visualize the
new input, can speed up analysts’ understanding.

Semantic Web technology can provide a window into how people, places, things and
events come together into threats and opportunities. It’s impossible to expect analysts to
manually “see” how anomalous and imperfect evidence streams fit together. And there is
always more than one way that they fit together.

Let machines do what they’re good at. Namely when coupled with semantic
understanding, measuring endless clues and hints, fitting, testing, removing and adding
various puzzle pieces to see if the picture starts to make sense. Past a certain threshold,
analysts can take over and do the work computers never will be able to do: apply human
judgment and reasoning. Otherwise judgment and decision never arrive, connections are
never made, and red flags are never raised.
The timeline of these past and recent events (9/11, the Christmas bomber and recent data
leakage in Washington, D.C.) show a serious need to address the gaps in our country’s
intelligence procedures and sharing processes. And this is where Semantic Web comes in.

Brooke Aker is CEO of Expert System USA. He writes and speaks on topics, such as
competitive intelligence, knowledge management and predictive analytics.


Semantic Search Gets On The Map

By Jennifer Zaino on February 4, 2011 1:00 PM

It’s as important for users to be able to visualize the content through which they must
navigate – whether the data hails from the open source web or internal repositories – as it
is to be able to discover semantically relevant information in the first place. A new
partnership between semantic vendor Expert System and geo-analysis vendor Esri Italia
aims at making both happen.

“We believe that … traditional search is actually a commodity,” says Luca Scagliarini,
Expert VP, Strategic and Business Development. What isn’t a commodity is search that
takes advantage of semantics for classifying, categorizing and indexing entities, finding
relationships among them, and then displaying relevant discoveries as points on a map.
“The result of our semantic engine enables us already to have a very rich way to navigate
content by its attributes,” he says. “But you don’t just want the granularity of the anaylsis,
but also a layer of visualization that enables the user to navigate content in different

The two companies working together should lead to the next version of Expert’s Cogito
platform software letting users see their search results represented on a map, thanks to the
ability to semantically understand and coordinate geographic mentions in text. Part of the
upcoming release is a search and analysis engine capability to use the map as the search
box, selecting an area and retrieving documents relevant to it,

Scagliarini gives an example of how such capabilities can play out for enterprise use.
Consider the situation of a vp of a company with supply chains stretching across the
globe – it’s important for that person to have a real-time view of what’s going on in those
locales for issues that could impact the supply chain. Now, if that supply chain happens
to involve moving oil across the Suez Canal, the VP probably didn’t need to do much
digging across open web data to figure out that it’s time to think about contingency
shipment plans.

But the situation is different when the signals are weaker – for instance, a breakout of
illness in an off-the-radar part of the world where a factory exists. The only sources of
up-to-the-minute information about that community may be a local newspaper, or
perhaps word coming through on social media forums. Being able to discover such data
that is semantically relevant to the location from these sources, and plot it out for quick
insight, can be invaluable for that VP.

And, if you have to monitor a lot of information from many sites where factories are
located, “having a layer on top that lets you turn on the web browser and look at a map
and see some red dots automatically telling you something of interest is happening in
those geo-locations is a way to be more effective,” Scagliarini says. “The idea is a map in
this case but there are different visualization layers to make the work of analysts more
effective and efficient at the end of the day.”

Expert expects to soon announce partnerships with other vendors to deliver additional
presentation layers from which users can explore other visual navigatations through

The Voice of Semantic Web Business

SemTech 2011
SemanticLink Podcast
SemTech Semantic Conference Data
Semantic Web 100
Conference Videos
Industry Verticals
Health Care/ Life Sciences
Financial Services
Marketing & Advertising
Ask a Question
Business, Enterprise Data, Insight
Myths About Classification: Computers do not Think, They Compute
By Bryan Bell on March 8, 2011 11:29 AM

Guest post by by Bryan Bell, VP of Enterprise Solutions, Expert System

Bryan BellAs companies around the world strive to find better options to manage the
inflow of unstructured information, they often turn to classification systems to organize
the chaos.

Classification systems give structure to massive volumes of information, with the

overriding goal of increasing discoverability. Organizations are working to manage large
sets of data by classifying it into hierarchical trees (or taxonomies) based on the
commonality of the content. You could say that classification is much like a multi-level
sort, grouping similar information across multiple category classes.

Classification systems make it easier to understand and process large volumes of data by
assigning sets of rules to a node within a classification tree. Various classification
methods are being used to apply knowledge to the nodes via a set of specific rules. The
challenge is building and organizing a system in a logical order that covers a multitude of
user perspectives–building the proper categories, assigning the proper classification to
those categories and describing what belongs in each category.

The development of classification systems and the management of data has quickly
become a science. Generally speaking, a classification system will contain several parts:
1) The collection itself, 2) A classification hierarchy (tree) that categorizes documents by
topic, 3) Sample documents describing the type of content to be classified within each
category/node of the hierarchy and 4) An information platform that drives collection of
content from the appropriate data sources and then places the content in the correct

What is the right method to implement an automatic categorizer?

A variety of technologies have been developed in an attempt to automate the process of

classification. But essentially there are really just three main approaches that have been
utilized over the past decade: the “bag of keywords” approach, statistical systems and
rules based systems. How do you know which approach is best for you? Wading through
the marketing clutter can be difficult, but it’s important to choose the best solution for
you based on facts (and not claims):

1. Manual approach. The “bag of keywords” approach requires compiling a list of “key
terms” that describes the type of content in question, and using them to develop a
metadata list to classify content to a specific node. One problem with this approach is that
identifying and organizing a list of terms (preferred, alternate, etc.) is quite labor
intensive, is not scalable, and in the end, doesn’t address the ambiguity in language. In
addition, such a system must be continuously updated as content within the information
being gathered is ever changing. This is the oldest method and time after time, it has
proven to be inefficient, inaccurate and simply not scalable.

2. Statistical approach. Statistical systems use a “training set” of documents that talks
about the same topic and uses different algorithms (Bayesian, LSA or many others) to
extract sets of key elements of the document to build implicit rules that will then be used
to classify content. These systems have no understanding of words, nor can the system
administrator pinpoint why specific terms are selected by the algorithm or why they are
being weighted. In the event the classification is incorrect, there is no accurate way to
modify a rule for better results. The only option is to select new training documents and
start again. As content changes, new rules must be recreated to keep up with changes in
the content being ingested. Finally, let’s not forget that in reality, most organizations do
not have a training set of documents for each node of the category, which inevitably
causes accuracy and scalability issues. This approach sounds appealing to many because
it seems fully automatic (although training requires a significant amount of time and
manual work) and sounds wonderful due to the savings in time and manpower. In reality,
the idea that a computer system can magically categorize content has created exaggerated
expectations which have not only proven to be unrealistic and unreliable, but also
detrimental to the industry.

3. Rules based approach. In this approach, rules are written to capture the elements of a
document that assign it to a category; rules can be written manually or generated with an
automatic analysis and only then validated manually (a time savings of up to 90%). Rules
are deterministic (you know why a document is assigned to a category), flexible,
powerful (much more than a bag of keywords) and easy to express. The best option is to
use a true semantic technology to implement semantic rules (rules can be written at the
keyword or linguistic level), which makes it possible to leverage many elements to work
faster and obtain better results:

A. A semantic network provides a true representation of the language and how

meaningful words are used in the language in their proper context.

B. With semantics, rules are simpler to write and much more powerful, making it
possible to work with a higher level of abstraction.

C. Once rules are written, the classification system provides superior precision and
recall because semantic technology allows words to be understood in their proper

D. Once the system is deployed, documents that do not “fit” into a specific category
are identified and automatically separated, and the system administrator is able to fully
understand why it was not classified. The administrator can then make an informed
decision on whether to modify an existing rule for the future, or create a new class for
content that has not been previously identified.
Why Semantic Intelligence is the best option.

Semantic technology is a way of processing and interpreting content that relies on a

variety of linguistic techniques/processing, including text mining, entity extraction,
concept analysis, natural language processing, categorization, normalizing, federating and
sentiment analysis. Semantic technology allows for the automatic comprehension of
words, sentences, paragraphs, and entire documents, and is able to understand the
meanings of words expressed in their proper context, no matter the number (singular or
plural), gender, verb tense or mode (indicative or imperative).

As opposed to keyword and statistical technologies that process content as data, semantic
technology is based on not just data, but the relationships between and the meaning of the
data. This ability to understand words in context is what makes automatic classification
possible, and what allows it to not only manage the chaos of our data, but optimize it for
even further analysis and intelligence.