Sie sind auf Seite 1von 5

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 71

Lexeme: An Ontology-Based Semantic


Advertising Networks
Lilac A. Al-Safadi, Aseel Al-Dawood, Nadeen Al-Abdullatif

Abstract— Lexeme is a prototype for advertising network that connects Web sites that want to host advertisements with
advertisers who want to run advertisements. Lexeme aims to implement better approaches for reaching and attracting target
customers by integrating semantic Web technology and enables computers to know what particular ads mean, to know what
particular Web sites are about, and to understand the relationships between them all. Advertising networks’ reliance on only the
keywords in the content results in displaying irrelevant and unappealing ads on the Web page. The Semantic-Based Advertising
Networks moves beyond simple keywords by understanding all the words on a page, and how they relate to one another. In
Lexeme, the description of ads and Web site content relies on the ontology that represents the conceptualization of the
knowledge domain. Advertiser defines the concept that corresponds to the product or services to sell in their ad along with
properties specifying the characteristics of the product and relationships with other concepts. The paper proposes a novel
approach for matching ads with Web site content using semantic Web technology, illustrated by Lexeme prototype.

Index Terms— Intelligent Web Services and Semantic Web, Ontology, Electronic commerce, Context Analysis and Indexing.

——————————  ——————————

1 INTRODUCTION

T HE escalation of the Internet as a consumer medium


in the 1990s is unprecedented in the history of media.
The Internet made its way into our homes and offices
stumble on appealing ads. Advertising networks can only
be effective when they are able to bring resources and
products that are of interest to the Web user. AdSense
faster than any other medium (or appliance), reaching (www.google.com/adsense) is one of the most famous ad
fifty-million users in only five years (it took radio thirty- networks. It presents ads based on tracing the content
six years to get to that point) [6]. The power of the Inter- around the page and matching keywords with adver-
net is especially evident in the business industry as mar- tisements. However, advertising networks' reliance on
keters and advertisers realize the significant financial re- only the keywords in the content without an accurate
wards the Web can offer. One of the goals of marketing is interpretation of the context of the page, results in dis-
contacting potential customers directly, without investing playing irrelevant and unappealing ads on the Web page
in a big ad or research budget. This is possible with the [14].
Internet. Rapid growth in the number of online venues
has led advertisers to market directly, as marketing and 1.2 Semantic Web
advertising campaigns are using the Web as an efficient Semantics is a field that studies the meaning of words,
pathway into our homes and lives. While banner ads on phrases, sentences, and larger units of discourse [15]. Se-
Web sites sometimes get no more than one percent re- mantic technologies enable semantic interoperability for
sponse (compared to five to seven percent for direct mail), IT systems with different data structures, formats, and
the cost is dramatically less. A Web site may cost $5 for vocabularies, without changing the core systems them-
each 1,000 people who view a marketing message, while selves [17]. The introduction of semantics into the adver-
direct mail could average $50 to reach the same 1,000 sets tising networks will solve some of the failed attempts at
of views [2]. Nevertheless, the basics of marketing con- reaching target audiences and increase the potential to
tinue to hold true. leverage existing information on the Web for far greater
benefits to advertisers.
1.1 Advertising Networks Unlike humans, computers do not possess a range of
Advertising networks or ad networks are companies that vocabulary understanding. In order to understand what
connect Web sites that want to host advertisements with words mean and what the relationships between words
advertisers who want to run advertisements. Neverthe- are, a computer has to have documents that describe all
less, it is a mistake for these Ad networks to think that the words and logic to make the necessary connections
users are willing to sift through Web pages before they [13]. In the semantic Web, these documents come from
ontologies and schemata. In terms of the Internet, an on-
———————————————— tology is a file that defines the relationships among a
 L. A. Al-Safadi, A. Al-Dawood, N. Al-Abdullatif are with the Department group of terms [8]. A schema is a method for organizing
of Information Technology, College of Computer & Information Sciences, information.
King Saud University.
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 72
1.3 The Relationship between the Semantic Web works in spending energy, time, and money pursuing the
and Advertising wrong prospects and marketing to the wrong channels.
In the early days of online advertising the targeting of The remaining of the paper is organized as follows. In
advertisements to different types of content pages was the next section we present the general model of our pro-
crude. Targeting was first done on a “run of network” posed semantic advertising network framework and in
(RON) basis [7]. In RON, the advertiser purchases banner section 3, we discuss how an ad content is entered into
inventory across an ad network's entire range of sites. the system and conceptualized based on the ontology of
RON is a type of non-targeting advertising. products’ domain as well as its analysis on the server
Target advertising is reaching the right person with the side. In section 4, we discuss the execution of the adver-
right message at the right time, which is the ultimate goal tise-publisher matching process and how advertises are
of advertising. The ability to deliver relevant messages to placed on the resulted publishers. The experimental re-
specific targets is among the significant and distinct sults and conclusions are given in sections 5 and 6 respec-
attributes of online advertising. The range of available tively.
online targeting options is vast and becoming increasing-
ly complex [18]. To apply target advertising, ad networks 2. THE PROPOSED MODEL
often aggregate sites into specific categories or demo-
graphic groups, then sell ad inventory to advertisers ei- Figure 1 below illustrates how Lexeme works. Lexeme
ther to specific sites within the specific categories or de- begins with publishers registering his/her Web site in
mographics. Lexeme. Lexeme analyses the publishers’ Web page, se-
Many online advertisers focus on contextual targeting. mantically index their content and stores them in its data-
Contextual advertising analyze the content of Web sites base. When an advertiser registers in Lexeme and upl-
and display relevant ads accordingly. By using contextual oads ads content, Lexeme selects an appropriate publish-
targeting, advertisers increase the probability that their er based on the semantic content of the uploaded ad and
ads will reach people who are in the market for their displays the ad on the relevant publishers’ Web sites. It
products. Contextual keyword-targeted advertising lets then tracks the number of clicks received from each visi-
advertisers select individual pages or articles based on tor of the publisher Web site and direct them to the adver-
keywords appearing in those pages or articles. Today, tiser’s Web site.
Google is the largest contextually targeted advertising
network in the world, with billions of pages served daily
across thousands of Web sites.
Some companies are beginning to understand the ad-
vantages of semantic technology for determining the con-
text and placements of ads: PEER39 [3], takes into account
the meaning of the entire Web page instead of portions of
it. It references a virtual database of potential meanings
and literal connections for the keyword. PEER39 does not
deal with RDFs or Ontologies. Instead, it implements the
idea of Natural Language Processing (NLP) and Machine
Learning which builds algorithms that simulate humans'
Fig. 1. An abstract description of Lexeme ad matching process
minds by allowing computers to process and understand
human languages, in order to achieve the desired seman- 2.1 Lexeme Architecture
tics. iSense [1] distinguished itself from natural language, Semantic advertising networks provide the capability to
algorithmic-based semantic classification systems by hav- match ads with relevant publishers on the basis of its
ing a team of 40 lexicographers and linguists assign meaning, or semantics. The core idea of Lexeme is to use
words from a dictionary to a framework of knowledge logical languages to make the structure and meaning of
categories. content explicit, and to attach this information directly to
This paper presents Lexeme, a Semantic Advertising the content, so that at run-time, automated procedures
Network prototype, specializing in educational ads. Lex- can determine whether and how to align content of both
eme is a context-based ad network that incorporates the ads and publishers’ Web sites.
Semantic Web technology enabling computers to know Figure 2 below illustrates Lexeme’s proposed architec-
what particular ads mean, to know what particular Web ture, which constitute of two main phases; the conceptua-
Pages are about, and to understand relationships between lization of the content of the ads and Web sites, and the
them all. This is achieved through; 1) making disparate match of the content of the ads and Web sites.
ads and Web sites content semantically interoperable so Conceptualization refers to the set of objects, concepts
that data can be retrieved and aligned automatically and and other entities that exists in a document in some area
on demand, and 2) providing techniques and tools to en- of interest together with the relationships that hold
able machines to intelligently search and match related among them. A conceptualization is like a group of con-
ads with publishers over their content. cepts related to a particular slice of reality (domain) [5].
The proposed prototype aims at overcoming the limi-
tations of the traditional keyword-based advertising net-
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 73
is the sub-tree representation of the term “Notebook” that
appeared in the advertisers Web site.

Fig. 2. A detailed description of the Architecture of Lexeme

Tagging data with describing metadata is a key strate-


gy for using semantic technology. In the conceptualiza- Fig. 3 (a). “Notebook” as a subclass of “Computer” in EAO
tion phase, the publisher tries to describe the context of
the Web site as a sequence of terms through the Web site
Registration module. A knowledge domain ontology [4,
10] is used to express the ad as a set of concepts asso-
ciated with their corresponding properties. Lexeme was
developed and tested using education advertising do-
main ontology (EAO) [9].
Before storing the Web site content, the RDF Extractor
Engine extracts the RDF model from the Web site and
converts it into an RDF graph. An RDF graph is a set of
RDF triples (subject, verb, and objects). Triples are the
basis of information representation. The Graph and the Fig. 3 (b). “Notebook” as a subclass of “Stationary” in EAO
Web site’s data are stored in Lexeme's database.
At the advertiser’s site, the advertiser creates his/her Figure 3 shows that the "Notebook" belongs to differ-
ad through the Ad Registration module. The Term Extrac- ent parent nodes “Stationary” and “Computer”, but share
tor is a semi automatic tool used to assist advertisers in the same antecedent "Product". Lexeme interactively
selecting the concepts that best describe his/her ad. The presents a list of options including one that consists in
selected concepts will be transformed to RDF syntax confirmation of sub-tree, as shown in figure 4.
(model) through the Concept Mapper. The RDF model
will be converted into RDF graph through the Ad Graph
Constructor module.
In the match phase, the Semantic Matcher matches the
ad’s RDF against the publishers’ RDF repository for the
most relevant Web site. Ad Placer associates and places
the ad in the most relevant Web sites.
Fig. 4. List of terms suggested by Lexeme for the advertiser

3. CONTENT CONCEPTUALIZATION For refinement, after choosing the parent concept the
In this section we explain in detail the content conceptua- system suggests a list of properties, expressed in a natural
lization process of both the ad and Web site. In our pro- language, which corresponds, here in our case, to the
posed model, advertiser describes the content of the ad "notebook" as “computer” product, namely: "is colored",
that relies on EAO ontology as a concept followed by a set "is priced for", "is dimensioned", "is manufactured by ", as
of attribute-value pairs, which is then interpreted and shown in figure 5. Advertiser selects a property and then
matched against different Web sites (publishers). When assigns to it, a specific value. In our example the price of
the advertiser registers, the system suggests a number of the product was set to "2000" and manufactured by
terms that may describe the ad depending on the content "Asus". That represents the value of the selected property
of the advertiser’s Web site and the ontology. In addition, "price" and the property "is manufactured by".
the advertiser may contribute in choosing a number of
terms that describes the content of the ad. For example,
in our case, “Notebook” term was supplied by the adver-
tiser.
Lexeme checks the EAO Ontology content and indenti-
fies the sub-tree in which these concepts belong to. Below
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 74
were asked to gather some samples that they considered
to be related to the educational domain. We then made
sure that they were 100% strict XHTML pages and then
manually marked them up with RDFa. Java-RDFa 0.3
was used to extract RDF from Web sites. Jena comes with
a number of built-in functions.
Among the set of publishers returned by Lexeme
Fig. 5. Properties of the selected concept where it displayed the ad on, we selected the result
shown in the below figure.
The formulated conceptualization of the publisher’s
Web site is sent to Lexeme server in charge of analyzing it
and performing an http request to Lexeme application
server. The server analyzes the RDF-based response rep-
lied from the application server and extracts the informa-
tion corresponding to the matching ad and direct it to the
ad spaces placed on publisher's Web site.

3.1 The Conceptual Representation


RDF and OWL are the core technologies of the semantic
Web and are recommend standard maintained by the
W3C. Lexeme ontology (EAO) is written with OWL lan- Fig. 7. Displaying the result of Lexeme match
guage and its inputs are described using RDF language.
OWL stands for Web Ontology Language. It is an ontolo-
gy language that represents one of the core elements in 5. EXPERIMENT RESULTS
the semantic Web [5]. RDF stands for Resource Descrip- Experiments were carried out using ten educational Web
tion Framework. It is a language for describing relation- sites. Table1 shows “Precision” for matching ads with
ships between resources using specific vocabularies. Fig- relevant Web sites using both keywords and semantics.
ure 6 shows the RDF code that have been generated by Precision was computed using standard Precision meas-
Lexeme for the matching process. ures. Precision = Correct System Answers / System An-
swers
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" TABLE 1. EXPERIMENTAL RESULTS FOR KEYWORD AND SE-
xmlns:eao="http://www.lexeme-ads.com/EAO.owl/"> MANTIC ADVERTISING NETWORKS PROTOTYPES
<rdf:Description
rdf:about="http://www.lexeme-ads.com/#Notebook"> Publisher
Keyword-based Semantic-based
<eao:hasPrice xml:lang="en">2000$</eao:hasPrice> Sample
<eao:hasManufacturer xml:lang="en">Asus</eao:hasManufacturer> Website 1 36 % 67 %
</rdf:Description > Website 2 60 % 25 %
</rdf:RDF >
Website 3 48 % 67 %
Website 4 37 % 75 %
Fig. 6. The RDF Representation of the previous example
Website 5 26 % 50 %
Website 6 68 % 100 %
4. ADS AND WEB SITES MATCH Website 7 29 % 75 %
Website 8 32 % 50 %
According to [16], every Web page on the Web that wants
Website 9 0% 100 %
to be part of the Web 3.0 era, must include a semantic Website 10 0% 50 %
mark (RDF labels) which is described in an OWL docu-
ment on the Internet. This condition stipulates that in or- The overall precision of textual description ontology
der for Lexeme to semantically match an ad content with retrieval is much better than keyword based search.
a Web page content, the publisher's Web page must have However, we find that ontology retrieval is still ham-
one of the following imbedded in its HTML, XHTML pered by the lack of information within the Web page, ad
code; or RDFa. Therefore, Lexeme asks its publishers to and ontology. For example, if no related concept and rela-
markup their pages with RDFa labels since it is a W3C tionship is extracted from the surrounding text, the gen-
recommendation. If the publisher is not familiar with how erated class of this ad is void. In addition, the overall per-
to markup his/her page, he/she can simply select the formance of keyword based match is faster than ontology
concepts that represent his/her Web page. based.
Lexeme uses Jena Semantic Web Framework 2.6.2 as its In addition, there are gaps between the ontology re-
semantic framework. The prototype that links both adver- sults and the optimal results in most cases. We presume it
tisers to publishers and vice versa was developed using could be due to one or more of these reasons: First the
Java 1.6 Software Development Kit (SDK) and NetBeans performance could be affected by the accuracy of the on-
IDE 6.8 Code-editing tool. In addition, a few experts
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 75
tology class classification or the completeness and size of [10] L. Bielski, "The Art and science of Web marketing", ABA Bamking Journal,
the ontology; Second, the case sensitivity of protégé en- vol. 100, no. 9, 2008.
tries; Third, the density and comprehensiveness of infor- [11] M. Marshall, "The semantic Web & its implications on search marketing,"
[Online]. Available: http//www.searchenginejournal.com/. [Accessed Oct.
mation within the Web page and the ad description; 31, 2009].
Fourth, the accuracy of rule-based engines and matcher [12] S. Mostefai and A. Bouras, “Merging Domain Ontologies for PLM”
we used. More study need to be conducted to study rea- unpublished.
soners that provide best matchmaking result. [13] T. Wilson, "How semantic Web works," [Online]. Available:
http//computer.howstuffworks.com. [Accessed Oct. 31, 2009].
[14] W. D. Wells, S. Moriarty, and J. Burnett, "Advertising: Principles and
6. CONCLUSION practice", 7th ed. New York: Prentice Hall, 2005.
[15] Wikipedia. [Online]. Available: http://www.wikipedia.com. [Accessed
In this paper we presented a prototype of a semantic ad- Oct. 31, 2009].
vertising network. This provides ad and Web site content [16] Y. Li, Y. Wang, and X. Huang, "A Relation-Based search engine in se-
match using semantic technology and ontology. The mantic web," IEEE Transactions On Knowledge And Data Engineering, vol. 19,
match guidance is based on the ontology conceptualizing no. 2, pp. 273 – 282, Feb. 2007.
the commercial products in the education knowledge [17] A. Cregan, “Overview of Semantic Technologies”, Handbook of Ontologies
domain. Indeed an ad represents set of concepts that for Business Interaction, 1-20 pp, 2008, ISBN10: 1599046601.
[18] M. Brown, “Targeting Online Ads: Aim for the Bulls-eye or Focus on
represents the names of the product and a series of pairs
Hitting the Target?” [Online]. Available:
of property/value that characterize precisely the match- http://www.dynamiclogic.com/na/research/whitepapers/docs/POV_Tar
ing Web site’s content. This feature remedy to the limita- geting_Mar09.pdf. [Accessed July. 30, 2010]
tions related to the low precision of keyword-based ad-
vertising network. Based on the conducted experiments,
the semantic advertising network shows an overall in-
crease in the precision of the advertise-publisher content-
based match compared to keyword-match in the specific
domain knowledge. In increase in precision is a very im-
portant guideline to measure the success of an advertising
campaign because marketing to the right customer means
spending less energy, less time, and less money compared
to pursuing the wrong prospects and marketing to the
wrong channels. However the performance is much
slower.

ACKNOWLEDGMENT
The authors wish to thank Arwa Al-Tameem, Ghada Ab-
uguyan, May Abu Melah, Nora Al-Zaid, Nouf Al-Najran
for developing the prototype.

REFERENCES
[1] "isense" [Online]. Available: http//www.isense.net/. [Accessed Nov. 19,
2009].
[2] "new ways to target your customer" 2006. [Online]. Available:
http://www.emarketer.com. [Accessed Nov. 3, 2009].
[3] "online ad targeting: engaging the audience" 2006. [Online]. Available:
http//www.emarketer.com. [Accessed Nov. 14, 2009].
[4] "Over 875 million consumers have shopped online—the number of Inter-
net shoppers up to 40% in two years", Jan. 28, 2008. [Online]. Available:
http://en-
us.neilsen.com/main/news/news_releases/2008/jan/over_875_million_co
nsumers. [Accessed Nov. 16, 2009].
[5] D. Allemang, and J. Hendler, "Semantic web for the working ontologist:
effective modeling in rdfs and owl". Burlington, Massachusetts: Morgan
Kaufmann, 2008.
[6] G. Scheider and J.Evans, "new perspectives on the internet", 7th ed. Bos-
ton: Course Technology, 2008.
[7] J. Plummer, S. Rappaport, T. Hall, and R. Barocci, "The Online Advertising
Playbook". Hoboken, New Jersey: Wiley, 2007.
[8] J. Strickland, "How Web 3.0 will work," [Online]. Available:
http//computer.howstuffworks.com. [Accessed Oct. 31, 2009].
[9] L. Al-Safadi, N. Abdulateef, "Educational Advertising Ontology: A Do-
main-Dependent Ontology for Semantic Advertising Networks". Jounal of
Computer Sciences 6(9), 2010.

Das könnte Ihnen auch gefallen