Sie sind auf Seite 1von 63

Bachelorarbeit

Integrating Semantic MediaWiki with Exhibit to Accelerate the Adoption of a Semantic Web
Fabian Howahl

Contents

Contents
1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 1.1.2 1.2 1.2.1 1.2.2 1.3 Crossing the Boundaries . . . . . . . . . . . . . . . . . . . . Increasing the Availability of Semantic Data . . . . . . . . . Data Exchange . . . . . . . . . . . . . . . . . . . . . . . . . Attraction of Casual Users . . . . . . . . . . . . . . . . . . . 4 4 5 6 8 8 9

Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 10 10

2 An Overview of the Two Components 2.1 2.1.1 2.1.2 2.2 2.2.1 2.2.2 2.2.3

Semantic MediaWiki . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 11 Leveraging Semantics . . . . . . . . . . . . . . . . . . . . . . 14 Feeding Exhibit with Data . . . . . . . . . . . . . . . . . . . 18 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Wibbit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 28

Exhibit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Integration of Exhibit and Semantic MediaWiki 3.1 3.1.1 3.1.2 3.1.3 3.1.4 3.1.5 3.1.6 3.2 3.2.1 3.2.2 3.2.3 3.2.4

Integrating Exhibit as Result Printer . . . . . . . . . . . . . . . . . 29 Challenges and Goals . . . . . . . . . . . . . . . . . . . . . . 30 Structure of Result Printers . . . . . . . . . . . . . . . . . . 30 Wibbit as Basis of an Exhibit Result Printer . . . . . . . . . 33 Query Design . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 35 Advanced Features . . . . . . . . . . . . . . . . . . . . . . . 37 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Design of a JSON Exporter . . . . . . . . . . . . . . . . . . 49

Enabling Exhibit to Fetch Data from Remote Wikis . . . . . . . . . 44

Contents

3.3

3.2.5 Implementation . . . . . . . . . . . . . . . Using Exhibit to Enable Data Exchange Between aWikis . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Design of Remote Queries . . . . . . . . . 3.3.2 Implementation . . . . . . . . . . . . . . . 3.3.3 Advanced Features . . . . . . . . . . . . .

. . . . . . . . . Semantic Medi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 52 . . . . 52 53 53 54 55 56 56 56 57

4 Discussion and Outlook 4.1 Visualization of Query Results . . . . . . . . 4.2 Use of Wiki Data in Other Web Applications 4.3 Data Exchange . . . . . . . . . . . . . . . . 4.4 Outlook . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

Contents

Abstract
This thesis promotes the integration of Semantic MediaWiki, a wiki engine for the collaborative management of structured data, and Exhibit, a rich visualization framework. The thesis argues that the integration of Semantic MediaWiki and Exhibit makes a contribution to the mitigation of two known problems causing the delayed adoption of a semantic web: the limited availability of semantic data on the web and the fact that data is rarely shared and reused block the emergence of a semantic web. The integration opposes these issues by attracting more casual users to face up with semantic technologies and enabling a convenient method of data exchange between two wikis. The integration is organized into three steps. The rst step establishes Exhibit as new interface for viewing query results in Semantic MediaWiki. Exhibits visualization widgets (e.g. maps and timelines) make wiki content more meaningful and clear users from analytic tasks. With features such as faceted browsing, sorting and aggregation of wiki content Exhibit helps subduing high information loads wikis tend to show. Next the design and implementation of a JSON exporter for Semantic MediaWiki is tackled to make data of a Semantic MediaWiki available throughout the web. By this means wiki data nds its way into other web applications such as blogs or content management systems. Finally, the previous outcomes are utilized to enable data exchange among Semantic MediaWikis.

INTRODUCTION

1 Introduction
1.1 Motivation
The vision of a semantic web continues to keep researchers and practitioners busy. As a cornerstone of this vision, Tim Berners-Lee, who is renowned for the invention of the World Wide Web, has speculated on what a future semantic web could look like in a journal article [1]. Even though the emergence of such ideas dates back to the beginning of this century, essential elements of the vision remain unimplemented [2]. The denition given by the World Wide Web Consortium (W3C) [3] is a suitable point of reference for outlining Berners-Lees ideas and related construction sites: The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. While analyzing this short denition, and also latest publications such as Six Challenges for the Semantic Web [4], two urgent reasons for the delayed adoption of the semantic web can be identied: The amount of available semantic data on the web is still limited Data is not shared and reused enough, particularly across boundaries This thesis considers the integration of Semantic MediaWiki (SMW) and Exhibit to be a part of the solution to these shortcomings. Semantic MediaWiki is a wiki engine that enables users to create, manage and leverage structured content, whereas Exhibits expertise lies in the visualization of structured content. The integration of these components yields the following two contributions that help mitigate the described shortcomings: It enables convenient data exchange among wikis to push reuse and sharing of information

INTRODUCTION

It advances the attraction of casual users, who are encouraged to create semantic content and subsequently increase the availability of semantic data on the web The following two sections explain the shortcomings and the pursued approaches to x them in more detail.

1.1.1 Crossing the Boundaries Although the idea of a semantic web places much emphasis on data exchange and coalescence across boundaries, small semantic islands have emerged. These islands are populated by rather small communities sharing a special domain of interest. Interaction between these islands rarely happens, not to mention data exchange. The Status Quo of Data Exchange among Wikis This thesis focuses on specic islands: semantic wikis, more precisely on semantic wikis powered by the Semantic MediaWiki software. Each installation of a semantic wiki is operated by a dierent community that maintains an individual, semantically enriched knowledge base. Indeed wiki software attempts to stimulate the data exchange among wikis by oering certain features, but the procedure that is performed to exchange data turns out to be fairly tedious. The exchange relies on le imports and exports. The user has to trigger data imports and exports in the involved wikis manually and is required to buer the resulting les on his computer before they get into the target wiki. Thus it is dicult to keep the data of multiple wikis synchronized, as imports and exports have to be initiated again and again. Exhibits contributions This thesis regards the application Exhibit, a lightweight framework for rich visualizations, as an enabling technology for a more sophisticated way of exchanging data among wikis. Exhibit can equip Semantic MediaWiki with new capabilities of obtaining and processing data. It enables SMW to query and fetch data from remote wikis on the y. Besides, Exhibit actualizes data exchange among SMWs and other web applications such as content management systems or blogs.

INTRODUCTION

1.1.2 Increasing the Availability of Semantic Data Another challenge lies in the fact that the recently introduced semantic islands are not yet populated by a wide range of web users. Casual users especially stay away from these islands and thus should be highlighted. Casual users use the web for recreational rather than professional activities. They take a proactive interest in exploiting the web for their purposes, but they are limited in their actions due to a lack of technical understanding. They were always a critical factor for the advancement of the World Wide Web in the past development cycles. They provided data through personal websites in early stages of the web [5]. Today casual users spread their knowledge through wikis and blogs in an even higher frequency. The entire Web 2.0 relies on the activity of casual users, since it is driven by user-generated content [6]. The success of the semantic web also depends on user contribution. Most of the data on the web has to be semantically enriched and thus machine processable to form a semantic web with all its possible advantages. The provision of such content is not a task only big publishers are expected to fulll. To create large semantic data amounts, casual users have to assist as well. That is why it is necessary to get casual users onboard and encourage them to create such data. Two basic requirements can be identied to attract casual users [7]: Participation has to be simple Results have to be visible and seizable immediately The Status Quo of Wiki Usability Semantic wikis, especially Semantic MediaWiki, already do a good job at simplifying the participation in the semantic web. They promote the use of wikis as foundation for the storage of semantic data. As this approach links advantages of social software to the semantic web, a semantic wiki enables users to create structured data easily in a community-oriented process. The community eect does its part in speeding up the availability of new data. Although end users have to be familiar with basic concepts of structuring data, this model can be considered successful, since it hides most of the complicated technologies of the semantic web that tend to overburden users [5] and even researchers and practitioners [8]. The fast-growing number of personal, non-commercial wikis on the web as well

INTRODUCTION

as a user study [9] suggest the easy learnability of wikitext, which is the markup language for wiki articles. Wikitext is only plain text with certain mark-up insertions. It aspires to simplifying HTML [10], but the language is still evolving and continuously develops new functions that exceed the capabilities of HTML. Obviously, SMW is well-prepared for casual users. However, it is hard to convince a casual user of the benet of providing semantically enriched data instead of ordinary textual data. In comparison, the creation of an HTML-le takes a few minutes and the results can be reviewed in a web browser promptly. The creator has control over the design of the page and can set free his creativity. But what does the creator of structured data get? In the case of Semantic MediaWiki, a couple of ways to leverage semantics exist [11]: Querying data Browsing data Data export (RDF, RSS ...) It is questionable which of these features are appealing to casual users, though. This thesis therefore aims at increasing the usability of these features for casual users. The subsequent section depicts how Exhibit and its ecosystem make a contribution here. Exhibits contributions Incontrovertibly, the possibility of querying stored knowledge is one of the most valuable functionalities a semantic wiki has to oer. As it is arguable whether casual users can handle queries [12], SMW relies on an easy query language based on wikitext to give even inexperienced users an understanding of querying. Hence, this feature has the potential to be more pitched to casual users. Exhibit can assist here. It provides an incentive to create and query structured data for building customizable visualizations users only know from big commercial websites such as news or shopping sites. To put the user in the position to create and consume these visualizations, this thesis pursues the integration of Exhibit as interface for viewing query results. At the same time Exhibit is able to improve the browsing experience of such query results. As semantic data is prepared for the consumption by machines, it

INTRODUCTION

is hardly consumable by human beings. As long as there are no agents on the web acting as a proxy for doing human business, there have to be other ways to consume the information. Albeit Semantic MediaWiki does its best at guiding the user through the data volumes, there is space for improvements. Exhibits features such as faceted browsing, sorting and aggregation of content enable a clearly arranged presentation of structured data that puts the user in control of large data volumes. SMWs current possibilities of exporting data only prove to be useful if there are applications that process such data (e.g. XML/RDF). At the moment casual users tend to refrain from using such applications (e.g. ontology editors), but there are additional formats for structured data being more valuable to casual users. One of these formats is JSON, which attained success, because various web services adopted it as input or output format (e.g. Google and Flickr). It makes sense to add JSON to the formats SMW can handle and export. Casual users can feed remote web pages that rely on Exhibit with data then. By this means they can incorporate data of an SMW in blog entries or personal websites.

1.2 Related Work


This section gives a review of existing projects that target similar achievements. Particularly, this section sets the achievements of this thesis apart from existing solutions.

1.2.1 Data Exchange In terms of data exchange among wikis, a couple of solutions already exist. Most of them pursue a le-based approach as the export and import functions of SMW do. The data of a wiki is exported as a le in a specic format and is imported to the target wiki again. The author of [13] makes proposals on what a unied le format for wiki data exports should look like. Nevertheless, manual le transfers make data exchange tedious and slow. The goal of this thesis is to allow a convenient exchange of data on the y. Popular application programming interfaces such

INTRODUCTION

as the Amazon1 or Google Maps2 API yet allow to query and reuse data easily within other web applications without demanding the user to interfere. This thesis attempts to make fetching data from SMWs as easy as getting data from Google Maps or Amazon. 1.2.2 Attraction of Casual Users This section introduces other semantic repositories and elaborates on their attraction of casual users. Wiki Systems Semantic MediaWiki competes with further semantic wiki systems. The unique selling point of other semantic wiki systems is the way structured data is authored. ACEWiki [14] is able to interpret a subset of English. The semantics are extracted from simple English sentences. The authors of IkeWiki [15] promote a WYSIWYG (What You See Is What You Get) editor to create and edit content. The user study Are wikis usable [9] suggests that wikitext can be learned quickly. Therefore this thesis sticks with and even extends wikitext to prepare it for controlling Exhibits visualizations and data exchange. Some extensions to SMW also attempt to upgrade the usability. Halo3 is one example. With features like auto completion and an ontology browser, the gardening and browsing of data turns out to be more user-friendly. However, the extension is not designed for casual users but intended for commercial use in a business environment. The Semantic Forms4 extension introduces a form-based way to edit wiki articles and add semantics to them. Additionally, many other MediaWiki extensions tackle the simplication of creating and editing data by applying the WYSIWYG paradigm. Other Repositories In 2007 Metaweb launched an online database called Freebase5 . The database contains structured data, which can be contributed by any user. Freebase can be
1 2

http://aws.amazon.com/ http://code.google.com/intl/en-en/apis/maps/ 3 http://semanticweb.org/wiki/Project_Halo 4 http://www.mediawiki.org/wiki/Extension:Semantic_Forms 5 http://freebase.com

AN OVERVIEW OF THE TWO COMPONENTS

thought of as a large Semantic MediaWiki that holds data of every kind. Interestingly, the operators of freebase tweaked the browsing interface recently: With Parallax6 it is possible to explore freebase with the aid of Exhibit. Another repository of user-authored structured data is Many Eyes [16]. It allows users to upload data tables and visualize this data right on the website. The visualizations Parallax and Many Eyes create are tied to one repository and displayed on one single website. The solution proposed in this thesis enables the visualization of data of dierent SMWs in dierent web applications (e.g. wikis, blogs and web sites).

1.3 Structure of the Thesis


The present thesis is structured by means of the essential steps to accomplish an integration of Exhibit and Semantic MediaWiki. In Chapter 2 both components are introduced before the integration is tackled. The integration itself just like the third chapter about the integration is organized into three parts. The rst part describes how Exhibit can be used to visualize query results in SMW. The second part presents how to enable Exhibit to fetch data from remote wikis. Finally, ways of exchanging data between two SMWs are investigated. The thesis is closed by an outlook summarizing eects of the integration and expectations for the future.

2 An Overview of the Two Components


This chapter introduces Semantic MediaWiki and Exhibit. Additionally, it gives a sense of creating value with these applications.

2.1 Semantic MediaWiki


For many internet users Wikipedia is the one-stop resource for meeting their information needs. People browse Wikipedia articles for useful information that is pertinent to various daily situations. In most cases the retrieval of one article is sucient to receive a suitable answer. For instance, to spot the population gure of Boston, a reading of the rst few lines of the article about Boston leads
6

http://mqlx.com/david/parallax/

10

AN OVERVIEW OF THE TWO COMPONENTS

to success. More complicated information needs may require a more ambitious search for information that may be scattered across several articles. To nd all American cities with a population greater than 500,000 people, we have to check every article dealing with an American city for the population gure. Since such a task is time-consuming, it is advisable to seek the assistance of computers here: Processing large data quantities is a task computers are naturally good at. However, computers have trouble scanning human-authored texts for information, as they are not able to process natural language on a semantic level. One approach to give computers an understanding of the information a wiki article contains is the attachment of semantic annotations. Annotations enhance wiki articles with structured knowledge. Subsequently, annotated articles hold cues the computer is capable to exploit. Semantic MediaWiki adopts this approach. Semantic MediaWiki is an extension of the popular MediaWiki7 software, which is famous for powering Wikipedia. It enables the enrichment of articles by adding semantic annotations. Whereas MediaWiki yet oers a few concepts such as namespaces and categories to describe and classify wiki pages to make it meaningful to computers, SMW advances this approach and gives the wiki content a new signicance. It considers wiki pages as entities. Semantic annotations set up relationships between these entities by using properties and the existing category system. Semantic MediaWiki is licensed under the GPL and therefore freely distributable. The home of the Semantic MediaWiki software is http: //semantic-mediawiki.org. It hosts the main documentation, which is also, along with Semantic Wikipedia [11], the source of the following information.

2.1.1 Basic Concepts Before we can take advantage of semantically enriched content, we have to become acquainted with SMWs concepts and the annotation procedure. The following sections introduce the syntax and the semantics of SMWs basic constructs.

http://www.mediawiki.org/

11

AN OVERVIEW OF THE TWO COMPONENTS

Semantic Annotations Instead of applying RDF/XML8 syntax, which tends to overburden most end users, Semantic MediaWiki attempts to hide complicated knowledge representation from users as much as possible. A simple syntax is benecial, because a complicated syntax tends to stunt the quality of annotations [18]. The syntax used for annotations in an SMW seamlessly ts the structure of wikitext. Thus a wiki author can rely on his existing wikitext knowledge and the required adjustment is reduced to a minimum. The following example covers a wiki page about Boston as it could be found on Wikipedia. Two semantic annotations are embedded in the continuous text, though. Boston is located in the [[located in::USA]]. It was settled in [[settled::1630]]. The expression [[located in::USA]], which resembles a usual wikitext link, is such an annotation. It assigns the value USA to the Property located in. Whereas a normal link in wikitext (e.g. [[USA]]) only establishes a relationship between the articles Boston and USA, the typed link [[located in::USA]] describes this relationship in more detail. As [[settled::1630]] indicates, the value of a Property is not necessarily a wiki article. Property values can also be xed values that are not represented by wiki pages. The concept of setting up and describing a relationship between two entities is often referred to as semantic glue [19]. While reading the text on the wiki page, the user will not note the dierence between a normal wiki link and a typed link. The semantic enhancements are only revealed to the user by viewing the wikitext of the page. This annotation syntax oers an additional advantage. In most cases the annotation of textual information causes redundant data and additional overhead [20]. For instance, an HTML le that is enriched with RDF contains the same information twice. In the HTML part of the le the information is formatted for being
8

RDF (Ressource Description Framework) is a common language to represent knowledge on the web. RDF is often specied by using an XML (eXtended Markup Language) syntax. For further information, please consult [17]

12

AN OVERVIEW OF THE TWO COMPONENTS

consumed by human beings, whereas the RDF part makes the same information available to machines. Besides, certain markup is required for specifying RDF. In contrast SMW reduces the necessary overhead as well as redundant information to a minimum. Links are an essential part of wiki pages for navigation purposes. By adding a short description to the links, the overhead is minimal. The traditional two-step process of entering data and annotating information (see [21]) melts down to one step. Authoring and annotating of an article take place at the same time. As a result, the overall time spent on the creation of annotations is less compared to traditional ways. The uncomplicated and less time-consuming way of annotating encourages even inexperienced (casual) users to rene entered information. Next we take a closer look at the description that is added to the common wiki links. SMW calls this description Property, which will be the subject of the next section. Properties We learned that Properties are the appropriate instrument to set up and describe a relationship between two articles or an article and a xed value. Properties, like all core concepts of SMW, are represented by usual wiki pages. These wiki pages reside in the namespace Property. Property pages include textual explanations along with special semantic annotations to characterize the Property. The Property name can be arbitrarily chosen. Please note that SMW reserves certain Property names for several purposes being introduced on occasion throughout this thesis. The rst special Property will be introduced instantaneously in the next section. Datatypes We can add the special Property has type to Property pages to dene the datatype of a Property value. The default datatype is wiki page. The type wiki page points towards the fact that the value is a further wiki article. Since the value USA in our example is represented by an wiki article, the article Property:located in is not expected to hold a has type annotation, since wiki page is the default type. To declare a value as xed, SMW comes up with various built-in datatypes such as numbers, dates or strings to characterize a value. For instance,

13

AN OVERVIEW OF THE TWO COMPONENTS

it makes sense to add [[has type::date]] to the article Property:settled, since settled seems to describe a point in time. Information about datatypes is fairly valuable to SMW, since datatypes have an eect on operations on values (e.g. sorting) and the appearance of values. 2.1.2 Leveraging Semantics After having introduced the methods to create structured data, it is a fair question how we can benet from the invested time on adding annotations (albeit it was little). Whereas the introduction only listed all important features to exploit structured content, the following sections describe these features more explicit. Browsing & Searching In case the user sets a certain ag in the SMW conguration le, a so-called Factbox (see Figure 1) appears at the bottom of every wiki article. The Factbox summarizes the semantic annotations a page incorporates. Additionally, it features numerous functions to browse the wiki content. When we click on a Property name, we are directed to the corresponding page in the Property namespace. A click on one of the magnifying glasses enables us to search for certain Property values. Beside the Factbox, we can take advantage of various special pages, which facilitate

Figure 1: A Factbox as it would appear on a page about Boston

browsing of stored data. For instance, the Special:Browse page holds an interface to traverse the wiki articles by means of their relationships. For data retrieval purposes the two pages Special:SearchByProperty and Special:PageProperty

14

AN OVERVIEW OF THE TWO COMPONENTS

can be considered helpful. Queries After having maintained a great number of semantic annotations, a remaining issue is the retrieval of information we added to the wiki articles. As lists play an important part here, the next paragraph illustrates the relevance of lists to the wiki world. The answer to a frequently occurring type of questions can be given by lists. This special type of questions is characterized by the need of compiling information from dierent wiki pages and comparing this information among themselves. This includes questions like: What are the ve biggest cities in the United States with a female mayor? In exceptional cases the Wikipedia authors already maintain a list holding the answer to our question. Indeed, Wikipedia contains a list of cities in the United States ordered by their population, but the requirement of having a female mayor is not taken into account. One has to create a further list to meet the demand of this extraordinary question. As one is expected to create such lists manually, the compilation costs time. Therefore, only a few lists answering generic questions are available on Wikipedia. Semantic MediaWiki can speed up the creation of lists: it features the automatic generation of lists based on the data collected from annotations. At a glance, the machinable compilation of lists yields the following advantages: Lists gather information scattered across the wiki Automatically created lists ensure that the gathered data never gets outdated The user can inuence how the information is arranged and displayed The instrument to generate such lists and thus exploit the itemized advantages is a query. SMW distinguishes between normal queries entered on the Special:Ask page and inline queries that embed the query results in a wiki article. The Special:Ask page enables a form-based specication of a query. Within this thesis the concept of inline querying is preferred in most cases. So it is important to get acquainted with the inline query syntax in advance. The following inline query gathers all pages describing an entity, presumably cities, that is located in the

15

AN OVERVIEW OF THE TWO COMPONENTS

United States on one wiki page. This query yields a list that is shown in Figure 2 (it is more a table than a list). {{#ask: [[located in::USA]] | ?state | ?population | ?settled | mainlabel=City}}

Figure 2: Table of query results

As every function call in wikitext, the query string starts with the name of the function (#ask:). The function name is followed by the parameters we want to pass to the function. Multiple parameters are separated by pipes. The rst parameter indicates the wiki pages to be selected. Basically, we ask for wiki pages with the given annotation. In order to expand the result set, comparators (e.g. [[population>=500,000]]) as well as disjunctions (OR operator) are permitted. We can rene the query with the help of further parameters to inuence the output format. Parameters with a prepended question mark such as ?state are called print requests and tell the engine which additional Property values of the selected wiki pages are required to be part of the result set. In the example the Properties state, population and settled are selected and therefore appear as columns in Figure 2. Additionally, we can dene parameter-value pairs.

16

AN OVERVIEW OF THE TWO COMPONENTS

Mainlabel=city denes the name of the rst column of the table in Figure 2, for instance. Many of the accepted parameter-value pairs (e.g. limit, order or sort) have counterparts in other query languages such as SQL or SPARQL. One parameter of particular interest is format. Format denes the result printer that is used for output generation. The next paragraph explains the purpose of result printers. Semantic MediaWiki lets users inuence the presentation of the query result output to a certain extent. By adding a parameter to a query, we can dene the result format of the selected data: {{#ask: [[located in::USA]] | ... | format=table}} By default SMW already supports basic formats such as tables (see Figure 2) and lists. The extension Semantic Result Formats9 brings even more sophisticated formats (e.g. timelines and maps) to SMW. Result printers will play a dominant part in the integration of SMW and Exhibit and will be therefore explained once more in Section 3.1.2. Data Import and Exports Semantic MediaWiki is able to export the wiki content in dierent formats. The most prominent format is OWL/RDF. Each Factbox holds a link that triggers the RDF export of the semantic annotations on the page. Furthermore, we can use the form on the Special:ExportRDF wiki page to export a collection of wiki pages. With the help of a console-based dump script we even can export all annotations in the wiki. On the other hand, the import of foreign vocabulary like FOAF10 is feasible.

2.2 Exhibit
In its early stages, the World Wide Web was only a repository of plain HTML pages. Today, most of the data on the web is still encoded in HTML, but with the
9 10

http://semantic-mediawiki.org/wiki/SRF http://www.foaf.org

17

AN OVERVIEW OF THE TWO COMPONENTS

aid of technologies like PHP, CGI and JavaScript websites can adjust to the users preferences and recreate the look-and-feel of desktop applications. As a matter of fact, the content of the web is more dynamic and interactive today. To exploit these advantages, one has to spend a great deal of time learning the handling of these rapidly changing languages and frameworks. In the early years of the web, the ability to run sophisticated, eye-catching websites was the exclusive province of big publishers, who could aord to employ highly skilled web experts. Today, lots of frameworks enable more inexperienced users to leverage possibilities latest technologies oer. However, the entry hurdles have not completely disappeared yet, although Exhibit lowers the entry barrier farther. Exhibit is a JavaScript framework. It is part of the SIMILE (Semantic Interoperability of Metadata and Information in unLike Environments)11 tool suite and was developed in the course of a doctoral thesis [7]. Most Exhibit-related documentation the following introduction is based on is compiled in the SIMILE wiki12 . Within this thesis, a webpage using Exhibit will be adressed as an exhibit in lower case. Exhibit creates rich data visualizations from structured data sources without demanding deep knowledge of advanced web technologies like PHP or JavaScript. A little HTML experience is sucient. A complicated installation or setup-routine is not necessary either. The purchase and usage of additional software is obsolete as well (a simple text editor is adequate). Exhibit, unlike HTML, separates the management of data from its presentation. The thesis will describe Exhibits way to obtain input data for visualizations at rst. Thereafter the presentation features will be outlined. 2.2.1 Feeding Exhibit with Data Natively Exhibit gets the data it is supposed to visualize from JSON les. Besides, Exhibit is able to extract data from many other sources such as Google Spreadsheets13 or BibTeX14 les. A web service15 for converting formats such as RDF
11 12

http://simile.mit.edu http://simile.mit.edu/exhibit 13 http://docs.google.com/ 14 http://www.bibtex.org 15 http://simile.mit.edu/babel/

18

AN OVERVIEW OF THE TWO COMPONENTS

or CSV (Comma-Separated Values) to JSON is available as well. The variety of options to feed Exhibit with data yields the advantage that Exhibit does not force the user to manage the data in a certain way or with a specic application. The user is at liberty to use a rudimentary text editor for creating a JSON le or a more sophisticated application like a spreadsheet engine. Within this work the focus lies on JSON as Exhibit uses it internally and as basis of the mentioned ways to import data. Furthermore, it plays an important part in the integration of SMW and Exhibit. To comprehend Exhibits own JSON format, it is helpful to get a general idea of JSON at rst. Javascript Object Notation (JSON) The JavaScript Object Notation (JSON) is a leightweight data exchange format. It is based on a subset of the JavaScript language and serializes structured data. [22] The detailed specication of JSON can be found in [23]. The fact that the acronym JSON includes JavaScript does not indicate it just works in JavaScript environments. A large number of programming languages evaluate JSON by mapping the JSON structures to the corresponding concepts such as arrays or objects. Let us take a closer look at the JSON structures and their equivalents in other programming languages. JSON only knows two types of structures: ordered lists of values collections of name-value pairs An ordered list is identical to an array. In JSON the dierent values of an array are separated by commas and embraced by square brackets:
1

[value1,value2,...] A collection can be regarded as an object. It is characterized by name-value pairs. In many programming languages an object is represented by an associative array in which the name lls the role of an array key. A collection is embraced by curly brackets. Names as well as values are quoted. Commas separate multiple name-value pairs:

19

AN OVERVIEW OF THE TWO COMPONENTS

2 3 4 5

"name1": "value1", "name2": value2", ... } Throughout the thesis we shall address these two JSON structures as arrays and objects. Of course, these structures can be nested. For instance, an array can have objects as values. Exhibits own JSON format Exhibit expects a data source le to specify certain nested JSON arrays and objects. By means of Listing 1 we explore the required JSON structures. The JSON le denes an object that consists of three name-value pairs. The rst two pairs describe the schema of the data that is passed to Exhibit, whereas the last name holds the data itself. As the rst two names are not mandatory, the following explanation starts with the latter name. The value of the name items is an array (line 14-25). The array consists of dierent objects that represent Items. Each Item is comparable to a single database record or one row of a database table. An Item is represented by a JSON object and therefore characterized by name-value pairs. In Exhibits terms the names of an Item are called Properties. A Property can have multiple values which then are enumerated in an array (cmp. the Property Borders in line 23 in Listing 1). The Property label (line 16) is mandatory and can be considered the name or title of an Item. To identify an Item unambiguously, it has an unique ID (line 17). In case the ID is not set, the label is regarded as the unique ID. Although the two names properties and types are optional, their specication is highly recommended to let Exhibit know what exactly it is about to visualize. For example, a couple of Exhibits features relies on information about datatypes. In terms of sorting it makes a dierence whether a Property holds a numeric or textual value. The value of types is an object that declares types an Item can have. By adding the special Property type (see line 18) to an Item, it is possible to assign these types. For instance, the Item Boston in the sample le is of the type City. Since Exhibit displays the number of displayed Items it has to be aware of the

20

AN OVERVIEW OF THE TWO COMPONENTS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

{ types: { City: { pluralLabel: Cities } }, properties:{ "City" : { valueType: "item" }, "Population" : { valueType: "number" }, "State" : { valueType: "item" }, "Settled" : { valueType: "number" }, "Geo" : { valueType: "item" } }, items: [ { "label": "Boston", "ID": "Boston", "type": "City", "Population": "590763", "State": "Massachusetts", "Settled": "1630", "Geo": "1221,2121", "Borders": ["Cambridge","Somerville"] } ] } Listing 1: An Exhibit JSON File

21

AN OVERVIEW OF THE TWO COMPONENTS

plural of the type (line 4). The object held by properties denes dierent Properties that may be used for describing Items. The sample le describes a city by its population gure (Population), the state it is located in (State), when it was settled (Settled) and a geographic coordinate (Geo). Similiar to Semantic MediaWiki, each value of these Properties has a valueType (see line 8). Exhibit supports the types number, url, text and date. Moreover, the value can be a further Item, which is also the default type. The declaration of a Property with the valueType Item (see line 8) sets up a relationship between two dierent Items. Navigation in the Data Modell As the examined JSON le indicates, Exhibit relies on a graph-based data model. Therefore it comes up with expressions to navigate through the nodes and edges of the graph. These expressions are deployed for addressing Property values. Throughout the thesis we shall address these expressions as Exhibit expressions. We use Exhibit expressions to congure the presentation of the data in the next section. As shown in Figure 3, Items as well as values ll the role of nodes in the graph, whereas Properties describe the edges. To tell Exhibit to use values of a certain Property, the name of the Property has to be prepended by a dot. To select the values of number-of-copies-sold of an Item, we have to use the expression .number-of-copies-sold. To traverse the graph vice versa an exclamation mark has to stand in front of the Property. It is possible to concatenate the expressions by simply stringing them together (e.g. !number-of-copies-sold.author.label).

2.2.2 Presentation Beside the introduced JSON le, an additional HTML le is required to congure the presentation of the JSON data. This section gives a short review of HTML constructs we can use to congure an exhibit. The list of conguration options is not complete. The reader is encouraged to consult the SIMILE wiki for a complete list of parameters. Figure 4 shows most HTML constructs in action.

22

AN OVERVIEW OF THE TWO COMPONENTS

Figure 3: Exhibits graph-based data model

Figure 4: Exhibit visualizes the former US presidents by means of a map and a timeline

23

AN OVERVIEW OF THE TWO COMPONENTS

Basics To take advantage of Exhibits features we have to include the Exhibit Application Programming Interface (API) in the web page we are about to create. The following code has to be added to the HTML header of the page: <script src="http://static.simile.mit.edu/exhibit/api-2.0/exhibitapi.js?autoCreate=true" type="text/javascript"/> If we want to use special views or features such as maps and timelines, the inclusion of further script tags may be necessary. In some cases we want to append parameters to the URL pointing to the Exhibit API. For example, by adding ?autoCreate=false to the URL the default start-up procedure Exhibit performs is deactivated. Since Exhibit is a javascript framework, rendering is done on the client side. Server-side actions are not designated. The server is exclusively expected to deliver the code of the referenced JavaScript les. To supply Exhibit with data, we need to invoke a JSON source by adding a link tag to the HTML header:
1 2 3 4

1 2

<link href="cities.js" type="application/json" rel="exhibit/data" /> It is possible to include more than one data source. In this case Exhibit tries to merge the data then. An use-case harnessing this functionality is illustrated in Section 3.3.3. Views We can visualize data with the aid of views. The default view is the tiles view, which is basically a list (see Figure 5). To make data more meaningful, we can utilize advanced views such as timelines and maps. Who would realize that all presidents of the United States except one were born in cities to the east of New

24

AN OVERVIEW OF THE TWO COMPONENTS

Figure 5: Exhibits default view

Mexico, Wyoming and Montana by just analyzing the listed data? A view showing the birth places by means of a map conveys this information at the rst sight16 .
1 2 3 4 5

<div ex:role="view" ex:viewClass="Tabular" ex:columns=.label,.party,.religion ex:sortColumn=.label/> Listing 2: Tag adding a Table View To activate a view we embed a marker (see Listing 2) in the section of the HTML body where we want the view to appear. This marker is a div environment with certain attributes. The ex:role attribute indicates the purpose of the marker: it is a placeholder for a view. The ex:view attribute holds the name of the desired view (e.g. Tabular, Timeline, Map). The remaining attributes are specic to the dierent views. For instance, the tabular view expects Exhibit expressions to constitute the arrangement of the columns (ex:columns on line 4) and the
16

for a live experience visit http://simile.mit.edu/exhibit/examples/presidents/ presidents-2.html or see Figure 4 for a screenshot

25

AN OVERVIEW OF THE TWO COMPONENTS

order of the table entries (ex:sortColumns in line 5). We can place dierent views in various sections of the page. Furthermore, we can insert a view panel to gather multiple views on a single spot. The view panel embeds controls to switch between the views. The exhibit in Figure 4 enables the user to switch between dierent map views: one shows the presidents birth places, whereas the other shows the death places. Independent of the selected view, Items are clickable links. After clicking on an Item a bubble with further information such as Properties and values appear (see Figure 6). Facets Facets aggregate values by their occurrence [24]. The facets in Figure 4 lter the presidents by their religion or political party. By clicking on one or multiple values, only the presidents the selected values apply to are displayed, whereas the remaining presidents vanish. The reduction of displayed Items lessens the information load. Thus facets provide an eective way to lter out information that is not of importance or out of scope. Casual users know faceted browsing from commercial sites. For example, amazon employs facets for structuring product categories to enhance the facility of inspection. To stick with the recent example of presidents the following markup reserves space for a facet of the party Property:

1 2

<div ex:role="facet" ex:expression=".party" /> To select which values a facet is supposed to aggregate, we have to assign an Exhibit expression to the ex:expression attribute. Special facets for numeric values are available as well. They embrace the values in customized intervals. Lenses The user can gain control over the output beyond merely choosing a view. We can inuence the rendition of single Items with the aid of a lens. How a single Item is rendered depends on the view. For instance, the tiles view (see Figure 5) creates an uniform table of Property-value pairs for each Item. In the timeline and map view a bubble (see Figure 6), which shows up when the user clicks on the

26

AN OVERVIEW OF THE TWO COMPONENTS

Figure 6: This bubble shows up on click of a presidents head

Item, holds the rendition. A lens is an HTML template that replaces the default rendition. We can specify a lens as follows:

1 2 3 4

<div ex:role="lens" ex:itemTypes="Presidents" > <div ex:content=".label"></div> <img ex:src-content=".imageURL" /> <a ex:href-subcontent="http://presidentswebsites.gov /\{\{.label\}\}">link</a> </div>

A div environment embraces the HTML snippet. To tell Exhibit where it is supposed to ll in the Property values, we have to embed markers in the HTML template. The ex:content attribute in line 2 causes Exhibit to insert the dened Property values in the designated div environment. In this way the rendition of each item includes the name of the president. Exhibit can also set the values of HTML attributes. The ex:src-content attribute manipulates an image source to display an image of the president (the property imageURL holds an URL to an image). Ex:subcontent joins hard-coded values with Exhibits content. By this means the tag in line 4 yields a dynamic link target to a website about the presidents. With customized lenses Exhibits default views can be altered in an easy way without touching the code of the API.

27

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

Collections We can assign Items having a certain type to a collection. The collection incorporates all Items of the type specied by the ex:itemTypes attribute then. To constrain a view to only show Items of a certain collection, we can add the ex:collectionID attribute to the view placeholder.

1 2 3

<div ex:role="collection" ex:itemTypes="typeA" /> <div ex:role="view" ex:viewClass="Timeline" ex:collectionID="typeA-things" />

2.2.3 Wibbit Wibbit is an extension of MediaWiki and allows the embedding of exhibits in wiki articles. It extracts input data from tables within wiki articles instead from JSON sources. As an advantage, we can edit the information source in a convenient way by using a wiki table editor17 . However, a simple table is not as expressive as a JSON le. It does not contain metadata about values such as information about datatypes, for instance. Hence, to congure the presentation, Wibbit requires HTML code, which species the views, facets and lenses Exhibit is supposed to use, in addition to the table .

3 Integration of Exhibit and Semantic MediaWiki


The integration of Exhibit and Semantic MediaWiki is organized into three stages. As a rst step, we design and implement a query result printer that uses Exhibit to format query results. Next we enable Exhibit to access content of SMWs as new type of data source. Finally, we utilize this work to establish data exchange among multiple wikis.
17

for example the Simile Wiki Table Editor at http://simile.mit.edu/wiki/Wiki_ Table_Editor

28

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

Figure 7: Query results visualized by the Exhibit result printer

3.1 Integrating Exhibit as Result Printer

This section deals with the design and implementation of a new result printer presenting query results with the aid of Exhibit. Figure 7 shows the desirable output of such a result printer. To achieve this output, we have to come up with a way to pass SMW content to Exhibit. Thereto, we explore result printers from a technical point of view in Section 3.1.2. In Section 3.1.3 we combine the existing table result printer and Wibbit to supply Exhibits visualizations with input data. Since Exhibit needs information regarding the conguration of views, facets and lenses, we introduce new parameters for inline queries in Section 3.1.4. Section 3.1.5 describes the implementation of the new result printer. Indeed the implemented result printer delivers an output as shown in Figure 7, but many of Exhibits features remain unused. Therefore Section 3.1.6 discusses modications of the result printer with the objective of leveraging more advanced features of Exhibit.

29

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

3.1.1 Challenges and Goals Before designing a new result printer it is necessary to dene the goals the implementation is aiming at. Besides, it is useful to be aware of the accompanying challenges. Keeping querying simple In comparison to other semantic repositories, querying in SMW is rather simple. Thus one goal is to retain this simplicity as far as possible. The query syntax has to be extended in a logical way so that users, who are already familiar with the syntax, can adjust to the new query extensions easily. Whereas Exhibit needs information to congure visualizations, the requirement of additional mandatory parameters has to be reduced to a minimum to keep query strings as short and clearly arranged as possible. A need for statements outside the query should completely be avoided. Reusing information SMW already contains SMW already contains information that is meaningful to the presentation of wiki content. It is advisable to pass this information to Exhibit to overcome redundant data specication. For instance, information concerning datatypes is valuable, as Exhibit is capable to adjust controls (e.g. facets) to dierent datatypes. The more existing information can be harnessed, the more the need of additional query parameters shrinks. Aligning Exhibits to the traditional wiki user interface Exhibits must seamlessly t the wiki interface to create a comfortable look and feel. Exhibits are not supposed to destroy the wiki layout by requiring horizontal scrolling, for instance. On the other hand, they have to grapple with the limited space wikis naturally oer due to xed navigation bars, headers and footers. 3.1.2 Structure of Result Printers Since the purpose of result printers has already been outlined in Section 2.1.2, this section conducts a technical analysis of result printers. The analysis encompasses descriptions of the query workow, the composition of query results and details regarding the implementation of result printers.

30

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

Figure 8: Simplied querying procedure

SMW processes queries in a certain order. Figure 8 sketches this workow. The query mechanism of SMW is based on a MediaWiki parser function. Whenever MediaWiki registers an ask function call, a query parser disperses the function parameters into dierent parts. The parser hands the various query elements to the query processor. The query processor collects the requested data and stores it in a result object of the type SMWQueryResult. From the content of the SMWQueryResult object, result printers generate outputs such as tables, timelines or maps. As the power of result printers depends on the accessible data, we investigate the SMWQueryResult class now. The SMWQueryResult class constitutes the structure of the query results. Basically, this structure can be considered a table (see Figure 9). Each row stands for one page that has been selected during query evaluation. For each print request the table contains a column, which can be adressed through an SMWPrintRequest object. This specic SMWPrintRequest object holds, among other things, the Property name and its datatype. In contrast to usual tables, the cells of the table are designed to contain multiple Property values. These values are stored in an array-like data structure called SMWRe-

31

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

Figure 9: Structure of the SMWQueryResult object

sultArray. Additionally, result printers can access the query string as part of the SMWQueryResult object. Each result printer is implemented as a subclass of SMWResultPrinter in a separate PHP le. The subclass implements a function getResultText() as its most important part. This method creates a string that is either embedded in the body of the output page or is released as detached le18 . Even though MediaWiki predetermines parts of the HTML output (navigation bars, headers and footers), result printers are capable to manipulate other parts than the body of the output page. For example, result printers inuence the HTML head environment by using the SMWOutput::RequireHeadItem() function to refer to scripts or data sources. To provide a certain sense of how result printers work, the next paragraph illustrates the proceeding of SMWs built-in table result printer. As many other result printers, the table result printer traverses the SMWQueryResult object row by row and adds format-dependent markup. The table printer adds HTML tags such as table, th, td to the result data. First, the
18

a detached le only contains the created string without navigation bars, headers and footers

32

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

printer sifts through the SMWPrintRequest object to create the headline of the table. It puts each Property name in th tags, whereas the entire headline is put in tr tags. The result rows of the SMWQueryResult object are embraced by tr tags as well. Concrete values are clasped by td tags. Besides, the result printer invokes a special JavaScript le in the header. The included JavaScript code enhances the resulting HTML table with new functionality such as sorting (see Figure 2). 3.1.3 Wibbit as Basis of an Exhibit Result Printer This section reveals that the table result printer can prepare data that is consumed by Wibbit. As we can already embed exhibits in wiki pages with Wibbit (see Section 3.1.3), it seems obvious to employ Wibbit as foundation for an Exhibit result printer. Wibbit consists of JavaScript code that parses tables on wiki pages and passes the table contents to Exhibit. Wibbit is able to process tables created by the table result printer without restriction of any kind. The choice of Wibbit holds two advantages: In the event the users browser does not support JavaScript a fallback is initiated so that the user faces a normal table view There is no need for the creation of a detached JSON le (which would yield an extra query processing) Of course, a table is not as expressive as a complete JSON le. Information, particularly about datatypes, gets lost in a table scheme, but Wibbit provides workarounds here. However, these workarounds conict with the goal of keeping querying simple (see Section 3.1.1). The next section shows how to adjust these workarounds in order that high usability is retained. 3.1.4 Query Design Exhibit requires additional HTML markup for controlling the output (see Section 2.2.2). Wibbit requires users to place this HTML code in a modied shape in the edit box of wiki articles. The mix of HTML and wikitext tends to confuse usual wiki users, since they usually are not expected to use HTML for editing articles.

33

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

SMWs query syntax is assumed to be known by users, though. Furthermore, the query syntax is geared to the wikitext syntax and can therefore be adopted easily. With the objective of avoiding user confusion, we provide information concerning Exhibits output within the query string. Hence, the introduction of new query parameters is necessary to eliminate the need of further HTML code:

{{#ask: [[located in::USA]] | ... | format=exhibit | views=tiles,tabular,timeline,map | facets=state,population | start=settled | latlng=geo}}

The views parameter species the dierent views the exhibit is going to show. We can select multiple views by using a comma as limiter. As all views are dened at the same position in the wikitext, a view panel manages the gathering of all views on this spot (see Figure 7). At the moment possible parameter values are tiles, tabular, timeline and map. To add facets to the resulting exhibit, we assign the Properties, whose values should be subject to a facet, to the new facets parameter (please note that the user is not expected to provide Exhibit expressions but the plain Property name). To add more than one facet, we separate multiple Properties by commas. Furthermore, depending on the selected views additional parameters might be mandatory. Particularly, the timeline and the map view demand more information on how to deal with the input data they acquire. The map view requires the name of the Property that contains the geographic coordinate to be displayed. We can assign the name to the new latlng parameter. The start parameter conveys the Property that contains a start date to the timeline. In Section 3.1.6 we shall see that the specications of these parameters can be bypassed. Exhibit processes further parameters, which can be set through the query string. Since the specication of such parameters happens analogously, a full explanation is omitted here.

34

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

3.1.5 Implementation The implementation mainly consists of the creation of a new result printer. We learned that result printers can inuence the header as well as the body of the HTML le that is delivered to the users web browser, when the user views a wiki article containing a query. The next two sections present the exact changes of the head and the body the result printer has to produce to let Exhibit take over the visualization. Head To gain access to Exhibits functionality, the result printer adds a reference to the API of Exhibit. Thereby, it disables Exhibits default start-up procedure by appending autoCreate=false to the link that is pointing to the Exhibit API. Moreover, the printer invokes a further JavaScript le called exhibit.js that provides a replacement for the disabled start-up procedure. We shall examine the code of exhibit.js subsequently. Additionally, the result printer declares three JavaScript variables. These variables hold information we would normally expect to be part of the HTML body. MediaWiki has a special engine, called Sanitizer, that prevents the injection of malicious code. The Sanitizer relies on a white list that contains HTML tags and attributes the author is permitted to use within wikitext. However, the majority of tag attributes Exhibit reads out to gain information about the desired presentation format is not part of this list. That is why the result printer stores information that is normally embodied by HTML tag attributes in JavaScript variables in JSON notation. The next lines provide an impression on what information is assigned to these variables and where the information comes from.
1 2 3 4 5 6

var sources = { source0: { id: "querytable1" , columns: [item,number] } };

35

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

The sources variable holds a JSON object. This object contains further objects, which describe the tables being the data source for the exhibit. Each of these objects has an ID. This ID equals the HTML ID of the result table that the result printer will create in the body (see next section). The name columns denes the datatypes of the values a table column holds. Since the SMWPrintRequest object contains datatypes of Property values, the result printer reuses this information here.
1 2 3 4 5 6

var facets = { facet0: { ex:expression=".state", innerHTML: ex:role="facet" } }; The facets variable stores information about facets the resulting exhibit is expected to set up. Each facet is described by a JSON object. Ex:expression denes the Property the facet will operate on. The name of this Property is derived from the new query parameter facet. The name innerHTML holds further HTML attributes describing the facet. These HTML attributes are extracted from the query string as well.

var views = "ex:role=view ex:viewClass=Tabular "; The view variable only contains HTML attributes. The value of the viewClass attribute is extracted from the new views query parameter. Before we nd out about the exact purpose of these variables, let us take a look at what the result printer embeds in the body of the output page. Body The result printer creates an HTML table from the SMWQueryResult object in the body. Since Section 2.1.2 described how the table result printer works, we omit a further explanation. The resulting table is tagged with an ID corresponding to the ID the sources variable holds. Besides, the result printer adds a div tag with a special ID to the body:

36

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

<div id=\"exhibitLocation\" /> Listing 3: Placeholder This new div tag can be regarded as marker. The function of this tag becomes obvious, when we broach the issue of the invoked JavaScript le exhibit.js in the next section. The exhibit.js JavaScript le The JavaScript code we are about to examine originates from Wibbit to a great extent. The code implements a new start-up function for Exhibit. This function can be divided into two parts. The rst part is dedicated to the procurement of input data. In place of loading JSON data Exhibits Exhibit.HtmlTableImporter.loadTable() function is used to scrape input data from an HTML table. Since the code we are exploring at the moment can access the JavaScript variables in the head, the JavaScript gets information about the ID of the table from the sources variable. The second part deals with passing information regarding the presentation to Exhibit. The variables sources, facets and views hold this data in JSON notation. The code creates an HTML snippet containing all the usual tags Exhibit processes from these variables. The resulting HTML snippet is inserted in the div environment with the ID exhibitLocation. Since the code creates and injects the HTML on the client side, the Sanitizer cannot interfere here.

3.1.6 Advanced Features The previous section covered the rudimentary implementation of an Exhibit result printer. The implementation exploited Exhibits basic features, whereupon much potential remained unused. The features discussed in this section utilize more sophisticated functionalities of Exhibit. The discussion points out further capabilities as well as boundaries of Exhibit. Not every feature we consider in this section can and will be implemented. The last section sums up which features nd their way into the nal Exhibit printer by outlining their implementation.

37

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

Mechanisms to improve Usability To accomplish high usability, the obligation of providing additional query parameters must be diminished as far as possible (see Section 3.1.1). The result printer can apply simple algorithms on the query results to retain non-bloated and wellarranged query strings. These algorithms attempt to anticipate what the user expects the exhibit to visualize. This section demonstrates two algorithms targeting the following achievements: Making parameters obsolete by reusing information about datatypes Calculating suitable numeric values for parameters by analyzing Property values As SMWs already contain information about datatypes, the Exhibit result printer can exploit this information to forecast what the user plans to visualize. To visualize data on a map, we have to convey a Property holding geographic coordinates to Exhibit. The common way is assigning the Property name to the latlng parameter in the ask query. To make this parameter dispensable, the result printer can look out for Properties holding geographic coordinates, once the map view is selected. We can do the same for the timeline view. A data set on the timeline is characterized by a start and optionally an end date. We can pass information about which Property contains the start and which one the end date to Exhibit by adding a start and an end parameter to the query. To make these declarations obsolete, a small function can collect the rst selected Properties holding date values. By now, the result printer just harnesses Propertys metadata. It is possible to go farther and analyze the dierent Property values in a result set to derive suitable congurations for facets. Section 2.2.2 mentioned numeric facets. In a numeric facet Exhibit groups values in several intervals. However, we have to set the span of these intervals manually so far. By analyzing the set of numbers, a simple algorithm can determine an appropriate interval span, though. A description of the implementation of these features can be found at the end of this section.

38

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

Browsing the data instead of just displaying it Exhibit can be looked upon as a rich browsing interface for data. It competes with projects like Tabulator [25] or mSpace [26]. At the moment the result printer does not take advantage of Exhibits browsing capabilities, since all resulting data is presented in a at list without any links. The lack of links gives the impression that the wiki does not contain any related data. In most cases this is not true. To overcome this impression, two approaches are imaginable: The result printer creates links to related wiki pages The result printer gathers data in advance and passes it to Exhibit Data shown by Exhibit can be linked to the related wiki pages (if there are any). We can use lenses (see Section 2.2.2) to alter the rendition of each result entry and thus embed links. The customized lens design this thesis proposes is not so dierent from the standard design (see Figure 10). In contrast to default lenses, all text items are rendered as links, if SMW holds a page about them. This applies to every Property name, since Properties are represented by articles within SMW. Obviously, it makes sense to render values of the type wiki page as links, too. For each ask query using the Exhibit result printer a new lens has to be designed, because each query yields a dierent amount and dierent kinds of Properties. As the result printer already creates HTML markup (for views, facets etc.), an obvious approach is letting it create the code for a particular lens as well. This feature is part of the implementation of the result printer (see Section Implementation for details regarding the implementation). Exhibit and SMW share a common data structure: they both rely on graphbased data models. So far, this thesis pursued a query-oriented approach. This approach yields that solely data gathered by queries is subject to the recently described operations. However, constraining the data volume is a legitimate course of action, since Exhibit is not designed to deal with large data volumes, as David Huynh states in his doctoral thesis [7]. Despite of this fact, it is advisable to make even some related data available to Exhibit. Usual result printers only focus on tiny apertures of the graph-structure, since the data provided by the SMWQueryResult object is limited. The result set is

39

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

Figure 10: Customized lens

just a at list of pages (which correspond to Exhibit Items) and their Property values. The fact, whether a Property value is represented by a further page, is not taken into account. Hence, data from other articles is not part of the result set. Subsequently, the bubbles Exhibit shows when an Item is clicked (see Section 2.2.2) stay blank. As workaround the solution proposed above embeds links to the related pages and therefore prevents the bubbles from popping up. In consequence we have to leave the exhibit behind, if we want to access further information. When returning to the exhibit, it is back in default state and thus former changes (e.g. view or facet selections) got lost. That is why it is worth it to push on with taking advantage of Exhibits built-in browsing features on the whole. The next paragraphs expand on achieving the exploitation of Exhibits browsing capabilities. Exhibit copes with multiple data sources. It is possible to combine data from tables with data from JSON sources. So far, the result printer only feeds Exhibit through tables created from the result set. To open up information that resides on other wiki pages, Exhibit can tap other sources: it can get the missing information from JSON les, once it is available as JSON. One way to make the semantic data of articles available as JSON is the establishment of a JSON exporter, which takes the name of a wiki page as parameter in the URL and returns the content of the Factbox on this page as JSON (SMW already oers this feature for RDF output). To retrieve JSON, an URL of the following scheme can be called: http://urltowiki/index.php/Special:JSONExport/pagename

40

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

For each wiki page value of the result table the result printer can create a link to the corresponding URL in the header of the output page. Thus Exhibit gains access to the data of related articles. In principle, Exhibit is able to traverse the whole data graph of an SMW, as soon as it is available as JSON. Even though the export of the entire graph as JSON is feasible, we run into performance issues here: Exhibit is not designed for processing large amounts of data. However, Exhibit only fetches data during its start-up procedure. Once the start-up function has been executed, we cannot ask Exhibit to fetch additional data. The lack of an option to reload information on demand forms an obstacle, since all data that might be requested by the user has to be available in advance. So big parts of the data graph have to be prefetched. The fetching procedure delays the construction phase of an exhibit tremendously. It is much to be hoped that future releases of Exhibit provide an option to fetch data on demand. As long as the construction phase of an exhibit is delayed signicantly, it is not advisable to add this feature to the Exhibit result printer. Therefore we refrain from an implementation. Multiple Exhibits as Obstacle and Opportunity Articles in an SMW sometimes contain more than one query. We have not dealt with the case of multiple queries using the Exhibit result printer on one page yet. There are two feasible approaches on how to tackle multiple queries: Merging the query results Separating dierent data sources by introducing collections The query results can be mashed up. Exhibit merges data on the basis of matching labels or IDs, respectively. Whereas it is not appropriate to merge data describing dierent types of entities (e.g. cars vs. animals), it is apparent to combine data sets that both contain entities of the same type. For instance, we can let Exhibit unite cities located in the US and cities located in Germany. However, the fact that we have to set parameters such as views and facets for each query again and again leads to redundant or dierent query parameters. This is not the only shortcoming the current query-oriented syntax yields. We shall face an additional shortcoming in the next paragraph.

41

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

The second option is to populate the existing exhibit with a new view for each data set. We can distinguish the various query results by using collections (see Section2.2.2). With the help of the view panel, one can switch between the dierent views. Certainly, embedded in a normal wiki page an exhibit visualizing dierent data sets in one view panel tends to confuse users. On usual wiki pages, the information is structured by sections. Graphics are embedded at the position where they are referred or match the context. The content of an article is browsed by scrolling down the page. With dierent data sets, dierent information which is not necessarily related to each other is presented on one single spot (remember the cars vs. animals example). It becomes dicult to identify which section of the text the exhibit refers to. The missing possibility to scatter dierent views across a wiki article bares another shortcoming of the current query syntax. Within this thesis data mash-up is considered to be the most advantageous option of how to deal with multiple queries. We shall see that data mash-up is useful in terms of remote querying in Section 3.3. Information regarding Exhibits presentation can only be passed to Exhibit through query parameters right now. As long as there is no way to place views on dierent spots on a site, it is not advisable to implement the second option. We have to come up with a new syntax for conveying information to Exhibit that is independent of query strings. Customizing Exhibits In Section 3.1.6 Exhibits generic views were aligned to SMWs infrastructure by introducing a new lens template. However, this generic template is predetermined and the user thus cannot inuence the appearance of an exhibit beyond selecting views and facets. As users might want to give exhibits a personal touch, the result printer has to provide a way to dene a lens. Assigning the HTML snippet of a lens to a query parameter is not practical, because an HTML snippet bloats the query string gratuitously. With the view to avoiding the need for special markup outside the query string, other ways have to be investigated. A closer look at MediaWikis template system pays o: in MediaWiki we can dene wikitext templates that are lodged on certain pages in the Template namespace. Let us consider a template called City. To dene the template we

42

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

have to create the article Template:City that has the following content: This is the city of {{{Name}}}. It is located in {{{State}}}. The template contains two markers (Name, State). If we want to paste this snippet in an article, we have to place the following expression in the wikitext of the corresponding article: {{City | Name=Boston | State=Massachusetts} The parameters after the template name assign values for the markers in the template. MediaWikis template engine substitues the markers with the values. The resulting replacement for the expression is: This is the city of Boston. It is located in Massachusetts. For instance, info boxes many Wikipedia articles come along with are based on templates. This concept is not fairly dierent from lenses. MediaWiki templates as well as lenses describe renditions. So it seems obvious to employ MediaWikis framework for specifying a template to dene lenses for an Exhibit. To pass the specication of a lens to Exhibit, we advise the result printer of the lens location by setting a new query parameter called lens. The result printer fetches the lens code from the corresponding page and converts the received wikitext to HTML. The next section provides further details on the implementation. Implementation The recent sections discussed numerous features. In this section the implementation of these features is tackled. Leveraging information to guess the values of query parameters To guess a latlng parameter, the result printer loops through the SMWPrintRequest object. As soon it nds a Property holding a geographic coordinate the loop breaks and the Property name is passed to Exhibit. To anticipate start and end values for timeline views, the result printer initiates the same loop, but it looks out for two Properties holding date values. The rst hit is assumed to be the start date,

43

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

whereas the second hit is considered to be the end date. If no second hit occurs, the result printer only sets the start date. Replacing Exhibits default lens to link related articles A new JavaScript variable lens supplements the existing variables (sources, views, facets). The variable holds HTML markup that describes the look of the lens. To create the HTML the printer sifts through the print requests and utilizes the possibility to request an HTML link to Property pages instead of getting only the pure title. Enabling the user to dene customized lenses The result printer derives the name of the template from the lens parameter. Next it fetches the content from the corresponding template page with the aid of basic MediaWiki library functions 19 . As the fetched content is formatted in wikitext, it must be converted to HTML to be a lens for an exhibit. The call of MediaWikis wikitext parser to get HTML turns out to be fairly simple. Since we cannot replace the markers with the code Exhibit expects on the server side (the Sanitizer would lter certain tags, we have to replace it with a further marker that passes the Sanitizer. Finally, new code in the exhibit.js le replaces this markers with Exhibits way of addressing property values. Please note that the names of the markers in the template have to match the names of the Properties whose values Exhibit is supposed to ll in.

3.2 Enabling Exhibit to Fetch Data from Remote Wikis


Exhibit accesses various data sources and thus grants numerous options to create and edit input data. A wide range of users employ Google Spreadsheets20 to manage input data. However, using tables as a data source yields limitations. For certain information, particularly metadata, a table scheme does not reserve room. This section aims at arranging SMW as further alternative for creating and managing input data. In contrary to Google Spreadsheets, SMW encourages users to provide metadata that is valuable to Exhibit. Section 3.2.1 determines the construction of data exchange between SMW and Exhibit. Next a decision
http://www.organicdesign.co.nz/MediaWiki_code_snippets lists the most useful MediaWiki functions a contributor can resort to 20 http://docs.google.com/
19

44

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

for the format of the exchanged data is made in Section 3.2.2. It turns out that JSON is the most appropriate format. In consequence, we design (see Section 3.2.4) and implement (see Section 3.2.5) a JSON exporter for SMW. The output of this exporter can feed exhibits throughout the web. 3.2.1 Architecture While reviewing the collaboration of Exhibit and Google Spreadsheets, it is revealed that data management and presentation are performed on separated servers. Whereas the user creates and stores the input data on Googles servers, Exhibits presentation capabilities are leveraged on a dierent server. The use of SMW instead of Google Spreadsheets does not change this scenario. The server that hosts SMW and the server the exhibit resides on do not necessarily have to be the same. Therefore communication between a remote SMW and Exhibit across server boundaries has to take place. This section outlines the main elements of such a communication. As it might not be desirable in all cases to export the entire wiki content to Exhibit, we have to investigate ways to select certain parts of the data. Queries are a helpful instrument here, since they select a subset of the entire data volume. Thus a request for data Exhibit sends to SMW needs to include a query. After having received the request, SMW delivers the results in a certain format. The choice of a format is the objective of the next section. 3.2.2 Data Format This section substantiates that creating a JSON exporter for SMW is the best method to feed an Exhibit with data across server boundaries. Both Semantic MediaWiki and Exhibit oer an open infrastructure to build customized importers and exporters. SMWs existing exporters serialize data in XML (eXtended MarkUp Language), whereas Exhibit prefers JSON as input format21 . XML has taken hold as a format for structured data on the web and is thus well-known by web developers. In contrast, JSON is an emerging format which gains in importance and has not achieved such a wide adoption yet. JSON has a higher space
21

Formats other than JSON have to run through a converter rst

45

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

eciency than XML, but the lean syntax of JSON aects the readability [27]. As XML and JSON are not compatible, a conversion between these formats has to occur at one point. It is arguable where this transformation process should be settled. Two options are conceivable: Enabling Exhibit to import XML data Enabling Semantic MediaWiki to export JSON Since Exhibit is a JavaScript framework it internally handles data in JSON. Within the scope of the rst option, data is transformed to an arbitrary format by SMW and then to JSON by Exhibit. The conversion to JSON happens on the client side and slows down Exhibits rendering process. To save this transformation step, SMW can directly deliver JSON. Additionally, SMW benets from a JSON exporter in other respects. Many web services support JSON as input or output format (e.g. Freebase and many Google services). A JSON exporter enables SMW to interact with these web services. Since wikis tend to contain large data volumes, JSONs space eciency helps to minimize the overhead of data transfers. Furthermore, an easy way to exchange JSON across server-boundaries exists (see Section 3.2.4). Indeed JSON is identied as the format of choice, but a statement on how SMWs data model is encoded in JSON has not been made yet. We develop a data model in the next section. 3.2.3 Data Model This section argues for sticking with Exhibits JSON data representation (Exhibits way to represent data in JSON will be referred to as Exhibit JSON from now on). A format is not only characterized by its syntax but also by the underlying semantic model and the resulting expressivity of this model. For example, it is possible to serialize RDF in JSON. Proposals on how to represent RDF in JSON can be found in [28]. The invention of a new way to represent SMWs data in JSON can also be anticipated, but we have to keep in mind that Exhibit is going to be the main consumer of exported JSON data (for now).

46

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

Most web services have their own JSON format. The majority of these formats does not signicantly dier from Exhibit JSON and therefore only needs to be slightly adjusted. If we want Exhibit to extract data from such a JSON source, we can pass the name of a javascript function that is taking care of the conversion to Exhibit JSON. Listing 4 illustrates how to advise Exhibit of the JavaScript function implementing a converter. <link rel="exhibit/data" type="application/json" href="..." ex:converter="JSONConverter" /> Listing 4: Specifying a Converter Function However, the conversion, as the transformation from XML to JSON, requires extra processor time. This extra processor time may not be underestimated, since client-side rendering gets slower the larger the data volume is. To supersede this conversion, the JSON exporter can output Exhibit JSON. Exhibit JSON excels at good readability. It addresses the users eagerness to experiment with data. These experiments include actions such as copying and pasting data manually and reusing it for other exhibits. To be able to do that, the user must have an understanding of the format. For instance, it can be doubted that a casual user can cope with formats like these ones proposed in [28]. The next section shows how SMWs data structure can be encoded in Exhibit JSON. Mapping to Exhibit JSON Section 2.2.1 explained the structure of an Exhibit JSON le. Now we go over the objects of an Exhibit JSON le once more to identify the corresponding information SMWs contain. Item Types Exhibit allows the user to specify the type of Items. When creating a query in SMW, it is not clear what type of entities will be returned. For example, when asking for pages with the Property located in and the value United States, the selected pages could deal with any entity that is located in the United States such as a city, a state or a building. Optionally, we can assign articles to categories. Categories can be construed

1 2

47

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

as classes. In certain cases the class matches the type of an entity (consider a category City that contains articles about cities). In other cases the equality of categories and types does not apply22 . Moreover, articles can belong to multiple or no categories. Due to these ambiguities the output of the JSON exporter we implement in Section 3.2.5 does not encompass the types object. Datatypes Both SMW and Exhibit rely on datatypes to characterize Property values. Since these datatypes are not utterly identical, we have to align them. The following table proposes a mapping of the dierent datatypes. Semantic MediaWiki Exhibit Wiki Page String Number Date URL Item Text Number Date URL

The core piece of the two graph-based data models is a wiki page or an Item, respectively. String and text as well as number and URL are a perfect match, whereas SMWs date datatype is more expressive and features a higher range. SMWs remaining built-in datatypes such as geographic coordinate do not have counterparts in Exhibit, but these values can be treated as usual textual values. Items Each row of a query result set corresponds to one object of the Items array. Both SMW and Exhibit rely on property-value pairs to describe an entity. The property label is dened by the title of a selected article. We saw that some of SMWs valuable information (e.g. certain datatypes) does not nd its way into Exhibit JSON. We can neglect this issue, since most applications being capable to process JSON would not leverage this information anyway. While reviewing various JSON formats of dierent web services it turns out that most of these formats do not encompass metadata.
22

for instance, a category USA could contain entities of dierent types (cities, states)

48

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

On the other hand, SMW does not provide information Exhibit JSON frequently embodies (e.g. itemtypes). This information is not mandatory. Hence, we do not run into problems here.

3.2.4 Design of a JSON Exporter In Section 2.1.2 the Special:Ask page was introduced. The page hosts a form we can use to dene the dierent components of a query. Instead of being embedded in an article, the query results are displayed on the special page itself then. Furthermore, we can request plain results without the usual HTML markup for the wiki navigation. For instance, such plain results can be formatted as an RSS le. To receive such a le, we pass the query to the PHP script that is implementing the Special:Ask via GET parameters. SMW already features an RSS export. We have to call the following URL to get a detached RSS le consisting of entities that are located in the United States:

http://wikiurl/Special:Ask?title=Special%3AAsk&q=[[located+ in%3%3AUSA]]&po=&sort0=&order0=DESC&sc=1&eq=yes

To format these plain results and thus create the RSS le, Special:Ask relies on result printers (see Section 2.1.2). Result printers detect whether they are expected to deliver query results to be embedded in an article (if they are invoked by an inline query) or a plain le (if they are called by Special:Ask). Hence, it is essential to create a new result printer for formatting a (Exhibit) JSON le. This result printer can be used for results of inline queries, too. Instead of returning a plain le the result printer embeds a link to the JSON le in the articles. The link points to the URL of Special:Ask and contains the encoded query string then. Users can benet from such a link, as they are able to copy and paste the link location in an exhibit. There is only one obstacle: such an exhibit only works properly if it is hosted on the same server the new result printer is running on. The next section expands on reasons and workarounds for this obstacle.

49

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

JSON with Padding Web browsers prevent cross-domain data fetching due to security issues (this policy is often referred to as Same Origin Policy). Thus data from foreign resources outside of the current domain is not fetched. At the rst sight, it is not simple to feed Exhibits with JSON les, which are located on remote servers, since a simple link tag to the resource on the remote server does not have an eect. Basically, two kinds of workarounds exist (apart from proprietary technologies like Flash) to retrieve JSON from remote servers anyway: One popular way to trick the browser is the setup of a PHP script on the server that hosts the exhibit, which requests the JSON le from the remote server and passes it to the browser. The browser is not aware of the fact that the JSON le originated from a remote server, as it gets the JSON from the PHP script. Such PHP scripts are called parrot scripts, since they just repeat the content of the remote JSON le and twitter it to the browser. However, parrot scripts do not work on every server. The PHP as well as Firewall congurations determine, whether data fetching from remote servers is permitted. Additionally, an Exhibit user cannot be expected to setup a PHP script. JSON with Padding (JSONP) was introduced as a workaround that bypasses the browsers security arrangements. An initial blog post [29] triggered the circulation of this idea in the blogosphere. In response to JSONP Yahoo published a similar approach on the corporate website [30]. It is hard to come up with an universal description on how JSONP works, since implementations vary. The following paragraph encompasses the constitutive actions that all implementations perform. In either case local JavaScript code dynamically generates a script tag that points to a remote script. The remote script gets invoked, since script tags, unlike link tags, are not subject to browsers security limitations. In one of his blog entries [31] Rick Strahl provides a sample on what such JavaScript code could look like. The trick is that the invoked remote script contains more than only a pure JSON string to make the JSON accessible to the local JavaScript. By putting the JSON string in parentheses and prepending a function name, a function call having the JSON string as parameter emerges:
1

callbackfunc({"foo":42});

50

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

The local JavaScript code implements the function that is called in the remote script. Within the code of the function implementation the JSON string is accessible via the passed parameter:
2 3 4

function callbackfunc(result)\{ alert(result.foo); } JSONP does not escape security issues. Using JSONP can lead to critical security risks. The tapped data source has to be trustworthy and reliable, since it is rather easy to inject malicious code by replying such code instead of the requested JSONP. Furthermore, an error handling on the fetching side is fairly impossible, as the incoming data comes out of the dark [32]. An encryption of the exchanged data cannot take place either. Currently, JSONP is oered as output format by many web services, including Google Spreadsheets [33]. Exhibit utilizes Google Spreadsheets JSON export to get data from spreadsheets. Similar to a normal link tag pointing to a JSON data source, the following link tag taps a JSONP data source:

5 6 7 8

<link rel="exhibit/data" type="application/jsonp" href="http://urltojsonp/exhibit.jsonp" ex:jsonp-callback="editgridCallback" /> In addition we can set the name of a particular callback function (ex:jsonp-callback) in case the name of the function diers from the default name callback. Query Adjustments Certain applications desire JSON (because they are using a parrot script), whereas others prefer JSONP. To stay exible here, we have to signalize SMW what it is expected to deliver. This is done through a new query parameter called callback. Without adding this parameter to the query, regular JSON is exported. As soon as a callback parameter is part of the query, SMW adds the obligatory callback function to the JSON export. The name of this function is derived from the value of the callback parameter. A sample JSONP query could look like the following:

51

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

{{ask: [[located in::Germany]] | ?state | format=JSON | callback=callback}} 3.2.5 Implementation As the creation of a result printer has already been part of the rst integration step (see Section 3.1.2), the implementation of a JSON result printer is summarized briey. To format the properties object of the JSON output, the result printer sifts through the print requests. For each print request the type of the Property is read out and mapped to the corresponding Exhibit type. The mapping is hardcoded in an array. The items object is generated by going through the result object row by row. The titles of the selected pages are assigned to the label property. Furthermore, for each Item an URI property is created that holds an URI to the wiki article that represents the Item. If a callback value is present, the result printer puts the output in parentheses and prepends the callback function name.

3.3 Using Exhibit to Enable Data Exchange Between Semantic MediaWikis


We encounter wikis in everyday situations. Wheras we might use one at work, personal wikis become of interest more and more [34]. Along the way, we use Wikipedia for knowledge procurement. As noted in the introduction, all these wikis are separated islands without a stable connection. The lack of a convenient method to exchange data among wikis fosters the generation of redundant data. Textual data is copied and pasted to transfer data from one wiki to another. The copied snippets get out of date quickly or evolve independent of the wiki they are originating from. MediaWiki already supports a more sophisticated way of exchanging data that is le-based. After having entered the names of pages to be exported on the Special:Export page, the wiki system integrates the selected pages in an XML le. In the target wiki, we must import these les with the aid of another special page again. This best practice has the following shortcomings:

52

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

Completion of several web forms and juggling with data les make the procedure rather tedious The problem regarding outdated information remains The selection of data to be exported is imprecise, since the selection process is only page-based In this section remote queries are introduced to help mitigate these shortcomings. Section 3.3.1 describes the function of remote queries. The implementation of remote queries is tackled in Section 3.3.2. Since the implementation leaves several questions unanswered, Section 3.3.3 broaches the issue of possible improvements. 3.3.1 Design of Remote Queries This section illustrates the procedure of remote queries. To request data from remote wikis, we want to create an inline query containing a remote parameter that indicates the wiki the query is meant for. The following query is supposed to be executed on the data of http://semanticweb.org by the query processor of the wiki running on this domain: {{#ask: [[located in::USA]] | ... | format = exhibit | remote=http://semanticweb.org}} The query is passed to the remote wiki without being executed in the local wiki. The remote wiki processes the query and delivers the query results as JSONP. We use an exhibit in the local wiki to display the incoming JSONP data. 3.3.2 Implementation The two components being the ingredients of the implementation of remote queries have already been designed and implemented: 1. JSON(P) Exporter 2. Exhibit result printer

53

INTEGRATION OF EXHIBIT AND SEMANTIC MEDIAWIKI

These two components need to be combined to accomplish data exchange. To add support for remote queries, we enhance the code of the Exhibit result printer. Instead of creating a table from the result data to form a data source for Exhibit, the result printer creates a link pointing to a remote JSONP source. The link location is dened by the remote query parameter. We have to extend this link with a string that encodes the query we want to submit to the target wiki. The SMWQueryResult object oers helpful methods to create this string. 3.3.3 Advanced Features Indeed we enabled on the y data exchange, but the current implementation shows inadequacies that can only be solved by carrying out additional features. Therefore the implementation can merely be regarded as proof of concept. The following sections reveal needs for improvement. Data Integration One open issue is data integration. Even though data is requested and retrieved from remote wikis, this data is exclusively displayed in the importing wiki. The data is not mashed with the existing local data and thus does not become part of the database of the local wiki. Of course, Exhibit cannot help to transfer data into wiki databases, but it can merge multiple data sources in its views. The combination of the following two queries yields a data set which contains US cities as well as German cities, although the set of German cities is picked up from a remote wiki. The user is not be able to distinguish between data from the local and the remote wiki while viewing the exhibit. {{#ask: [[located in::USA]] | ?geo | format=exhibit | views=map}} {{#ask: [[located in::Germany]] | ?geo | format=exhibit | views=map

54

DISCUSSION AND OUTLOOK

| remote=http://localhost/fh/germanywiki}} Dierent SMWs run on dierent ontologies. In the example, we assume that properties and categories have the same name and semantics in both wikis. In reality, the match of properties and categories is unlikely. It is necessary to come up with a way to align properties originating from dierent wikis. For instance, we can introduce new syntax constructs such as as statements to express that properties with dierent names have the same semantics. Multiple requests To make thriving data exchange among wikis possible, the involved wikis have to deal with multiple data requests at the same time. To contribute to an eective handling of data, wikis need to reduce the workload by caching the results to avoid frequently-repeated execution of queries. We have to make up our mind about caching strategies that speed up the availability but aect the currentness of data as few as possible. Condential Data Of course, not every wiki administrator wants to share the entire wiki data with the public. We have to investigate mechanisms that protect condential information from being queried by remote wikis. Usual page-based restrictions are not sucient, as they are not ne-grained enough. Security Issues Risks that are caused by technologies such as JSONP have to be averted. For instance, we have to make sure users cannot query arbitrary resources. To prevent that, we can only permit querying of trustworthy wikis that are part of a white list.

4 Discussion and Outlook


The integration of Semantic MediaWiki and Exhibit from creating to presenting structured data. SMW managing structured data, whereas Exhibit displays tions summarize the contributions of the integration covers the entire value chain is in charge of creating and the data. The following secand discuss their impact.

55

DISCUSSION AND OUTLOOK

4.1 Visualization of Query Results


Due to Exhibit, SMW incorporates new visualization features and therefore provides further incentives for the annotation of wiki articles. Wiki data can be consumed more easily. The user is cleared from analytic tasks, since new coherences are promptly revealed through map and timeline views. High information loads, wikis tend to have, are subdued by features such as faceted browsing, sorting and aggregation of queried content.

4.2 Use of Wiki Data in Other Web Applications


By exploiting the JSON Exporter Exhibit can visualize data of SMWs at any place of the World Wide Web. So wiki data nds its way into personal websites or blogs. Many web applications, including content management systems such as Drupal23 or blog engines, allow the user to embed exhibits. The Wordpress plugin datapress24 supports users in adding exhibits to blog entries. The user is instructed by a wizard and is not supposed to write a single HTML line for setting up an exhibit. He can congure data sources, views and facets with a few mouse clicks. The barrier of building rich data-centric web applications is lowered again. With the aid of tools such as Semantic Forms or datapress data management and presentation are facilitated. The user gets along with little knowledge of wikitext to create rich web applications. In comparison, popular frameworks for building web applications such as LAMP25 require the knowledge of a language for the application logic (e.g. PHP), a language for presentation (e.g. HTML) and another language for data querying and management (e.g. SQL).

4.3 Data Exchange


This thesis established remote queries as method for on the y data exchange among semantic wikis. The fact that data exchange can be enforced rather conthe following blog entry describes how to use Exhibit with Drupal without requiring a special plugin: http://joshhuckabee.com/ getting-started-exhibit-using-drupal-and-views-part-1 24 http://projects.csail.mit.edu/datapress/ 25 LAMP is a software bundle that consists of Linux, Apache, MySQL and PHP
23

56

DISCUSSION AND OUTLOOK

venient and incoming data is mashed with the local data yields new opportunities of meliorating wiki data. Wiki authors can fetch data from remote wikis without taking over the responsibility of maintaining the imported information. So the domain expert, who is maintaining the data, stays in charge of keeping the data up-to-date. As soon as changes are applied, the data in the importing wiki is contemporarily26 updated as well. By this means not only information is shared and reused but also the related expertise. As a future step, other repositories can be involved in data exchange. For instance, freebase contains lots of generic and useful information that is valuable to wiki articles.

4.4 Outlook
It is much to be hoped that SMW takes further root as tool for personal and corporate knowledge management among casual users. With the aid of Exhibit, SMW users, who conduct personal information management, publish this information through blog entries or personal web pages, whereas users dealing with SMW in a business context use wiki data as basis of reports or presentations. Even if all this information ows into less structured mediums like blog entries, the semantic data remains in the wiki. In the course of enabled data exchange, the various wikis grow closer together. Thus a large pool of structured data emerges. This data is subject to all benets of a semantic web such as semantic search or semantic browsing and thus helps to form a semantic web.

26

in terms of the shown implementation the data is updated immediately, caching procedures would delay an update

57

References

References
[1] Tim Berners-Lee, James Hendler, and Ora Lassila. Semantic web. Scientic American, May, 2001. [2] Nigel Shadbolt, Tim Berners-Lee, and Wendy Hall. The semantic web revisited. IEEE Intelligent Systems, 21(3):96101, May/Jun 2006. [3] World Wide Web Consortium (W3C). W3C semantic web activities. http: //www.w3.org/2001/sw/, last checked: 02/15/09. [4] Richard Benjamins, Jesus Contreras, Oscar Corcho, and Asuncion GomezPerez. Six challenges for the semantic web. In Proc. KR2002 Workshop on Formal Ontology, Knowledge Representation and Intelligent Systems for the Web, 2002. [5] Anupriya Ankolekar, Markus Krtzsch, Thanh Tran, and Denny Vrandecic. o The two cultures - mashing up web 2.0 and the semantic web. Journal of Web Semantics, 6, 2008. [6] Tim OReilly. What is web 2.0? http://www.oreillynet.com/pub/ a/oreilly/tim/news/2005/09/30/what-is-web-20.html, last checked: 01/18/09. [7] David F. Huynh. User Interfaces Supporting Casual Data-Centric Interactions on the Web. PhD thesis, Massachusetts Institute of Technology, 2007. [8] Tom Heath, John Domingue, and Paul Shabajee. User interaction and uptake challenges to successfully deploying semantic web technologies. In Proc. 3rd International Semantic Web User Interaction, 2006. [9] A. Desilets, S. Paquet, and N. G. Vinson. Are wikis usable. In Proc. 1st ACM Wiki Symposium, 2005. [10] Christoph Sauer. What you see is wiki - questioning wysiwyg in the internet age. In Proc. Wikimania, 2006.

58

References

[11] Markus Krtzsch, Denny Vrandecic, Rudi Studer, Max Vlkel, and Heiko o o Haller. Semantic Wikipedia. Journal of Web Semantics, 5, 2007. [12] Abraham Bernstein, Esther Kaufmann, Anne Ghring, and Christoph Kiefer. o Querying ontologies: A controlled english interface for end-users. In Proc. 4th International Semantic Web Conference (ISWC05), pages 112126, 2005. [13] Max Vlkel and Eyal Oren. Towards a wiki interchange format (wif)opening o semantic wiki content and metadata. In Proc. First Workshop on Semantic Wikis - From Wiki To Semantics, 2006. [14] Tobias Kuhn. AceWiki: A natural and expressive semantic wiki. In Proc. 3rd Semantic Wiki Workshop, 2008. [15] Sebastian Schaert, Rupert Westenthaler, and Andreas Gruber. Ikewiki: A user-friendly semantic wiki. In Proc. 3rd European Semantic Web Conference, 2006. [16] Fernanda B. Vigas, Martin Wattenberg, Frank van Ham, Jesse Kriss, and e Matt McKeon. Many eyes: A site for visualization at internet scale. IEEE Transactions on Visualizations and Computer Graphics, 13(6):11211129, 11/12 2007. [17] Dave Beckett. Rdf/xml syntax specication. W3C Specication. Available at: http://www.w3.org/TR/rdf-syntax-grammar/, last checked: 02/12/09, 2 2004. [18] Petra S. Bayerl, Harald Lungen, Ulrike Gut, and Karsten I. Paul. Methodology for reliable schema development and evaluation of manual annotations. In Proc. Workshop on Knowledge Markup and Semantic Annotation at the Second International Conference on Knowledge Capture (KCAP), 2003. [19] Malte Kiesel and Leo Sauermann. Towards semantic desktop wikis. UPGRADE, 6:30, 2005. [20] Stefan Haustein and Joerg Pleumann. Is participation in the semantic web too dicult? In Lecture Notes in Computer Science. Springer, 2002.

59

References

[21] Aditya Kalyanpur, James Hendler, Bijan Parsia, and Jennifer Goldbeck. Smore - semantic markup, ontology, and rdf editor. Available at: http: //www.mindswap.org/papers/SMORE.pdf. [22] Douglas Crockford. RFC 4627 application/json. Available at: http://www. ietf.org/rfc/rfc4627.txt?number=4627. [23] JSON: The fat-free alternative to XML. http://www.json.org/xml. html, last checked: 02/02/09. [24] Marti A. Hearst. UIs for faceted navigation: Recent advances and remaining open problems. In Proc. 2nd Workshop on Human-Computer Interaction and Information Retrieval, 2008. [25] Tim Berners-Lee, Yuhsin Chen, Lydia Chilton, Dan Connolly, Ruth Dhanaraj, James Hollenbach, Adam Lerer, and David Sheets. Tabulator: Exploring and analyzing linked data on the semantic web. In Proc. 3rd International Semantic Web User Interaction Workshop, 2006. [26] m.c. schraefel, Max Wilson, Alistair Russell, and Daniel A. Smith. mspace: improving information access to multimedia domains with multimodal exploratory search. Communications of the ACM, 49(4):4749, April 2006. [27] Ramon Lawrence. The space eciency of xml. Information and Software Technology, 46(11):753759, September 2004. [28] Keith Alexander. RDF/JSON: A specication for serialising RDF in JSON. In Proc. 4th Workshop on Scripting for the Semantic Web, 2008. [29] Remote json - jsonp. http://bob.pythonmac.org/archives/2005/ 12/05/remote-json-jsonp/, last checked: 02/01/09. [30] Using JSON (JavaScript Object Notation) with Yahoo! Web Services. http: //developer.yahoo.com/common/json.html, last checked 01/26/09. [31] Rick Strahl. JSONP for cross-site callbacks. http://www.west-wind. com/Weblog/posts/107136.aspx, last checked 01/10/09.

60

References

[32] Remy Sharp. What is JSONP? http://remysharp.com/2007/10/08/ what-is-jsonp/, last checked 01/17/09. [33] Google. Simple example of retrieving json feeds from spreadsheets data api. http://code.google.com/apis/gdata/samples/ spreadsheet_sample.html, last checked 01/15/09. [34] A. P. McAfee. Enterprise 2.0: The dawn of emergent collaboration. MIT Sloan Management Review, 47(3):2128, 2006.

61

References

Ich versichere hiermit wahrheitsgem, die Arbeit bis auf die dem Aufgabena steller bereits bekannte Hilfe selbststndig angefertigt, alle benutzten Hilfsmittel a vollstndig und genau angegeben und alles kenntlich gemacht zu haben, was aus a Arbeiten anderer unverndert oder mit Abnderungen entnommen wurde. a a

(Ort, Datum)

(Unterschrift)

62