Sie sind auf Seite 1von 13

The Semantic Web

The Semantic Web


The Semantic Web is a collaborative movement led by the World Wide Web

Consortium(W3C)that promotes common formats for dataon the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of data". It builds on the W3C's Resource Description Framework (RDF). A Vision Of Possibilities The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. -- Tim Berners-Lee, James Hendler and OraLassila, The Semantic Web, Scientific American, May 2001 According to the W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries." The term was coined by Tim Berners-Lee, the inventor of the World Wide Web and director of the World Wide Web Consortium ("W3C"), which oversees the development of proposed Semantic Web standards. He defines the Semantic Web as "a web of data that can be processed directly and indirectly by machines." Purpose The main purpose of the Semantic Web is driving the evolution of the current Web by enabling users to find, share, and combine information more easily. The semantic web is a vision of information that can be readily interpreted by machines, so machines can perform more of the tedious work involved in finding, combining, and acting upon information on the web.

The Semantic Web, as originally envisioned, is a system that enables machines to "understand" and respond to complex human requests based on their meaning. Such an "understanding" requires that the relevant information sources be semantically structured. The Semantic Web is regarded as an integrator across different content, information applications and systems. It has applications in publishing, blogging, and many other areas. Often the terms "semantics", "metadata", "ontologies" and "Semantic Web" are used inconsistently. In particular, these terms are used as everyday terminology by researchers and practitioners, spanning a vast landscape of different fields, technologies, concepts and application areas. Furthermore, there is confusion with regard to the current status of the enabling technologies envisioned to realize the Semantic Web. In a paper presented by Gerber, Barnard and Van der Merwe the Semantic Web landscape is charted and a brief summary of related terms and enabling technologies is presented. The architectural model proposed by Tim Berners-Lee is used as basis to present a status model that reflects current and emerging technologies. Limitations of HTML Currently, the World Wide Web is based mainly on documents written in Hypertext Markup Language (HTML), a markup convention that is used for coding a body of text interspersed with multimedia objects such as images and interactive forms. Metadata tags provide a method by which computers can categorise the content of web pages, for example:
<metaname="keywords"content="computing, computer studies, computer"/> <metaname="description"content="Cheap widgets for sale"/> <metaname="author"content="John Doe"/>

With HTML and a tool to render it (perhaps web browser software, perhaps another user agent), one can create and present a page that lists items for sale. The HTML of this catalog page can make simple, document-level assertions such as "this document's title is 'Widget Superstore'", but there is no capability within the HTML itself to assert unambiguously that, for example, item number X586172 is an Acme Gizmo with a retail price of 199, or that it is a consumer product. Rather, HTML can only say that the span of text "X586172" is something that should be positioned near "Acme Gizmo" and "199", etc. There is no way to say "this is a catalog" or even to establish that "Acme Gizmo" is a kind of title or that "199" is a price. There is also no way to

express that these pieces of information are bound together in describing a discrete item, distinct from other items perhaps listed on the page. Semantic Web solutions The Semantic Web takes the solution further. It involves publishing in languages specifically designed for data: Resource Description Framework (RDF), Web Ontology Language (OWL), and Extensible Markup Language (XML). HTML describes documents and the links between them. RDF, OWL, and XML, by contrast, can describe arbitrary things such as people, meetings, or airplane parts. These technologies are combined in order to provide descriptions that supplement or replace the content of Web documents. Thus, content may manifest itself as descriptive data stored in Webaccessible databases,[14] or as markup within documents (particularly, in Extensible HTML (XHTML) interspersed with XML, or, more often, purely in XML, with layout or rendering cues stored separately). The machine-readable descriptions enable content managers to add meaning to the content, i.e., to describe the structure of the knowledge we have about that content. In this way, a machine can process knowledge itself, instead of text, using processes similar to humandeductive reasoning and inference, thereby obtaining more meaningful results and helping computers to perform automated information gathering and research. An example of a tag that would be used in a non-semantic web page:
<item>cat</item>

Encoding similar information in a semantic web page might look like this:
<itemrdf:about="http://dbpedia.org/resource/Cat">Cat</item>

Tim Berners-Lee calls the resulting network of Linked Data the Giant Global Graph, in contrast to the HTML-based World Wide Web. Berners-Lee posits that if the past was document sharing, the future is data sharing. His answer to the question of "how" provides three points of instruction. One, a URL should point to the data. Two, anyone accessing the URL should get data back. Components The term "Semantic Web" is often used more specifically to refer to the formats and technologies that enable it. The collection, structuring and recovery of linked data are enabled by technologies

that provide a formal description of concepts, terms, and relationships within a given knowledge domain. Resource Description Framework (RDF), a general method for describing information RDF Schema (RDFS) Simple Knowledge Organization System (SKOS) SPARQL, an RDF query language Notation3 (N3), designed with human-readability in mind N-Triples, a format for storing and transmitting data Turtle (Terse RDF Triple Language) Web Ontology Language (OWL), a family of knowledge representation languages The Semantic Web Stack illustrates the architecture of the Semantic Web. The functions and relationships of the components can be summarized as follows:

XML provides an elemental syntax for content structure within documents, yet associates no semantics with the meaning of the content contained within. XML is not at present a necessary component of Semantic Web technologies in most cases, as alternative syntaxes exists, such as Turtle. Turtle is a de facto standard, but has not been through a formal standardization process.

XML Schema is a language for providing and restricting the structure and content of elements contained within XML documents.

RDF is a simple language for expressing data models, which refer to objects ("resources") and their relationships. An RDF-based model can be represented in a variety of syntaxes, e.g., RDF/XML, N3, Turtle, and RDFa. RDF is a fundamental standard of the Semantic Web.

RDF Schema extends RDF and is a vocabulary for describing properties and classes of RDFbased resources, with semantics for generalized-hierarchies of such properties and classes.

OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.

SPARQL is a protocol and query language for semantic web data sources.

Coding theory
Coding theory is the study of the properties of codes and their fitness for a specific application. Codes are used for data compression,cryptography, error-correction and more recently also for network coding. Codes are studied by various scientific disciplinessuch asinformation theory, electrical engineering, mathematics, and computer sciencefor the purpose of designing efficient and reliable data transmission methods. This typically involves the removal of redundancy and the correction (or detection) of errors in the transmitted data. There are essentially two aspects to Coding theory: 1. Data compression (or, source coding) 2. Error correction (or, channel coding). These two aspects may be studied in combination. Source encoding attempts to compress the data from a source in order to transmit it more efficiently. This practice is found every day on the Internet where the common Zip data compression is used to reduce the network load and make files smaller. The second, channel encoding, adds extra data bits to make the transmission of data more robust to disturbances present on the transmission channel. The ordinary user may not be aware of many applications using channel coding. Principle Channel Coding Entropy of a source is the measure of information. Basically source codes try to reduce the redundancy present in the source, and represent the source with fewer bits that carry more information. Data compression which explicitly tries to minimize the average length of messages according to a particular assumed probability model is called entropy encoding. Various techniques used by source coding schemes try to achieve the limit of Entropy of the source. C(x) H(x), where H(x) is entropy of source (bitrate), and C(x) is the bitrate after compression. In particular, no source coding scheme can be better than the entropy of the source. Channel Coding

The aim of channel coding theory is to find codes which transmit quickly, contain many valid code words and can correct or at leastdetect many errors. While not mutually exclusive, performance in these areas is a trade off. So, different codes are optimal for different applications. The needed properties of this code mainly depend on the probability of errors happening during transmission. In a typical CD, the impairment is mainly dust or scratches. Thus codes are used in an interleaved manner. Other codes are more appropriate for different applications. Deep space communications are limited by the thermal noise of the receiver which is more of a continuous nature than a bursty nature. Likewise, narrowband modems are limited by the noise, present in the telephone network and also modeled better as a continuous disturbance. Linear codes The term algebraic coding theory denotes the sub-field of coding theory where the properties of codes are expressed in algebraic terms and then further researched. Algebraic coding theory is basically divided into two major types of codes: 1. Linear block codes 2. Convolutional codes. It analyzes the following three properties of a code mainly:

code word length total number of valid code words the minimum distance between two valid code words, using mainly the Hamming distance, sometimes also other distances like theLee distance.

Linear block codes Linear block codes have the property of linearity, i.e. the sum of any two codewords is also a code word, and they are applied to the source bits in blocks, hence the name linear block codes. There are block codes that are not linear, but it is difficult to prove that a code is a good one without this property.

Linear block codes are summarized by their symbol alphabets (e.g., binary or ternary) and parameters (n,m,dmin) where 1. n is the length of the codeword, in symbols, 2. m is the number of source symbols that will be used for encoding at once, 3. dmin is the minimum hamming distance for the code. There are many types of linear block codes, such as 1. Cyclic codes (e.g., Hamming codes) 2. Repetition codes 3. Parity codes 4. Polynomial codes (e.g., BCH codes) 5. ReedSolomon codes 6. Algebraic geometric codes 7. ReedMuller codes 8. Perfect codes. Convolutional codes The idea behind a convolutional code is to make every codeword symbol be the weighted sum of the various input message symbols. This is like convolution used in LTI systems to find the output of a system, when you know the input and impulse response. So we generally find the output of the system convolutional encoder, which is the convolution of the input bit, against the states of the convolution encoder, registers. Fundamentally, convolutional codes do not offer more protection against noise than an equivalent block code. In many cases, they generally offer greater simplicity of implementation over a block code of equal power. The encoder is usually a simple circuit which has state memory and some feedback logic, normally XOR gates. The decoder can be implemented in software or firmware. The Viterbi algorithm is the optimum algorithm used to decode convolutional codes. There are simplifications to reduce the computational load. They rely on searching only the most likely

paths. Although not optimum, they have generally been found to give good results in the lower noise environments. Other applications of coding theory Another concern of coding theory is designing codes that help synchronization. A code may be designed so that a phase shift can be easily detected and corrected and that multiple signals can be sent on the same channel. Another application of codes, used in some mobile phone systems, is code-division multiple access (CDMA). Each phone is assigned a code sequence that is approximately uncorrelated with the codes of other phones. When transmitting, the code word is used to modulate the data bits representing the voice message. At the receiver, a demodulation process is performed to recover the data. The properties of this class of codes allow many users (with different codes) to use the same radio channel at the same time. To the receiver, the signals of other users will appear to the demodulator only as a low-level noise. Group Testing Group testing uses codes in a different way. Consider a large group of items in which a very few are different in a particular way (for e.g. Defective products or infected test subjects). The idea of group testing is to determine which items are "different" by using as few tests as possible. The origin of the problem has its roots in the Second World War when the United States Army Air Forces needed to test its soldiers for Syphilis. It originated from a ground-breaking paper by Robert Dorfman. Analog coding Information is encoded analogously in the neural networks of brains, in analog signal processing, and analog electronics. Aspects ofanalog coding include analog error correction, analog data compression. analog encryption. Neural coding Neural coding is a neuroscience-related field concerned with how sensory and other information is represented in the brain by networksof neurons. The main goal of studying neural coding is to characterize the relationship between the stimulus and the individual or ensemble neuronal responses and the relationship among electrical activity of the neurons in the ensemble. It is

thought that neurons can encode both digital and analog information, and that neurons follow the principles of information theory and compress information, and detect and correct errors in the signals that are sent throughout the brain and wider nervous system.

Programming languages

A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely. A programming language is a notation for writing programs, which are specifications of a computation or algorithm. Some, but not all, authors restrict the term "programming language" to those languages that can express all possible algorithms. Programming languages are languages, a means of expressing computations in a form comprehensible to both people and machines. The syntax of a language specifies the means by which various sorts of phrases (expressions, commands, declarations, and so forth) may be combined to form programs. The term computer language is sometimes used interchangeably with programming

language. However, the usage of both terms varies among authors, including the exact scope of each. One usage describes programming languages as a subset of computer languages. In this vein, languages used in computing that have a different goal than expressing computer programs are generically designated computer languages. Elements All programming languages have some primitive building blocks for the description of data and the processes or transformations applied to them (like the addition of two numbers or the selection of an item from a collection). These primitives are defined by syntactic and semantic rules which describe their structure and meaning respectively. Syntax A programming language's surface form is known as its syntax. Most programming languages are purely textual; they use sequences of text including words, numbers, and punctuation, much like written natural languages. On the other hand, there are some programming languages which are more graphical in nature, using visual relationships between symbols to specify a program.

The syntax of a language describes the possible combinations of symbols that form a syntactically correct program. The meaning given to a combination of symbols is handled by semantics (eitherformal or hard-coded in a reference implementation). Since most languages are textual, this article discusses textual syntax. Semantics The term Semantics refers to the meaning of languages, as opposed to their form (syntax). Static semantics The static semantics defines restrictions on the structure of valid texts that are hard or impossible to express in standard syntactic formalisms. For compiled languages, static semantics essentially include those semantic rules that can be checked at compile time. Examples include checking that every identifier is declared before it is used (in languages that require such declarations) or that the labels on the arms of a case statement are distinct. Dynamic semantics Once data has been specified, the machine must be instructed to perform operations on the data. For example, the semantics may define the strategy by which expressions are evaluated to values, or the manner in which control structures conditionally executestatements. The dynamic semantics (also known as execution semantics) of a language defines how and when the various constructs of a language should produce a programbehaviour. Type system A type system defines how a programming language classifies values and expressions into types, how it can manipulate those types and how they interact. The goal of a type system is to verify and usually enforce a certain level of correctness in programs written in that language by detecting certain incorrect operations. Any decidable type system involves a trade-off: while it rejects many incorrect programs, it can also prohibit some correct, albeit unusual programs. Standard library and run-time system Most programming languages have an associated core library (sometimes known as the 'standard library', especially if it is included as part of the published language standard), which is conventionally made available by all implementations of the language. Core libraries typically include definitions for commonly used algorithms, data structures, and mechanisms for input and output.

Design and Implementation Programming languages share properties with natural languages related to their purpose as vehicles for communication, having a syntactic form separate from its semantics, and showing language families of related languages branching one from another. But as artificial constructs, they also differ in fundamental ways from languages that have evolved through usage. A significant difference is that a programming language can be fully described and studied in its entirety, since it has a precise and finite definition. Specification The specification of a programming language is intended to provide a definition that the language users and the implementors can use to determine whether the behavior of a program is correct, given its source code. A programming language specification can take several forms, including the following:

An explicit definition of the syntax, static semantics, and execution semantics of the language. While syntax is commonly specified using a formal grammar, semantic definitions may be written in natural language (e.g., as in the C language), or a formal semantics(e.g., as in Standard ML and Scheme specifications).

description

of

the

behavior

of

a translator for

the

language

(e.g.,

the C++ and Fortran specifications). The syntax and semantics of the language have to be inferred from this description, which may be written in natural or a formal language.

A reference or model implementation,

sometimes written

in

the

language

being

specified (e.g., Prolog or ANSI REXX). The syntax and semantics of the language are explicit in the behavior of the reference implementation. Implementation An implementation of a programming language provides a way to execute that program on one or more configurations of hardware and software. There are, broadly, two approaches to programming language implementation: compilation and interpretation. It is generally possible to implement a language using either technique. The output of a compiler may be executed by hardware or a program called an interpreter. In some implementations that make use of the interpreter approach there is no distinct boundary

between compiling and interpreting. For instance, some implementations ofBASIC compile and then execute the source a line at a time. Taxonomy There is no overarching classification scheme for programming languages. A given programming language does not usually have a single ancestor language. Languages commonly arise by combining the elements of several predecessor languages with new ideas in circulation at the time. Ideas that originate in one language will diffuse throughout a family of related languages, and then leap suddenly across familial gaps to appear in an entirely different family. The task is further complicated by the fact that languages can be classified along multiple axes. For example, Java is both an object-oriented language (because it encourages object-oriented organization) and a concurrent language (because it contains built-in constructs for running multiple threads in parallel). Python is an object-oriented scripting language.

Das könnte Ihnen auch gefallen