Beruflich Dokumente
Kultur Dokumente
Objectives
Document structure is the logical organisation of the information, i.e., the contents. Traditional
documents, such as a book, may contain only text and still images. They can be organised linearly.
The logical structure will be, for example, chapters, sections, subsections, paragraphs.
When documents are exchanged, everything about the document have to be transfered. These
include:
the contents
the structure
the presentation
Therefore, it requires some way of describing the structure and the presentation of the document
along with the contents.
In contrast to traditional document, hypertext and hypermedia have as their major property, a
non-linear information link.
SCI2600 Multimedia Systems
12. Hypertext, Hypermedia, WWW
Department of Computer Science
(200001) Slide: 3
A hypertext structure is a graph, consisting of
nodes and edges.
The nodes are the actual information units.
The edges provide links to other information
units.
One can follow the edges (arrows or links) to
navigate through the document. The root of the
arrows are known as anchors.
Hypertext refers to a document containing purely
text, or sometime some images but no continuous
Multimedia Hypertext
media, with non-linear links, while Hypermedia
refers to multimedia systems that include non-
Hypermedia linear structure of information units.
There have been a number of hypertext systems before the recent bloom of World Wide Web, e.g.,
Apple’s Hypercard.
When exchanging the document, we need to transfer the contents as well as the structure and the
presentation of the document. To specify the document structure and how it is presented, we need
to put in commands into the document. These are known as markups. There are in general two
kinds of markups: a) logical and b) visual.
The logical markups marks the document The visual markups defines how the elements
elements according to their functions and are rendered, for example, a chapter title is
relations with other elements, e.g., chapter, formatted in Helvetica Bold 24 point, while a
section, paragraph. It does not tell how the section heading is formatted in Times Roman
elements looks. Bold 20 point.
The advantages of logical markups are: With visual markup, the logical structure is lost.
The visual effects of the elements are explicit.
The document structure is explicit, thus the
organisation of information is clear.
It is easier to keep the fidelity.
It is easy to maintain the consistent look of
the document. It is easier to render the document.
It requires a more powerful process to render It is hard to maintain the consistent look of
the document. the document.
SCI2600 Multimedia Systems
12. Hypertext, Hypermedia, WWW
Department of Computer Science
(200001) Slide: 5
Examples of logical and visual markups
Logical Markup Visual Markup
The two kinds of markups are often used together, mixed inside a single document.
chapter chapter
over- section1 section2 section3
title contents
view
Since the beginning of 1980, the need of making publication in electronic form has be growing
rapidly. In order to facilitate document exchange, ANSI and then ISO developed standards for
document markups. The ISO standard (ISO:8879, 1986) specifies a Standard Generalised Markup
language (SGML).
Since there are numerous kinds of documents, SGML does not specifies a single markup language,
instead, it specifies how to define markup languages, therefore, SGML is a meta-language. Thus,
SGML is very complex language.
Presentation
program
formatted
document
Probably, the most popular document markup language nowadays is HyperText Markup
Language(HTML). It it the primary language of the World Wide Web (WWW). HTML is an
application of SGML.
Tim Berners-Lee and Robert Caillau both worked at CERN, an international high energy physics
research centre near Geneva. In 1989 they collaborated on ideas for a linked information system that would
be accessible across the wide range of different computer systems in use at CERN. At that time many people
were using TeX and Postscript for their documents. A few were using SGML. Tim realized that something
simpler was needed that would cope with dumb terminals through high end graphical X Windows
workstations. HTML was conceived as a very simple solution, and matched with a very simple network
protocol HTTP.
CERN launched the Web in 1991 along with a mailing list called www-talk. Other people thinking
along the same lines soon joined and helped to grow the web by setting up Web sites and implementing
browsers, such as, Cello, Viola, and MidasWWW. The break through came when the National Centre for
Supercomputer Applications (NCSA) at Urbana-Champaign encouraged Marc Andreessen and Eric Bina to
develop the X Windows Mosaic browser. It was later ported to PCs and Macs and became a run-away sucess
story. The Web grew exponentially, eclipsing other Internet based information systems such as WAIS,
Hytelnet, Gopher, and UseNet.
E XCERPT FROM http://www.w3c.org/MarkUp/#historical
A HTML document is enclosed by a pair of tags <HTML> and </HTML>. It should contain at
least two parts: the head and the body.
<HTML> The elements in the <HEAD> part are used for
<HEAD> defining informations about the document.
<META NAME="Author"
CONTENT="Wai Wong">
<TITLE>simple document</TITLE> They will not be displayed on a browser.
</HEAD>
They may be used by the server to search
<BODY> for information, such as keywords and
<H1>A sample HTML document</H1> descriptions.
<HR WIDTH="100%">
The elements in the <BODY> part will be
<P>Some text. shown by a browser. The elements are marked
</P> by tags, <XXXX>. There are two kinds of tags:
<HR WIDTH="50%">
<ADDRESS>
Copyright 2000 by Wai Wong</ADDRESS> opening tags required a matching closing
tag, e.g., <H1> and </H1>.
</BODY>
</HTML> tags without matching closing tags, <BR>.
There are many elements that can appear in the <BODY> section of a HTML document. The DTD
of HTML4.01 defines the BODY element as follows:
<!ENTITY % block
"P | %heading; | %list; | %preformatted; | DL | DIV | NOSCRIPT |
BLOCKQUOTE | FORM | HR | TABLE | FIELDSET | ADDRESS">
The above fragment of DTD defines the elements that can appear directly in the <BODY> part, i.e.,
they are not enclosed by other elements. A brief description of these elements are listed in the
table on next page.
SCI2600 Multimedia Systems
12. Hypertext, Hypermedia, WWW
Department of Computer Science
(200001) Slide: 14
<P> Paragraph. Although the end tag is optional, the standard encourage authors to use
proper end tag.
<H1> heading. There are 6 levels, i.e., <H1> to <H6>.
<UL> Un-ordered list. A list of item starting with a dot (). Each item is marked by
<LI>. . . </LI>
<OL> Ordered list. A list of numbered items. each list must have one or more items.
<DL> Definition list. Each item of a definition list consists of two parts: a term which is
marked by <DT>, and a description which is marked by <DD>
<PRE> Pre-formatted text. In handling pre-formatted text, visual user agent may leave
white space intact, may render text with fixed-pitch font, and may disable automatic
word-wrap.
<DIV> Grouping element. For marking the text as a block level element without any
formatting information defined. It is expected that this element is used with the id
and class attributes, and together with style sheet to defined formatting information.
This provides a means of extending the HTML language by authors.
<BLOCKQUOTE> Block quotation. Usually, visual user agent will format this element as an indented
block.
<FORM> Form. It acts as a container for form controls. In addition to form controls, it can
contain other text and markups.
<HR> Horizontal rule.
<TABLE> Table. Tables are divided into rows of data cells. Each row is marked by <TR> and
each data cell is marked by TD.
<ADDRESS> Address. For authors to supply contact information for a document.
<SCRIPT> Script. To place a piece of script in a document.
<NOSCRIPT> This element allows authors to prove alternate content when a script is not executed.
Font style <TT> <I> <B> Change the font style to typewriter, italic and bold.
Font size <BIG> <SMALL> Change the font size large or small.
There are more elements that are allowed in the <BODY> of a HTML document. See the HTML
4.01 specification document for more details.
How web pages arrive to your desktop? The browsers and the servers communicate using the
HTTP protocol.
the screen.
4. It will send out more requests,
probably to a different server if the
received document includes linked Browser
elements.
SCI2600 Multimedia Systems
12. Hypertext, Hypermedia, WWW
Department of Computer Science
(200001) Slide: 17
5.1 The Hypertext Transfer Protocol (HTTP)
HTTP is the basis for the World Wide Web. It is a very simple protocol. There are two types of
messages in HTTP:
Requests the client (user agent) sends out this Responses the server responses to the client’s
type of messages to the server. The format request by sending back this type of
of the request messages is: messages. The format of the response
messages is :
request-line
status-line
headers (0 or more)
headers (0 or more)
hblank linei
hblank linei
body (only for a POST request)
body
The request-line has three parts: request, The status-line also has three parts:
request-URI and HTTP-version. There are HTTP-version, response code and response
only three different requests: GET, HEAD phrase. This tells whether the request was
and POST. For example, a simple request successful. The body contains the data
is: requested. A sample status line is:
GET / HTTP/1.0 HTTP/1.0 200 OK
It provides a uniform, graphical user interface to many Internet services, such as mail, news,
ftp, archie, and so on.
It provides a universal way of finding information.
It provides a flexible way to organise and link information.
The Universal Resources Locator (URL) is the common address for finding information on the
Internat. It is specified in RFC1738, Dec. 1994.
http://www.comp.hkbu.edu.hk/˜comp3600/index.html
" " "
scheme host path/data name
hschemei:hscheme-specific-parti
URLs only specify the location of a resources. Since they may often be moved and changed, a
name can be associated to a resource. This name is known as Uniform Resource Names (URNs)
which are intended to serve as persistent, location-independent, resource identifiers. All URNs
have the following syntax:
hURNi ::= urn:hNIDi:hNSSi
where hNIDi is the Namespace Identifier, and hNSSi is the Namespace Specific String. The
leading “urn:” sequence is case-insensitive. The Namespace ID determines the syntactic
interpretation of the Namespace Specific String.
URLs and URNs are collectively known as Uniform Resource Identifiers (URIs). Therefore, in the
specification of HTML, the links and anchors are called URIs.
A HTML document may include or link to many different kinds of informations, e.g., images,
sounds, animations, . . . Browsers recognise them using a scheme called MIME types. MIME
stands for Multipurpose Internet Mail Extension.
1. Each document may belong to a type specified by the MIME types scheme.
2. The type is recognised by the browser based on the file name extension.
3. The types are specified in a file .mime.type (in UNIX and similar systems).
Of the numerous types of media, popular browsers can handle internally a small number of them,
for example, usually, only GIF and JPEG images can be handled by browsers.
<IMG SRC="images/pattern.gif"
ALT="COMP3600 home" HSPACE="10" VSPACE="10"
HEIGHT="400" WIDTH="600">
Plug-ins are extensions of the browsers. The media handled by the plug-in will be shown within
the browser window. When a document whose type is handled by a plug-in is received, the
plug-in is automatically started by the browser.
Helper applications are separation programs and executed as a separate process. A helper
application usually creates and displays the media on a separate window. But some can
display the media within the browser window. The browser uses a file .mailcap to
configure which helper handles which type of media.
Since the web is becoming more and more popular, more and more hypermeia documents are
delivered on the web. Because many hypermedia documents are very large in size, to download the
whole document before playing it takes a long time.
Streaming is a method of showing multimedia elements on the web without a long initial delay.
There are three favours: Shockwave for Authorware, Shockwave for Director, and Shockwave for
Flash.
1. Using the applications, e.g., Authorware, to create the pieces, pay special attention to the
requirements of the Web. (Refer to the Multimedia Applications Handbook for more details on
creating shocked pieces.)
2. Using the tool provided by the application, e.g., Authorware Afterburner, to create the
‘shocked’ piece.
3. Embed the piece into a HTML document using the following tag (in Netscape).
XML stands for Extensible Markup Language. It is the universal format for structured documents
and data on the Web. XML 1.0 became the recommendation by W3C in February 1998. It is still
in active development. Since the release of XML 1.0, a number of related specifications have been
released and more are being developed.
XML is a SGML application. The major goal of XML is to allow authors to put the structured
information into a document. The visual information for rendering the document are usually
specified in a separate object. Cascade Style Sheets (CSS) or Extensible Style Language (XSL) are
often used for this purpose.
XML is more powerful because it is extensible. It defines very small number of tags, but it allows
the authors or developers to define their own tags. However, when defining new tags, one has to
follow certain rules. A document following these rules is known as a well-formed document. The
rules are:
6. Attribute values must be quoted, for example, it is wrong to write <hr width=50%/>.
7. The characters < and & may only be used to start tags and entity references, respectively.
8. The only entity references which appear are &, <, >, ' and ".
The way in which one extends XML is to create a Document Type Definition (DTD). The DTD
defines a list of elements, attributes, notations and entities that are allowed in an XML document.
The DTD can be internal, which means it is embedded within the XML document, or external,
which means it is in a separate file.
An XML document is valid if it satisfies all the constraints specified by the DTD.
SMIL stands for Synchronised Multimedia Integration Language. It is a new markup language that
became a W3C recommendation on 15 June 1998.
There are a few player that can play SMIL documents. RealPlayer G2 is one of the best players.
Capable of handling many types of media, such as sound, image, video, text, and so on.
The author can specify where the elements are presented on the SMIL presentation pane.
The author can specify when the elements are presented.
It allows choice from alternative elements to be made according to runtime system status.
The <layout> element is the only mandatary element in the <head> part. It determines how
the elements in the document’s body are positioned on an abstract rendering surface. It may
contain the <root-layout> and the <region> elements.
The <root-layout> element specifies the root visual presentation space for the entire SMIL
file. The <region> element specifies an abstract presentation space that may have a name and
become the target of some media elements, i.e., you can specify that a certain element to be
displayed in a certain regain.
<layout>
<root-layout height="425" width="450" background-color="black"/>
<region id="title" left="50" top="150" width="350" height="200"/>
<region id="full" left="0" top="0" height="425" width="450"
background-color="#602030"/>
<region id="video" left="200" top="200" height="180" width="240"
z-index="1"/>
</layout>
Root
full
titel
video
<seq> The children of this element form a temporal sequence, i.e., they will be presented
one-by-one.
<seq>
<video src="slide_narration_video1.rm" region="video"/>
<audio src="slide_narration_audio1.ra"/>
<video src="slide_narration_video2.rm" region="video"/>
</seq>
<par> The children of this element can overlap in time. The textual order of appearance of
children in a par has no significance for the timing of their presentation.
<par>
<audio src="map_narration.ra"/>
<img src="map.rp" region="full" fill="freeze" dur="20s"/>
</par>
The other elements that are allowed in the <body> part are the media object elements. They allow
the inclusion of media objects into a SMIL presentation. Media objects are included by reference
(using a URI).
SCI2600 Multimedia Systems
12. Hypertext, Hypermedia, WWW
Department of Computer Science
(200001) Slide: 34
Summary
Document model, document structure
Hypertext and hypermedia
Document markups
HTML documents
Multimedia on the World Wide Web
New markup languages: XML, SMIL