Sie sind auf Seite 1von 12

The University of Queensland

INFS1300 The Web from the

Inside Out - from Geeks to
Google & Facebook
Compiled Notes
Semester 2, 2014
(Part 3)

22: HTML

Hypertext Markup Language (HTML) is the main language by

which information is displayed in browsers. Began in 1990,

standardized in 1997 (HTML4)

The purpose of a browser is to read HTML documents and turn

them into visible/audible web pages

HTML is written using elements called tags, usually written

with angle <brackets</> and in pairs (though not always)

HTML is all about connecting things. In the early days most

things were other text documents.

Problems with HTML:
o No easily shareable user-defined tags
o Describes only data format, and limited content types
o Lack of compatibility with popular browsers
o Too many incorrect HTML files (wrong HTML grammar)
XHTML refers to the introduction of extensibility to HTML,
formally recognized as HTML 4.01
CSS Cascading Style Sheets
o Offer web designers two key advantages in managing
complex websites
Separation of content and design. CSS gives site
developers the best of both worlds:
Content markup that reflects the logical

structure of the information, and

The freedom to specify exactly how each

HTML tag will look

Efficient Control over large document sets. Allows
site designers to control the graphic look and
feel of thousands of pages by modifying a single

master style sheet document

o Style sheets provide greater typographic control with

less code
Postels Law
o The ability to connect things on the internet is still the
primary feature of HTML5
The Robustness Principle: be conservative in what you do; be
liberal in what you accept from others

23: XML

eXtensible Markup Language

Data format and data content
User-defined tags
Has its own grammar
Describe structured and unstructured data
o Structured: database, table
o Unstructured: webpage, eCommercer document, etc.
Format for sharing data

24: XML and HTML Different Goals

XML was designed to transport and store data, with focus on

what data is
HTML was designed to display data, with initial focus on how
data looks

25: What is METADATA?

Metadata by definition:

A set of data that describes and gives information about other

data Oxford English Dictionary

26: What are Networks?

Networks by definition:
1. A collection of individual or atomic entities and their links
2. A concrete (measurable) pattern of relationships among
entities in a given space
Networks Components:

Atomic entities: nodes or vertices (the dots or points)

Collection of links or edges between vertices (the lines)
o Links can represent any pairwise relationship
o Links can be directed or undirected

Networks Study:
In principle, we are interested in properties that are invariant:

Structural properties
Statistical properties of families of networks

Some Network claims:

Networks create social capital for individuals and communities

Networks create status and category difference in markets
Network forms of organization are an alternative to markets

and hierarchies
Networks are the defining feature of innovative regions such

as Silicon Valley
Networks create trust and increase tolerance
Networks inspire conformity in thought and action
Networks shape the diffusion of technologies and

organizational practices
Networks create individual tastes and preferences
Social networks among individuals: friendship, advice-seeking,

romantic connections, acquaintanceship

Formal, contractual relationships among organizations:
strategic alliances, buyer-supplier contracts, joint ventures,

Informal, inter-organizational relationships flow through
people: directed interlocks, employee mobility, social
networks that cross organizational boundaries

Network key mechanisms:

1. Resource and information channels (Network pipes)
2. Status signaling and certification (Network prisms)
3. Social influence (Network peeps)
27: Undirected vs. Directed Networks
Undirected Networks

Edges have no direction

Paths: A sequence of nodes with the property that each
consecutive pair is joined by an edge
o A path that starts and ends in the same node
o Cyclic vs. Acyclic

Directed Networks

Edges have a direction

Paths or cycles must respect the directionality of the edges

28: Connectivity (re. networks)

An undirected network is connected if there is a path between

every pair of nodes in the network

o There is not more than one component
o Otherwise the network has multiple components
A directed graph is strongly connected if for every two nodes
A and B, there is a path from A to B and a path from B to A

29: Social Network Analysis

Social Networks
o A finite set of social entities and the relations defined on
o Focus on relational information rather than attributes of

the social entitles

o If entity A likes B and B likes C, then also A likes C
o If entity A and B like each other, they should be similar
in the evaluation of some C
o If entity A and B dislike each other, they should evaluate

C differently
Social ties
o Example: evaluation of one person by another,
association, behavioral interaction, formal relations


Emphasizes on a tie between two social entities

An inherent property between two social entities
Analysis focuses on dyadic properties
Example: reciprocity, trust
A subgroup of three social entities and possible ties

between them
o Always in undirected networks, i.e. considering by
default mutual relations (e.g. colleagues)
o Centrality: The core of a group, an authoritative figure,

o Always in directed networks, i.e. considering the
particular direction of the relations (e.g. likes, dislikes,

o Prestige: The cool guy in the group, the most respected

member of a society
30: P2P Networks
Two types of network architectures
1. Client/Server
a. Advantages:
i. Well-known
ii. Powerful
iii. Easy to maintain
iv. Fast routing client sends to server and vice-versa
b. Limitations:
i. Scalability (more connections require more
ii. Single point of failure
iii. Administration
2. Peer to Peer (P2P)
a. Sharing resources by direct exchange
b. Advantages:
i. No single point of failure
ii. Efficient use of resources
iii. Scalability
1. Give-and-take model
iv. Reliability
1. Information duplication (replication)
2. Geographic distribution
v. Ease of administration
1. Self-organizing nodes
2. Built-in fault tolerance and load balancing
c. Characteristics
i. Clients are also servers
1. Each node may contribute physical
resources (memory, disk space)
ii. Nodes are autonomous
1. No administrative authority YAY!
iii. Network is dynamic
1. Nodes join and leave at their discretion
iv. Direct collaboration: P2P
d. P2P applications/examples
i. File sharing
1. BitTorrent
ii. Multiplayer Games
1. CounterStrike
iii. Collaborative applications
1. Skype
iv. Distributed computation

1. SETI, Bionic, Project List

e. P2P file sharing
i. Typical workflow:
1. A makes files on his/her computer available
to others
2. B connects to the network, searches for files
and downloads them directly from A
ii. You can clearly see where the huge (illegal) issue
1. E.g. Napster was a free music downloading
program that went down BECAUSE it was a
hybrid server and P2P
31: Small World Problem

Starting with any two people in the world, what is the

likelihood that they will know each other?
o Studied extensively by social psychologist from Yale and
Harvard University Stanley Milgram

How do Small Worlds form?

Preferential attachment
o The rich get richer
o High probability of a new individual to connect to a
individual that already has a large number of
E.g. A new individual linking to well-known
individuals in a social network, or a new website

linking to more established ones

Assortative mixing (or Homophily)
o Selective linking of individuals to other individuals who
share some common interests/properties
o E.g. degree correlation: high-degree nodes in a network
associate preferentially with other high-degree nodes
o E.g. social networks: nodes of a certain type tend to

associate with the same type of nodes (e.g. by race)

o Selective linking of individuals to other individuals who
are different in some interests/properties

o E.g. the Web: low-degree nodes tend to associate with

high-degree nodes
32: Censorship and Privacy

o Search engines censor results in:

o 2006 AOL search data
20 Mio. Search keywords
650,000 users
3-month period
o Released publicly for research purposes
o Outcome: resignation of AOLs CTO a month after their
o WHY?
Data in principle was anonymised, however
NY times have proved that the data was
enough to actually locate individuals!

33: Advertising on the Web

Web 1.0 Advertising

Advertisers paid per impression and ads were not targeted

In 1998 ads became targeted

The charging model also changed:

Past CPI (cost per impression)

o Typically very low
Now CIC (cost per click)
o Much higher!

Google AdWords

Google launched AdWords in 2000, on a cost per 1000

impression basis
o No keyword bidding
o Standard banner ads
o Unobtrusive text ads based on search terms displayed
next to the search results

In 2002 Cost per Click model

o Combined with a unique relevance score to avoid
advertisers with larger budgets but irrelevant ads to
hijack all attention
o Googles decision to factor Click-through rate into an
advertisers ranking forced an economy of relevance
into the pay-per-click model

How search engine advertising works

User enters a search term

Advertiser determines a number of keywords and associates

an advertisement
Google matches the search terms with the keywords and

displays the ads

When a user clicks the ad, the advertiser pays a certain
amount based on their bid

How does a search engine decide how much to charge?

Generalized second price auction

34: Organic Search: Search Engine Optimization

Insert picture/pyramid from slides here!
Building accessible sites

Link Architecture
o An ideal site architecture makes content access simple
for both search engines and users
o URL friendliness (single domain, shallow folder structure
with relevant words, keywords in page name, separated
by hyphen)
o Tag recommendations
o Anchor text
o Image alt descriptions (obtain keywords to get more

Keyword research and targeting

Keywords are everything! Include them in all you can

Link building

Manual link submissions/requests

o Very Low scalability (1/5) as requesting links is a long
tedious process
o Very high link quality (5/5) as if carefully chosen, source

quality can be exceptional

Competitive link research/acquisition
o Low scalability (2/5) as researching and mimicking
anothers link profile is tedious (depending on the
strategies employed)
o Moderate link quality (3/5) as theres no reliable way to
know which links are worth chasing vs. not passing

Partnerships, exchanges and trades
o Moderate scalability (3/5) as depending on the focus,
the ability to incent more linkers using the same tactics
can be reasonably simple
o Moderate link quality (3/5) as the quality of the links is
highly variable depending on the tactic (reciprocal links
tend to be easy to spot and discount) and the effort

employed by the link builders

Paid links
o High scalability (4/5) as link buying is highly scalable,
limited only by the available funds and costs
o Very low link quality (1/5) as quality can be high, but
even in these cases its often temporary and many paid
link sources are devalued or even penalize/harm the
sites they link to

34: Folksonomies
35: Four Demands for a Smarter Search
1. Find information faster
a. Provide Search assistants
2. Reveal hidden information
a. Enrich the search index with background knowledge
3. Find more specific information
a. Query the Semantic Web
4. Find linked information

a. Integrate data sources (Linked Data Web)

37: Big Data vs. Smart Data
Big Data is properly defined as data sets whose size is beyond the
ability of commonly used software tools to capture, manage, and
process the data within a tolerable elapsed time
38: Performance Goals vs. Learning Challenges
Performance Goals

E.g. Getting an A in French

Learning Challenges

E.g. Learning to speak French