Sie sind auf Seite 1von 32

What is a web browser?

2003-06-30: When you sit down and look at web pages, you are using a web browser.
This is the piece of software that communicates with web servers for you via the HTTP
protocol, translates HTML pages and image data into a nicely formatted on-screen
display, and presents this information to your eyeballs -- or to your other senses, in the
case of browsers for the vision-impaired and other alternative interface technologies. Web
browsers also appear in simpler devices such as Internet-connected cell phones, like
many Nokia models, and PDAs (Personal Digital Assistants) such as the Palm Pilot.
The most common web browser, by a large margin, is Microsoft Internet Explorer,
followed by the open-source Firefox browser and its relatives, including Netscape 6.0 and
later. Apple's Safari browser is now the standard on Macintoshes (although Firefox is also
a fine choice on the Mac), and the Opera shareware browser has a loyal following among
those who are willing to pay for the fastest browser possible, especially on older
computers. The Lynx browser is the most frequently used text-only browser and has been
adapted to serve the needs of the vision-impaired.
What is a web server?

2003-09-04: web servers are the computers that actually run websites. The term "web
server" also refers to the piece of software that runs on those computers, accepting HTTP
connections from web browsers and delivering web pages and other files to them, as well
as processing form submissions. The most common web server software is Apache,
followed by Microsoft Internet Information Server. Many, many other web server
programs also exist. For more information about web servers and how to arrange hosting
for your own web pages, see the creating websites section.
What is a home page?

2003-01-18: the "home page" of a website is the page that is displayed if you simply type
in the fully qualified domain name of the site in the address bar of your browser and press
enter. For instance, when you type in www.cnn.com and press enter in the address bar,
you go to CNN's home page. "Home page" can also refer to a page that serves as the table
of contents and logical starting point for any collection of web pages, such as the personal
web pages of an individual, even if it is not actually the top-level home page for the
domain name. Also sometimes referred to as a "homepage."
What are HTML and XHTML?

2003-09-04: XHTML, which stands for Extensible HyperText Markup Language, is a


simple markup language used to make web pages.
Although all modern word processors and many specialized tools can be used to make
web pages without learning XHTML at all, learning XHTML itself is a useful way to
learn more about the web and provides more control over the results. Luckily, XHTML is
very simple and quite easy to learn.
What's this XHTML stuff? What happened to HTML?
XHTML is the latest generation of HTML. HTML was originally intended to be an
instance of SGML, a general-purpose markup language. But many HTML pages do not
comply with the requirements of SGML, which makes HTML tougher for computers to
work with in useful ways.
In more recent years, the World Wide Web Consortium has taken steps to correct the
problem. SGML has been largely replaced by XML (Extensible Markup Language), a
new general-purpose markup language that is easier to work with than SGML. And
XHTML, which replaces HTML, is a newer standard which complies fully with the
requirements of XML but remains compatible with older web browsers.
A Simple Example
Here is a simple example of a valid XHTML document. To try this out for yourself,
simply create a new file called mypage.html with any text editor, such as Windows
notepad. Paste in the HTML below, make any changes that please you, and save the
document. Then pick "open" from the File menu of your web browser, locate the file you
have just made, and open it. If you make further changes, you will need to "save" again
and then click "reload" or "refresh" in your browser to see the results.
Of course, this is just a simple example. XHTML can do far, far more than this. A
complete tutorial can be found at Dave's HTML Guide.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"


"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>Title of My Page Goes Here</title>
</head>
<body>
<h1>Heading Of My Page Goes Here</h1>
<p>
<a href="http://news.google.com/">Follow this link to Google News</a>
</p>
<p>
Here is a picture of my cat:
</p>
<p>
<img src="cat.jpg" alt="Photograph of my cat"/>
</p>
</body>
</html>

What's That DOCTYPE About?


Good grief! Most of this looks friendly enough, but what's that scary "DOCTYPE" line
all about?
The DOCTYPE tells the web browser what version of XHTML we're using. In this case
I've specified XHTML 1.0 Strict, because this code is 100% compliant with the rules of
XHTML. You don't need to understand this line in detail - just know that you should
include it if you plan to write standards-compliant web pages. And you should.
Those who must use HTML elements that aren't included in strict XHTML can use the
"transitional DTD" (Data Type Declaration) instead:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"


"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Understanding XHTML: A Basic Introduction


The XHTML elements in the page above are nearly self-explanatory. All elements that
describe the page but are not actually part of the content appear inside the head element.
All of the elements that actually make up the visible page itself are part of the body
element. Everything between the opening <head> "tag" and the closing </head> "tag" is
considered a part of the head element. The same goes for body. And everything should be
contained within a single html element.

The text between <h1> and </h1> is displayed as a "level one heading," which is
typically a very large, bold font.
The p element encloses a paragraph. In strict XHTML, most elements such as images and
links must be enclosed in a paragraph or another "block-level" element.
The text between the opening and closing <a> and </a> "tags" becomes a link to another
web page; the URL of the web page to be linked to is found in the HREF attribute of the
<a> element as shown in the example above.
The <img> element includes an image in the page; the image is displayed at that point in
the page, as long as the image file specified by the URL in the src attribute actualy exists.
Since the src attribute I used here contains a simple filename, the cat picture will be
shown as long as the file cat.jpg is in the same directory as the page. The same trick can
be used in href attributes in <a> elements, to conveniently link to pages in the same
directory. For more information about images and how to create them in formats
appropriate for the web, see the image file formats entry.
The alt attribute of the img element contains text to be displayed to blind users. XHTML
requires it, and since this text is also read by search engines like Google, it's important to
include it - Google probably won't know your page is about cats if there is no text about
cats on the page!
The "alt text" should describe the image in a useful way for those (including both
computers and people) who cannot otherwise see it.
The <img> element has a / before the > to signify that it is not a container and that no
closing </img> is expected.
What To Do With Your XHTML Page
Of course, a web page sitting in a file on your own computer is not yet visible to anyone
in the outside world. See the setting up websites entry to learn more about how to create
websites that others can see.
What are Cascading Style Sheets (CSS)?

2004-09-29: cascading style sheets are the recommended method of expressing the
precise "look and feel" of a web page. By associating a CSS file with a web page, the
exact appearance and behavior of every HTML element can be specified.
For instance, if the file main.css contains the following:
a {
text-decoration: none;
}
And the web page page.html contains the following <link> element within its <head>
element:
<link href="/main.css" rel="stylesheet" type="text/css">
Then any <a> elements (ordinary links) in the page will not be underlined as they
otherwise would be in most browsers.
You can also attach a style to a specific HTML element in the page itself, like so:
<a href="something.html" style="text-decoration: none;">
All modern web browsers support CSS, though CSS support is rarely perfect and testing
with a variety of browsers is recommended.
For a complete reference guide to CSS, see the W3 Schools site.
Why do style sheets exist? Why are they separate from HTML?
HTML was always intended to express the content and structure of web pages,
rather than their appearance. The original HTML specification offered elements
like <cite> (for citations), <a> (anchor, for links), and <p> (paragraph). These
express the structure and meaning of a document's parts, but not the way those
parts are expected to appear. The look and feel was left up to the programmers of
the web browser and the preferences of the reader.
Of course, designers clamored for better control over the appearance of web
pages. Various "quick fixes" were introduced, like the <font> and <center>
elements, which are now discouraged. The difficulty with elements like these is
that they do not express anything about the meaning of the page. What if the user
is blind? What if the browser is actually a site-indexing program? Knowing that
the text should be "red, and 24 pixels tall" doesn't convey as much to these users
as <cite> or <h1>.
A better solution is to let HTML elements express the structure of the document in
a way that all users and programs can understand, and let cascading style sheets
express the exact appearance the designer prefers for each element -- when the
web browser is actually capable of displaying such things.
What is a website?

2007-02-23: A website is a collection of web pages maintained by a single person or


organization. In most cases, a website has a distinct fully qualified domain name, such as
www.boutell.com. Everything on www.boutell.com, such as this page, is considered to be
part of the www.boutell.com website.
Less often, a set of pages beginning "lower down" than the home page of the site will
also be referred to as a website. These pages will almost always have URLs that begin
with a common "stem," such as:

http://example.edu/~professorsname/
Legal Note: yes, you may use sample HTML, Javascript, PHP and other code presented
above in your own projects. You may not reproduce large portions of the text of the
article without our express permission.
Got a LiveJournal account? Keep up with the latest articles in this FAQ by adding our
syndicated feed to your friends list!
What are web pages?

2003-09-04: every website is made up of one or more web pages -- like the one you are
looking at right now! This text is part of a web page, and is written in the HyperText
Markup Language (HTML). In addition to text with hyperlinks, tables, and other
formatting, web pages can also contain images. Less commonly, web pages may contain
Flash animations, Java applets, or MPEG video files. For more information and an
example, see the HTML entry.
What is a URL?

2003-09-04: look up at the top of this web page. Above the page you will see the
"location bar" of your web browser, which should contain something very like this:
http://www.boutell.com/newfaq/definitions/url.html

This is the Uniform Resource Locator (URL) of the web page you are looking at right
now. A URL can be thought of as the "address" of a web page and is sometimes referred
to informally as a "web address."
URLs are used to write links linking one page to another; for an example, see the HTML
entry.
A URL is made up of several parts. The first part is the protocol, which tells the web
browser what sort of server it will be talking to in order to fetch the URL. In this
example, the protocol is http.
The remaining parts vary depending on the protocol, but the vast majority of URLs you
will encounter use the http protocol; exceptions include file URLs, which link to local
files on your own hard drive, ftp URLs, which work just like http URLs but link to
things on FTP servers rather than web servers, and mailto URLs, which can be used to
invite a user to write an email message to a particular email address.
The second part of the example URL above is the fully qualified domain name of the
website to connect to. In this case, the fully qualified domain name is www.boutell.com.
This name identifies the web site containing the page. The term "fully qualified domain
name" refers to a complete website or other computer's name on the Internet. The term
"domain name" usually refers only to the last part of the name, in this case boutell.com,
which has been registered for that particular company's exclusive use. For more
information about registering domain names, see the setting up websites entry.
The third part of the example URL is the path at which this particular web page is
located on the web server. In this case, the path is /newfaq/basic/url.html. Similar to
a filename, a path usually indicates where the web page is located within the web space
of the website; in this case it is located in the basic sub-folder of the newfaq folder,
which is located in the top-level web page directory of our website.
For more information, see a beginner's guide to URLs, as well as my article what is my
URL?
What is a domain name?

2004-06-01: The term "domain name" usually refers to a particular organization's


registered name on the Internet, such as example.com, boutell.com or udel.edu. There
may be many distinct computers within a single domain, or there may be only one. The
term "fully qualified domain name" refers to a complete website or other computer's
name on the Internet, such as www.boutell.com or ip2039.cleveland.myisp.com. The
holder of a domain name may delegate almost any number of names within that domain,
such as www1.example.com, www2.example.com, whimsical.example.com, and so on.

Registered domain names are themselves part of a "top-level domain." See the top-level
domains entry for more information about top-level domains such as .com, .edu, .mx,
.fr and so on.

For more information about registering domain names, see how do I register a domain
name, how do I register a .edu domain name, and how do I set up a website.
What is a hyperlink?
2003-06-30: Every time you click on a link on a web page, such as the link you may have
clicked on to reach this page, you are following a hyperlink.
A hyperlink is a link you can click on or activate with the keyboard or other device in
order to go somewhere else. A hyperlink is defined by its function, not by its appearance.
What it looks or sounds or smells like is completely irrelevant except as a way of
recognizing it. Visually impaired people follow hyperlinks with speech-based browsers
and never see text at all. A hyperlink without a blue underline is still a hyperlink if your
browser allows you to click on it or otherwise activate it to go somewhere else on the
World Wide Web, or in another hypertext system.
See also the URL entry.
What is hypertext?

2006-04-04: hypertext is text that contains hyperlinks. The HTML and XHTML
documents we see on the World Wide Web are the best-known example of a hypertext
system, but it is not the only one. Hypertext doesn't necessarily have to include links to
documents in other places; a simple hypertext system can live on a single computer, as in
the case of Apple's once-common HyperCard application.
What does WWW stand for?

2003-08-23: WWW is an acronym which stands for World Wide Web. See World Wide
Web for more information
What is the World Wide Web?

2003-06-30: The term "World Wide Web" refers to all of the publicly accessible websites
in the world, in addition to other information sources that web browsers can access.
These other sources include FTP sites (an older way of transferring files), UseNet
newsgroups (once the most popular way to post messages to public forums on the
Internet), and a few surviving Gopher sites.
See also: What is the Internet, Who invented the World Wide Web, and Who invented the
Internet.
What is the Internet?

2003-06-30: "The Internet" refers to the worldwide network of interconnected computers,


all of which use a common protocol known as TCP/IP to communicate with each other.
Every publicly accessible website is hosted by a web server computer, which is a part of
the Internet. Every personal computer, cell phone or other device that people use to look
at websites is also a part of the Internet. The Internet also makes possible email, games
and other applications unrelated to the World Wide Web.
See also: What is the World Wide Web, and Who invented the Internet, and Who invented
the World Wide Web.
What is an Intranet?

2004-02-23: any network of interconnected computers belonging to one organization,


similar to but separate from or insulated from the Internet. Intranets use the same
protocols and software that are used on the Internet. For instance, many organizations
have special intranet websites that can only be viewed from a desktop in their offices, or
when connected to their Virtual Private Network (VPN).
What are FTP and SFTP?

2006-10-15: FTP (File Transfer Protocol) is an older protocol for moving files back and
forth over the Internet and other networks. All modern web browsers still speak FTP,
which was sometimes used as a substitute for HTTP in the early days of the web. FTP is
still used often as a means of downloading large files.
Many web hosts still offer FTP as the preferred way of uploading new web pages to a
website. However, because there is no encryption of your password, FTP is not the best
choice for this purpose. And since there is no encryption of the files being moved, FTP is
a poor choice indeed for more sensitive information.
SFTP (Secure FTP) is a popular replacement. Built on SSL, SFTP is just as secure as
HTTPS. And most modern FTP clients, such as the free, high-quality FileZilla program
for Windows, support both FTP and SFTP. SFTP offers a set of features quite similar to
FTP and will be immediately familiar to FTP users, although it works quite differently
"under the hood."
Every Windows, MacOS X and Linux system comes standard with a simple command
line FTP client program. And MacOS X and Linux also have command line SFTP clients
as standard equipment. In addition, MacOS X supports connections to FTP servers in a
user-friendly way, right out of the box (you can find a great tutorial on creativemac.com).
Binary Mode and ASCII Mode in FTP
"Classic," non-secure FTP can move files in two major ways: "binary mode" and "ASCII
mode." Binary mode just moves the file down the wire without modifying anything... and
this is, almost always, what we want today.
"ASCII mode" is sometimes used for plain-text (usually, .txt) files. ASCII mode, named
for the American Standard Code for Information Interchange which determines what byte
stands for each letter, number or other character in text, corrects for differences in the
way line endings are stored in text files. Windows traditionally uses a carriage return
(represented by an ASCII value of 13) followed by a line feed (represented by 10). Unix
typically expects just the line feed. And MacOS, at least prior to MacOS X, preferred a
carriage return only.
Today, though, most text-editing and viewing programs can view text files that follow
any of these three conventions (including Microsoft Word, and the free WordPad program
that comes with Windows, but excluding a few annoying holdovers like Windows
Notepad). So ASCII mode doesn't do us much good. These days ASCII mode is mostly
an annoyance, something people accidentally leave on in a very old fashioned command-
line FTP program, or accidentally turn on in a newer one. And when you're moving a
program, an image or anything else with an exact file format that must not be modified,
that means you get garbage instead of the file you wanted.
The norm today is for FTP to simply ship files unchanged in binary mode.
Once upon a time there was also something called "TENEX mode," to move files
between computers that didn't even use the same number of bits to represent a byte. Yes,
FTP has been around that long! But TENEX mode doesn't come up as an issue these
days, and we're all happier for it.
What is HTTPS?

2006-09-11: HTTPS is HTTP over SSL. Now, let me explain that in English!
HTTP, the HyperText Transfer Protocol, is the language or "protocol" that all web
browsers speak when talking to web servers. And SSL, which stands for Secure Sockets
Layer, is a protocol that provides secure communication. When two programs talk to each
other using HTTP, but do it using SSL's secure communications instead of talking "in the
clear," they are speaking HTTPS.
When two programs communicate via HTTPS, they need a way to verify each other's
identity and agree on a method of encryption. They do this via SSL certificates. See what
is an SSL certificate? for more information.
HTTPS URLs can be recognized by the additional s after http. By default, HTTPS
communication happens on TCP/IP port number 443 instead of port 80.
What is SSL?

2006-09-11: SSL (Secure Sockets Layer), also known as TLS (Transport Layer Security),
is a protocol that allows two programs to communicate with each other in a secure way.
Like TCP/IP, SSL allows programs to create "sockets," endpoints for communication,
and make connections between those sockets. But SSL, which is built on top of TCP,
adds the additional capability of encryption. The HTTPS protocol spoken by web
browsers when communicating with secure sites is simply the usual World Wide Web
HTTP protocol, "spoken" over SSL instead of directly over TCP.
In addition to providing privacy, SSL encryption also allows us to verify the identity of
the party we are talking to. This can be very important if we don't trust the Internet. While
it is unlikely in practice that the root DNS servers of the Internet will be subverted, a
"man in the middle" attack elsewhere on the network could substitute the address of one
Internet site for another. SSL prevents this scenario by providing a mathematically sound
way to verify the other program's identity. When you log on to your bank's website, you
want to be very, very sure you are talking to your bank!
How SSL Works
SSL provides both privacy and security using a technique called "public/private key
encryption" (often called "asymmetric encryption" or simply "public key encryption").
A "public key" is a string of letters and numbers that can be used to encrypt a message so
that only the owner of the public key can read it. This is possible because every public
key has a corresponding private key that is kept secret by the owner of the public key.
How exactly are the public and private key related? That depends on the algorithm
(mathematical method) used. SSL allows several algorithms, of which the most famous is
the RSA algorithm invented by Ron Rivest, Adi Shamir and Len Adleman of MIT in
1977.
Several algorithms, including RSA, depend on properties of very large prime numbers.
For instance, it is very difficult to difficult to factor a number that is a product of two
large primes, unless you already know one of the primes.
Public and private keys can also be used in the opposite way: a message encrypted with
the private key can only be decrypted (read) with the public key. This comes in handy at
the beginning of the conversation, as a way of verifying the other program's identity.
The SSL Handshake: Identity and Privacy
Let's suppose Jane wants to log into www.examplebank.com. When Jane's web browser
makes an HTTPS connection to www.examplebank.com, her browser sends the bank's
server a string of randomly generated data, which we'll call the "greeting."
The web server responds with two things: its own public key encoded in an SSL
certificate, which we'll examine more closely later, and the "greeting" encrypted with its
private key.
Jane's web browser then decrypts the greeting with the bank's public key. If the decrypted
greeting matches the original greeting sent by the browser, then Jane's browser can be
sure it is really talking to the owner of the private key - because only the holder of the
private key can encrypt a message in such a way that the corresponding public key will
decrypt it.
Now, let's suppose Bob is monitoring this traffic on the Internet. He has the bank's public
key, and Jane's greeting. But he doesn't have the bank's private key. So he can't encrypt
the greeting and send it back. That means Jane can't be fooled by Bob.
The Identity Problem
But what if Bob inserts himself into the picture even before Jane's browser connects to
the bank? What if Jane's browser is actually talking to Bob's server from the very
beginning? Then Bob can substitute his own public and private keys, encrypt the greeting
successfully, and convince Jane's browser that his computer is the bank's. Not good!
That's why the complete SSL handshake includes more than just the bank's public key.
The public key is part of an SSL certificate issued by a certificate authority that Jane's
browser already trusts.
How does this work? When web browser software is installed on a computer, it already
contains the public keys of several certificate authorities, such as GoDaddy, VeriSign and
Thawte. Companies that want their secure sites to be "trusted" by web browsers must
purchase an SSL certificate from one of these authorities.
But what is the certificate, exactly? The SSL certificate consists essentially of the bank's
public key and a statement identifying the bank, encrypted with the certificate authority's
private key.
When the bank's web server sends its certificate to Jane's browser, Jane's browser
decrypts it with the public key of the certificate authority. If the certificate is fake, the
decryption results in garbage. If the certificate is valid, out pops the bank's public key,
along with the identifying statement. And if that statement doesn't include, among other
information, the same hostname that Jane connected to, Jane receives an appropriate
warning message and decides not to continue the connection.
Now, let's return to Bob. Can he substitute himself convincingly for the bank? No, he
can't, because he doesn't have the certificate authority's private key. That means he can't
sign a certificate claiming that he is the bank.
Now that Jane's browser is thoroughly convinced that the bank is what it appears to be,
the conversation can continue.
After the Handshake: Symmetric Key Encryption
Jane's browser and the bank could continue to communicate with public key encryption.
But public key encryption is very processor-intensive - it makes both computers work
hard. And that slows down both systems. Jane's browser might not matter, since Jane's
computer is probably only talking to one site at a time. But the bank's server is
communicating with hundreds of customers and can't afford to do the math!
Fortunately, now that Jane's browser trusts the bank's server, there's an easier way. Jane's
browser simply tells the bank's server that the rest of the conversation should be carried
out using a "symmetrical" cipher - a method of encryption that is simpler than
public/private key, or "asymmetrical," encryption. "Symmetric" ciphers use a single key
that is shared by both sides. Jane's browser picks a cipher (an "algorithm," or
mathematical method, of encryption, such as the AES Advanced Encryption Standard)
and randomly generates the key to be used. Finally, Jane's browser tells the bank's server
what the cipher and key will be, encrypting this information with the bank's public key,
and the conversation continues using symmetric encryption.
But what if Bob is still listening? Bob might receive the symmetric key from Jane, but
that information is itself encrypted with the bank's public key... and can only be decrypted
with the bank's private key. Which Bob doesn't have. So
Jane and the bank now share a symmetric key, also known as a "master secret," that no
one else can know. And this allows them to continue communicating secretly.
Additional Reading
Here I've discussed what a typical SSL conversation looks like and addressed the
essential features of public key cryptography. I've tried to cover the important features
while keeping things understandable. But for simplicity's sake, I've glossed over quite a
bit.
If you're interested in understanding the mathematical details and the many encryption
algorithms that can be employed, you can find a more technical discussion on Wikipedia.
What is TLS?

2006-09-11: TLS, or Transport Layer Security, is another name for the SSL encryption
protocol, version 3 or later. See what is SSL? for more information.
What is HTTP?

2003-09-23: In order to fetch a web page for you, your web browser must "talk" to a web
server somewhere else. When web browsers talk to web servers, they speak a language
known as HTTP, which stands for HyperText Transfer Protocol. This language is actually
very simple and understandable and is not difficult for the human eye to follow.
A Simple HTTP Example
The browser says:
GET / HTTP/1.0
Host: www.boutell.com

And the server replies:


HTTP/1.0 200 OK
Content-Type: text/html

<head>
<title>Welcome to Boutell.Com, Inc.!</title>
</head>
<body>
The rest of Boutell.Com's home page appears here
</body>
The first line of the browser's request, GET / HTTP/1.0, indicates that the browser wants
to see the home page of the site, and that the browser is using version 1.0 of the HTTP
protocol. The second line, Host: www.boutell.com, indicates the website that the
browser is asking for. This is required because many websites may share the same IP
address on the Internet and be hosted by a single computer. The Host: line was added a
few years after the original release of HTTP 1.0 in order to accommodate this.
The first line of the server's reply, HTTP/1.0 200 OK, indicates that the server is also
speaking version 1.0 of the HTTP protocol, and that the request was successful. If the
page the browser asked for did not exist, the response would read HTTP/1.0 404 Not
Found. The second line of the server's reply, Content-Type: text/html, tells the
browser that the object it is about to receive is a web page. This is how the browser
knows what to do with the response from the server. If this line were Content-Type:
image/png, the browser would know to expect a PNG image file rather than a web page,
and would display it accordingly.
A modern web browser would say a bit more using the HTTP 1.1 protocol, and a modern
web server would respond with a bit more information, but the differences are not
dramatic and the above transaction is still perfectly valid; if a browser made a request
exactly like the one above today, it would still be accepted by any web server, and the
response above would still be accepted by any browser. This simplicity is typical of most
of the protocols that grew up around the Internet.
Human Beings Can Speak HTTP
In fact, you can try being a web browser yourself, if you are a patient typist. If you are
using Windows, click the Start menu, select "Run," and type "telnet
www.mywebsitename.com 80" in the dialog that appears. Then click OK. Users of other
operating systems can do the same thing; just start your own telnet program and
connect to your website as the host and 80 as the port number. When the connection is
made, type:
GET / HTTP/1.0
Host: www.mywebsitename.com

Make sure you press ENTER twice after the Host: line to end your HTTP headers.
Your telnet program probably will not show you what you are typing, but after you press
ENTER the second time, you should receive your website's home page in HTML after a
short pause. Congratulations, you have carried out your very own simple HTTP
transaction.
HTTP 1.1 Differences
Originally, web browsers made a separate HTTP request like this for each and every
page, and for each and every image or other component of the page. While this is still
often the case, most web servers and browsers now support HTTP 1.1 and can negotiate
to keep the connection open and transfer all of the page components without hanging up
and opening new connections. For the complete HTTP 1.1 specification, see the W3C
Consortium's HTTP-related pages.
HTTP itself is "layered" on top of another protocol, TCP. For more information, see the
article what is TCP/IP?
What is a firewall?

2003-06-30: a firewall sits between your computer and the rest of the Internet, filtering
out unwanted traffic and foiling attempts to interfere with or take over your computer.
Firewalls can be separate devices, which is very common today, or simply pieces of
software for your own computer, which is also fairly common. Separate firewall devices
are often preferable because their very simplicity makes them less likely to have
unknown security problems; however, it is still important to keep up with "firmware
updates" released by the manufacturer, otherwise your firewall may be vulnerable to
attack. "Cable/DSL routers" and similar devices sold by companies like Linksys provide
simple firewall capabilities which are adequate for most home users.
What are browser plug-ins?
2004-03-25: Web browser plug-ins (sometimes just spelled plugins) are additional pieces
of software that add extra capabilities to your web browser, such as the ability to view
movies, run Java applets, or see Flash animations. Unfortunately, since plug-ins run with
all the privileges of real applications, they can do absolutely anything to your computer.
That means you should never, ever agree to install a plug-in unless you have very
good reason to trust the source. Keep in mind that the Flash plug-in comes with your
computer, and most systems also come with a Java plug-in. Other mainstream plug-ins
include RealPlayer. You will almost never have a good reason to install a plug-in that isn't
one of these, so say "no" when your browser asks you to install one, unless you have an
excellent reason to do otherwise. See what is ActiveX? for more information on this
subject.
What are spyware and adware?

2004-08-17: Programs that cause your computer to display ads even when you are not
using the program in question for its intended purpose, as well as those that report
information about your web browsing activities to an advertising firm, are commonly
known as "spyware." Typical examples are programs like "WeatherBug" and
"MemoryMeter." These claim to serve a useful purpose and, in some cases, actually
provide some service, but their main goal is to present annoying and unwanted
advertising throughout your web browsing experience. They are very difficult to remove
manually. Fortunately, there are excellent free tools available to do the job correctly. For
more information, see why is my web browser broken?
Adware programs, strictly speaking, are well-behaved applications that happen to display
some advertising in that program, while you are using that program. Usually this is
offered as an alternative to paying for the software. This is a perfectly legitimate practice,
but with the exception of a few well-known programs like the Opera web browser, true
adware has become quite rare, crowded out by aggressive spyware.
What is ActiveX?

2004-08-17: ActiveX is Microsoft's technology for signing plug-ins that add additional
software to your computer when a web page is accessed. If all goes well, you will be
asked whether you want to trust a plug-in from that particular company and you will have
the option of saying no. In principle, this is a useful way to allow the installation of
worthwhile add-ons, such as Adobe Acrobat Reader, Macromedia Flash Player and
RealPlayer. However, if you do not run Windows Update regularly, all will not go well --
there have been security flaws in Internet Explorer in the past that have resulted in
software being able to install itself without permission.
If you do not have a specific, clear reason to want and trust the software you are
being asked to install -- that is, if it is not the Macromedia Flash Player or the Adobe
Acrobat PDF Reader or something similarly crucial that you really need -- SAY NO!
Many nasty pieces of awful spyware are properly signed and will ask permission to
install, knowing that some people will naively give it. You do NOT, for instance, want to
say yes to installing things like "WeatherBug" or "MemoryMeter," among many others.
For more information about removing such programs you may have installed by mistake,
see why is my web browser broken?
What is DNS?

2003-09-04: every time you follow a link or type in the name of a website, such as
www.boutell.com, that name must be translated into an IP address on the Internet. This
translation is done by the domain name system. A DNS server is a program that
participates in the task of providing this service. Some DNS servers respond to queries
from web browsers and other programs, make further inquiries, and return IP addresses,
such as 208.27.35.236, which is the current IP address of www.boutell.com. Other
DNS servers have primary responsibility for answering DNS inquiries about names
within a particular domain, such as the boutell.com domain. Every time a new domain
is registered, a DNS server must be configured to give out address information for that
domain, so that users can actually find websites in that domain. In most cases, web
hosting companies provide this service for the domains that they host; it is rare for
webmasters to run their own DNS servers. For more information, see setting up websites.
How DNS Usually Works
Let's say you want to visit www.google.com. Your computer hasn't already looked up
www.google.com since it was turned on. Or it has kept that information for long enough
that it considers it appropriate to check again. So your computer asks the DNS server of
your ISP (Internet Service Provider - the people who sell you an Internet connection,
companies such as Comcast and Earthlink).
The DNS server of your ISP first talks to one of thirteen "root" DNS servers. The root
DNS servers answer questions at the highest level possible: the top-level domain. For
instance, "who is in charge of DNS for the com domain?"
In practice, your ISP's DNS server caches (remembers) this information for a significant
period of time, and does not contantly harrass the root servers just in case responsibility
for com has changed in the last five seconds. Similarly, your ISP's DNS server remembers
other informaton for appropriate lengths of time as well to avoid extra queries. But let's
assume, just for fun, that no one has ever asked your ISP for the IP address of
www.google.com before! Now your ISP's DNS server knows which DNS servers are
responsible for the com top-level domain. So your ISP's DNS server reaches out and
contacts one of those servers and asks the next question: who is responsible for DNS in
the google.com domain?

The response will list two or more DNS servers that have authority over the google.com
domain.
Finally, your ISP's DNS server contacts one of those DNS servers and asks for the
address of www.google.com, and hands the response back to your computer.

As mentioned above, in real life your ISP's DNS server will remember all of this
information. That means that a typical user will get an immediate response when asking
for the address of a frequently-visited site like Google.
But how long is it safe to remember that information? After all, the IP addresses of
servers do change, though usually not often. Fortunately, your ISP's DNS server doesn't
have to guess! The DNS records that come back from the "upstream" DNS servers
include an "expire" field that indicates how long the information can be kept before the
authoritative server should be asked again.
What is an IP address?

2004-12-22: an IP address (Internet Protocol address) is a unique identifier that


distinguishes one device from any other on a TCP/IP-based computer network, such as
the Internet. The IP address provides enough information to route data to that specific
computer from any other computer on the network. In the case of the Internet, this
enables you to communicate with web servers, instant messaging servers and other
computers all over the world.
IP addresses are usually not entered directly by end users. Instead, DNS servers are used
to map permanent and user-friendly names like boutell.com to unfriendly and
impermanent IP addresses, such as 64.246.52.10.

An IP address is made up of four numbers, each between 0 and 255. For instance, as of
this writing, the IP address of boutell.com is:
64.246.52.10

The most general information is conveyed by the first number, and the specific
identification of a single computer within a single network is usually made by the last
number. In general, delegation of responsibility for various portions of the IP address
space is carried out by the Asia Pacific Network Information Centre (APNIC), the
American Registry for Internet Numbers (ARIN), the Latin-American And Caribbean
Internet Addresses Registry (LACNIC), and the RIPE Network Coordination Centre
(RIPE NCC).
The above description applies to IPv4, the most commonly used version of the IP
protocol that underlies the Internet and similar networks. A newer system, IPV6,
addresses the fact that the number of IPv4 addresses is limited to approximately four
billion (256 to the fourth power), with the practical maximum considerably lower than
that due to the ways in which addresses are assigned. When much of the Earth's
population begins to use the Internet from a variety of devices, this limitation becomes a
serious problem. IPv6 addresses have a vastly greater range, inexhaustible for all
practical purposes.
You will not always have the same IP address, unless you have specifically arranged for a
fixed IP address, typically from a cable modem, DSL or other high-speed provider.
Therefore, your IP address usually does not uniquely identify you as an individual. When
you dial into your Internet service provider with your modem, an IP address is
temporarily assigned to your computer for the duration of the call. Even web servers such
as boutell.com will typically change their IP address when they move from one hosting
facility to another; DNS servers make this transparent for the end user by automatically
translating domain names to IP addresses. With the exception of the "root" DNS servers,
which are used to resolve the IP addresses of all other DNS servers, all IP addresses are
subject to potential change.
Those who use the Internet at work, or who have a connection-sharing router at home, do
not truly have an Internet IP address for their individual computer. Instead, the
connection-sharing router holds the Internet IP address, carries out the requests made by
the various personal computers "behind" the router, and appears to the rest of the Internet
to be a single, very busy computer. The personal computers "behind" the router have IP
addresses on an intranet. Such IP addresses typically resemble 192.168.2.2 or
10.1.1.7, because the prefixes 192.168. and 10. are universally reserved for such
private networks and are guaranteed never to be assigned to systems on the Internet.
This lack of a true Internet IP address for each personal computer can be a very good
thing, because it prevents incoming connections to individual PCs, providing some
protection from certain types of attacks. Unfortunately, there are many other ways for
computers to become infected by viruses, spyware and similar software. For more
information, see can my computer catch a virus from a web page? and why is my web
browser broken?
Such setups can also have a downside: if you wish to run a server on one of the
computers behind the connection-sharing router, you must explicitly configure your
router to forward connections on certain ports to that particular computer.
What is a static IP address?

2006-11-26: A static IP address is a TCP/IP protocol address that does not change.
If your ISP provides static IP address service, you can expect your IP address to stay the
same even if you disconnect from the Internet and reconnect to it later.
Contrary to popular belief, you do not need a static IP address in order to run a server on
your own computer. Dynamic DNS services provide a way to give your computer an
Internet hostname that does not change, such as example.is-a-geek.com. These services
automatically update the IP address that corresponds to your hostname every time your IP
address changes. For more information about dynamic DNS services, see my article how
do I get a hostname for my own computer?
Static IP addresses are required only for those who intend to run their own DNS servers.
In this case, at least two static IP addresses are required. You will still probably want to
pay for another company with a more reliable connection to run your DNS, as this is very
inexpensive. See the article how do I host a real domain name at home? for more
information.
See also what is TCP/IP, what is a protocol, what is a static IP address, what is an IP
address, should I host my own web server at home, how do I host my own web server at
What is Flash?

2006-04-19: similar to Java, Macromedia Flash is another technology that allows


animations, interactive forms, games and other jazzed-up features to be embedded in web
pages. Macromedia Flash Player is a well-known and trustworthy plug-in that users
should feel comfortable installing. In fact, Flash is the most commonly installed plug-in
on the web, more common than QuickTime, RealPlayer or Java. The Flash plug-in can be
found on Macromedia's website.
What is JavaScript?

2006-04-19: JavaScript is a simple programming language used to make web pages more
interactive. Once known as LiveScript, JavaScript's name was changed as part of a
marketing deal between Netscape and Sun. People talk about Java and JavaScript as if
they were interchangeable, but they are completely different things. You do not need a
Java runtime environment in order to use a JavaScript-enabled web page. See What is
Java?
JavaScript code was invented to validate form fields before a form is submitted, saving
the user the trouble of waiting to hear back from the web server if the problem is a simple
one, like a missing digit in a 10-digit phone number.
Today JavaScript can also be used together with the Document Object Model (DOM) to
create powerful web applications like Google's GMail service. GMail lets the user browse
through their email without constantly waiting for the web server every time they click.
JavaScript is now able to do many jobs that formerly required Java or Flash. Since
JavaScript doesn't require plugins, I recommend JavaScript instead of Flash or Java
wherever possible, with a fallback plain-HTML option for users with older browsers or
handheld devices.
JavaScript code is inserted into a page using the <script> element, like this:

<script>
alert("This displays a message box to the user");
</script>
JavaScript code also appears in "event handlers," special attributes of other HTML
elements that trigger JavaScript code. One example of an event: when the mouse moves
over an image, any JavaScript code provided in an onMouseOver attribute is executed.
To learn more, see the w3schools JavaScript tutorial.
What is Java?

2007-08-08: Java is a technology that allows software designed and written just once for
an idealized "virtual machine" to run on a variety of real computers, including Windows
PCs, Macintoshes, and Unix computers. On the web, Java is quite popular on web
servers, used "under the hood" by many of the largest interactive websites. Here it serves
the same role that PHP, ASP or Perl might, although traditionally Java has been used for
larger-scale projects.
"I just need to know: should I update Java? Can I get rid of Java?"
You need to do one of the two: keep it updated or get rid of it. Either is fine for most
users. Yes, it is safe to update Java. See my article should I update Java? for complete
answers to these questions.
Java can also be used to create small programs, known as "applets," to be embedded in
web pages. For instance, a web page using Java could contain an interactive weather
map, a live display of subway trains, or a video game, without the need for the web server
to do all of the work. Unlike normal software such as .EXE files, these "applets" can not
access or delete your personal files unless they ask for and are given express permission
to do so. In the real world, users hardly ever give permission for this, so applets generally
don't ask.
As of this writing, Java is usually (though not always) included as standard equipment on
Windows PCs. If you choose to use Java applets on your site, you can invite your users to
download the Java plug-in from Sun's website, using the "Get It Now" button on that site.
If you're running Java on your server, browser users don't need to have Java runtime
environments, just as users don't need to have PHP or ASP on their home computers to
access websites that use them. At the end of the day, what is delivered to the web browser
is plain old HTML! You only need to worry about the user's Java runtime environment if
you choose to use Java Applets in your pages. This is the only time when Java code must
run on the user's computer. Java is not the only way to embed applet-like capabilities in a
web page these days. It's not the most popular or widely supported, either. As an
alternative to applets, see the JavaScript and Flash entries.
JavaScript and Java are completely different things. JavaScript used to be called
LiveScript. The similar names were a marketing decision made by Sun and Netscape
many years ago. You do not need a Java runtime environment simply to include
JavaScript in a web page.
Java Examples
Java can be used on PCs for both applets (interactive features inside web pages) and
stand-alone applications (non-web programs like Notepad or Excel— these are not
written in Java, they are just examples of applications).
Java applets have been almost completely replaced by Flash, but there are occasional jobs
for which Flash is ill-suited. An example is my own Fracster mandelbrot set explorer,
which lets the user explore an interesting mathematical function in a graphical way.
While not impossible in Flash, this sort of mathematically intense, pixel-by-pixel display
is better done in Java.
There are also many older applet games, such as Atari's official Asteroids applet, that
simply haven't been rewritten in Flash.
The Azureus file-sharing application is a good example of a popular stand-alone
application written in Java.
What is a secure site?

2006-09-11: a site that uses the HTTPS protocol to ensure that your information cannot
be stolen by a third party between the sender and the receiver. For a detailed discussion of
how HTTPS works, see what is HTTPS? and what is an SSL certificate? See also is it
safe to shop online?
What is "caching?"

2004-04-29: Caching refers to the strategy of keeping a copy of a page or image you have
already seen; web browsers typically cache files that they display for you, and simply ask
the server if the page has actually changed rather than always downloading the entire
thing. This speeds up your next visit to the page.
Since caching everything forever would take up too much space, web browsers typically
delete the least recently used file in the cache when a certain total cache size is reached.
Caching also occurs in other places. You may be using a proxy server, in which case the
proxy server is probably caching pages on behalf of you and other users to save trips to
the real Internet.
Users typically become aware of caching when things don't work as expected. For
instance, you might make a change to your own web page, open up your web page in
your web browser, and not see the change until you click the "reload" button, telling your
browser to discard the cached copy of that page.
Of course, some things, such as credit card transactions, should not be cached.
Fortunately, the HTTP protocol that web browsers and servers use to communicate
includes ways for the web server to specify how long a page may be safely cached, if at
all. But sometimes browsers do not perfectly obey such directives. The problem that is
made worse by the tendency of websites built in PHP, ASP or other dynamic web
programming languages to tell the web browser not to cache anything. This problem is
not inherent to those languages, but it is a common result of poorly-thought-out site
design.
Caching can potentially be a privacy issue for those who share their computers; cached
copies of pages on your hard drive can reveal information about your browsing habits.
What is a proxy server?

2004-04-29: proxy servers are specialized web servers that allow web browsers to receive
web pages from web servers without communicating with them directly. Proxy servers
are often used to provide more secure web access in organizations; the proxy server is
allowed to connect to the Internet, but the individual web browsers are only allowed to
"talk" to the proxy server. When there are many users sharing a single proxy server, the
proxy server can also speed up web browsing by caching popular pages.
The HTTP protocol used by web browsers and web servers contains provisions for proxy
servers. In addition, most major Internet Service Providers (ISPs) now run "transparent"
proxy servers without your browser being directly aware of it. This is done by
intercepting Internet packets that are recognized to be part of the HTTP protocol and
silently redirecting them to the proxy server rather than sending them directly to the
intended web server. When an ISP such as America Online has many customers, this can
result in a significant speed increase, because pages can be cached "closer" to the users. It
also provides an opportunity to work around slow modem speeds; the proxy server can
convert large image files to a more compact format, at a considerable cost in quality, and
send those lower-quality images on much more quickly to web browsers that request the
original images.
What are the top-level domains?

2004-05-15: "top-level domains" (TLDs) are the last part of every domain name. In other
words, the top-level domains are .com, .org, .edu, .uk, .net, and so on.
There are two types of top-level domains: two-letter country domains, such as .uk
(United Kingdom), and three-letter domains, such as .com, .org, and .net. National
domain names follow the ISO 3166 standard two-letter codes for each country. The
International Standards Organization adds new two-letter codes to the ISO 3166 list when
the United Nations publishes an updated bulletin of country and region codes. You can
learn more about this on the website of the International Standards Organization.
Once a two-letter code has been assigned, the Internet Assigned Numbers Authority then
identifies the responsible authority within that country that should be permitted to register
subdomains within that country's domain. Some national domains, such as .tv (Tuvalu, a
small island in the Pacific), have become available for commercial registration.
"Generic" domains, such as .com, .org, .edu and .net, are created and overseen by the
ICANN (Internet Corporation for Assigned Names and Numbers). The original generic
domains were .com, .edu, .gov, .int, .mil, .net, and .org. Additional top-level domains
added in recent years are .biz, .info, .name, .pro, .aero, .coop, and .museum.
Any entity can register a domain in .com, .net, .biz, .info, and .org, although .org is
typically used by nonprofit organizations and .net is typically used by Internet Service
Providers. .com is what most people assume when they can't remember the name of your
site, so it is the preferred choice for businesses of all kinds. The .edu domain is reserved
for accredited four-year academic institutions, and registration is handled solely by
educause. .aero is reserved for the international aviation community, .coop is reserved for
cooperative businesses, .museum is reserved for mseums, .name is reserved for
individuals, and .pro is reserved for "licensed professionals," such as lawyers, doctors and
accountants.
For more information about each of the generic domains and an extensive list of
registrars available, see the ICANN accredited registrars page.
What is a search engine?

2004-08-02: since no one is in charge of the Web as a whole, there is a business


opportunity for anyone to create an index of its contents and an interface for searching
that index. Such interfaces are known as search engines. Typically the user will type in a
few words that relate to what he or she is looking for and click a search button, at which
point the search engine will present a links to web pages which are, hopefully, relevant to
that search.
While some early indexes of the web were created by hand, modern search engines rely
on automated exploring, or "spidering," of the web by specialized programs that behave
somewhat like web browsers but do not require a human operator.
As of this writing, Google remains the most popular search engine by a large margin.
MSN Search is also a significant player.
What is World Wide Web accessibility?

2004-08-26: a site that is easily used by individuals with disabilities, especially blindness
and vision impairment, is known as an "accessible" site. There are at least three good
reasons to design your pages to accommodate such users:
• Because it's the right thing to do.

• Because you will reach more customers.

• Because search engines experience the web much as blind users do: through text.
If it's not there in easily accessible text, it's not helping anyone search for your
site.

• Because users of new web browsing devices, such as wireless handhelds and cell
phones, experience the web the way blind users do. (They may be able to see
some images, but only with difficulty and frustration.)
For tips on how to achieve accessibility, see how do I make my website more accessible?
What are HTML templates?

2004-09-02: HTML templates are web pages, or portions of web pages, which can be
reused many times to create numerous pages with similar content. See how can I create a
template for all of my pages?
What is a blog?

2004-09-14: a web page that presents short journal entries in chronological order, newest
first, is typically referred to as a "blog" or "weblog." Most blogs emphasize links to other
pages and sites, and most entries are short commentaries or even simple one-sentence
links to an interesting page somewhere else. Many blogs are concerned with current news
events and often provide unabashedly partisan commentary. "Blogging," of course, is the
act of writing a blog; those who write blogs are sometimes referred to as "bloggers."
The term "weblog" was apparently coined by John Barger in 1999. The term was
contracted to "blog" later that same year by Peter Merholz.
For more information about blogging, see "how
What is XML?

2004-09-14: XML, the Extensible Markup Language, is a general-purpose markup


language for all applications that manipulate text. XML is derived from an older standard
known as SGML. XHTML (which supersedes HTML) is one example of a specific
markup language which is complies with the rules of XML. RSS is another such
example. While XHTML is the best-known example, XML can be used to represent
almost any kind of information. The existence of a standard markup language makes it
possible for anyone to write software that can successfully extract specific information
from any valid XML document. See the W3C Consortium website for further information
and the complete XML specification.
What is RSS?
2004-09-14: RSS is an HTML-like, XML-compliant format for blogs. RSS is usually said
to stand for "Really Simple Syndication." RSS was originally invented by Netscape
Communications Corporation as a format for "channels," a feature of the Netscape 3.0
web browser. While channels did not take the world by storm, the format became the
basis for a good idea: by publishing an RSS "feed" and giving the world permission to
reproduce it, anyone can contribute to a virtual "newswire" service. RSS aggregators can
then bring the latest stories from many blogs together in chronological order.
While blogging appears to take control of formatting and presentation away from the
author, the reality is that blog entries (or "items") are typically short summaries or
"teasers" associated with a link to the author's website or another site relevant to the story
in question. In this way, RSS feeds help to bring new readers to many websites.
Despite the fundamental simplicity of the idea, the RSS "industry" is crowded with
competing standards and conflicting histories. Important RSS "standards" in use today
include:
1. RSS 0.91. The original Netscape channels specification. Generated by blosxom and
other tools. Very simple and direct. Entries can contain HTML elements for formatting
and additional links.
1. RSS 1.0. Standardized by the RSS-DEV working group. A very complete standard,
including namespaces, extension mechanisms, and various things perhaps lacking in RSS
0.91. Despite the name, this is NOT related to RSS 2.0, and indeed it is not a superset of
RSS 0.91.
1. RSS 2.0. Published by the Berkman Center at Harvard Law. A much simpler standard
completely unrelated to RSS 1.0, RSS 2.0 attempts to maintain the spirit of RSS 0.91
while filling in gaps.
Perhaps at some future date a single RSS standard will emerge as the preferred format. In
the meantime, however, the major syndication services accept well-formed and not-so-
well-formed blogs in all of the above formats, and more. You may choose any of the
above, with good results. I presently use both RSS 0.91, for Innards, and RSS 1.0, for the
RSS feed of the WWW FAQ.
For a particularly thorough effort to make sense of the history of RSS, see Ronan Waide's
RSS presentation notes. What is an RSS aggregator?

2004-09-14: an RSS aggregator, or reader, is a program which fetches many RSS feeds
and intermixes stories, typically in chronological order. Aggregators can be implemented
as part of a website, like the O'Reilly Developer Weblogs page or the syndicated accounts
feature of LiveJournal, and this is probably the approach that brings RSS content to the
largest audience. There are also excellent RSS reader applications for individuals;
hebig.org offers an impressive, if slightly dated, list of these.
What is Gopher?
2004-10-12: Gopher is an older distributed information retrieval system, similar to but
much simpler than the World Wide Web as we know it. Gopher did not offer a way to
create free-form hypertext documents similar to HTML, and its growth was also stunted
by attempts to limit the technology to paying customers only. Gopher did offer a very
structured and useful approach to retrieving information and searching across many
Gopher sites.
Technically, the World Wide Web includes Gopher. Part of Tim Berners-Lee and Robert
Cailliau's vision for the Web was to incorporate existing technologies for sharing
information via the Internet by allowing links to Gopher sites, via gopher:// URLs.

Web browsers supported the Gopher protocol for several years. However, support for
Gopher in Microsoft Internet Explorer ended in 2002 and support in other browsers is
moribund. Very few Gopher servers survive today. For more information, see the
Wikipedia entry on Gopher.
What are MIME types?

2004-10-19: similar to file extensions but more universally accepted, "MIME types" are
used to identify the type of information that a file contains. While the file extension
.html is informally understood to mean that the file is an HTML page, there is no
requirement that it mean this, and many HTML pages have different file extensions.
In the HTTP protocol used by web browsers to talk to web servers, the "file extension" of
the URL is not used to determine the type of information that the server will return.
Indeed, there may be no file extension at all at the end of the URL.
Instead, the web server specifies the correct MIME type using a Content-type: header
when it responds to the web browser's HTTP request.
Here are some examples of common mime types seen on the web:
Common File
Type Purpose
Extension
text/html .html Web Page
image/png .png PNG-format image
image/jpeg .jpeg JPEG-format image
audio/mpeg .mp3 MPEG Audio File
application/octet- Best for downloads that should just be saved to
.exe
stream disk
The Internet Assigned Numbers Authority website offers a complete listing of the official
IANA-registered MIME types.
MIME stands for "Multimedia Internet Mail Extensions." MIME was originally invented
to solve a similar problem for email attachments.

What is phishing?

2004-12-09: "phishing" is the act of sending out email messages that are more or less
exact copies of legitimate HTML emails that well-known companies such as Amazon
send out. Exactly the same in every way... except that the actual site to be reached by
clicking on the link in the email is the site of the criminals doing the "phishing." That site
then makes every effort to look an awful lot like Amazon, and the uninformed fish will
bite, typing in their Amazon username and password, credit card number or other
requested information when asked to do so.
The best way to avoid phishing: don't click on links in email messages! Go to the site in
question yourself, by using one of your favorites or bookmarks or by typing in the site
name in the location bar at the top of your browser window. Also be sure to heed any
warnings about specific phishing scams in progress that may be mentioned on the home
pages of your bank, Amazon, eBay, and other frequent phishing targets.
What are bookmarks and favorites?

2005-07-25: "Bookmarks" and "favorites" are two words for the same thing: web
addresses that users have asked the web browser to remember for convenient access in
the future. Microsoft Internet Explorer uses the term "favorite." Most other products,
including the very first web browsers and the modern Firefox browser, use the term
"bookmark." In Microsoft Internet Explorer, you can add a favorite by visiting the page
you want the browser to remember, pulling down the "Favorites" menu and selecting
"Add to Favorites." In Firefox, you can do the same thing by pulling down the
"Bookmarks" menu and selecting "Bookmark This Page."
What is the difference between the World Wide Web and the Internet?

2005-07-28: All of the websites in the world, taken together, make up the World Wide
Web. The Internet is the worldwide network of interconnected computers, including both
web servers and computers like the one on your desk that run web browser software. The
Internet also carries other kinds of network traffic unrelated to the web.
Let's put it even more simply:
The Internet is the actual network. The World Wide Web is something you can do with it.
You can do other things with it, too. Playing Quake or sending email both use the Internet
but are not the World Wide Web.
What is AJAX?
2005-11-21: AJAX, or Asynchronous JavaScript and XML, is a new way for web pages
to send and receive data to and from a web site without forcing the user to wait for a new
page to load.
True AJAX applications, like Google's gmail or my live support example, have certain
things in common:

• They can use a standard web browser, such as Firefox, Internet Explorer or Safari,
as their only user interface.

• They don't force the user to wait for the web server every time the user clicks a
button. This is what "asynchronous" means. For instance, gmail fetches new email
messages in the background ("asynchronously") without forcing the user to wait.
This makes an AJAX application respond much more like a "real" application on
the user's computer, such as Microsoft Outlook.

• They use standard JavaScript features (including the unofficial


XMLHTTPRequest standard, pioneered by Microsoft and adopted by Firefox and
other browsers) to fetch data in the background and display different email
messages or other data "on the fly" when the user clicks on appropriate parts of
the interface.

• Usually they manipulate data in XML format. This allows AJAX applications to
interact easily with server-side code written in a variety of languages, such as
PHP, Perl/CGI, Python and ASP.NET. Using XML isn't absolutely necessary, and
in fact many "AJAX" applications don't -- they use the XMLHTTPRequest object
to send and receive data "on the fly," but they don't actually bother packaging that
data as XML.
For a complete introduction to AJAX programming, see How do I fetch data from the
server without loading a new page? Most AJAX applications use XML only for data
coming from the server to the browser, and sometimes not even then. XMLHTTPRequest
can accommodate other data formats just as easily, and parsing is often faster when XML
is not used. For communications from the browser to the server, submitting the data like a
conventional POST form submission is the most common method. Built-in features of
server-side languages like ASP and PHP can easily understand data that arrives that way
and respond with XML, or another format that the JavaScript client has been designed to
understand.
What does .com stand for?

2006-02-12: The .com TLD (top-level domain name) stands for COMmercial. Domains
in .com are typically registered by for-profit companies, but anyone may register a
domain in .com.

See what are the top level domains? and what is a domain name?
What is the difference between a web browser and a web server?

2006-06-01: A web browser is what you're probably looking at right now: a program on
your computer that shows you stuff that's on the web. A web server is a program on a
server computer, somewhere out on the Internet, that delivers web pages to web
browsers.
The term web server also refers to an actual, physical computer that is running web server
software.
For more information, see what is a web browser? and what is a web server?
What is a protocol?

2006-08-07: On the Internet, the word "protocol" refers to a set of rules for
communicating. Two programs or computers that follow the same rules are able to
exchange information, even if they don't run the same operating system and are not made
by the same company.
Sometimes protocols are "layered" on top of other protocols, taking advantage of what's
already there and adding additional capabilities.
Examples of Internet protocols include the HTTP protocol spoken by web browsers and
web servers, the FTP protocol for transferring files, and the TCP/IP protocols on which
both of these are based.
What does IP stand for?

2007-01-02: "IP" stands for Internet Protocol. This is why we refer to a computer's
numeric address on the Internet as an "IP address."
For a more complete explanation, see my articles what is TCP/IP? and what is an IP
address?
"IP" also stands for "Intellectual Property." A book, a web page, an image, a movie, an
idea - anything that might conceivably be covered by copyright or patent law.
What is TCP/IP?

2006-08-07: TCP/IP (Transmission Control Protocol / Internet Protocol) is the protocol -


the set of rules for communicating - that underlies all communications on the Internet.
The HTTP protocol spoken by web browsers and web servers is layered on top of TCP/IP.
There are several sub-protocols within TCP/IP:
1. Internet Protocol (IP), which covers fundamentals like IP addresses and routing of
packets of data from one place to another, but doesn't address issues like reliability and
delivery in the correct order.
2. Transmission Control Protocol (TCP), which adds the idea of a reliable connection that
always delivers a stream of data in the correct order. Telephone modems, Ethernet
networks and other physical connections used on the Internet might not be 100% reliable,
and some types of connections don't guarantee that the second packet won't arrive before
the first one. TCP provides rules for checking the order of the data and for resending
anything that is not received. This is the protocol that HTTP, FTP most other Internet
protocols you are familiar with are built on top of.
3. User Datagram Protocol (UDP) is a simple wrapper around the basic features of
Internet Protocl (IP). UDP is useful when you don't care about reliability or in-order
delivery, and you can't afford the extra time that TCP takes to ensure them. When you
browse the World Wide Web, you are using the DNS protocol to look up the names of
websites. DNS is layered on top of UDP. Online gaming is another popular application of
UDP.
What is an SSL certificate?

2006-09-11: An SSL certificate is a means by which web servers prove their identity to
web browsers, allowing a secure site to communicate privately with the web browser via
the HTTPS protocol.
An SSL certificate is digitally "signed" by a certificate authority, such as GoDaddy or
Thawte, that web browsers already trust. This allows the web browser to verify the
identity of a secure site before sending private personal information, such as bank
account or credit card numbers. Webmasters can purchase certificates from the certificate
authorities, which verify the webmaster's identity to varying degrees.
For a detailed explanation of certificates and how they actually work, see what is SSL?
For help deciding which SSL certificate to buy, see my articles which SSL certificate
should I buy? and can I set up a secure site without buying an SSL certificate?
What does 404 Not Found mean?

2006-11-06: 404 Not Found is the HTTP status code produced by a web server when the
page or file you are trying to access does not exist. If you try to access, for instance,
http://www.example.com/xyzabc, you will get a 404 Not Found error, unless the
webmaster has deliberately set up the web server to redirect you to another page instead.
For a complete list of standard HTTP status codes, see the W3 Consortium's website.
What does 401 Unauthorized mean?

2007-05-16: 401 Unauthorized (sometimes mislabeled as 401 Forbidden) is the HTTP


status code produced by a web server when you don't have the right credentials to access
the page or file you have asked for. The web server sends your browser the 401
Unauthorized response when you access a password-protected page without presenting a
password. Normally the web browser automatically recognizes this situation and displays
a password prompt at this point. However, if you don't know the correct username and
password and click "Cancel" rather than trying again, the browser may show you the 401
Unauthorized error message directly.
For a complete list of standard HTTP status codes, see the W3 Consortium's website.
What does 403 Forbidden mean?

2007-05-16: 403 Forbidden is the HTTP status code produced by a web server when you
are not permitted to access a particular URL. Usually a 403 Forbidden error means that
the page in question does exist but cannot be accessed by you.
Some websites are locked down so that only those on the local company or school
network can access parts of the site. You will often see 403 Forbidden errors when
browsing such sites from "off-campus."
Sometimes webmasters try to set up dynamic web programming features like PHP or
Perl/CGI but fail to do so correctly. This can also result in 403 Forbidden errors until the
web server is correctly configured.
403 Forbidden can appear in other situations at the discretion of the webmaster, so you
may see it in scenarios other than these.
For a complete list of standard HTTP status codes, see the W3 Consortium's website.
What is the difference between a web browser and a search engine?

2007-05-17: Web browsers and search engines both talk to web servers in order to
retrieve web pages. But while a web browser then shows that page directly to a human
being, a search engine does not. Instead, the search engine analyzes the page, looking for
uncommon words and indexing the content so that users can search for the pages they
want.
For more information, see what is a web browser, what is a search engine and what is a
web server.
What is PHP?

2007-06-26: PHP is a popular programming language for extending web pages with
dynamic features. While plain-vanilla HTML can lay out an attractive page and perhaps
present forms for users to enter information, HTML can't actually do anything with the
data that the user enters in the form. This is where web server extension languages like
PHP come in, providing a way to handle form submissions and other user requests by
accessing databases, sending email, generating images on the fly and performing other
actions.
PHP is currently the most popular web server extension language, used by many websites
both large and small. Its popularity is partly due to its free, open-source nature and partly
due to its friendliness and convenience. Tasks such as reading an entire file and
outputting it to the web browser can be accomplished with a single line of PHP code. And
PHP programmers can begin by sprinkling a small amount of code into a page otherwise
made up entirely of HTML— a convenience also available in Microsoft's ASP.NET and
other extension languages.
For more information, visit www.php.net, PHP's home on the web. See also my article
how can I receive form submissions, which provides a quick overview of PHP
programming.
"But what does PHP stand for?"
PHP stands for "PHP Hypertext Preprocessor." Yes, you read that right— "PHP" does
appear in its own expansion. Recursive acronyms like this one are a popular inside joke
in the open source communit
What are ASP and ASP.NET?

2007-12-05: ASP (Active Server Pages) and ASP.NET are server-side dynamic web
programming languages. Webmasters use them to extend their web sites by
communicating with databases, collecting form submissions from users, and generating
content on the fly. ASP and ASP.NET offer capabilities similar to PHP. Unlike PHP, ASP
and ASP.NET are products of Microsoft. You can find Microsoft's official "portal site" for
ASP.NET programmers at www.asp.net.
"How are ASP and ASP.NET different?"
ASP was Microsoft's original server-side web programming language, based on their
earlier Visual BASIC language. ASP.NET is part of Microsoft's new family of ".NET"
programming languages, which are thoroughly object-oriented and substantially different
from what went before. Since Microsoft strongly recommends ASP.NET over ASP, I do
not recommend starting new projects in ASP.
Bear in mind that all server-side dynamic web programming languages are the same as
far as the end user is concerned. That's because what ultimately reaches the web browser
is just HTML anyway. That means you can build your site in PHP (which is available for
free server operating systems like Linux) and reach just as many people as you would
with an ASP.NET site. So use the language that works best for you and your client
What is an ISP?

2007-12-06: An ISP (Internet Service Provider) is an organization that provides Internet


access to others. Most of the time the term "ISP" refers to companies that provide Internet
access to home users, although a company that provides Internet access to other
companies can also be called an ISP, and not all ISPs are for-profit corporations.
Earthlink, AOL and Comcast are popular ISPs in the United States.
"But who is my ISP?"
Your ISP is the company you pay for Internet access. If you have high-speed access such
as cable modem or DSL, then your ISP is probably your cable company or phone
company, although there are other companies that offer these services too (notably
Earthlink). If you have old-fashioned, slow "dialup" access, you could be with AOL,
Earthlink or any number of smaller ISPs.
If you are the person who pays the bills in your house, you'll already know. If not, just
ask that person.
"But I'm not paying anybody!"
If you have a computer with wireless Internet access (WiFi), there's a possibility that you
are "borrowing" Internet access from a neighbor with an unsecured WiFi access point. In
that case, you are stealing Internet service from someone else, who is acting as your ISP
without your permission. Be aware that this is a criminal act in the United States and
individuals who do it have recently been charged with data trespassing, a federal crime. If
you are poor and wish to obtain cheap Internet access, I suggest that you talk to your
neighbor and offer to pay a portion of their Internet bill in exchange for permission to use
their WiFi connection. While you're at it, teach them how to secure their WiFi correctly
with WPA so that only the two of you have access. A WiFi network that you can sneak
onto is also a WiFi network that others can monitor. That means that someone could be
listening to all of your Internet activities on an unsecured WiFi network (except for SSL-
encrypted connections to secure sites and other encrypted traffic).

Das könnte Ihnen auch gefallen