Sie sind auf Seite 1von 82

Workbook 1. The Apache Web Server

Workbook 1. The Apache Web Server

Table of Contents

1. Webserver Basics

5

Discussion

5

Web Servers

5

Installation the Apache Web Server

5

Web Server Layout

6

The Document Root: /var/www/html/

7

Content Types

8

Directories

9

Web Server Logging: /var/log/httpd/{access,error}_log

10

The Anatomy of a Web Request: the HTTP Protocol (Optional, but Interesting)

12

The

Hyper Text Markup Language (HTML) (Optional)

17

Exercises

18

Specification

18

 

Deliverables

19

Clean Up

19

Questions

20

2. Apache Configuration

24

Discussion

24

Apache Configuration: /etc/httpd/conf/httpd.conf

24

The

Global Section

25

The

Main Section

30

The Answer Book: http://localhost/manual

35

Exercises

36

Specification

36

 

Deliverables

37

Questions

37

3. Apache Configuration: Containers

41

Discussion

41

Tailoring Customization to Particular Content: Containers

41

 

Common

Container Configuration

42

Red Hat Enterprise Linux Default Configuration

46

Location Containers: server-status and server-info

48

Exercises

50

Specification

50

Deliverables

52

Questions

52

4. Virtual Hosts

57

Discussion

57

Virtual Hosts

57

IP Based Virtual Hosting

57

Name Based Virtual Hosts

58

Exercises

59

Specification

59

Deliverables

62

Questions

62

5.

The Squid Proxy Server

67

Discussion

 

67

Proxy Servers

67

The squid Proxy Server

68

Squid Configuration:

/etc/squid/squid.conf

68

The server’s identity:

http_port

69

Squid Access Control Lists: acl and http_access

69

Configuring Proxies for Web Clients

73

Squid Logging: /var/log/squid/access.log

75

Finding Out More

76

Exercises

76

Specification

76

Deliverables

78

Challenge Exercises

78

Questions

78

iv

Copyright rha230-5.0-1-en-2008-01-21T07:12:18-0500 (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

Key Concepts

The web server that ships with Red Hat Enterprise Linux is the Apache webserver.

In general terms, web servers map URL requests onto files within the local directory, using the Document Root (/var/www/html/) as the base of the translation.

The web server associates meta-data with requested files, such as content types.

When a client requests a directory instead of a file, Apache serves the file index.html (if it exists), generates a dynamically generated directory listing (if it’s allowed to), or returns an access denied error.

Web servers and web clients communicate using the HTTP protocol.

Often, the information served from a web server is structured using the HTML markup language.

Table 1-1. The Apache Web Server

Packages httpd (with apr and httpd-suexec dependencies), plus other modules (usually starting mod ), and
Packages
httpd (with apr and httpd-suexec dependencies), plus other modules
(usually starting mod
),
and httpd-manual.
Service
httpd
Daemon
/usr/sbin/httpd
Config Files
Logging
/etc/httpd/conf/httpd.conf, /etc/httpd/conf.d/ *
/var/log/httpd/{access,error}_log
Ports
80/tcp (http), 443/tcp (https)

Discussion

Web Servers

This lesson focuses on installing and starting the Apace web server, and publishing information using the default configuration. We also introduce some of the basics of the HTTP protocol and the HTML markup language, for those who are interested.

Installation the Apache Web Server

In Red Hat Enterprise Linux, the Apache web server is easy to install and start in its default configuration, using the conventional trio of commands to install the httpd package and start the httpd

service: yum install

;

service

start; chkconfig

[root@station ~]# yum install httpd

Dependencies Resolved

Chapter 1. Webserver Basics

on.

=============================================================================

Package

Arch

Version

Repository

Size

=============================================================================

Installing:

httpd

i386

2.2.3-6.el5

rha-rhel

Installed: httpd.i386 0:2.2.3-6.el5 Complete!

1.1 M

The httpd service can now be started and "chkconfiged on".

[root@station ~]$ service httpd start

Starting httpd:

[root@station ~]$ chkconfig httpd on

[

OK

]

The availability of the Web Server can be confirmed by using any Web browser to reference http://localhost. The following example uses elinks, but the firefox browser could have been used just as easily.

[root@station ~]$ elinks -dump http://localhost

Red Hat Enterprise Linux Test Page

This page is used to test the proper operation of the Apache HTTP server after it has been installed. If you can read this page, it means that the Apache HTTP server installed at this site is working properly.

Web Server Layout

Once installed, a rpm query to list files (rpm -ql) always serves as a good introduction to the layout of a new product.

[root@station ~]$ rpm -ql httpd

/etc/httpd

/etc/httpd/conf

/etc/httpd/conf.d

/etc/httpd/conf.d/README

Skimming the output, the following relevant files and directories could be seen.

Table 1-2. Web Server Filesystem Layout

Chapter 1. Webserver Basics

Directory Purpose /etc/httpd/ Configuration files, including /etc/httpd/conf/httpd.conf. Dynamically loaded modules.
Directory
Purpose
/etc/httpd/
Configuration files, including /etc/httpd/conf/httpd.conf.
Dynamically loaded modules.
/usr/lib/httpd/modules/
/var/log/httpd/
Log files, including access_log and error_log.
/var/www/html/
The Web Server Document Root (more on this in a moment).

The Document Root: /var/www/html/

The purpose of the Web Server is to serve information. Usually, this involves reading a file from the file system and transferring it to a web browser, which then displays or renders the file.

As an arbitrary example, the file /etc/sysctl.conf can be copied to the document root (/var/www/html) directory. Any web browser referencing http://localhost/sysctl.conf should display the contents of the file just as could be done with the cat command. (Some web browsers may mangle the whitespace within the file, essentially placing the entire contents of the file on one line. This issue arises because of misguided "Content Type" negotiations. More on this later.)

[root@station ~]$ cp /etc/sysctl.conf /var/www/html/ [root@station ~]$ elinks http://localhost/sysctl.conf [root@station ~]$ elinks -source http://localhost/sysctl.conf

#

Kernel sysctl configuration file for Red Hat Linux

#

#

For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and

#

sysctl.conf(5) for more details.

# Controls IP packet forwarding net.ipv4.ip_forward = 0

Instead of a single file, entire directory trees can be copied into the /var/www/html directory.

[root@station ~]$ cp -a /etc/sysconfig /var/www/html/

Now, by accessing http://localhost/sysconfig with a web browser, the contents of the directory should be visible, with "clickable" file and subdirectory links.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

7

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Figure 1-1. Browsing the sysconfig Directory

Chapter 1. Webserver Basics

the sysconfig Directory Chapter 1. Webserver Basics Notice the shift in perspective. What we would call

Notice the shift in perspective. What we would call the directory /var/www/html/sysconfig, the web server refers to as just /sysconfig. This translation is the essence of the term "Document Root".

Web browsers request information using "Uniform Resource Locators", or more commonly just "URL"s. Web related URL’s are usually composed of a hostname and a file path.

http://hostname/dir1/dir2/filename

The hostname is simply the hostname or IP address of the host running the server, while the dir1/dir2/filename is thought of as being a path to a particular file on the server. When locating the file, the web server assumes that the root of the "URL Namespace" is the document root directory

(/var/www/html).

The http portion of the URL is the protocol, which tells the web browser both which port to connect to, and what "language" to expect to speak to whomever is listening on that port. For web servers, the port is 80, and the language is known as the Hypertext Transfer Protocol, or HTTP.

Of course, it’s not a machine’s configuration files that one usually chooses to publish to the world. We’ll move on to more interesting content.

Content Types

The purpose of the web server is to serve the content of files, but web clients seem to learn not just the content of the file, but how to interpret the content, as well. As an example, consider a text file such as /etc/hosts, an HTML file such as

/usr/share/doc/samba-version/htmldocs/manpages/net.8.html, and an image file, such as /usr/share/backgrounds/tiles/neurons.png, each of which are copied to a web server’s

document root.

[root@station ~]# mkdir /var/www/html/example [root@station ~]# cd /var/www/html/example [root@station example]# cp /etc/hosts . [root@station example]# cp /usr/share/doc/samba- * /htmldocs/manpages/net.8.html . [root@station example]# cp /usr/share/backgrounds/tiles/neurons.png .

rha230-5.0-1-en-2008-01-21T07:12:18-0500

8

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is

a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether

in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed

please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

[root@station example]# ls

hosts net.8.html neurons.png

Chapter 1. Webserver Basics

How does a web client handle each of these? If you’re sitting at a student workstation, try for yourself. (Of course, you will first need to perform the above commands to put the files in place.)

http://localhost/example/hosts

http://localhost/example/net.8.html

http://localhost/example/neurons.png

Note: Make sure to create or copy files underneath the /var/www/html directory as the root user. Do not move already existing files into the directory. If you’re having trouble, give it a pass for now, until you read the section "But What Could Go Wrong?" below.

All of the files should have been treated reasonably by the client: the hosts file as a simple text file, the net.8.html file as a marked up man page, complete with bolded titles, italics, and hyperlinks, and neuron.png as a picture of blue blobs.

Now lets shake things up a bit.

[root@station example]# cp hosts hosts.html [root@station example]# cp net.8.html net.8.txt [root@station example]# cp hosts hosts.png [root@station example]# cp neurons.png neurons.txt

Again, if at a student workstation, try the following.

http://localhost/example/hosts.html

http://localhost/example/net.8.txt

http://localhost/example/hosts.png

http://localhost/example/neurons.txt

For those not able to follow along, hosts.html lost all of it’s formatting, net.8.txt dumped what you would see if you catted the file directly, hosts.png caused the browser to complain about a malformed image, and neurons.txt showed a bunch of glyphs representing binary data.

There’s obviously some expectations on the part of the browser about how to interpret the data it receives: text to dump, marked up text (html) to format, or an image to render. The expectation about what type of data the client is receiving is known as the data’s content type.

Apparently, the content type is determined by the file’s filename extension. We still don’t know if the extension is being interpreted into a content type by the server (before the file’s content is transmitted) or by the client (after the content is received). The answer is the server, and the server communicates that content type, as well as a lot of other meta-data about the transfer, using the HTTP protocol.

Directories

We’ve seen how the web server responds when the web server requests a file: it returns the contents of the file to the client. How does the web server handle directories? In general, a webserver responds in one of three ways.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

9

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

First, the web server checks to see if an index file (a file named index.html) exists in the directory. If so, the webserver returns the contents of the file, as if the request for http://localhost/example were for http://localhost/example/index.html.

Secondly, if no index file exists, the web server checks to see if the Indexes option is enabled. If so, the web server returns a dynamically generated directory listing. Otherwise, the webserver returns an error to the client. (How the Indexes option is set or not set will be covered in a following lesson. In Red Hat Enterprise Linux, the option is set by default.)

Table 1-3. Web Server Responses to Directory Requests

Configuration Response index.html exists Return the contents of index.html no index.html, Indexes enabled Return a
Configuration
Response
index.html exists
Return the contents of index.html
no index.html, Indexes enabled
Return a dynamically generated directory listing
no index.html, Indexes disabled
Return error 403 ("Access Denied")

Assuming you followed along above, create the file /var/www/html/example/index.html with the following content (you should be able to cut and paste directly from the browser).

<h1>Examples</h1>

[<a href="hosts">hosts</a>] [<a href="net.8.html">net man page</a>] [<a href="neurons.png">picture of neurons</a>]

What happens when you now view http://localhost/example? You should see the marked up contents of the index file. Is the effect any different if you view http://localhost/example/index.html directly? (It shouldn’t be.)

Figure 1-2. Contents of http://localhost/example

be.) Figure 1-2. Contents of http://localhost/example What about the file /var/www/html/hosts.html ? Is it still

What about the file /var/www/html/hosts.html? Is it still available? You should be able to access it by manually entering the URL http://localhost/example/hosts.html, but there is no way to click to it directly (except from this page, of course). Content behind an index file, which is not referenced directly, is obscured, but still available if someone knows it’s there.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

10

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is

a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether

in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed

please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

Web Server Logging: /var/log/httpd/{access,error}_log

The Apache web server logs information about every request it handles to the file /var/log/httpd/access_log. A sample of the log file’s contents follows.

[root@station ~]# tail -3 /var/log/httpd/access_log

127.0.0.1 - - [13/Jul/2005:06:34:24 -0400] "GET /example/net.8.html HTTP/1.1" 20

0 26196 "http://localhost/rhasb/curr/rha230/html-instructor-classroom/rha230_htt

pd_http.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc4 Firefox/1.0.6"

127.0.0.1 - - [13/Jul/2005:06:34:24 -0400] "GET /example/samba.css HTTP/1.1" 404

290 "http://localhost/example/net.8.html" "Mozilla/5.0 (X11; U; Linux i686; en- US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc4 Firefox/1.0.6" 127.0.0.1- - [13/Jul/2005:06:34:25 -0400]"GET /favicon.ico HTTP/1.1" 404284" -" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0 .6-1.1.fc4 Firefox/1.0.6"

Amongst any line, we find the following information.

The IP address of the client who made the request.

A timestamp of when the request occurred.

The response code associated with the request. A response of code of 200 implies success, anything else is usually some type of failure.

The length of the content returned, not to be confused with the response code which proceeds it.

Any request that does not complete successfully (i.e., whose response code is not 200) also generates information in the error_log.

[root@station ~]# tail -3 /var/log/httpd/error_log

[Tue Jul 13 06:34:24 2005] [error] [client 127.0.0.1] File does not exist: /var/ www/html/example/samba.css, referer: http://localhost/example/net.8.html [Tue Jul 13 06:34:25 2005] [error] [client 127.0.0.1] File does not exist: /var/ www/html/favicon.ico

The access_log and the error_log are one of the first places an administrator should look when trying to figure out why something doesn’t seem to be working. The following table itemizes some of the return codes associated with various errors (or successes).

Table 1-4. HTTP return codes

Code Meaning 200 Success 301 Authorization Required 403 Access Denied 404 File Not Found 501
Code
Meaning
200
Success
301
Authorization Required
403
Access Denied
404
File Not Found
501
Internal Server Error

There are many others, but these tend to be the most common. (In general, the HTTP protocol follows an response code convention used by many network services: partial success are in the 100’s, successes in the 200’s, incomplete transactions in the 300’s, client errors in the 400’s, and server errors in the 500’s.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

11

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

Watch closely the output the next time you use the simple ftp client, for example.)

But What Could Go Wrong?

In it’s default configuration, there’s really only two things that could cause problems: permissions, and SELinux.

First, files must be readable by the system user apache. The httpd process, like any other process, must have the right permissions to access a file. For security reasons, the web server runs as the user apache. Therefore, any file served by the web server must be readable by the user apache.

Secondly, the Apache web server is one of the services constrained by the Red Hat Enterprise Linux SELinux targeted policy. Therefore, any file serviced by the Apache web server must have an appropriate SELinux context. For now, the context of the /var/www/html directory (httpd_sys_content_t) will suffice. Any file created in this directory (including subdirectories) should inherit this context, and be fine. The problem occurs when files are created somewhere else, and moved to this directory - they then retain their original (inappropriate) SELinux context.

At any rate, whenever the web server complains in its log file that it cannot access a file you think it should be able to, try the following commands to set appropriate permissions and SELinux context.

[root@station ~]# chmod a+r filename

[root@station ~]# chcon --reference /var/www/html filename

or

[root@station ~]# restorecon /var/www/html/filename

The Anatomy of a Web Request: the HTTP Protocol (Optional, but Interesting)

This section introduces the HTTP protocol. The intent is not to be thorough, but instead to give students an impression of what is meant when people use terms such as HTTP headers, GET, and Response Code. For those who don’t get enough, all of the details can be found at the World Wide Web Consortium’s (http://www.w3.org) website (http://www.w3.org/Protocols).

In order to introduce the HTTP protocol, it’s easiest to start with an example. The entire conversation between a web client and a web server can be captured using the wireshark network analyzer. If not already installed, yum install wireshark-gnome should do the trick. A capture is started by opening

wireshark, choosing Capture:Start

80, and "OK"ing. (Enabling "Update list of packets in real time" and "Automatic scroll in live capture" tends to make things more interesting for small captures, as well.)

from the menu, specifying a capture filter of (in this case) port

rha230-5.0-1-en-2008-01-21T07:12:18-0500

12

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Figure 1-3. Specifying a Wireshark Capture filter

Chapter 1. Webserver Basics

a Wireshark Capture filter Chapter 1. Webserver Basics Once Wireshark is capturing packets, any conversations

Once Wireshark is capturing packets, any conversations between a web client and a web server which occur on the local machine should be captured. For example, the following displays a conversation

between a web client requesting http://station53.rosemont.wlan/example/hosts and a web

server providing the answer. Once wireshark has been stopped, the individual IP packets can be browsed from a list.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

13

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Figure 1-4. A Wireshark Capture Packet List

Chapter 1. Webserver Basics

A Wireshark Capture Packet List Chapter 1. Webserver Basics More interestingly for our purposes, wireshark can

More interestingly for our purposes, wireshark can easily assemble the payload from each of the individual packets which compose a TCP/IP conversation by right clicking on any packet, and choosing Follow TCP Stream.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

14

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

Figure 1-5. Viewing a TCP Conversation with Wireshark

Basics Figure 1-5. Viewing a TCP Conversation with Wireshark The web client, in red, is making

The web client, in red, is making a request of the web server, in blue. The "language" the client and server use is the HTTP protocol.

The HTTP Protocol: the Request (Client to Server)

A web request is composed of three parts: a request line, a series of HTTP headers, and the "body" (or content).

Note: In the following, some portions of the text have been replaced with " same convention is used many places in the text.

" for readability. The

GET/example/hostsHTTP/1.1 Host: station53.rosemont.wlan User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Geck

Accept: text/xml,

Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7, * ;q=0.7 Keep-Alive: 300 Connection: keep-alive

text/html;q=0.9,text/plain;q=0.8,image/png,

* / * ;q=0.5

rha230-5.0-1-en-2008-01-21T07:12:18-0500

15

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

The entire first line is known as the Request-Line, and contains exactly three pieces of information in a specified order.

The request method, which for our purposes can be thought of either being a GET or a POST. With a GET, the client is requesting information. With a POST, the client is submitting information.

The URI, or "Universal Resource Identifier". Think of this as the path portion of a URL. (The server portion has already been used to open the TCP/IP connection.)

The exact protocol that the client is speaking. Only two protocols are generally considered, HTTP/1.0 and HTTP/1.1, and any modern client should be using the latter.

The next series of lines, which all have the form header: data, are known as the HTTP headers. These are used to associate any metadata with the request. Some HTTP request headers relevant to our discussion are the following.

Host: The content of the host portion of the URL requested by the client.

User-Agent: The User Agent is the client software. In this case, the client is the Firefox web browser, which identifies itself as a variant of Mozilla.

Accept: A list of the content types that the browser is willing to accept. This browser prefers to receive text/xml or text/html, but will also handle text/plain. For images, the browser prefers image/png, but in the end, the browser will accept * / * , or anything the server will throw at it.

After a blank line, indicating the end of the HTTP headers, the content of the request would follow. For GET requests, such as this one, there is no content.

The HTTP Protocol: the Response (Server to Client)

The server responds with the following, which is again composed of three parts: a response line, a set of response HTTP headers, and the response "body" (or content).

HTTP/1.1200OK Date: Sat, 13 Aug 2005 11:09:51 GMT Server: Apache/2.0.54 (Fedora) Last-Modified: Sat, 13 Aug 2005 10:26:31 GMT ETag: "406ee-104-105723c0" Accept-Ranges: bytes Content-Length: 260 Connection: close Content-Type: text/plain; charset=UTF-8

# Do not remove the following line, or various programs

# that require network functionality will fail.

127.0.0.1.localhost.localdomain.localhost

192.168.218.254.rosemont.

#192.168.218.254.s.

#192.168.218.53.w.

192.168.0.5.s.

192.168.0.6.w.

192.168.201.254 rw

rha230-5.0-1-en-2008-01-21T07:12:18-0500

16

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

The Response-Line, like the Request-Line, is composed of three ordered parts. In the case of the response, however, the latter two fields are redundant.

The exact protocol the server is using.

The response code of the transaction, which is used to imply success, or qualify a type of failure. In this case, the response code 200 implies success. (More on these later.)

A text representation of the response code. This is supplied only for diagnostic (debugging) purposes, as the response code is what’s really important. The text OK is associated with the response code of 200.

Again, the next series of lines, which all have the form header: data, are known as the HTTP headers. We will only focus on one of the HTTP response headers.

Content-Type: The server is providing the client with the type of the content, so the browser can render the data appropriately. For this response, the content type is text/plain, so the browser will display the content "as is", preserving whitespace. Other content types could include image/png, text/html, or application/msword.

After a blank line, indicating the end of the HTTP headers, the content of the response follows.

For this response, the content is a simple text file. (In the output above, tabs have been replaced with periods, an artifact of how wireshark displays non-printing characters.)

The Hyper Text Markup Language (HTML) (Optional)

This workbook is about managing the Apache webserver as a system administrator, not about designing web content. However, during this workbook you will encounter files which use HTML to markup their content, so a brief introduction will be useful. Again, those who do not get enough can find more at the World Wide Web Consortium’s (http://www.w3.org) website (http://www.w3.org/MarkUp).

Fundamentally, HTML provides three things.

1. Structure: HTML allows text to be identified as titles or inlined quotes, or organized into lists and tables.

2. Embedded Media: HTML allows authors to embed media into their text, usually in the form of images, but also as videos and sound.

3. Links: HTML allows authors to easily reference other information, so that anyone reading the text can locate the other information with the click of a mouse.

All three of the above capabilities rely on embedding HTML tags into the text, where a tag is any text embedded between brackets, such as <table>, <img>, or <a>.

Because the brackets are now considered syntax, there needs to be some way to represent the bracket. This is done using HTML entities, which begin with an ampersand (&) and end with a semicolon. For example, the entity for a left bracket is &lt; (for "less than"), and the entity for a right bracket is &gt; (for "greater than"). Entities are also used for glyphs not often found on keyboards, such as the copyright symbol. Since the ampersand starts entities, there must also be some way of representing it, and the answer is itself an entity: &amp;.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

17

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

Rather than provide a full introduction to HTML in the text, a sample document is provided at http://rha-server/pub/rha/rha230/sample.html. Students are encouraged to examine this document, both as it is rendered by a web browser and the underlying text (which can usually be viewed in a browser by right clicking and choosing view page source).

Exercises

Lab Exercise

Objective: Install, start, and contribute content to an Apache web site. Estimated Time: 45 mins.

This exercise has you download and install material for your web server, using the web server’s default configuration. The material consists of three texts which are not optimally organized for the Apache web server. The lab has you perform some simple renamings and repositioning of the material so that it is more naturally viewed using a web browser.

Specification

1. If the httpd package is not already installed on your machine, install it now.

2. Start the httpd service (if it is not already started), and configure the service to be started by default upon reboots.

3. Download a copy of the file http://rha-server/pub/rha/rha230/readings.tgz, and extract its contents into your web server’s document root directory (/var/www/html/). Properly extracting the contents should result in a new /var/www/html/readings directory. 1

4. Using a web browser, browse the http://localhost/readings directory. You should be able to view the

HTML files the_god_of_mars.html and war_of_the_worlds appropriately.

5. Correct a misnamed index file.

a. Again using a web browser, examine the contents of the http://localhost/readings/relat10h/ subdirectory. You should discover the file index.htm. Try examining this file through the web browser: http://localhost/readings/relat10h/index.htm.

b. Apparently, the intent of the author was that this page should serve as an index page, but the file is named incorrectly for Apache’s default configuration. In the

/var/www/html/readings/relat10h/ directory, create a link of index.htm named

index.html (either hard or soft).

c. Using a browser, again view the URL http://localhost/readings/relat10h/. You should now see the contents of the index page.

d. To make life a little easier for anyone browsing your site, in the /var/www/html/readings directory, create a symlink to the relat10h directory called relativity.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

18

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

e. Confirm that you may now access the content of the file index.htm using http://localhost/readings/relativity/.

6. Correct a misnamed directory.

a. If you can stomach the physics (and, in fact, even if you cannot), skim the first appendix to Einstein’s theory of relativity, either by following the link from the main page, or by referencing http://localhost/readings/relativity/ap01.htm directly.

b. You might notice that many of the equations, such as equation 29, equation 30, etc., are missing. Examine the end of /var/log/httpd/access_log, and note the many requested images files which received a 404 ("File Not Found") response code.

c. Examine the end of the file /var/log/httpd/error_log, and you will discover more helpful messages.

[root@station ~]# tail /var/log/httpd/error_log

[Tue Jul 20 16:53:14 2005] [error] [client 127.0.0.1] File does not exist: /var/ www/html/readings/relat10h/pics, referer: http://localhost/readings/relat10h

/ap01.htm

d. Examining the log messages closely, you may discover the problem. All of the web pages are expecting images to be in a directory named pics, but this directory does not exist.

e. Through a simple directory renaming, or perhaps another symlink, solve the problem, so that all of the images of equations are properly displayed.

7. Now that you have completed the hard work, relax a little, by deriving the equation for the Lorentz transformation, following the steps in chapter 11. Place your results in a file titled that_was_easy in your academy user’s home directory. (Just kidding.)

Deliverables

1. An installed and running httpd service, configured to start by default on bootup.

2. The text of three books, browsable from the URL http://localhost/readings.

3. The table of contents of Einstein’s theory of relativity at http://localhost/readings/relat10h.

4. The table of contents of Einstein’s theory of relativity, also at http://localhost/readings/relativity.

5. The images of equations in appendix 1 (found at http://localhost/readings/relativity/ap01.htm) are displayed properly.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

19

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

Clean Up

You will want to leave the /var/www/html/readings directory in place, as you will need it in the next section.

Questions

1. In Red Hat Enterprise Linux 5, which of the following packages provides the Apache web server?

(

) a. httpd

(

) b. apache

(

) c. webserver

(

) d. apr

(

) e. None of the above

2. Which of the following directories serves as the web server’s document root?

(

) a. /opt/docroot

(

) b. /var/pub/

(

) c. /var/www/html/

(

) d. /etc/httpd

(

) e. None of the above

After migrating the contents of a web site from one operating system to another, web clients, when viewing the URL http://localhost/zsh.txt, are displaying raw html instead of a formatted page:

rha230-5.0-1-en-2008-01-21T07:12:18-0500

20

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

Chapter 1. Webserver Basics 3. What is the simplest solution to the problem? ( ) a.

3. What is the simplest solution to the problem?

(

) a. Install the mod_html package.

(

) b. Create a index.html file to reference this page.

(

) c. Use the txt2html utility to assign the file the HTML file type.

(

) d. Rename the file zsh.html.

(

) e. Use chcon to assign the file the appropriate SELinux context.

Use the output of the following command to answer the next question, assuming the default Red Hat Enterprise Linux configuration of the Apache web server.

[root@station1 ~]# ls /usr/share/backgrounds/ *

/usr/share/backgrounds/images:

default.png

dewdop_leaf.jpg

ladybugs.jpg

leafdrops.jpg

/usr/share/backgrounds/tiles:

riverstreet_rail.jpg

sneaking_branch.jpg

3dgreen.png

dunes.png

Planning-And-Probing-1.jpg

All-Good-People-1.jpg

fibers.png

plasma.png

[root@station ~]# cp -a /usr/share/backgrounds/ /var/www/html/

4. What would you expect to see if you pointed the Firefox web browser to the URL

http://localhost/backgrounds/images/?

(

) a. A dynamically generated web page which displays the images as pictures.

(

) b. A "404: File not Found" error page.

(

) c. A "403: Forbidden" error page.

(

) d. A page containing binary data, because the web server tries to interpret the directory as if it were a file.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

21

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Webserver Basics

( ) e. A dynamically generated web page which lists the contents of the directory by filename.

5. If, when the directory above is referenced, you would prefer web clients to see the contents of a file, what should

the relevant file be named?

(

) a.

README.html

(

) b.

HEADER.html

(

) c.

index.htm

(

) d.

DIR.htm

(

) e. None of the above

6. In what file are all web requests from clients ("hits") logged?

(

) a.

/var/log/secure

(

) b.

/var/log/httpd/error_log

(

) c. /var/log/messages

(

) d.

/var/log/httpd/access_log

(

) e. Both C and D

7. If, when running service httpd start, the webserver fails to start, what file might contain helpful debugging

messages?

(

) a.

/var/log/secure

(

) b.

/var/log/xferlog

(

) c. /var/log/httpd/error_log

(

) d.

/var/log/httpd/access_log

(

) e. Both B and D

8. In what file are web requests that generate errors logged?

(

) a.

/var/log/secure

(

) b.

/var/log/httpd/error_log

(

) c. /var/log/messages

(

) d.

/var/log/httpd/access_log

(

) e. Both B and D

rha230-5.0-1-en-2008-01-21T07:12:18-0500

22

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

9. Which is the web server’s "well known" port?

(

) a. 8080

(

) b.

22

(

) c.

25

(

) d.

80

Chapter 1. Webserver Basics

10. Apache’s dynamically loaded modules are conventionally found in what directory?

(

) a. /usr/lib/httpd/modules

(

) b. /usr/lib/apache

(

) c. /usr/libexec/apache

(

) d. /usr/share/httpd/modules

(

) e. None of the above

Notes

1. An excellent source for public domain texts it the Gutenberg project (http://www.gutenberg.org).

rha230-5.0-1-en-2008-01-21T07:12:18-0500

23

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Configuration

Key Concepts

The Apache server is configured using the /etc/httpd/conf/httpd.conf and

/etc/httpd/conf.d/ * .conf configuration files.

The configuration file is informally divided into the Global, Main, and Virtual Server sections.

The Global section defines aspects which pertain to the server as a whole, including client connection dynamics, server pool parameters, binding address, and which modules to load.

The Main section defines aspects which may be redefined by any virtual server, such as the document root, logging behavior, and URL namespace remappings.

Comprehensive documentation is provided by the httpd-manual package, which, when installed, can be access at http://localhost/manual.

Discussion

Apache Configuration: /etc/httpd/conf/httpd.conf

The Apache web server is configured with text configuration files which are read upon startup. The primary configuration file is /etc/httpd/conf/httpd.conf, but the files /etc/httpd/conf.d/ * .conf are "slurped up" into the configuration, as well.

[root@station ~]# ls /etc/httpd/conf /etc/httpd/conf.d/

/etc/httpd/conf:

httpd.conf magic

/etc/httpd/conf.d/:

README welcome.conf

The apache configuration file syntax is straightforward, and tends to be well documented (both as comments in the default configuration file, and in a separate manual to be discussed later). A sample of the configuration file’s syntax follows.

#

#

DocumentRoot: The directory out of which you will serve your

#

documents. By default, all requests are taken from this directory, but

#

symbolic links and aliases may be used to point to other locations.

#

DocumentRoot "/var/www/html"

#

#

Each directory to which Apache has access can be configured with respect

#

to which services and features are allowed and/or disabled in that

#

directory (and its subdirectories).

#

<Directory /> Options FollowSymLinks AllowOverride None

</Directory>

Chapter 2. Apache Configuration

Any empty line, or line which begins with a hash ("#"), is considered a comment.

Any line which is not a comment generally starts with a keyword referred to as a directive. Directives are not case sensitive, but of course spelling is important. The syntax of the remainder of the line depends on the directive, but all of a directive’s arguments must occur on a single line.

The only other way a line can begin is with a XML-like tag, which begins a container. Containers end with a XML-like closing tag. Generally, all directives found within a container only take effect within the scope of the container. We will discuss the effects of different types of containers in a later lesson.

The file is thought of as occurring in three sections, although the syntax does not formally enforce them.

1. The Global Section: This section contains configuration which applies to the web server as a whole, including any virtual servers.

2. The Main Section: Configuration which applies to the main server (as opposed to any virtual servers) belongs in this section. Any configuration in this section can be overridden by a virtual server.

3. Virtual Servers: The Apache web server can take on the appearance of being multiple distinct servers. Virtual servers will be discussed in more detail in the next lesson.

We begin by examining configuration relevant to the server as a whole. You might want to open the file /etc/httpd/conf/httpd.conf in a pager or text editor and follow along as you read the following sections. (You should consider setting the editor into a "read only" mode, or making a backup of the file and browsing it).

The Global Section

The Global section of the configuration file includes configuration that effects the server as a whole.

Figure 2-1. /etc/httpd/conf/httpd.conf

### Section 1: Global Environment

#

35 # The directives in this section affect the overall operation of Apache,

# such as the number of concurrent requests it can handle or where it

# can find its configuration files.

Configuration Context: ServerRoot

The ServerRoot directive establishes a home base for all of the remaining server context, while the second directive is a simple example of making use of this home base.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

25

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Figure 2-2. /etc/httpd/conf/httpd.conf

Chapter 2. Apache Configuration

46

#

 

#

ServerRoot: The top of the directory tree under which the server’s

#

configuration, error, and log files are kept.

#

55

# Do NOT add a slash at the end of the directory path.

 

#

ServerRoot "/etc/httpd"

#

60

# PidFile: The file in which the server should record its process

 

#

identification number when it starts.

#

PidFile run/httpd.pid

The ServerRoot directive establishes context for future file references within the configuration file. Any relative file reference (one that does not begin with a "/") will be relative to the ServerRoot, which in Red Hat Enterprise Linux is /etc/httpd.

In Unix, daemons traditionally record the fact that they are running by creating a file in the filesystem which contains their process id, called a pid file. The PidFile directive specifies where this file should be located.

Examining the /etc/httpd directory, we find it’s populated with several symbolic links.

[root@station ~]$ ls -l /etc/httpd

total 28 drwxr-xr-x 4 root root 4096 Jul 25 06:33 conf

drwxr-xr-x 2 root root 4096 Jul 25 06:33 conf.d

lrwxrwxrwx 1 root root

lrwxrwxrwx 1 root root

lrwxrwxrwx 1 root root

19 Jul 25 06:33 logs -> 27 Jul 25 06:33 modules ->

13 Jul 25 06:33 run ->

/

/

/var/log/httpd

/

/usr/lib/httpd/modules

/var/run

In the httpd.conf configuration file, file references that begin logs/, modules/, or run/ are mapped to the relevant directories. Can you convince yourself that the daemon’s pid file would be found at

/var/run/httpd.pid?

It’s important to understand the role of the ServerRoot directive, and the use of the symbolic links in the /etc/httpd directory, but there’s seldom any reason to change these values.

Client Connection Dynamics: Timeout and KeepAlive

The following directives control how long the server will wait on badly behaved clients.

Figure 2-3. /etc/httpd/conf/httpd.conf

65 #

#

Timeout: The number of seconds before receives and sends time out.

#

Timeout 120

rha230-5.0-1-en-2008-01-21T07:12:18-0500

26

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is

a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether

in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed

please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Configuration

70

#

 

#

KeepAlive: Whether or not to allow persistent connections (more than

#

one request per connection). Set to "Off" to deactivate.

#

KeepAlive Off

75

 

#

#

MaxKeepAliveRequests: The maximum number of requests to allow

#

during a persistent connection. Set to 0 to allow an unlimited amount.

#

We recommend you leave this number high, for maximum performance.

80

# MaxKeepAliveRequests 100

#

#

KeepAliveTimeout: Number of seconds to wait for the next request from the

85

# same client on the same connection.

 

#

KeepAliveTimeout 15

A particular httpd process can only communicate with one client at a time. A badly behaved client, which opens a TCP/IP connection but never uses it, could therefore tie up a server indefinitely. The Timeout directive specifies how long, in seconds, before a server terminates a connection with a badly behaved client.

➋➌➍ These directives decide if the server honors "Keep Alive" requests from a client, how many request can be made over a "Keep Alive" connection, and how long before an inactive connection should time out.

The HTTP protocol is termed a "stateless" protocol, meaning that the server doesn’t record any information about the client between one request and the next. In the original HTTP/1.0 protocol, clients are required to open a new socket for every request. Downloading a web page with 10 images, therefore, would require the client to open 11 sockets (one for the page, and one for each referenced image).

The HTTP/1.1 protocol tried to improve efficiency by allowing a client to leave a single socket open for "follow up" requests. Such a persistent socket is called a "Keep Alive" socket. Clients are more likely to abuse such persistent connections, however, by leaving them open but not making any followup requests, so stricter timeout values are usually assigned to such connections.

Managing the Server Pool: StartServers, {Min,Max}SpareServers, MaxClients, and MaxRequestsPerChild

Recall that most Unix daemons use a forking model. Upon receiving a new client connection, the server process forks (duplicates itself), dedicating the new child to the newly connected client, while the parent returns to listening for new connections.

In order to gain efficiency, the Apache web server takes the uncommon approach of "pre-forking" child daemons to handle client connections, before the clients ever arrive. Even on an unused web server, several httpd processes exist. The parent daemon is generally run as the user root, and the pre-forked child daemons as the user apache. The collection of httpd process are often referred to as the "server pool".

rha230-5.0-1-en-2008-01-21T07:12:18-0500

27

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is

a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether

in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed

please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

[root@station ~]# ps aux | grep httpd

Chapter 2. Apache Configuration

root

2334

0.0

2.0

19504 10488 ?

Ss

05:57

0:00 /usr/sbin/httpd

apache

2359

0.0

2.0

19504 10624 ?

S

05:57

0:00 /usr/sbin/httpd

apache

2360

0.0

2.0

19504 10624 ?

S

05:57

0:00 /usr/sbin/httpd

apache

5248

0.0

2.0

19504 10628 ?

S

07:04

0:00 /usr/sbin/httpd

root

7636

0.0

0.1

3768

716 pts/5

S+

08:56

0:00 grep httpd

The following directive manage the dynamics of the server pool.

Figure 2-4. /etc/httpd/conf/httpd.conf

# prefork MPM

# StartServers: number of server processes to start

95 # MinSpareServers: minimum number of server processes which are kept spare

#

MaxSpareServers: maximum number of server processes which are kept spare

#

ServerLimit: maximum value for MaxClients for the lifetime of the server

#

MaxClients: maximum number of server processes allowed to start

#

MaxRequestsPerChild: maximum number of requests a server process serves

100

<IfModule prefork.c>

StartServers

8

MinSpareServers

5

MaxSpareServers

20

ServerLimit

256

105

MaxClients

256

MaxRequestsPerChild 4000

</IfModule>

StartServers: The initial size of the server pool (in number of processes).

➋➌ {Min,Max}SpareServers: The server pool scales dynamically. If a web server gets blitzed with many requests, more child daemons will be started. If things go quiet, unused child daemons will be killed. These directives place bounds on the server pool size.

➍➎ ServerLimit, MaxClients: The number of concurrent requests can be limited. Connection request

above this limit will be greeted with a quick "I’m busy

handled. The distinction between the ServerLimit and MaxClients directives is subtle, and in practice they are set together to the same value.

come back later", rather than actually

MaxRequestsPerChild: In order to improve stability, a given child daemon will only serve so many requests until it kills itself, and a new daemon must be started. (This suicide helps curtail memory leaks in poorly written libraries and CGI executables.)

Controlling the Server Address: Listen

Figure 2-5. /etc/httpd/conf/httpd.conf

125

#

#

Listen: Allows you to bind Apache to specific IP addresses and/or

#

ports, in addition to the default. See also the <VirtualHost>

#

directive.

#

rha230-5.0-1-en-2008-01-21T07:12:18-0500

28

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is

a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether

in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed

please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Configuration

130

# Change this to Listen on specific IP addresses as shown below to

#

prevent Apache from glomming onto all bound IP addresses (0.0.0.0)

#

#Listen 12.34.56.78:80 Listen 80

The Listen directive controls which address the server binds to. In the default configuration (above), the server binds to internal IP address 0.0.0.0 (implying every active interface), port 80. Multiple Listen lines can be used to specify that the daemon should bind to multiple ports and/or addresses.

Extending the Web Server: LoadModule

The Apache web server is modular by design. The core web server is actually fairly minimal, with various modules providing much of the interesting behavior. Modules may either be "static", meaning that they’re part of the core executable and can never be removed, or "dynamic", meaning that an administrator can control if the module is loaded or not during startup.

Apache dynamic modules are located in the /usr/lib/httpd/modules, and are loaded using the LoadModule directive.

Figure 2-6. /etc/httpd/conf/httpd.conf

136

#

#

Dynamic Shared Object (DSO) Support

#

# To be able to use the functionality of a module which was built as a DSO you

140

# have to place corresponding ‘LoadModule’ lines at this location so the

#

directives contained in it are actually available _before_ they are used.

#

Statically compiled modules (those listed by ‘httpd -l’) do not need

#

to be loaded here.

#

145

# Example:

#

LoadModule foo_module modules/mod_foo.so

#

LoadModule auth_basic_module modules/mod_auth_basic.so

LoadModule auth_digest_module modules/mod_auth_digest.so

150 LoadModule authn_file_module modules/mod_authn_file.so

LoadModule authn_alias_module modules/mod_authn_alias.so

LoadModule authn_anon_module modules/mod_authn_anon.so

LoadModule include_module modules/mod_include.so

LoadModule log_config_module modules/mod_log_config.so

165 LoadModule logio_module modules/mod_logio.so

LoadModule env_module modules/mod_env.so LoadModule ext_filter_module modules/mod_ext_filter.so LoadModule mime_magic_module modules/mod_mime_magic.so

206

#

#

Load config files from the config directory "/etc/httpd/conf.d".

#

Include conf.d/ * .conf

rha230-5.0-1-en-2008-01-21T07:12:18-0500

29

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is

a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether

in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed

please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

210

Chapter 2. Apache Configuration

The various modules tend to introduce new configuration directives to modify their behavior. For example, the log_config_module provides the LogFormat directive, which we will encounter later. In the configuration file, the module must be loaded (with LoadModule) before any directives it provides are encountered.

In order to ease the distribution of modules using a package managed system (such as RPM), the Include directive specifies external configuration files to include, either directly or by using pathname expansion (file globbing).

The Main Section

The Main section of the configuration file includes configuration that effects the primary server, but directives in this section can be overridden by any virtual server.

Figure 2-7. /etc/httpd/conf/httpd.conf

### Section 2: ’Main’ server configuration

#

235

# The directives in this section set up the values used by the ’main’

#

server, which responds to any requests that aren’t handled by a

#

<VirtualHost> definition. These values also provide defaults for

#

any <VirtualHost> containers you may define later in the file.

#

240

# All of these directives may appear inside <VirtualHost> containers,

#

in which case these default settings will be overridden for the

#

virtual host being defined.

Server Identity: ServerName and ServerAdmin

The first two directives in the main section help establish the identity of the server.

Figure 2-8. /etc/httpd/conf/httpd.conf

245

#

#

ServerAdmin: Your address, where problems with the server should be

#

e-mailed. This address appears on some server-generated pages, such

#

as error documents. e.g. admin@your-domain.com

#

250

ServerAdmin root@localhost

#

# ServerName gives the name and port that the server uses to identify itself.

#

This can often be determined automatically, but we recommend you specify

255

# it explicitly to prevent problems during startup.

264

#ServerName www.example.com:80

rha230-5.0-1-en-2008-01-21T07:12:18-0500

30

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Configuration

The ServerAdmin directive is mainly cosmetic. The email address is listed in the footer of the default error pages.

For simple hosts, with a single external interface and therefore a clear concept of a hostname, the ServerName can be automatically determined. If in doubt, however, it should be specified manually. (For example, if the server is bound to multiple interfaces, the preferred name should be configured explicitly).

Server Content: the DocumentRoot

The DocumentRoot directive, one of the most fundamentally important, identifies where in the filesystem the information to be be served is found. Recall that when the file portion of a URL is translated to a file in the filesystem, the document root provides the base of that translation. This directive is probably the most often overridden by a Virtual Host.

The following default specifies the Red Hat Enterprise Linux document root as /var/www/html.

Figure 2-9. /etc/httpd/conf/httpd.conf

#

DocumentRoot: The directory out of which you will serve your

#

documents. By default, all requests are taken from this directory, but

#

symbolic links and aliases may be used to point to other locations.

#

280

DocumentRoot "/var/www/html"

Specifying the Directory Index File: DirectoryIndex

In a previous lesson, we discussed the role of an index file, called index.html. We now see that the name of the file is configurable.

Figure 2-10. /etc/httpd/conf/httpd.conf

#

#

DirectoryIndex: sets the file that Apache will serve if a directory

#

is requested.

385

#

#

The index.html.var file (a type-map) is used to deliver content-

#

negotiated documents. The MultiViews Option can be used for the

#

same purpose, but it is much slower.

#

390

DirectoryIndex index.html index.html.var

Notice that if multiple file names are specified, each will be searched for in sequence. Specifying too many alternatives, however, could lead to poor performance.

For example, if migrating content from a Microsoft based server, setting DirectoryIndex to the following would be easier than renaming every file named index.htm to index.html.

DirectoryIndex index.html index.htm

rha230-5.0-1-en-2008-01-21T07:12:18-0500

31

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Configuration

Tip: Index files can even be specified as an absolute reference. What do you think would be the effect of a configuration such as the following?

DirectoryIndex index.html /cgi-bin/index.cgi

Collecting Client Identities: HostnameLookups

Buried deep withing the configuration file is an important directive called HostnameLookups.

Figure 2-11. /etc/httpd/conf/httpd.conf

435

#

#

HostnameLookups: Log the names of clients or just their IP addresses

#

e.g., www.apache.org (on) or 204.62.129.132 (off).

#

The default is off because it’d be overall better for the net if people

#

had to knowingly turn this feature on, since enabling it means that

440

# each client request will result in AT LEAST one lookup request to the

#

nameserver.

#

HostnameLookups Off

The web server can easily determine the IP address of any client which is making a web request: it’s part of the request’s IP protocol header. In order to determine the hostname of the client, however, the web server must work harder: it must perform a reverse DNS lookup on the client’s IP address. This reverse lookup increases both time and network traffic on the part of the server, so by default, it’s disabled. As a result, all logging and access control list are implemented by IP address, not by hostname.

If you desire logs and access control lists to use client hostnames instead of IP addresses, and are willing to pay the price in performance, HostnameLookup can be set to on.

Logging: ErrorLog, LogLevel, LogFormat, and CustomLog

The apache web server maintains two types of logs: transaction logs, and error logs. Transaction logging occurs with every web request ("hit"), and is highly configurable, potentially logging to multiple files. In contrast, there is only one error log, and only two questions associated with it: where, and how much. We start with the simpler of the two.

Error Logging: ErrorLog and LogLevel

Figure 2-12. /etc/httpd/conf/httpd.conf

#

465

# ErrorLog: The location of the error log file.

#

If you do not specify an ErrorLog directive within a <VirtualHost>

#

container, error messages relating to that virtual host will be

#

logged here. If you * do * define an error logfile for a <VirtualHost>

#

container, that host’s errors will be logged there and not here.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

32

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is

a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether

in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed

please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

470 # ErrorLog logs/error_log

Chapter 2. Apache Configuration

#

#

LogLevel: Control the number of messages logged to the error_log.

475

# Possible values include: debug, info, notice, warn, error, crit,

#

alert, emerg.

#

LogLevel warn

By default, the web server logs to the file /var/log/httpd/error_log (recall the role of the ServerRoot directive, and the /etc/httpd/logs symlink). For the main server, it’s hard to think of a reason to ever change it, though virtual hosts often override it.

More interesting is the LogLevel, which determines how much information is logged. The vocabulary draws directly from the syslog service. When troubleshooting, an administrator often ratchets up the logging by setting the LogLevel to debug, for example. Of course, more copious logging slows down overall performance, so once a problem has been resolved, logging is returned to a more suitable default.

Transaction Logging: LogFormat and CustomLog

For every web request, there is a large amount of information that an administrator can choose to log (or not). Such transaction logs are often referred to as "access logs". The LogFormat directive allows administrators to assign names to collections of information, so that they are easy to refer to later. This is all LogFormat does, however. In order to use one of the formats, they must be associated with a

CustomLog.

Figure 2-13. /etc/httpd/conf/httpd.conf

480

#

#

The following directives define some format nicknames for use with

#

a CustomLog directive (see below).

#

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

485 LogFormat "%h %l %u %t \"%r\" %>s %b" common LogFormat "%{Referer}i -> %U" referer LogFormat "%{User-agent}i" agent

# "combinedio" includes actual counts of actual bytes received (%I) and sent (%O); this

490 # requires the mod_logio module to be loaded.

#LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedi

The following table illustrates some of the parameters most commonly used in access logs.

Table 2-1. Apache Log Parameters

ParameterReferences Example
ParameterReferences
Example
%h Remote host (IP or hostname) 127.0.0.1 %u Remote user (for HTTP authentication) elvis
%h
Remote host (IP or hostname)
127.0.0.1
%u
Remote user (for HTTP authentication)
elvis

rha230-5.0-1-en-2008-01-21T07:12:18-0500Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation

of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

33

Chapter 2. Apache Configuration

ParameterReferences Example %t Timestamp [15/Jul/2005:06:55:44 -0400] %r Request line (from HTTP protocol) GET
ParameterReferences
Example
%t
Timestamp
[15/Jul/2005:06:55:44 -0400]
%r
Request line (from HTTP protocol)
GET /icons/compressed.gif HTTP/1.1
%s
HTTP response status code
200
%b
Response size (in bytes)
1079
%{name}iHTTP header name
(depends on name)

Many more exist as well. As usual, with all of this flexibility comes the need for convention. Two commonly used conventions are the common format and the combined format, which are the first two formats defined above. The common format records IP address, username (if any), timestamp, request line, response status, and number of bytes transferred. 1

The combined format adds the identity of the client application, and the referring page (if any). While the combined format is used by default in Red Hat Enterprise Linux, administrators could well choose to drop back to the common format to save space and improve performance.

Many external log analysis utilities (such as webalizer) rely on logs being in a standard format, so an administrator should consider the consequences before changing the log format arbitrarily.

Finally, once a format has been decided, it can be associated with a log file using the CustomLog directive.

Figure 2-14. /etc/httpd/conf/httpd.conf

#

The location and format of the access logfile (Common Logfile Format).

495

# If you do not define any access logfiles within a <VirtualHost>

#

container, they will be logged here. Contrariwise, if you * do *

#

define per-<VirtualHost> access logfiles, transactions will be

#

logged therein and * not * in this file.

#

500

#CustomLog logs/access_log common

#

#

If you would like to have separate agent and referer logfiles, uncomment

#

the following directives.

505 # #CustomLog logs/referer_log referer #CustomLog logs/agent_log agent

#

510

# For a single logfile with access, agent, and referer information

#

(Combined Logfile Format), use the following directive:

#

CustomLog logs/access_log combined

As the above configuration suggests, multiple log files, each containing different information, could be updated with each hit, though of course performance is a consideration. By default, Red Hat Enterprise Linux only updates the single file /var/log/httpd/access_log, using the combined format.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

34

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Configuration

Remapping the URL Namespace: Alias

Up until now, we have had a very clean concept of the URL namespace: the file portion of a URL maps directly to a file which exists underneath the document root directory. The Alias directive allows administrators to make arbitrary mappings from a portion of the URL namespace to any directory in the filesystem.

Figure 2-15. /etc/httpd/conf/httpd.conf

# Aliases: Add here as many aliases as you need (with no limit). The format is

#

Alias fakename realname

#

#

Note that if you include a trailing / on fakename then the server will

530

# require it to be present in the URL.

So "/icons" isn’t aliased in this

#

example, only "/icons/". If the fakename is slash-terminated, then the

#

realname must also be slash terminated, and if the fakename omits the

#

trailing slash, the realname must also omit it.

#

535

# We include the /icons/ alias for FancyIndexed directory listings. If you

#

do not use FancyIndexing, you may comment this out.

#

Alias /icons/ "/var/www/icons/"

As an example, the default Red Hat Enterprise Linux configuration aliases http://localhost/icons/ to the directory /var/www/icons/, which is not underneath the document root, but a sibling of it. The remapping should be easy enough to confirm by following the above link, and taking a ls of the icons directory.

For better or for worse, we now have a way to expose portions of our filesystem which are not under the document root. Another option is the use of symbolic links, which will be discussed in more detail shortly.

Also, notice the comments about trailing slashes, which have often been a source of confusion. The Apache webserver automatically redirects clients which refer to directories without the trailing slash to an equivalent URL which does (watch closely as you access http://localhost/example, and note that the browser ends up showing the omitted trailing slash). This causes some directory related configuration which doesn’t specify the omitted slash to be interpreted twice, which can cause confusion.

The Answer Book: http://localhost/manual

By now you could well be bewildered by the many different configuration directives, and in many ways we’ve just touched the tip of the iceberg. This seems a good time to introduce the manual, which in Red Hat Enterprise Linux ships as the separate http-manual package. Once installed, the manual can be accessed at http://localhost/manual.

[root@station ~]# yum install httpd-manual

=============================================================================

Package

Arch

Version

Repository

Size

=============================================================================

rha230-5.0-1-en-2008-01-21T07:12:18-0500

35

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is

a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether

in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed

please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Installing:

httpd-manual

i386

2.2.3-6.el5

Installed: httpd-manual.i386 0:2.2.3-6.el5 Complete!

[root@station ~]# service httpd restart

Chapter 2. Apache Configuration

rha-rhel

831 k

Stopping httpd:

[

OK

]

Starting httpd:

[

OK

]

The manual provides comprehensive documentation, organized by directive name, module name, or by topic (such as "Log Files" or "Virtual Hosts"). Anyone wishing to quickly refresh memories, or learn more about Apache configuration, should definitely load the manual as well.

Exercises

Lab Exercise

Objective: Configure the Apache web server.

Estimated Time: 45 mins.

Specification

You will probably want to make a backup of the main Apache configuration file (/etc/httpd/conf/httpd.conf) before starting this exercise, so that you can later restore the default configuration. If you have not already downloaded http://rha-server/pub/rha/rha230/readings.tgz and extracted its contents into the /var/www/html directory (as specified in the previous exercise), do so now.

Edit your Apache configuration so that the server meets the following specifications. The suggested technique is to duplicate the relevant lines of your configuration file, comment out the original configuration, and edit the new line to make your changes. You will probably want to make incremental changes, checking your configuration as you go.

1. Configure the Apache webserver so that it accepts HTTP/1.1 KeepAlive requests, but will only wait 3 seconds for a followup request before closing the connection.

Hint: you can confirm this configuration by capturing a transaction between the Firefox browser and your webserver with ethereal, and examining the HTTP headers of both the request and response.

2. Manage the bounds of the server pool, such that there are always between 2 and 4 (inclusive) child daemons present.

3. The Apache server should be bound to port 8888 (of at least the loopback address), in addition to port 80 (on all interfaces). (Note: you will need to drop SELinux into permissive mode in order to allow Apache to bind to a port other than 80 and 443).

4. Configure the web server such that index.htm is recognized as an index file, as well as index.html. Confirm your configuration by removing the file

rha230-5.0-1-en-2008-01-21T07:12:18-0500

36

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other

use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Configuration

/var/www/html/readings/relat10h/index.html that you created in the previous exercise, if it exists, leaving the original /var/www/html/readings/relat10h/index.htm, and

referencing http://localhost/readings/relat10h/.

5. Configure the server so that clients are logged by hostname (when available) as opposed to IP address. (Hint: You are not expected to need to edit any LogFormat directives).

6. Set the log level for the error log to debug.

7. In addition to the default logging, have every web request logged to the file /var/log/httpd/common_log, using what is commonly referred to as the common format.

8. In the separate configuration file /etc/httpd/conf.d/rha.conf, establish an Alias, so that the URL http://localhost/images/ refers to the directory

/var/www/html/readings/relat10h/pics. (If the relevant directory is still named picts,

rename it or symlink it to pics).

Deliverables

1. A running Apache webserver, that accepts Keep-Alive requests, but will close connections after 3 seconds of inactivity.

2. The server should maintain a server pool of between 2 and 4 pre-forked child daemons.

3. The server should be bound to the loopback address’s port 8888, in addition to the normal port 80.

4. The server should treat files named index.htm as index files, in addition to the standard index.html.

5. Transaction logging should log clients by hostname, if available.

6. The error log should log all messages with debug and higher priority.

7. In addition to the standard access_log, a transaction log named /var/log/httpd/common_log should be kept, logging in the common format.

8. The URL http://localhost/images/ should resolve to /var/www/html/readings/relat10h/pics, due to an alias established in the /etc/httpd/conf.d/rha.conf configuration file.

Questions

For all of the following questions, assume the default Red Hat Enterprise Linux configuration of the Apache webserver, unless the question states otherwise.

1. Which directory serves as the ServerRoot directory (i.e., the directory used as the base for all relative file references in the configuration file) ?

( ) a. /var/www/html

rha230-5.0-1-en-2008-01-21T07:12:18-0500

37

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

(

) b. /var/log/httpd

(

) c. /etc/httpd

(

) d. /etc

(

) e. None of the above

Chapter 2. Apache Configuration

2. Which file(s) is(are) used to configure the Apache web server upon startup?

(

) a. /etc/httpd/conf/httpd.conf

(

) b. /etc/apache.conf

(

) c. /etc/httpd/conf.d/ * .conf

(

) d. /etc/sysconfig/apache

(

) e. Both A and C

3. Which of the following directives could be used to improve the performance of a heavily loaded web server?

(

) a.

KeepAlive

(

) b.

MaxClients

(

) c.

MaxSpareServers

(

) d.

Timeout

(

) e. All of the above

4. Which of the following directives can be used to defend against memory leaks and other instabilities in poorly

written libraries and CGI scripts?

(

) a.

MaxClients

(

) b.

MaxRequestsPerChild

(

) c.

ServerLimit

(

) d.

KeepAlive

(

) e.

Listen

5. Which of the following best describes the default Apache server model?

( ) a. The server uses a traditional Unix forking model, where a new daemon is forked to handle connections for a particular client.

( ) b. The server uses a pre-forking model, whereby clients are distributed amongst a dynamic pool of pre-existing daemons.

( ) c. The server uses a multi-threaded model, whereby a single process clones multiple threads, each handling a distinct client.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

38

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Configuration

( ) d. The server uses a single process polling model, whereby the single process polls a collection of active connections for activity.

6. Which of the following lines would cause the web server to bind to port 8080 on the loopback address?

(

) a. Bind 127.0.0.1:8080

(

) b. Bind 127.0.0.1 8080

(

) c. Listen 127.0.0.1:8080

(

) d. Listen 127.0.0.1 8080

(

) e. None of the above

7. The apache manual states that %h is used to log the remote hostname or IP address. Yet, even using this parameter,

and administrator finds a log file logs using IP addresses instead. Which of the following configurations would allow client hostnames to be logged?

(

) a.

DNS /etc/resolv.conf

(

) b.

HostnameLookups On

(

) c.

LogNames On

(

) d.

LogLevel info

(

) e. None of the above

8. Which of the following directives would have the same end effect as

/images images ?

cd /var/www/html/data; ln -s

(

) a.

Alias /data/images/ /var/www/html/images/

(

) b.

Symlink /images/ /data/images/

(

) c.

Alias /images/ /data/images/

(

) d. View /var/www/html/images/ /data/images/

(

) e. None of the above

9. Assuming the httpd-manual package is installed, where can Apache documentation be found?

(

) a.

http://localhost/help

(

) b.

http://localhost/guide

(

) c.

http://localhost/apache

(

) d.

http://localhost/man

(

) e. None of the above

rha230-5.0-1-en-2008-01-21T07:12:18-0500

39

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Apache Configuration

10. After editing an Apache configuration file, what should be done for changes to take effect?

(

) a. chkconfig httpd on

(

) b.

service httpd restart

(

) c.

chkconfig httpd reload

(

) d. service httpd status

(

) e. No action is required, because the apache daemon actively monitors its configuration file.

Notes

1. The observant might notice the omission of the second field, inevitably a hyphen ("-"). This field used to refer to the username as returned by the legacy identd service, which is seldom implemented today.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

40

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Configuration: Containers

Key Concepts

The Apache web server allows context dependent configuration through the use of Directory,

Location, Files, and VirtualHost containers.

Often, the Options directive is used within containers to allow or disallow symbolic link resolution (with FollowSymLinks) and dynamic directory generation (with Indexes), among other parameters.

Often, the Order, allow from, and deny from directives are used within containers to implement access control based on the client’s IP address or hostname.

The default Red Hat Enterprise Linux configuration allows the resolution of symbolic links almost everywhere, but limits the generation of dynamic indexes to the intended document root directory.

Dynamic information about the Apache webserver can be obtained using custom handlers which are conventionally associated with the /server-status and /server-info locations.

Discussion

Tailoring Customization to Particular Content: Containers

The Apache webserver allows configuration to be customized to particular files or directories using

>, and end with an

XMLish closing tag, such as </Directory>. Directives found within the container only affect files which fall under the container’s scope.

There are essentially four types of scoping containers, which are exemplified below and itemized in the following table.

containers. Containers start with an XMLish opening tag, such as <Directory

Figure 3-1. Sample Apache Containers

<Directory "/var/www/icons"> Options Indexes MultiViews AllowOverride None Order allow,deny Allow from all </Directory>

<Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from .example.com </Location>

<Files ~ " * .hide"> Order allow,deny Deny from all </Files>

Chapter 3. Apache Configuration: Containers

<VirtualHost * :80> ServerAdmin webmaster@dummy-host.example.com DocumentRoot /www/docs/dummy-host.example.com ServerName dummy-host.example.com ErrorLog logs/dummy-host.example.com-error_log CustomLog logs/dummy-host.example.com-access_log common </VirtualHost>

Table 3-1. Apache Scoping Containers

Directive Scope Directory All files which exist in or underneath the specified directory in the
Directive
Scope
Directory
All files which exist in or underneath the specified directory in the filesystem,
after URL to filename translation occurs.
Location
All files which exist in or underneath the specified location in the URL
namespace, before URL to filename translation occurs.
Files
All files which match the specified pattern, no matter where they exist in the
filesystem or URL namespace.
VirtualHost
All files served by a particular virtual server. Virtual hosts will be covered in
detail in a later lesson.

The argument to the opening tag specifies the relevant file or directory (or, in the case of VirtualHost, IP address). The filename may either be explicit, or shell-like pathname expansion (file globbing) can be used.

Common Container Configuration

Skimming the containers exemplified above, one finds that container configuration often involves the following three concepts.

1. Options: Various capabilities of the web server are grouped under a general Options directive.

2. ACLs: The web server allows access control lists (or ACLs, informally pronounced "Ack-uls") to specify which clients are allowed to access information, using the Order, Allow, and Deny directives. (Access control can also be based on authenticated users, unfortunately a topic beyond the scope of the current course).

3. Overrides: If allowed with the AllowOverride directive, local configuration files intermixed with webserver content can dynamically override the startup configuration.

We look at each of these syntaxes in turn.

rha230-5.0-1-en-2008-01-21T07:12:18-0500

42

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Configuration: Containers

General Options: Options

The Apache server supports the following options, which are specified as arguments to the Options directive, usually within a scoping container. Of these, the first two are most commonly used.

Table 3-2. Apache Options

Option Effect Indexes When a URL references a directory (as opposed to a regular file),
Option
Effect
Indexes
When a URL references a directory (as opposed to a regular file), and no
index.html file is present (more on this in a bit), and this option is enabled, the
web server will return an automatically generated directory listing. If Indexes is
disabled, a 403 error page will be returned to the client (Access Forbidden).
FollowSymLinks
This option must be enabled in order for the webserver to resolve (follow) a
symbolic link.
A qualification of the FollowSymLinks option, where the symlink will only be
SymLinksIfOwnerMatch
followed if the file owner of the resulting file is the same as the file owner of the
link itself.
ExecCGI
Allow CGI executables to be executed from withing this scope. (More on these
later).
Includes,
IncludesNOEXEC
Server side includes are allowed (or, in the latter case, mostly allowed) from
within this scope. Server side includes are beyond the scope of this course.
Multiviews
If enabled, content negotiation between the client and the server is supported.
This allows a server to serve a document in the most appropriate of multiple
languages, for example. Further discussion of Multiviews is beyond the scope of
this course.
All
This option refers to all of the previous options collectively, with the exception of
Multiviews. Unless otherwise specified, this is the default configuration. (Recall
that in Red Hat Enterprise Linux, however, a different policy applies to the root
directory, effectively establishing a different default.)

Why not Indexes?

The decision to allow the web server to automatically generate indexes or not is really a matter of control. If indexes are automatically generated, then merely locating a file underneath the document root allows anyone to view it or copy it (often with automated command line clients such as wget), unless an index.html file is created to hide files within a particular directory. In contrast, if indexes are not allowed, files must be explicitly linked from other files (index.html or otherwise) to be easily discovered.

Many low maintenance, public web sites leave indexes on (such as the official Linux kernel repository (http://www.kernel.org/pub/linux)). Other web sites, hoping for a more professional look or more refined control of information, do not.

Why not resolve Symbolic Links?

Again, the decision to allow symlink resolution is basically one of control. If symlinks are not allowed,

rha230-5.0-1-en-2008-01-21T07:12:18-0500

43

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Configuration: Containers

an administrator has a clear concept of what portions of the file system are exposed through the web server (only files underneath the document root). If symlinks are resolved, however, a symlink underneath the document root could expose any other part of the filesystem.

More subtly, the decision to not resolve symlinks can degrade performance. When resolving a path to reference a file, the kernel automatically resolves symlinks. (If you were to cat the file /foo/biz/baz/buzz, you do not need to worry if the directory biz or baz is actually a symlink). If symlinks are disabled, however, the web server must make a system call on each of the nodes within a file path, asking "is it a symlink? is it a symlink? is it a symlink?" This degradation is one of the reasons why the default Red Hat Enterprise Linux configuration leaves FollowSymLinks enabled.

Options Syntax

The Options directive takes effect for the scope specified by its enclosing container. For example, the following container would enable indexes and symlink resolution for all files underneath the directory

/var/www/html.

<Directory /var/www/html> Options FollowSymLinks Indexes </Directory>

The following container, however, would enable indexes and server side includes underneath

/var/www/html/widgets.

<Directory /var/www/html/widgets> Options Indexes Includes </Directory>

The directory /var/www/html/widgets does not inherit its options from /var/www/html, but instead gets its configuration entirely from the new Options line. Because FollowSymLinks is not mentioned, symlinks underneath /var/www/html/widgets will not be resolved.

In contrast, options can be preceded by a "+" or "-", implying that options should be inherited from the enclosing scope, with the simple addition or stripping of a particular option. Consider rewriting the above container as follows.

<Directory /var/www/html/widgets> Options +Includes </Directory>

In this case, the /var/www/html/widgets directory would have Includes, Indexes, and FollowSymLinks enabled (the latter two inherited from /var/www/html).

Similarly, the following container would leave /var/www/html/widgets with only the FollowSymLinks option enabled.

<Directory /var/www/html/widgets> Options -Indexes </Directory>

rha230-5.0-1-en-2008-01-21T07:12:18-0500

44

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Configuration: Containers

Client Access Control: Order, Allow, Deny

The Apache web server allows an administrator to impose access control restrictions on a directory by directory (or even file by file) basis using access control lists. These ACL’s are composed of the following directives.

The Allow Directive

The Allow directive uses the following syntax to specify which clients are allowed to connect to a given resource.

Allow from client_specification

The client_specification is composed of a whitespace separated list of any of the following elements.

Table 3-3. Apache ACL client specification

Syntax Example Meaning ALL ALL All clients Full IP addresses 192.168.0.3 The specified client Partial
Syntax
Example
Meaning
ALL
ALL
All clients
Full IP addresses
192.168.0.3
The specified client
Partial IP addresses
172.63.
All clients whose IP address begins as specified
Network/Netmask
192.168.1.64/255.255.255.192All clients who belong to the specified subnet
notation
CIDR notation
192.168.1.64/26
All clients who belong to the specified subnet
(this example is completely equivalent to the
preceding example).
A full or partial domain
name
.example.com
All clients whose reverse lookup domain name
ends as specified (reverse lookups must be enabled
with HostnameLookups)

The Deny Directive

The Deny directive uses an identical syntax to specify which clients are not allowed to connect to a given resource.

Deny from client_specification

The client_specification is composed of the same elements as for the Allow directive.

The Order directive

Here’s where things get interesting. Whenever client ACLs are specified with the Allow and Deny directives, the order of precedence must be specified with the Order directive.

The Order directive usually comes in one of two forms.

Order Allow,Deny

rha230-5.0-1-en-2008-01-21T07:12:18-0500

45

Copyright (c) 2003-2007 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. Apache Configuration: Containers

In this case, any clients which are unspecified (not matching any rule) or over specified (they match both an allow and deny rule) are denied.

Order Deny,Allow

In this case, any clients which are unspecified or over specified are allowed