Beruflich Dokumente
Kultur Dokumente
org/admin/reverseproxies
In 2003, Nick Kew released a new module that complements Apache's mod_proxy and is essential
for reverse-proxying. Since then he gets regular questions and requests for help on proxying with
Apache. In this article he attempts to give a comprehensive overview of the proxying and
mod_proxy_html
This article was originally published at ApacheWeek in January 2004, and moved to ApacheTutor
with minor updates in October 2006. The current revision was made in October 2009 and
incorporates updates in Apache 2.2 and mod_proxy_html 3.1.
Notes
Sorry, I've turned off anonymous annotations, due to an unusual level of abuse. Existing non-spam
annotations are preserved.
A proxy server is a gateway for users to the Web at large. Users configure
the proxy in their browser settings, and all HTTP requests are routed via
the proxy. Proxies are typically operated by ISPs and network
administrators, and serve several purposes: for example,
A reverse proxy is a gateway for servers, and enables one web server to
provide content from another transparently. As with a standard proxy, a
reverse proxy may serve to improve performance of the web by caching;
this is a simple way to mirror a website. Loadbalancing a heavy-duty
application, or protecting a vulnerable one, are other common usages. But
the most common reason to run a reverse proxy is to enable controlled
access from the Web at large to servers behind a firewall.
1 de 11 11/5/17 11:24
Running a Reverse Proxy with Apache: http://www.apachetutor.org/admin/reverseproxies
Apache 2.2 brings major improvements over Apache 2.0 in both proxying
and cacheing, and is also the first version to support load-balancing as
standard. If you are using an older Apache version, it is strongly
recommended you upgrade.
Having mentioned the modules, I'm going to ignore caching for the
remainder of this article. You may want to add it if you are concerned
about the load on your network or origin servers, but the details are
outside the scope of this article. I'm also going to ignore all non-HTTP
protocols, and load balancing.
2 de 11 11/5/17 11:24
Running a Reverse Proxy with Apache: http://www.apachetutor.org/admin/reverseproxies
Note: if you are installing Apache from a package, you will just need to
install packages for Apache, libxml2 and third-party modules according to
your distributor's conventions, which may differ from what is described
here.
Most of the above modules are included in the core Apache distribution.
They can easily be enabled in the Apache build process. For example:
Of course, you may want other build options too, and you could just as
well build the modules as static.
If you are adding proxying to an existing installation, you should use apxs
instead:
# apxs -c -i [module-name].c
noting that mod_proxy itself is in two source files
(mod_proxy.c and proxy_util.c).
The company also has a couple of application servers which have private
IP addresses and unregistered DNS entries, and are inside the firewall.
The application servers are visible within the network - including the
webserver, as "internal1.example.com" and "internal2.example.com", But
because they have no public DNS entries, anyone looking at
internal1.example.com from outside the company network will get a "no
such host" error.
3 de 11 11/5/17 11:24
Running a Reverse Proxy with Apache: http://www.apachetutor.org/admin/reverseproxies
For windows users this is slightly different: you'll need to load libxml2.dll
rather than libxml2.so, and you'll probably need to load iconv.dll and
xlib.dll as prerequisites to libxml2 (you can download them from
zlatkovic.com, the same site that maintains windows binaries of libxml2).
The LoadFile directive is the same.
Of course, you may not need all the modules. Two that are not required in
our typical scenario are shown commented out above.
Having loaded the modules, we can now configure the Proxy. But before
doing so, we have an important security warning:
Of course, you may also want to run a forward proxy with appropriate
security measures, but that lies outside the scope of this article. The author
runs both forward and reverse proxies on the same server (but under
different Virtual Hosts).
4 de 11 11/5/17 11:24
Running a Reverse Proxy with Apache: http://www.apachetutor.org/admin/reverseproxies
However, this is not the whole story. ProxyPass just sends traffic straight
through. So when the application servers generate references to
themselves (or to other internal addresses), they will be passed straight
through to the outside world, where they won't work.
For example, an HTTP redirection often takes place when a user (or
author) forgets a trailing slash in a URL. So the response to a request for
http://www.example.com/app1/foo proxies to
http://internal.example.com/foo which generates a response:
But from the outside world, the net effect of this is a "No such host" error.
The proxy needs to re-map the Location header to its own address space
and return a valid URL
<Location /app1/>
ProxyPassReverse /
</Location>
<Location /app2/>
ProxyPassReverse /
</Location>
The reason for recommending this is that a problem arises with some
application servers. Suppose for example we have a redirect:
5 de 11 11/5/17 11:24
Running a Reverse Proxy with Apache: http://www.apachetutor.org/admin/reverseproxies
address space and won't work. The second form fixes this to
If your backend server uses cookies, you may also need the
ProxyPassReverseCookiePath and ProxyPassReverseCookieDomain
directives. These are similar to ProxyPassReverse, but deal with the
different form of cookie headers. These require mod_proxy from Apache
2.2 (recommended), or a patched version of 2.0.
To fix this requires us to parse the HTML and rewrite the links. This is the
purpose of mod_proxy_html. It works as an output filter, parsing the
HTML and rewriting links as it is served. Two basic configuration
directives are required to set it up:
ProxyHTMLEnable On
This activates mod_proxy_html (and mod_xml2enc if available) for
the request, and enables ProxyHTMLURLMap and other directives.
ProxyHTMLURLMap from-pattern to-pattern [flags] [cond]
In its basic form, this has a similar purpose and semantics to
ProxyPassReverse. Additionally, an extended form is available to
enable search-and-replace rewriting of URLs within Scripts and
Stylesheets.
How it works
6 de 11 11/5/17 11:24
Running a Reverse Proxy with Apache: http://www.apachetutor.org/admin/reverseproxies
full knowledge of all URI attributes that can occur in HTML 4 and
XHTML 1. Whenever a URL is encountered, it is matched against
applicable ProxyHTMLURLMap directives. If it starts with any
from-pattern, that will be rewritten to the to-pattern. Rules are applied in
the reverse order to their appearance in httpd.conf, and matching stops as
soon as a match is found.
Here's how we set up a reverse proxy for HTML. Firstly, full links to the
internal servers should be rewritten regardless of where they arise, so we
have:
Note that in this instance we omitted the "trailing" slash. Since the
matching logic is starts-with, we use the minimal matching pattern. We
have now globally fixed case 3 above.
Case 2 above requires a little more care. Because the link doesn't include
the hostname, the rewrite rule must be context-sensitive. As with
ProxyPassReverse above, we deal with that using <Location>
<Location /app1/>
ProxyHTMLURLMap / /app1/
</Location>
<Location /app2/>
ProxyHTMLURLMap / /app2/
</Location>
ProxyHTMLLogVerbose On
LogLevel Info (or LogLevel Debug)
Now run your testcases through your rulesets, and examine the apache
error log for details of exactly how it was processed.
The previous section sets up remapping of HTML URLs, but leaves any
URL encountered in a Stylesheet or Script untouched. mod_proxy_html
doesn't parse Javascript or CSS, so dealing with URLs in them requires
text-based search-and-replace. This is enabled by the directive
ProxyHTMLExtended On.
7 de 11 11/5/17 11:24
Running a Reverse Proxy with Apache: http://www.apachetutor.org/admin/reverseproxies
We just set up a proxy to parse and where necessary correct HTML. But
of course, the web isn't just HTML. Surely feeding non-HTML content
through an HTML parser is at best inefficient, if not totally broken?
Content-Type: text/html
Content-Encoding: gzip
There are two solutions to this. One is to uncompress the incoming data
with mod_deflate. Uncompressing and compressing content radically
reduces network traffic, but increases the processor load on the proxy. It is
worthwhile if and only if bandwidth between the proxy and the backend is
at a premium: this is common on the 'net at large, but unlikely to be the
case on a company internal network.
SetOutputFilter INFLATE;DEFLATE
This should only apply to the Proxy, so we put it inside our <Location>
containers.
8 de 11 11/5/17 11:24
Running a Reverse Proxy with Apache: http://www.apachetutor.org/admin/reverseproxies
and a certificate on the proxy, so that the actual secure session is between
the browser and the proxy, not the origin server.
ProxyRequests off
ProxyPass /app1/ http://internal1.example.com/
ProxyPass /app2/ http://internal2.example.com/
ProxyHTMLURLMap http://internal1.example.com /app1
ProxyHTMLURLMap http://internal2.example.com /app2
<Location /app1/>
ProxyPassReverse /
ProxyHTMLEnable On
ProxyHTMLURLMap / /app1/
RequestHeader unset Accept-Encoding
</Location>
<Location /app2/>
ProxyPassReverse /
ProxyHTMLEnable On
ProxyHTMLURLMap / /app2/
RequestHeader unset Accept-Encoding
</Location>
Of course, there's more than one way to do it. Our configuration would
actually have been simpler if we'd used Virtual Hosts for each application
server. But that takes you beyond the realm of Apache configuration and
into DNS. If you don't fully understand that (or if you think "why can't I
see my domain" is a webserver question), then please don't try using
virtual hosts for this.
NOTE
Cacheing
9 de 11 11/5/17 11:24
Running a Reverse Proxy with Apache: http://www.apachetutor.org/admin/reverseproxies
Load Balancing
A reverse proxy is not the natural place for a "family filter", but is ideal
for defining access controls and imposing security restrictions. We could,
for example, configure the proxy to recognise a custom header from an
origin server and block content based on it. This delegates control to the
application servers.
1. Changing the FPI (the <!DOCTYPE ...> line) may affect some
browsers. FIX: set the doctype explicitly if this bothers you.
10 de 11 11/5/17 11:24
Running a Reverse Proxy with Apache: http://www.apachetutor.org/admin/reverseproxies
<body>
<p>Hello, World!
</body>
will be transformed to
<body>
<p>Hello, World!</p>
<body>
11 de 11 11/5/17 11:24