Beruflich Dokumente
Kultur Dokumente
Cookies
-------
Are you sure the server is sending you any cookies in the first place? Maybe
the server is keeping track of state in some other way (`HIDDEN` HTML form
entries (possibly in a separate page referenced by a frame), URL-encoded
session keys, IP address, HTTP `Referer` headers)? Perhaps some embedded
script in the HTML is setting cookies (see below)? Turn on
[logging](#logging).
~~~~{.python}
import mechanize
cj = mechanize.LWPCookieJar()
opener = mechanize.build_opener(mechanize.HTTPCookieProcessor(cj))
mechanize.install_opener(opener)
r = mechanize.urlopen("http://foobar.com/")
cj.save("/some/file", ignore_discard=True, ignore_expires=True)
~~~~
JavaScript code can set cookies; mechanize does not support this. See [the
FAQ](faq.html#script).
General
-------
Enable [logging](#logging).
Sometimes, a server wants particular HTTP headers set to the values it expects.
For example, the `User-Agent` header may need to be [set](./doc.html#headers)
to a value like that of a popular browser.
Check that the browser is able to do manually what you're trying to achieve
programatically. Make sure that what you do manually is *exactly* the same as
what you're trying to do from Python -- you may simply be hitting a server bug
that only gets revealed if you view pages in a particular order, for example.
Try comparing the headers and data that your program sends with those that a
browser sends. Often this will give you the clue you need. There are [browser
addons](faq.html#sniffing) available that allow you to see what the browser
sends and receives even if HTTPS is in use.
If nothing is obviously wrong with the requests your program is sending and
you're out of ideas, you can reliably locate the problem by copying the headers
that a browser sends, and then changing headers until your program stops
working again. Temporarily switch to explicitly sending individual HTTP
headers (by calling `.add_header()`, or by using `httplib` directly). Start by
sending exactly the headers that Firefox or IE send. You may need to make sure
that a valid session ID is sent -- the one you got from your browser may no
longer be valid. If that works, you can begin the tedious process of changing
your headers and data until they match what your original code was sending.
You should end up with a minimal set of changes. If you think that reveals a
bug in mechanize, please [report it](support.html).
Logging
-------
~~~~{.python}
import sys, logging
logger = logging.getLogger("mechanize")
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.DEBUG)
~~~~
You can reduce the amount of information shown by setting the level to
`logging.INFO` instead of `logging.DEBUG`, or by only enabling logging for one
of the following logger names instead of `"mechanize"`:
* `"mechanize"`: Everything.
HTTP headers
------------
~~~~{.python}
import sys, logging
import mechanize
logger = logging.getLogger("mechanize")
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.DEBUG)
browser = mechanize.Browser()
browser.set_debug_http(True)
browser.set_debug_responses(True)
browser.set_debug_redirects(True)
response = browser.open("http://python.org/")
~~~~
Alternatively, you can examine request and response objects to see what's going
on. Note that requests may involve "sub-requests" in cases such as
redirection, in which case you will not see everything that's going on just by
examining the original request and final response. It's often useful to [use
the `.get_data()` method](./doc.html#seekable-responses) on responses during
debugging.
~~~~{.python}
import mechanize
hh = mechanize.HTTPHandler() # you might want HTTPSHandler, too
hh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh)
response = opener.open(url)
~~~~