Sie sind auf Seite 1von 10

httrack

11/5/13, 4:08 PM

httrack
NAME SYNOPSIS DESCRIPTION EXAMPLES OPTIONS FILES ENVIRONMENT DIAGNOSTICS LIMITS BUGS COPYRIGHT AVAILABILITY AUTHOR SEE ALSO

NAME
httrack ! ofine browser : copy websites to a local directory

SYNOPSIS
httrack [ url ]... [ !lter ]... [ +lter ]... [ !O, !!path ] [ !%O, !!chroot ] [ !w, !!mirror ] [ !W, !!mirror!wizard ] [ !g, !!get!les ] [ !i, !!continue ] [ !Y, !!mirrorlinks ] [ !P, !!proxy ] [ !%f, !!httpproxy!ftp[=N] ] [ !%b, !!bind ] [ !rN, !!depth[=N] ] [ ! %eN, !!ext!depth[=N] ] [ !mN, !!max!les[=N] ] [ !MN, !!max!size[=N] ] [ !EN, ! !max!time[=N] ] [ !AN, !!max!rate[=N] ] [ !%cN, !!connection!per!second[=N] ] [ !GN, !!max!pause[=N] ] [ !%mN, !!max!mms!time[=N] ] [ !cN, !!sockets[=N] ] [ !TN, !!timeout ] [ !RN, !!retries[=N] ] [ !JN, !!min!rate[=N] ] [ !HN, !!host !control[=N] ] [ !%P, !!extended!parsing[=N] ] [ !n, !!near ] [ !t, !!test ] [ !%L, ! !list ] [ !%S, !!urllist ] [ !NN, !!structure[=N] ] [ !%D, !!cached!delayed!type !check ] [ !%M, !!mime!html ] [ !LN, !!long!names[=N] ] [ !KN, !!keep !links[=N] ] [ !x, !!replace!external ] [ !%x, !!disable!passwords ] [ !%q, ! !include!query!string ] [ !o, !!generate!errors ] [ !X, !!purge!old[=N] ] [ !%p, ! !preserve ] [ !bN, !!cookies[=N] ] [ !u, !!check!type[=N] ] [ !j, !!parse!java[=N] ] [ !sN, !!robots[=N] ] [ !%h, !!http!10 ] [ !%k, !!keep!alive ] [ !%B, !!tolerant ] [ ! %s, !!updatehack ] [ !%u, !!urlhack ] [ !%A, !!assume ] [ !@iN, !!protocol[=N] ] [ !%w, !!disable!module ] [ !F, !!user!agent ] [ !%R, !!referer ] [ !%E, !!from ] [ !%F, !!footer ] [ !%l, !!language ] [ !C, !!cache[=N] ] [ !k, !!store!all!in!cache ] [ !%n, !!do!not!recatch ] [ !%v, !!display ] [ !Q, !!do!not!log ] [ !q, !!quiet ] [ !z, !!extra!log ] [ !Z, !!debug!log ] [ !v, !!verbose ] [ !f, !!le!log ] [ !f2, ! !single!log ] [ !I, !!index ] [ !%i, !!build!top!index ] [ !%I, !!search!index ] [
http://www.httrack.com/html/httrack.man.html Page 1 of 10

httrack

11/5/13, 4:08 PM

!pN, !!priority[=N] ] [ !S, !!stay!on!same!dir ] [ !D, !!can!go!down ] [ !U, ! !can!go!up ] [ !B, !!can!go!up!and!down ] [ !a, !!stay!on!same!address ] [ !d, !!stay!on!same!domain ] [ !l, !!stay!on!same!tld ] [ !e, !!go!everywhere ] [ !%H, !!debug!headers ] [ !%!, !!disable!security!limits ] [ !V, !!userdef!cmd ] [ !%U, !!user ] [ !%W, !!callback ] [ !K, !!keep!links[=N] ] [

DESCRIPTION
httrack allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other les from the server to your computer. HTTrack arranges the original sites relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads.

EXAMPLES
httrack www.someweb.com/bob/ mirror site www.someweb.com/bob/ and only this site httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg !mime:application/* mirror the two sites together (with shared links) and accept any .jpg les on .com sites httrack www.someweb.com/bob/bobby.html +* !r6 means get all les starting from bobby.html, with 6 link!depth, and possibility of going everywhere on the web httrack www.someweb.com/bob/bobby.html !!spider !P proxy.myhost.com:8080 runs the spider on www.someweb.com/bob/bobby.html using a proxy httrack !!update updates a mirror in the current folder httrack will bring you to the interactive mode httrack !!continue continues a mirror in the current folder

OPTIONS
General options: !O path for mirror/logles+cache (!O path mirror[,path cache and logles]) (! !path <param>) ! chroot path to, must be r00t (!%O root path) (!!chroot <param>) %O Action options:
http://www.httrack.com/html/httrack.man.html Page 2 of 10

httrack

11/5/13, 4:08 PM

!w !W !g !i !Y

*mirror web sites (!!mirror) mirror web sites, semi!automatic (asks questions) (!!mirror!wizard) just get les (saved in the current directory) (!!get!les) continue an interrupted mirror using the cache (!!continue) mirror ALL links located in the rst level pages (mirror links) (!!mirrorlinks)

Proxy options: !P ! %f ! %b proxy use (!P proxy:port or !P user:pass@proxy:port) (!!proxy <param>) *use proxy for ftp (f0 don t use) (!!httpproxy!ftp[=N]) use this local hostname to make/send requests (!%b hostname) (!!bind <param>)

Limits options: !rN !%eN !mN !mN,N2 !MN !EN !AN !%cN !GN !%mN set the mirror depth to N (* r9999) (!!depth[=N]) set the external links depth to N (* %e0) (!!ext!depth[=N]) maximum le length for a non!html le (!!max!les[=N]) maximum le length for non html (N) and html (N2) maximum overall size that can be uploaded/scanned (!!max!size[=N]) maximum mirror time in seconds (60=1 minute, 3600=1 hour) (!!max !time[=N]) maximum transfer rate in bytes/seconds (1000=1KB/s max) (!!max !rate[=N]) maximum number of connections/seconds (*%c10) (!!connection!per !second[=N]) pause transfer if N bytes reached, and wait until lock le is deleted (!!max !pause[=N]) maximum mms stream download time in seconds (60=1 minute, 3600=1 hour) (!!max!mms!time[=N])

Flow control: !cN !TN !RN !JN !HN number of multiple connections (*c8) (!!sockets[=N]) timeout, number of seconds after a non!responding link is shutdown (! !timeout) number of retries, in case of timeout or non!fatal errors (*R1) (!!retries[=N]) trafc jam control, minimum transfert rate (bytes/seconds) tolerated for a link (!!min!rate[=N]) host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or slow (! !host!control[=N])

Links options: ! %P !n *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don t use) (!!extended!parsing[=N]) get non!html les near an html le (ex: an image located outside) (!!near)
Page 3 of 10

http://www.httrack.com/html/httrack.man.html

httrack

11/5/13, 4:08 PM

!t ! %L ! %S

test all URLs (even forbidden ones) (!!test) <le> add all URL located in this text le (one URL per line) (!!list <param>) <le> add all scan rules located in this text le (one scan rule per line) (! !urllist <param>)

Build options: !NN !or ! %N ! %D ! %M !LN !KN !x !%x !%q !o !X !%p structure type (0 *original structure, 1+: see below) (!!structure[=N]) user dened structure (!N "%h%p/%n%q.%t") delayed type check, don t make any link test but wait for les download to start instead (experimental) (%N0 don t use, %N1 use for unknown extensions, * %N2 always use) cached delayed type check, don t wait for remote type during updates, to speedup them (%D0 wait, * %D1 don t wait) (!!cached!delayed!type !check) generate a RFC MIME!encapsulated full!archive (.mht) (!!mime!html) long names (L1 *long names / L0 8!3 conversion / L2 ISO9660 compatible) (!!long!names[=N]) keep original links (e.g. http://www.adr/link) (K0 *relative link, K absolute links, K4 original links, K3 absolute URI links, K5 transparent proxy link) (! !keep!links[=N]) replace external html links by error pages (!!replace!external) do not include any password for external password protected websites (%x0 include) (!!disable!passwords) *include query string for local les (useless, for information purpose only) (%q0 don t include) (!!include!query!string) *generate output html le in case of error (404..) (o0 don t generate) (! !generate!errors) *purge old les after update (X0 keep delete) (!!purge!old[=N]) preserve html les as is (identical to !K4 !%F "" ) (!!preserve)

Spider options: !bN !u !j !sN !%h !%k !%B accept cookies in cookies.txt (0=do not accept,* 1=accept) (!!cookies[=N]) check document type if unknown (cgi,asp..) (u0 don t check, * u1 check but /, u2 check always) (!!check!type[=N]) *parse Java Classes (j0 don t parse, bitmask: |1 parse default, |2 don t parse .class |4 don t parse .js |8 don t be aggressive) (!!parse!java[=N]) follow robots.txt and meta robots tags (0=never,1=sometimes,* 2=always, 3=always (even strict rules)) (!!robots[=N]) force HTTP/1.0 requests (reduce update features, only for old servers or proxies) (!!http!10) use keep!alive if possible, greately reducing latency for small les and test requests (%k0 don t use) (!!keep!alive) tolerant requests (accept bogus responses on some servers, but not standard!) (!!tolerant)
Page 4 of 10

http://www.httrack.com/html/httrack.man.html

httrack

11/5/13, 4:08 PM

!%s !%u !%A !can !@iN !%w

update hacks: various hacks to limit re!transfers when updating (identical size, bogus response..) (!!updatehack) url hacks: various hacks to limit duplicate URLs (strip //, www.foo.com==foo.com..) (!!urlhack) assume that a type (cgi,asp..) is always linked with a mime type (!%A php3,cgi=text/html;dat,bin=application/x!zip) (!!assume <param>) also be used to force a specic le type: !!assume foo.cgi=text/html internet protocol (0=both ipv6+ipv4, 4=ipv4 only, 6=ipv6 only) (! !protocol[=N]) disable a specic external mime module (!%w htsswf !%w htsjava) (! !disable!module <param>)

Browser ID: !F ! %R ! %E ! %F !%l user!agent eld sent in HTTP headers (!F "user!agent name") (!!user !agent <param>) default referer eld sent in HTTP headers (!!referer <param>) from email address sent in HTTP headers (!!from <param>) footer string in Html code (!%F "Mirrored [from host %s [le %s [at %s]]]" (! !footer <param>) preffered language (!%l "fr, en, jp, *" (!!language <param>)

Log, index, cache !C !k ! %n ! %v !Q !q !z !Z !v !f !f2 !I !%i ! %I create/use a cache for updates and retries (C0 no cache,C1 cache is prioritary,* C2 test update before) (!!cache[=N]) store all les in cache (not useful if les on disk) (!!store!all!in!cache) do not re!download locally erased les (!!do!not!recatch) display on screen lenames downloaded (in realtime) ! * %v1 short version ! %v2 full animation (!!display) no log ! quiet mode (!!do!not!log) no questions ! quiet mode (!!quiet) log ! extra infos (!!extra!log) log ! debug (!!debug!log) log on screen (!!verbose) *log in les (!!le!log) one single log le (!!single!log) *make an index (I0 don t make) (!!index) make a top index for a project folder (* %i0 don t make) (!!build!top!index) make an searchable index for this mirror (* %I0 don t make) (!!search !index)

Expert options: !pN priority mode: (* p3) (!!priority[=N])


Page 5 of 10

http://www.httrack.com/html/httrack.man.html

httrack

11/5/13, 4:08 PM

!p0 !p1 !p2 !*p3 !p7 !S !D !U !B !a !d !l !e !%H

just scan, don t save anything (for checking links) save only html les save only non html les save all les get html les before, then treat other les stay on the same directory (!!stay!on!same!dir) *can only go down into subdirs (!!can!go!down) can only go to upper directories (!!can!go!up) can both go up&down into the directory structure (!!can!go!up!and!down) *stay on the same address (!!stay!on!same!address) stay on the same principal domain (!!stay!on!same!domain) stay on the same TLD (eg: .com) (!!stay!on!same!tld) go everywhere on the web (!!go!everywhere) debug HTTP headers in logle (!!debug!headers)

Guru options: (do NOT use if possible) !#X !#0 !#1 !#2 !#C !#R !#d !#E !#f !#FN !#h !#K !#L !#p !#P !#R !#T !#u !#Z !#! *use optimized engine (limited memory boundary checks) (!!fast!engine) lter test (!#0 *.gif www.bar.com/foo.gif ) (!!debug!testlters <param>) simplify test (!#1 ./foo/bar/../foobar) type test (!#2 /foo/bar.php) cache list (!#C *.com/spider*.gif (!!debug!cache <param>) cache repair (damaged cache) (!!repair!cache) debug parser (!!debug!parsing) extract new.zip cache meta!data in meta.zip always ush log les (!!advanced!ushlogs) maximum number of lters (!!advanced!maxlters[=N]) version info (!!version) scan stdin (debug) (!!debug!scanstdin) maximum number of links (!#L1000000) (!!advanced!maxlinks) display ugly progress information (!!advanced!progressinfo) catch URL (!!catch!url) old FTP routines (debug) (!!repair!cache) generate transfer ops. log every minutes (!!debug!xfrstats) wait time (!!advanced!wait) generate transfer rate statictics every minutes (!!debug!ratestats) execute a shell command (!#! "echo hello") (!!exec <param>)

Dangerous options: (do NOT use unless you exactly know what you are doing) ! bypass built!in security limits aimed to avoid bandwith abuses (bandwidth, %! simultaneous connections) (!!disable!security!limits) !IMPORTANT NOTE: DANGEROUS OPTION, ONLY SUITABLE FOR EXPERTS !USE IT WITH EXTREME CARE Command!line specic options:
http://www.httrack.com/html/httrack.man.html Page 6 of 10

httrack

11/5/13, 4:08 PM

!V ! %U ! %W

execute system command after each les ($0 is the lename: !V "rm ") (! !userdef!cmd <param>) run the engine with another id when called as root (!%U smith) (!!user <param>) use an external library function as a wrapper (!%W myfoo.so[,myparameters]) (!!callback <param>)

Details: Option N !N0 !N1 !N2 !N3 !N4 !N5 !N99 !N100 !N101 !N102 !N103 !N104 !N105 !N199 !N1001 !N1002 !N1003 !N1004 !N1005 !N1099 Site!structure (default) HTML in web/, images/other les in web/images/ HTML in web/HTML, images/other in web/images HTML in web/, images/other in web/ HTML in web/, images/other in web/xxx, where xxx is the le extension (all gif will be placed onto web/gif, for example) Images/other in web/xxx and HTML in web/HTML All les in web/, with random names (gadget !) Site!structure, without www.domain.xxx/ Identical to N1 exept that "web" is replaced by the site s name Identical to N2 exept that "web" is replaced by the site s name Identical to N3 exept that "web" is replaced by the site s name Identical to N4 exept that "web" is replaced by the site s name Identical to N5 exept that "web" is replaced by the site s name Identical to N99 exept that "web" is replaced by the site s name Identical to N1 exept that there is no "web" directory Identical to N2 exept that there is no "web" directory Identical to N3 exept that there is no "web" directory (option set for g option) Identical to N4 exept that there is no "web" directory Identical to N5 exept that there is no "web" directory Identical to N99 exept that there is no "web" directory

Details: User!dened option N %n Name of le without le type (ex: image) %N Name of le, including le type (ex: image.gif) %t File type (ex: gif) %p Path [without ending /] (ex: /someimages) %h Host name (ex: www.someweb.com) %M URL MD5 (128 bits, 32 ascii bytes) %Q query string MD5 (128 bits, 32 ascii bytes) %k full query string %r protocol name (ex: http) %q small query string MD5 (16 bits, 4 ascii bytes) %s? Short name version (ex: %sN) %[param] param variable in query string %[param:before:after:empty:notfound] advanced variable extraction Details: User!dened option N and advanced variable extraction %[param:before:after:empty:notfound]
http://www.httrack.com/html/httrack.man.html Page 7 of 10

httrack

11/5/13, 4:08 PM

!param !before !after !notfound !empty !all

: parameter name : string to prepend if the parameter was found : string to append if the parameter was found : string replacement if the parameter could not be found : string replacement if the parameter was empty elds, except the rst one (the parameter name), can be empty

Details: Option K !K0 !K !K3 !K4 !K5 foo.cgi?q=45 !> foo4B54.html?q=45 (relative URI, default) !> http://www.foobar.com/folder/foo.cgi?q=45 (absolute URL) (!!keep !links[=N]) !> /folder/foo.cgi?q=45 (absolute URI) !> foo.cgi?q=45 (original URL) !> http://www.foobar.com/folder/foo4B54.html?q=45 (transparent proxy URL)

Shortcuts: !!mirror <URLs> *make a mirror of site(s) (default) !!get <URLs> get the les indicated, do not seek other URLs (!qg) !!list <text le> add all URL located in this text le (!%L) !!mirrorlinks <URLs> mirror all links in 1st level pages (!Y) !!testlinks <URLs> test links in pages (!r1p0C0I0t) !!spider <URLs> spider site(s), to test links: reports Errors & Warnings (!p0C0I0t) !!testsite <URLs> identical to !!spider !!skeleton <URLs> make a mirror, but gets only html les (!p1) !!update update a mirror, without conrmation (!iC2) !!continue continue a mirror, without conrmation (!iC1) !!catchurl create a temporary proxy to capture an URL or a form post URL !!clean erase cache & log les !!http10 force http/1.0 requests (!%h) Details: Option %W: External callbacks prototypes see htsdenes.h

http://www.httrack.com/html/httrack.man.html

Page 8 of 10

httrack

11/5/13, 4:08 PM

FILES
/etc/httrack.conf The system wide conguration le.

ENVIRONMENT
HOME Is being used if you dened in /etc/httrack.conf the line path ~/websites/#

DIAGNOSTICS
Errors/Warnings are reported to hts!log.txt by default, or to stderr if the -v option was specied.

LIMITS
These are the principals limits of HTTrack for that moment. Note that we did not heard about any other utility that would have solved them.
- Several scripts generating complex lenames may not nd them (ex: img.src=image+a+Mobj.dst+.gif) - Some java classes may not nd some les on them (class included) - Cgi-bin links may not work properly in some cases (parameters needed). To avoid them: use lters like *cgi-bin*

BUGS
Please reports bugs to <bugs@httrack.com>. Include a complete, self-contained example that will allow the bug to be reproduced, and say which version of httrack you are using. Do not forget to detail options used, OS version, and any other information you deem necessary.

COPYRIGHT
Copyright (C) Xavier Roche and other contributors This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
http://www.httrack.com/html/httrack.man.html Page 9 of 10

httrack

11/5/13, 4:08 PM

details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

AVAILABILITY
The most recent released version of httrack can be found at: http://www.httrack.com

AUTHOR
Xavier Roche <roche@httrack.com>

SEE ALSO
The HTML documentation (available online at http://www.httrack.com/html/ ) contains more detailed information. Please also refer to the httrack FAQ (available online at http://www.httrack.com/html/faq.html )

http://www.httrack.com/html/httrack.man.html

Page 10 of 10

Das könnte Ihnen auch gefallen