[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [Fun With Perl] index.html




On Fri, 11 Jun 1999, John Porter wrote:

> > you can use Hrvoje Niksic's utility "wget" and perl?
> > (wget is avaliable as a debian GNU/Linux package)
> 
> Debian-specific/only?  Pretty useless, in that case.

It's not Debian-specific, but there's a nice easy-to-install package for
Debian (like with most programs). He's just advocating, 's all.

> I confess I'm not familiar with the workings of wget;
> please enlighten as to how it differs from GET, which comes with LWP.

I don't think GET grabs sites recursively. wget has a lot more features,
in general. Here's the output of wget --help:

GNU Wget 1.5.3, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...
Mandatory arguments to long options are mandatory for short options too.

Startup:
  -V,  --version           display the version of Wget and exit.
  -h,  --help              print this help.
  -b,  --background        go to background after startup.
  -e,  --execute=COMMAND   execute a `.wgetrc' command.

Logging and input file:
  -o,  --output-file=FILE     log messages to FILE.
  -a,  --append-output=FILE   append messages to FILE.
  -d,  --debug                print debug output.
  -q,  --quiet                quiet (no output).
  -v,  --verbose              be verbose (this is the default).
  -nv, --non-verbose          turn off verboseness, without being quiet.
  -i,  --input-file=FILE      read URL-s from file.
  -F,  --force-html           treat input file as HTML.

Download:
  -t,  --tries=NUMBER           set number of retries to NUMBER (0 unlimits).
  -O   --output-document=FILE   write documents to FILE.
  -nc, --no-clobber             don't clobber existing files.
  -c,  --continue               restart getting an existing file.
       --dot-style=STYLE        set retrieval display style.
  -N,  --timestamping           don't retrieve files if older than local.
  -S,  --server-response        print server response.
       --spider                 don't download anything.
  -T,  --timeout=SECONDS        set the read timeout to SECONDS.
  -w,  --wait=SECONDS           wait SECONDS between retrievals.
  -Y,  --proxy=on/off           turn proxy on or off.
  -Q,  --quota=NUMBER           set retrieval quota to NUMBER.

Directories:
  -nd  --no-directories            don't create directories.
  -x,  --force-directories         force creation of directories.
  -nH, --no-host-directories       don't create host directories.
  -P,  --directory-prefix=PREFIX   save files to PREFIX/...
       --cut-dirs=NUMBER           ignore NUMBER remote directory components.

HTTP options:
       --http-user=USER      set http user to USER.
       --http-passwd=PASS    set http password to PASS.
  -C,  --cache=on/off        (dis)allow server-cached data (normally allowed).
       --ignore-length       ignore `Content-Length' header field.
       --header=STRING       insert STRING among the headers.
       --proxy-user=USER     set USER as proxy username.
       --proxy-passwd=PASS   set PASS as proxy password.
  -s,  --save-headers        save the HTTP headers to file.
  -U,  --user-agent=AGENT    identify as AGENT instead of Wget/VERSION.

FTP options:
       --retr-symlinks   retrieve FTP symbolic links.
  -g,  --glob=on/off     turn file name globbing on or off.
       --passive-ftp     use the "passive" transfer mode.

Recursive retrieval:
  -r,  --recursive             recursive web-suck -- use with care!.
  -l,  --level=NUMBER          maximum recursion depth (0 to unlimit).
       --delete-after          delete downloaded files.
  -k,  --convert-links         convert non-relative links to relative.
  -m,  --mirror                turn on options suitable for mirroring.
  -nr, --dont-remove-listing   don't remove `.listing' files.

Recursive accept/reject:
  -A,  --accept=LIST                list of accepted extensions.
  -R,  --reject=LIST                list of rejected extensions.
  -D,  --domains=LIST               list of accepted domains.
       --exclude-domains=LIST       comma-separated list of rejected
domains.
  -L,  --relative                   follow relative links only.
       --follow-ftp                 follow FTP links from HTML documents.
  -H,  --span-hosts                 go to foreign hosts when recursive.
  -I,  --include-directories=LIST   list of allowed directories.
  -X,  --exclude-directories=LIST   list of excluded directories.
  -nh, --no-host-lookup             don't DNS-lookup hosts.
  -np, --no-parent                  don't ascend to the parent directory.

Mail bug reports and suggestions to <bug-wget@gnu.org>.



==== Want to unsubscribe from this list? (Don't you love us anymore?)
==== Well, if you insist... Send mail with body "unsubscribe" to
==== fwp-request@technofile.org