Use name as the default file name when it isn’t known (i.e., for URLs that end in a slash), instead of index.html.
If a file of type ‘application/xhtml+xml’ or ‘text/html’ is downloaded and the URL does not end with the regexp ‘\.[Hh][Tt][Mm][Ll]?’, this option will cause the suffix ‘.html’ to be appended to the local filename. This is useful, for instance, when you’re mirroring a remote site that uses ‘.asp’ pages, but you want the mirrored pages to be viewable on your stock Apache server. Another good use for this is when you’re downloading CGI-generated materials. A URL like ‘http://site.com/article.cgi?25’ will be saved as article.cgi?25.html.
Note that filenames changed in this way will be re-downloaded every time you re-mirror a site, because Wget can’t tell that the local X.html file corresponds to remote URL ‘X’ (since it doesn’t yet know that the URL produces output of type ‘text/html’ or ‘application/xhtml+xml’.
As of version 1.12, Wget will also ensure that any downloaded files of type ‘text/css’ end in the suffix ‘.css’, and the option was renamed from ‘--html-extension’, to better reflect its new behavior. The old option name is still acceptable, but should now be considered deprecated.
At some point in the future, this option may well be expanded to include suffixes for other types of content, including content types that are not parsed by Wget.
Specify the username user and password password on an
HTTP server. According to the type of the challenge, Wget will
encode them using either the
digest, or the Windows
NTLM authentication scheme.
Another way to specify username and password is in the URL itself
(see URL Format). Either method reveals your password to anyone who
bothers to run
ps. To prevent the passwords from being seen,
store them in .wgetrc or .netrc, and make sure to protect
those files from other users with
chmod. If the passwords are
really important, do not leave them lying in those files either—edit
the files and delete them after Wget has started the download.
Turn off the “keep-alive” feature for HTTP downloads. Normally, Wget asks the server to keep the connection open so that, when you download more than one document from the same server, they get transferred over the same TCP connection. This saves time and at the same time reduces the load on the server.
This option is useful when, for some reason, persistent (keep-alive) connections don’t work for you, for example due to a server bug or due to the inability of server-side scripts to cope with the connections.
Disable server-side cache. In this case, Wget will send the remote server an appropriate directive (‘Pragma: no-cache’) to get the file from the remote service, rather than returning the cached version. This is especially useful for retrieving and flushing out-of-date documents on proxy servers.
Caching is allowed by default.
server-side state. The server sends the client a cookie using the
Set-Cookie header, and the client responds with the same cookie
upon further requests. Since cookies allow the server owners to keep
track of visitors and for sites to exchange this information, some
however, storing cookies is not on by default.
Load cookies from file before the first HTTP retrieval. file is a textual file in the format originally used by Netscape’s cookies.txt file.
You will typically use this option when mirroring sites that require that you be logged in to access some or all of their content. The login process typically works by the web server issuing an HTTP cookie upon receiving and verifying your credentials. The cookie is then resent by the browser when accessing that part of the site, and so proves your identity.
Mirroring such a site requires Wget to send the same cookies your browser sends when communicating with the site. This is achieved by ‘--load-cookies’—simply point Wget to the location of the cookies.txt file, and it will send the same cookies your browser would send in the same situation. Different browsers keep textual cookie files in different locations:
The cookies are in ~/.netscape/cookies.txt.
Mozilla’s cookie file is also named cookies.txt, located somewhere under ~/.mozilla, in the directory of your profile. The full path usually ends up looking somewhat like ~/.mozilla/default/some-weird-string/cookies.txt.
You can produce a cookie file Wget can use by using the File menu, Import and Export, Export Cookies. This has been tested with Internet Explorer 5; it is not guaranteed to work with earlier versions.
If you are using a different browser to create your cookies, ‘--load-cookies’ will only work if you can locate or produce a cookie file in the Netscape format that Wget expects.
If you cannot use ‘--load-cookies’, there might still be an alternative. If your browser supports a “cookie manager”, you can use it to view the cookies used when accessing the site you’re mirroring. Write down the name and value of the cookie, and manually instruct Wget to send those cookies, bypassing the “official” cookie support:
wget --no-cookies --header "Cookie: name=value"
Save cookies to file before exiting. This will not save cookies that have expired or that have no expiry time (so-called “session cookies”), but also see ‘--keep-session-cookies’.
When specified, causes ‘--save-cookies’ to also save session cookies. Session cookies are normally not saved because they are meant to be kept in memory and forgotten when you exit the browser. Saving them is useful on sites that require you to log in or to visit the home page before you can access some pages. With this option, multiple Wget runs are considered a single browser session as far as the site is concerned.
Since the cookie file format does not normally carry session cookies, Wget marks them with an expiry timestamp of 0. Wget’s ‘--load-cookies’ recognizes those as session cookies, but it might confuse other browsers. Also note that cookies so loaded will be treated as other session cookies, which means that if you want ‘--save-cookies’ to preserve them again, you must use ‘--keep-session-cookies’ again.
Unfortunately, some HTTP servers (CGI programs, to be more
precise) send out bogus
Content-Length headers, which makes Wget
go wild, as it thinks not all the document was retrieved. You can spot
this syndrome if Wget retries getting the same document again and again,
each time claiming that the (otherwise normal) connection has closed on
the very same byte.
With this option, Wget will ignore the
if it never existed.
Send header-line along with the rest of the headers in each HTTP request. The supplied header is sent as-is, which means it must contain name and value separated by colon, and must not contain newlines.
You may define more than one additional header by specifying ‘--header’ more than once.
wget --header='Accept-Charset: iso-8859-2' \ --header='Accept-Language: hr' \ http://fly.srk.fer.hr/
Specification of an empty string as the header value will clear all previous user-defined headers.
As of Wget 1.10, this option can be used to override headers otherwise
generated automatically. This example instructs Wget to connect to
localhost, but to specify ‘foo.bar’ in the
wget --header="Host: foo.bar" http://localhost/
In versions of Wget prior to 1.10 such use of ‘--header’ caused sending of duplicate headers.
Specifies the maximum number of redirections to follow for a resource. The default is 20, which is usually far more than necessary. However, on those occasions where you want to allow more (or fewer), this is the option to use.
Specify the username user and password password for
authentication on a proxy server. Wget will encode them using the
basic authentication scheme.
Security considerations similar to those with ‘--http-password’ pertain here as well.
Include ‘Referer: url’ header in HTTP request. Useful for retrieving documents with server-side processing that assume they are always being retrieved by interactive web browsers and only come out properly when Referer is set to one of the pages that point to them.
Save the headers sent by the HTTP server to the file, preceding the actual contents, with an empty line as the separator.
Identify as agent-string to the HTTP server.
The HTTP protocol allows the clients to identify themselves using a
User-Agent header field. This enables distinguishing the
WWW software, usually for statistical purposes or for tracing of
protocol violations. Wget normally identifies as
‘Wget/version’, version being the current version
number of Wget.
However, some sites have been known to impose the policy of tailoring
the output according to the
While this is not such a bad idea in theory, it has been abused by
servers denying information to clients other than (historically)
Netscape or, more frequently, Microsoft Internet Explorer. This
option allows you to change the
User-Agent line issued by Wget.
Use of this option is discouraged, unless you really know what you are
Specifying empty user agent with ‘--user-agent=""’ instructs Wget
not to send the
User-Agent header in HTTP requests.
Use POST as the method for all HTTP requests and send the specified
data in the request body. ‘--post-data’ sends string as
data, whereas ‘--post-file’ sends the contents of file.
Other than that, they work in exactly the same way. In particular,
they both expect content of the form
with percent-encoding for special characters; the only difference is
that one expects its content as a command-line parameter and the other
accepts its content from a file. In particular, ‘--post-file’ is
not for transmitting files as form attachments: those must
key=value data (with appropriate percent-coding) just
like everything else. Wget does not currently support
multipart/form-data for transmitting POST data; only
application/x-www-form-urlencoded. Only one of
‘--post-data’ and ‘--post-file’ should be specified.
Please note that wget does not require the content to be of the form
key1=value1&key2=value2, and neither does it test for it. Wget will
simply transmit whatever data is provided to it. Most servers however expect
the POST data to be in the above format when processing HTML Forms.
Please be aware that Wget needs to know the size of the POST data in
advance. Therefore the argument to
--post-file must be a regular
file; specifying a FIFO or something like /dev/stdin won’t work.
It’s not quite clear how to work around this limitation inherent in
HTTP/1.0. Although HTTP/1.1 introduces chunked transfer that
doesn’t require knowing the request length in advance, a client can’t
use chunked unless it knows it’s talking to an HTTP/1.1 server. And it
can’t know that until it receives a response, which in turn requires the
request to have been completed – a chicken-and-egg problem.
Note: As of version 1.15 if Wget is redirected after the POST request is completed, its behaviour will depend on the response code returned by the server. In case of a 301 Moved Permanently, 302 Moved Temporarily or 307 Temporary Redirect, Wget will, in accordance with RFC2616, continue to send a POST request. In case a server wants the client to change the Request method upon redirection, it should send a 303 See Other response code.
This example shows how to log in to a server using POST and then proceed to download the desired pages, presumably only accessible to authorized users:
# Log in to the server. This can be done only once. wget --save-cookies cookies.txt \ --post-data 'user=foo&password=bar' \ http://server.com/auth.php # Now grab the page or pages we care about. wget --load-cookies cookies.txt \ -p http://server.com/interesting/article.php
If the server is using session cookies to track user authentication, the above will not work because ‘--save-cookies’ will not save them (and neither will browsers) and the cookies.txt file will be empty. In that case use ‘--keep-session-cookies’ along with ‘--save-cookies’ to force saving of session cookies.
For the purpose of RESTful scripting, Wget allows sending of other HTTP Methods without the need to explicitly set them using ‘--header=Header-Line’. Wget will use whatever string is passed to it after ‘--method’ as the HTTP Method to the server.
Must be set when additional data needs to be sent to the server along with the Method specified using ‘--method’. ‘--body-data’ sends string as data, whereas ‘--body-file’ sends the contents of file. Other than that, they work in exactly the same way.
Currently, ‘--body-file’ is not for transmitting files as a whole.
Wget does not currently support
multipart/form-data for transmitting data;
application/x-www-form-urlencoded. In the future, this may be changed
so that wget sends the ‘--body-file’ as a complete file instead of sending its
contents to the server. Please be aware that Wget needs to know the contents of
BODY Data in advance, and hence the argument to ‘--body-file’ should be a
regular file. See ‘--post-file’ for a more detailed explanation.
Only one of ‘--body-data’ and ‘--body-file’ should be specified.
If Wget is redirected after the request is completed, Wget will
suspend the current method and send a GET request till the redirection
is completed. This is true for all redirection response codes except
307 Temporary Redirect which is used to explicitly specify that the
request method should not change. Another exception is when
the method is set to
POST, in which case the redirection rules
specified under ‘--post-data’ are followed.
If this is set to on, experimental (not fully-functional) support for
Content-Disposition headers is enabled. This can currently result in
extra round-trips to the server for a
HEAD request, and is known
to suffer from a few bugs, which is why it is not currently enabled by default.
This option is useful for some file-downloading CGI programs that use
Content-Disposition headers to describe what the name of a
downloaded file should be.
If this is set to on, wget will not skip the content when the server responds with a http status code that indicates error.
If this is set to on, on a redirect the last component of the redirection URL will be used as the local file name. By default it is used the last component in the original URL.
If this option is given, Wget will send Basic HTTP authentication information (plaintext username and password) for all requests, just like Wget 1.10.2 and prior did by default.
Use of this option is not recommended, and is intended only to support some few obscure servers, which never send HTTP authentication challenges, but accept unsolicited auth info, say, in addition to form-based authentication.