Note that filenames changed in this way will be re-downloaded every time you re-mirror a site, because Wget can't tell that the local X.html file corresponds to remote URL ‘X’ (since it doesn't yet know that the URL produces output of type ‘text/html’ or ‘application/xhtml+xml’.
As of version 1.12, Wget will also ensure that any downloaded files of type ‘text/css’ end in the suffix ‘.css’, and the option was renamed from ‘--html-extension’, to better reflect its new behavior. The old option name is still acceptable, but should now be considered deprecated.
At some point in the future, this option may well be expanded to include suffixes for other types of content, including content types that are not parsed by Wget.
digest, or the Windows
Another way to specify username and password is in the url itself
(see URL Format). Either method reveals your password to anyone who
bothers to run
ps. To prevent the passwords from being seen,
store them in .wgetrc or .netrc, and make sure to protect
those files from other users with
chmod. If the passwords are
really important, do not leave them lying in those files either—edit
the files and delete them after Wget has started the download.
This option is useful when, for some reason, persistent (keep-alive) connections don't work for you, for example due to a server bug or due to the inability of server-side scripts to cope with the connections.
Caching is allowed by default.
You will typically use this option when mirroring sites that require that you be logged in to access some or all of their content. The login process typically works by the web server issuing an http cookie upon receiving and verifying your credentials. The cookie is then resent by the browser when accessing that part of the site, and so proves your identity.
Mirroring such a site requires Wget to send the same cookies your browser sends when communicating with the site. This is achieved by ‘--load-cookies’—simply point Wget to the location of the cookies.txt file, and it will send the same cookies your browser would send in the same situation. Different browsers keep textual cookie files in different locations:
If you cannot use ‘--load-cookies’, there might still be an alternative. If your browser supports a “cookie manager”, you can use it to view the cookies used when accessing the site you're mirroring. Write down the name and value of the cookie, and manually instruct Wget to send those cookies, bypassing the “official” cookie support:
wget --no-cookies --header "Cookie: name=value"
Since the cookie file format does not normally carry session cookies, Wget marks them with an expiry timestamp of 0. Wget's ‘--load-cookies’ recognizes those as session cookies, but it might confuse other browsers. Also note that cookies so loaded will be treated as other session cookies, which means that if you want ‘--save-cookies’ to preserve them again, you must use ‘--keep-session-cookies’ again.
Content-Lengthheaders, which makes Wget go wild, as it thinks not all the document was retrieved. You can spot this syndrome if Wget retries getting the same document again and again, each time claiming that the (otherwise normal) connection has closed on the very same byte.
With this option, Wget will ignore the
if it never existed.
You may define more than one additional header by specifying ‘--header’ more than once.
wget --header='Accept-Charset: iso-8859-2' \ --header='Accept-Language: hr' \ http://fly.srk.fer.hr/
Specification of an empty string as the header value will clear all previous user-defined headers.
As of Wget 1.10, this option can be used to override headers otherwise
generated automatically. This example instructs Wget to connect to
localhost, but to specify ‘foo.bar’ in the
wget --header="Host: foo.bar" http://localhost/
In versions of Wget prior to 1.10 such use of ‘--header’ caused sending of duplicate headers.
Security considerations similar to those with ‘--http-password’ pertain here as well.
The http protocol allows the clients to identify themselves using a
User-Agent header field. This enables distinguishing the
www software, usually for statistical purposes or for tracing of
protocol violations. Wget normally identifies as
‘Wget/version’, version being the current version
number of Wget.
However, some sites have been known to impose the policy of tailoring
the output according to the
While this is not such a bad idea in theory, it has been abused by
servers denying information to clients other than (historically)
Netscape or, more frequently, Microsoft Internet Explorer. This
option allows you to change the
User-Agent line issued by Wget.
Use of this option is discouraged, unless you really know what you are
Specifying empty user agent with ‘--user-agent=""’ instructs Wget
not to send the
User-Agent header in http requests.
key1=value1&key2=value2, with percent-encoding for special characters; the only difference is that one expects its content as a command-line parameter and the other accepts its content from a file. In particular, ‘--post-file’ is not for transmitting files as form attachments: those must appear as
key=valuedata (with appropriate percent-coding) just like everything else. Wget does not currently support
multipart/form-datafor transmitting POST data; only
application/x-www-form-urlencoded. Only one of ‘--post-data’ and ‘--post-file’ should be specified.
Please be aware that Wget needs to know the size of the POST data in
advance. Therefore the argument to
--post-file must be a regular
file; specifying a FIFO or something like /dev/stdin won't work.
It's not quite clear how to work around this limitation inherent in
HTTP/1.0. Although HTTP/1.1 introduces chunked transfer that
doesn't require knowing the request length in advance, a client can't
use chunked unless it knows it's talking to an HTTP/1.1 server. And it
can't know that until it receives a response, which in turn requires the
request to have been completed – a chicken-and-egg problem.
Note: if Wget is redirected after the POST request is completed, it will not send the POST data to the redirected URL. This is because URLs that process POST often respond with a redirection to a regular page, which does not desire or accept POST. It is not completely clear that this behavior is optimal; if it doesn't work out, it might be changed in the future.
This example shows how to log to a server using POST and then proceed to download the desired pages, presumably only accessible to authorized users:
# Log in to the server. This can be done only once. wget --save-cookies cookies.txt \ --post-data 'user=foo&password=bar' \ http://server.com/auth.php # Now grab the page or pages we care about. wget --load-cookies cookies.txt \ -p http://server.com/interesting/article.php
If the server is using session cookies to track user authentication, the above will not work because ‘--save-cookies’ will not save them (and neither will browsers) and the cookies.txt file will be empty. In that case use ‘--keep-session-cookies’ along with ‘--save-cookies’ to force saving of session cookies.
Content-Dispositionheaders is enabled. This can currently result in extra round-trips to the server for a
HEADrequest, and is known to suffer from a few bugs, which is why it is not currently enabled by default.
This option is useful for some file-downloading CGI programs that use
Content-Disposition headers to describe what the name of a
downloaded file should be.
Use of this option is not recommended, and is intended only to support some few obscure servers, which never send HTTP authentication challenges, but accept unsolicited auth info, say, in addition to form-based authentication.