8.1 Proxies

Proxies are special-purpose HTTP servers designed to transfer data from remote servers to local clients. One typical use of proxies is lightening network load for users behind a slow connection. This is achieved by channeling all HTTP and FTP requests through the proxy which caches the transferred data. When a cached resource is requested again, proxy will return the data from cache. Another use for proxies is for companies that separate (for security reasons) their internal networks from the rest of Internet. In order to obtain information from the Web, their users connect and retrieve remote data using an authorized proxy.

Wget supports proxies for both HTTP and FTP retrievals. The standard way to specify proxy location, which Wget recognizes, is using the following environment variables:

http_proxy
https_proxy

If set, the http_proxy and https_proxy variables should contain the URLs of the proxies for HTTP and HTTPS connections respectively.

ftp_proxy

This variable should contain the URL of the proxy for FTP connections. It is quite common that http_proxy and ftp_proxy are set to the same URL.

no_proxy

This variable should contain a comma-separated list of domain extensions proxy should not be used for. For instance, if the value of no_proxy is ‘.mit.edu’, proxy will not be used to retrieve documents from MIT.

In addition to the environment variables, proxy location and settings may be specified from within Wget itself.

--no-proxy
proxy = on/off

This option and the corresponding command may be used to suppress the use of proxy, even if the appropriate environment variables are set.

http_proxy = URL
https_proxy = URL
ftp_proxy = URL
no_proxy = string

These startup file variables allow you to override the proxy settings specified by the environment.

Some proxy servers require authorization to enable you to use them. The authorization consists of username and password, which must be sent by Wget. As with HTTP authorization, several authentication schemes exist. For proxy authorization only the Basic authentication scheme is currently implemented.

You may specify your username and password either through the proxy URL or through the command-line options. Assuming that the company’s proxy is located at ‘proxy.company.com’ at port 8001, a proxy URL location containing authorization data might look like this:

http://hniksic:mypassword@proxy.company.com:8001/

Alternatively, you may use the ‘proxy-user’ and ‘proxy-password’ options, and the equivalent .wgetrc settings proxy_user and proxy_password to set the proxy username and password.