2.5 Charset Translation

During translation from MML to MIME, for each MIME part which has been composed inside Emacs, an appropriate charset has to be chosen.

If you are running a non-MULE Emacs, this process is simple: If the part contains any non-ASCII (8-bit) characters, the MIME charset given by mail-parse-charset (a symbol) is used. (Never set this variable directly, though. If you want to change the default charset, please consult the documentation of the package which you use to process MIME messages. See Various Message Variables in Message Manual, for example.) If there are only ASCII characters, the MIME charset US-ASCII is used, of course.

Things are slightly more complicated when running Emacs with MULE support. In this case, a list of the MULE charsets used in the part is obtained, and the MULE charsets are translated to MIME charsets by consulting the table provided by Emacs itself. If this results in a single MIME charset, this is used to encode the part. But if the resulting list of MIME charsets contains more than one element, two things can happen: If it is possible to encode the part via UTF-8, this charset is used. (For this, Emacs must support the utf-8 coding system, and the part must consist entirely of characters which have Unicode counterparts.) If UTF-8 is not available for some reason, the part is split into several ones, so that each one can be encoded with a single MIME charset. The part can only be split at line boundaries, though—if more than one MIME charset is required to encode a single line, it is not possible to encode the part.

When running Emacs with MULE support, the preferences for which coding system to use is inherited from Emacs itself. This means that if Emacs is set up to prefer UTF-8, it will be used when encoding messages. You can modify this by altering the mm-coding-system-priorities variable though (see Encoding Customization).

The charset to be used can be overridden by setting the charset MML tag (see MML Definition) when composing the message.

The encoding of characters (quoted-printable, 8bit, etc.) is orthogonal to the discussion here, and is controlled by the variables mm-body-charset-encoding-alist and mm-content-transfer-encoding-defaults (see Encoding Customization).