Next: , Previous: International Chars, Up: International


26.2 Enabling Multibyte Characters

By default, Emacs starts in multibyte mode: it stores the contents of buffers and strings using an internal encoding that represents non-ASCII characters using multi-byte sequences. Multibyte mode allows you to use all the supported languages and scripts without limitations.

Under very special circumstances, you may want to disable multibyte character support, either for Emacs as a whole, or for a single buffer. When multibyte characters are disabled in a buffer, we call that unibyte mode. In unibyte mode, each character in the buffer has a character code ranging from 0 through 255 (0377 octal); 0 through 127 (0177 octal) represent ASCII characters, and 128 (0200 octal) through 255 (0377 octal) represent non-ASCII characters.

To edit a particular file in unibyte representation, visit it using find-file-literally. See Visiting. You can convert a multibyte buffer to unibyte by saving it to a file, killing the buffer, and visiting the file again with find-file-literally. Alternatively, you can use C-x <RET> c (universal-coding-system-argument) and specify ‘raw-text’ as the coding system with which to visit or save a file. See Text Coding. Unlike find-file-literally, finding a file as ‘raw-text’ doesn't disable format conversion, uncompression, or auto mode selection.

To turn off multibyte character support by default, start Emacs with the ‘--unibyte’ option (see Initial Options), or set the environment variable EMACS_UNIBYTE. You can also customize enable-multibyte-characters or, equivalently, directly set the variable default-enable-multibyte-characters to nil in your init file to have basically the same effect as ‘--unibyte’. With ‘--unibyte’, multibyte strings are not created during initialization from the values of environment variables, /etc/passwd entries etc., even if those contain non-ASCII characters.

Emacs normally loads Lisp files as multibyte, regardless of whether you used ‘--unibyte’. This includes the Emacs initialization file, .emacs, and the initialization files of Emacs packages such as Gnus. However, you can specify unibyte loading for a particular Lisp file, by putting ‘-*-unibyte: t;-*- in a comment on the first line (see File Variables). Then that file is always loaded as unibyte text. The motivation for these conventions is that it is more reliable to always load any particular Lisp file in the same way. However, you can load a Lisp file as unibyte, on any one occasion, by typing C-x <RET> c raw-text <RET> immediately before loading it.

The mode line indicates whether multibyte character support is enabled in the current buffer. If it is, there are two or more characters (most often two dashes) near the beginning of the mode line, before the indication of the visited file's end-of-line convention (colon, backslash, etc.). When multibyte characters are not enabled, nothing precedes the colon except a single dash. See Mode Line, for more details about this.

To convert a unibyte session to a multibyte session, set default-enable-multibyte-characters to t. Buffers which were created in the unibyte session before you turn on multibyte support will stay unibyte. You can turn on multibyte support in a specific buffer by invoking the command toggle-enable-multibyte-characters in that buffer.