23.2 Language Environments

All supported character sets are supported in Emacs buffers whenever multibyte characters are enabled; there is no need to select a particular language in order to display its characters. However, it is important to select a language environment in order to set various defaults. Roughly speaking, the language environment represents a choice of preferred script rather than a choice of language.

The language environment controls which coding systems to recognize when reading text (see Recognizing Coding Systems). This applies to files, incoming mail, and any other text you read into Emacs. It may also specify the default coding system to use when you create a file. Each language environment also specifies a default input method.

To select a language environment, customize current-language-environment or use the command M-x set-language-environment. It makes no difference which buffer is current when you use this command, because the effects apply globally to the Emacs session. See the variable language-info-alist for the list of supported language environments, and use the command C-h L lang-env RET (describe-language-environment) for more information about the language environment lang-env. Supported language environments include:

ASCII, Arabic, Belarusian, Bengali, Brazilian Portuguese, Bulgarian, Burmese, Cham, Chinese-BIG5, Chinese-CNS, Chinese-EUC-TW, Chinese-GB, Chinese-GB18030, Chinese-GBK, Croatian, Cyrillic-ALT, Cyrillic-ISO, Cyrillic-KOI8, Czech, Devanagari, Dutch, English, Esperanto, Ethiopic, French, Georgian, German, Greek, Gujarati, Hebrew, IPA, Italian, Japanese, Kannada, Khmer, Korean, Lao, Latin-1, Latin-2, Latin-3, Latin-4, Latin-5, Latin-6, Latin-7, Latin-8, Latin-9, Latvian, Lithuanian, Malayalam, Oriya, Persian, Polish, Punjabi, Romanian, Russian, Sinhala, Slovak, Slovenian, Spanish, Swedish, TaiViet, Tajik, Tamil, Telugu, Thai, Tibetan, Turkish, UTF-8, Ukrainian, Vietnamese, Welsh, and Windows-1255.

To display the script(s) used by your language environment on a graphical display, you need to have suitable fonts. See Fontsets, for more details about setting up your fonts.

Some operating systems let you specify the character-set locale you are using by setting the locale environment variables LC_ALL, LC_CTYPE, or LANG. (If more than one of these is set, the first one that is nonempty specifies your locale for this purpose.) During startup, Emacs looks up your character-set locale’s name in the system locale alias table, matches its canonical name against entries in the value of the variables locale-charset-language-names and locale-language-names (the former overrides the latter), and selects the corresponding language environment if a match is found. It also adjusts the display table and terminal coding system, the locale coding system, the preferred coding system as needed for the locale, and—last but not least—the way Emacs decodes non-ASCII characters sent by your keyboard.

If you modify the LC_ALL, LC_CTYPE, or LANG environment variables while running Emacs (by using M-x setenv), you may want to invoke the set-locale-environment command afterwards to readjust the language environment from the new locale.

The set-locale-environment function normally uses the preferred coding system established by the language environment to decode system messages. But if your locale matches an entry in the variable locale-preferred-coding-systems, Emacs uses the corresponding coding system instead. For example, if the locale ‘ja_JP.PCK’ matches japanese-shift-jis in locale-preferred-coding-systems, Emacs uses that encoding even though it might normally use utf-8.

You can override the language environment chosen at startup with explicit use of the command set-language-environment, or with customization of current-language-environment in your init file.

To display information about the effects of a certain language environment lang-env, use the command C-h L lang-env RET (describe-language-environment). This tells you which languages this language environment is useful for, and lists the character sets, coding systems, and input methods that go with it. It also shows some sample text to illustrate scripts used in this language environment. If you give an empty input for lang-env, this command describes the chosen language environment.

You can customize any language environment with the normal hook set-language-environment-hook. The command set-language-environment runs that hook after setting up the new language environment. The hook functions can test for a specific language environment by checking the variable current-language-environment. This hook is where you should put non-default settings for specific language environments, such as coding systems for keyboard input and terminal output, the default input method, etc.

Before it starts to set up the new language environment, set-language-environment first runs the hook exit-language-environment-hook. This hook is useful for undoing customizations that were made with set-language-environment-hook. For instance, if you set up a special key binding in a specific language environment using set-language-environment-hook, you should set up exit-language-environment-hook to restore the normal binding for that key.