Emacs supports editing text written in scripts, such as Arabic, Farsi, and Hebrew, whose natural ordering of horizontal text for display is from right to left. However, digits and Latin text embedded in these scripts are still displayed left to right. It is also not uncommon to have small portions of text in Arabic or Hebrew embedded in an otherwise Latin document; e.g., as comments and strings in a program source file. For these reasons, text that uses these scripts is actually bidirectional: a mixture of runs of left-to-right and right-to-left characters.
This section describes the facilities and options provided by Emacs for editing bidirectional text.
Emacs stores right-to-left and bidirectional text in the so-called logical (or reading) order: the buffer or string position of the first character you read precedes that of the next character. Reordering of bidirectional text into the visual order happens at display time. As a result, character positions no longer increase monotonically with their positions on display. Emacs implements the Unicode Bidirectional Algorithm (UBA) described in the Unicode Standard Annex #9, for reordering of bidirectional text for display. It deviates from the UBA only in how continuation lines are displayed when text direction is opposite to the base paragraph direction, e.g., when a long line of English text appears in a right-to-left paragraph.
The buffer-local variable
whether text in the buffer is reordered for display. If its value is
nil, Emacs reorders characters that have right-to-left
directionality when they are displayed. The default value is
Each paragraph of bidirectional text can have its own base
direction, either right-to-left or left-to-right. Text in
left-to-right paragraphs begins on the screen at the left margin of
the window and is truncated or continued when it reaches the right
margin. By contrast, text in right-to-left paragraphs is displayed
starting at the right margin and is continued or truncated at the left
margin. By default, paragraph boundaries are empty lines, i.e., lines
consisting entirely of whitespace characters. To change that, you can
customize the two variables
bidi-paragraph-separate-re, whose values should be regular
expressions (strings); e.g., to have a single newline start a new
paragraph, set both of these variables to
"^". These two
variables are buffer-local (see Local Variables).
Emacs determines the base direction of each paragraph dynamically,
based on the text at the beginning of the paragraph. However,
sometimes a buffer may need to force a certain base direction for its
paragraphs. The variable
nil, disables the dynamic determination of the base
direction, and instead forces all paragraphs in the buffer to have the
direction specified by its buffer-local value. The value can be either
left-to-right. Any other value is
Alternatively, you can control the base direction of a paragraph by
inserting special formatting characters in front of the paragraph.
The special character
RIGHT-TO-LEFT MARK, or RLM, forces
the right-to-left direction on the following paragraph, while
LEFT-TO-RIGHT MARK, or LRM forces the left-to-right
direction. (You can use C-x 8 RET to insert these characters.)
In a GUI session, the LRM and RLM characters display as very
thin blank characters; on text terminals they display as blanks.
Because characters are reordered for display, Emacs commands that operate in the logical order or on stretches of buffer positions may produce unusual effects. For example, the commands C-f and C-b move point in the logical order, so the cursor will sometimes jump when point traverses reordered bidirectional text. Similarly, a highlighted region covering a contiguous range of character positions may look discontinuous if the region spans reordered text. This is normal and similar to the behavior of other programs that support bidirectional text.
Cursor motion commands bound to arrow keys, such as LEFT and C-RIGHT, are sensitive to the base direction of the current paragraph. In a left-to-right paragraph, commands bound to RIGHT with or without modifiers move forward through buffer text, but in a right-to-left paragraph they move backward instead. This reflects the fact that in a right-to-left paragraph buffer positions predominantly increase when moving to the left on display.
When you move out of a paragraph, the meaning of the arrow keys might change if the base direction of the preceding or the following paragraph is different from the paragraph out of which you moved. When that happens, you need to adjust the arrow key you press to the new base direction.
By default, LEFT and RIGHT move in the logical order,
visual-order-cursor-movement is non-
commands move to the character that is, correspondingly, to the left
or right of the current screen position, moving to the next or
previous screen line as appropriate. Note that this might potentially
move point many buffer positions away, depending on the surrounding
Bidirectional text sometimes uses special formatting characters to
affect the reordering of text for display. The LRM and RLM
characters, mentioned above, are two such characters, but there are
more of them. They are by default displayed as thin space glyphs on
GUI frames, and as simple spaces on text-mode frames. If you want to
be aware of these special control characters, so that their effect on
display does not come as a surprise, you can turn on the
glyphless-display-mode (see How Text Is Displayed). This minor mode
will cause these formatting characters to be displayed as acronyms
inside a small box, so that they stand out on display, and make their
effect easier to understand.