Next: , Previous: , Up: Text handling   [Contents][Index]


11.5 Translating characters

Character translation is done with translit:

Builtin: translit (string, chars, [replacement])

Expands to string, with each character that occurs in chars translated into the character from replacement with the same index.

If replacement is shorter than chars, the excess characters of chars are deleted from the expansion; if chars is shorter, the excess characters in replacement are silently ignored. If replacement is omitted, all characters in string that are present in chars are deleted from the expansion. If a character appears more than once in chars, only the first instance is used in making the translation. Only a single translation pass is made, even if characters in replacement also appear in chars.

As a GNU extension, both chars and replacement can contain character-ranges, e.g., ‘a-z’ (meaning all lowercase letters) or ‘0-9’ (meaning all digits). To include a dash ‘-’ in chars or replacement, place it first or last in the entire string, or as the last character of a range. Back-to-back ranges can share a common endpoint. It is not an error for the last character in the range to be ‘larger’ than the first. In that case, the range runs backwards, i.e., ‘9-0’ means the string ‘9876543210’. The expansion of a range is dependent on the underlying encoding of characters, so using ranges is not always portable between machines.

The macro translit is recognized only with parameters.

translit(`GNUs not Unix', `A-Z')
⇒s not nix
translit(`GNUs not Unix', `a-z', `A-Z')
⇒GNUS NOT UNIX
translit(`GNUs not Unix', `A-Z', `z-a')
⇒tmfs not fnix
translit(`+,-12345', `+--1-5', `<;>a-c-a')
⇒<;>abcba
translit(`abcdef', `aabdef', `bcged')
⇒bgced

In the ASCII encoding, the first example deletes all uppercase letters, the second converts lowercase to uppercase, and the third ‘mirrors’ all uppercase letters, while converting them to lowercase. The two first cases are by far the most common, even though they are not portable to EBCDIC or other encodings. The fourth example shows a range ending in ‘-’, as well as back-to-back ranges. The final example shows that ‘a’ is mapped to ‘b’, not ‘c’; the resulting ‘b’ is not further remapped to ‘g’; the ‘d’ and ‘e’ are swapped, and the ‘f’ is discarded.

Omitting chars evokes a warning, but still produces output.

translit(`abc')
error→m4:stdin:1: Warning: too few arguments to builtin `translit'
⇒abc

Next: , Previous: , Up: Text handling   [Contents][Index]