Next: , Previous: , Up: GNU troff Reference   [Contents][Index]


5.10 Manipulating Hyphenation

When filling, GNU troff hyphenates words as needed at user-specified and automatically determined hyphenation points. The machine-driven determination of hyphenation points in words requires algorithms and data, and is susceptible to conventions and preferences. Before tackling such automatic hyphenation, let us consider how hyphenation points can be set explicitly.

Explicitly hyphenated words such as “mother-in-law” are eligible for breaking after each of their hyphens. Relatively few words in a language offer such obvious break points, however, and automatic detection of syllabic (or phonetic) boundaries for hyphenation is not perfect,56 particularly for unusual words found in technical literature. We can instruct GNU troff how to hyphenate specific words if the need arises.

Request: .hw word …

Define each hyphenation exception word with each hyphen ‘-’ in the word indicating a hyphenation point. For example, the request

.hw in-sa-lub-rious alpha

marks potential hyphenation points in “insalubrious”, and prevents “alpha” from being hyphenated at all.

Besides the space character, any character whose hyphenation code is zero can be used to separate the arguments of hw (see the hcode request below). In addition, this request can be used more than once.

Hyphenation points specified with hw are not subject to the within-word placement restrictions imposed by the hy request (see below).

Hyphenation exceptions specified with the hw request are associated with the hyphenation language (see the hla request below) and environment (see Environments); invoking the hw request in the absence of a hyphenation language is an error.

The request is ignored if there are no parameters.

These are known as hyphenation exceptions in the expectation that most users will avail themselves of automatic hyphenation; these exceptions override any rules that would normally apply to a word matching a hyphenation exception defined with hw.

Situations also arise when only a specific occurrence of a word needs its hyphenation altered or suppressed, or when a URL or similar string needs to be breakable in sensible places without hyphenation.

Escape sequence: \%
Escape sequence: \:

To tell GNU troff how to hyphenate words as they occur in input, use the \% escape sequence; it is the default hyphenation character. Each instance within a word indicates to GNU troff that the word may be hyphenated at that point, while prefixing a word with this escape sequence prevents it from being otherwise hyphenated. This mechanism affects only that occurrence of the word; to change the hyphenation of a word for the remainder of input processing, use the hw request.

GNU troff regards the escape sequences \X and \Y as starting a word; that is, the \% escape sequence in, say, ‘\X'...'\%foobar or ‘\Y'...'\%foobar no longer prevents hyphenation of ‘foobar’ but inserts a hyphenation point just prior to it; most likely this isn’t what you want. See Postprocessor Access.

\: inserts a non-printing break point; that is, a word can break there, but the soft hyphen glyph (see below) is not written to the output if it does. This escape sequence is an input word boundary, so the remainder of the word is subject to hyphenation as normal.

You can combine \: and \% to control breaking of a file name or URL, or to permit hyphenation only after certain explicit hyphens within a word.

The \%Lethbridge-Stewart-\:\%Sackville-Baggins divorce
was, in retrospect, inevitable once the contents of
\%/var/log/\:\%httpd/\:\%access_log on the family web
server came to light, revealing visitors from Hogwarts.
Request: .hc [char]

Change the hyphenation character to char. This character then works as the \% escape sequence normally does, and thus no longer appears in the output.57 Without an argument, hc resets the hyphenation character to \% (the default). The hyphenation character is associated with the environment (see Environments).

Request: .shc [c]

Set the soft hyphen character, inserted when a word is hyphenated automatically or at a hyphenation character, to the ordinary or special character c.58 If the argument is omitted, the soft hyphen character is set to the default, \[hy]. If no glyph for c exists in the font in use at a potential hyphenation point, then the line is not broken there. Neither character definitions (specified with the char and similar requests) nor translations (specified with the tr request) are applied to c.

Several requests influence automatic hyphenation. Because conventions vary, a variety of hyphenation modes is available to the hy request; these determine whether hyphenation will apply to a word prior to breaking a line at the end of a page (more or less; see below for details), and at which positions within that word automatically determined hyphenation points are permissible. The places within a word that are eligible for hyphenation are determined by language-specific data and lettercase relationships. Furthermore, hyphenation of a word might be suppressed due to a limit on consecutive hyphenated lines (hlm), a minimum line length threshold (hym), or because the line can instead be adjusted with additional inter-word space (hys).

Request: .hy [mode]
Register: \n[.hy]

Set automatic hyphenation mode to mode, an integer encoding conditions for hyphenation; if omitted, ‘1’ is implied. The hyphenation mode is available in the read-only register ‘.hy’; it is associated with the environment (see Environments). The default hyphenation mode depends on the localization file loaded when GNU troff starts up; see the hpf request below.

Typesetting practice generally does not avail itself of every opportunity for hyphenation, but the details differ by language and site mandates. The hyphenation modes of AT&T troff were implemented with English-language publishing practices of the 1970s in mind, not a scrupulous enumeration of conceivable parameters. GNU troff extends those modes such that finer-grained control is possible, favoring compatibility with older implementations over a more intuitive arrangement. The means of hyphenation mode control is a set of numbers that can be added up to encode the behavior sought.59 The entries in the following table are termed values; the sum of the desired values is the mode.

0

disables hyphenation.

1

enables hyphenation except after the first and before the last character of a word.

The remaining values “imply” 1; that is, they enable hyphenation under the same conditions as ‘.hy 1’, and then apply or lift restrictions relative to that basis.

2

disables hyphenation of the last word on a page,60 even for explicitly hyphenated words.

4

disables hyphenation before the last two characters of a word.

8

disables hyphenation after the first two characters of a word.

16

enables hyphenation before the last character of a word.

32

enables hyphenation after the first character of a word.

Apart from value 2, restrictions imposed by the hyphenation mode are not respected for words whose hyphenations have been specified with the hyphenation character (‘\%’ by default) or the hw request.

Nonzero values in the previous table are additive. For example, mode 12 causes GNU troff to hyphenate neither the last two nor the first two characters of a word. Some values cannot be used together because they contradict; for instance, values 4 and 16, and values 8 and 32. As noted, it is superfluous to add 1 to any non-zero even mode.

The automatic placement of hyphens in words is determined by pattern files, which are derived from TeX and available for several languages. The number of characters at the beginning of a word after which the first hyphenation point should be inserted is determined by the patterns themselves; it can’t be reduced further without introducing additional, invalid hyphenation points (unfortunately, this information is not part of a pattern file—you have to know it in advance). The same is true for the number of characters at the end of a word before the last hyphenation point should be inserted. For example, you can supply the following input to ‘echo $(nroff)’.

.ll 1
.hy 48
splitting

You will get

s- plit- t- in- g

instead of the correct ‘split- ting’. English patterns as distributed with GNU troff need two characters at the beginning and three characters at the end; this means that value 4 of hy is mandatory. Value 8 is possible as an additional restriction, but values 16 and 32 should be avoided, as should mode 1. Modes 4 and 6 are typical.

A table of left and right minimum character counts for hyphenation as needed by the patterns distributed with GNU troff follows; see the groff_tmac(5) man page for more information on GNU troff’s language macro files.

languagepattern nameleft minright min
Czechcs22
Englishen23
Frenchfr23
German traditionaldet22
German reformedden22
Italianit22
Swedishsv12

Hyphenation exceptions within pattern files (i.e., the words within a TeX \hyphenation group) obey the hyphenation restrictions given by hy.

Request: .nh

Disable automatic hyphenation; i.e., set the hyphenation mode to 0 (see above). The hyphenation mode of the last call to hy is not remembered.

Request: .hpf pattern-file
Request: .hpfa pattern-file
Request: .hpfcode a b [c d] …

Read hyphenation patterns from pattern-file, which is sought in the same way that macro files are with the mso request or the -mname command-line option to groff. The pattern-file should have the same format as (simple) TeX pattern files. More specifically, the following scanning rules are implemented.

The hpfa request appends a file of patterns to the current list.

The hpfcode request defines mapping values for character codes in pattern files. It is an older mechanism no longer used by GNU troff’s own macro files; for its successor, see hcode below. hpf or hpfa apply the mapping after reading the patterns but before replacing or appending to the active list of patterns. Its arguments are pairs of character codes—integers from 0 to 255. The request maps character code a to code b, code c to code d, and so on. Character codes that would otherwise be invalid in GNU troff can be used. By default, every code maps to itself except those for letters ‘A’ to ‘Z’, which map to those for ‘a’ to ‘z’.

The set of hyphenation patterns is associated with the language set by the hla request (see below). The hpf request is usually invoked by a localization file loaded by the troffrc file.61

A second call to hpf (for the same language) replaces the hyphenation patterns with the new ones. Invoking hpf or hpfa causes an error if there is no hyphenation language. If no hpf request is specified (either in the document, in a file loaded at startup, or in a macro package), GNU troff won’t automatically hyphenate at all.

Request: .hcode c1 code1 [c2 code2] …

Set the hyphenation code of character c1 to code1, that of c2 to code2, and so on. A hyphenation code must be an ordinary character (not a special character escape sequence) other than a digit or a space. The request is ignored if given no arguments.

For hyphenation to work, hyphenation codes must be set up. At startup, GNU troff assigns hyphenation codes to the letters ‘a’–‘z’ (mapped to themselves), to the letters ‘A’–‘Z’ (mapped to ‘a’–‘z’), and zero to all other characters. Normally, hyphenation patterns contain only lowercase letters which should be applied regardless of case. In other words, they assume that the words ‘FOO’ and ‘Foo’ should be hyphenated exactly as ‘foo’ is. The hcode request extends this principle to letters outside the Unicode basic Latin alphabet; without it, words containing such letters won’t be hyphenated properly even if the corresponding hyphenation patterns contain them.

For example, the following hcode requests are necessary to assign hyphenation codes to the letters ‘’, needed for German.

.hcode     
.hcode     
.hcode     
.hcode  

Without these assignments, GNU troff treats the German word ‘Kindergrten’ (the plural form of ‘kindergarten’) as two words ‘kinderg’ and ‘rten’ because the hyphenation code of the umlaut a is zero by default, just like a space. There is a German hyphenation pattern that covers ‘kinder’, so GNU troff finds the hyphenation ‘kin-der’. The other two hyphenation points (‘kin-der-gr-ten’) are missed.

Request: .hla lang
Register: \n[.hla]

Set the hyphenation language to lang. Hyphenation exceptions specified with the hw request and hyphenation patterns and exceptions specified with the hpf and hpfa requests are associated with the hyphenation language. The hla request is usually invoked by a localization file, which is turn loaded by the troffrc or troffrc-end file; see the hpf request above.

The hyphenation language is available in the read-only string-valued register ‘.hla’; it is associated with the environment (see Environments).

Request: .hlm [n]
Register: \n[.hlm]
Register: \n[.hlc]

Set the maximum quantity of consecutive hyphenated lines to n. If n is negative, there is no maximum. If omitted, n is -1. This value is associated with the environment (see Environments). Only lines output from a given environment count toward the maximum associated with that environment. Hyphens resulting from \% are counted; explicit hyphens are not.

The .hlm read-only register stores this maximum. The count of immediately preceding consecutive hyphenated lines is available in the read-only register .hlc.

Request: .hym [length]
Register: \n[.hym]

Set the (right) hyphenation margin to length. If the adjustment mode is not ‘b’ or ‘n’, the line is not hyphenated if it is shorter than length. Without an argument, the hyphenation margin is reset to its default value, 0. The default scaling unit is ‘m’. The hyphenation margin is associated with the environment (see Environments).

A negative argument resets the hyphenation margin to zero, emitting a warning in category ‘range’.

The hyphenation margin is available in the .hym read-only register.

Request: .hys [hyphenation-space]
Register: \n[.hys]

Suppress hyphenation of the line in adjustment modes ‘b’ or ‘n’ if it can be justified by adding no more than hyphenation-space extra space to each inter-word space. Without an argument, the hyphenation space adjustment threshold is set to its default value, 0. The default scaling unit is ‘m’. The hyphenation space adjustment threshold is associated with the environment (see Environments).

A negative argument resets the hyphenation space adjustment threshold to zero, emitting a warning in category ‘range’.

The hyphenation space adjustment threshold is available in the .hys read-only register.


Next: , Previous: , Up: GNU troff Reference   [Contents][Index]