2.3.3 String Data

Strings => SourceMaps[maps] Labels

SourceMaps => int32[n-maps] SourceMap*[n-maps]

SourceMap => string[source-name] int32[n-variables] VariableMap*[n-variables]
VariableMap => string[variable-name] int32[n-data] DatumMap*[n-data]
DatumMap => int32[value-idx] int32[label-idx]

Labels => int32[n-labels] Label*[n-labels]
Label => int32[frequency] string[label]

Each variable may include a mix of numeric and string data values. If a legacy binary member contains any string data, Strings is present; otherwise, it ends just after the last Data element.

The string data overlays the numeric data. When a variable includes any string data, its Variable represents the string values with a SYSMIS or NaN placeholder. (Not all such values need be placeholders.)

Each SourceMap provides a mapping between SYSMIS or NaN values in source source-name and the string data that they represent. n-variables is the number of variables in the source that include string data. More precisely, it is the 1-based index of the last variable in the source that includes any string data; thus, it would be 4 if there are 5 variables and only the fourth one includes string data.

A VariableMap repeats its variable’s name, but variables are always present in the same order as the source, starting from the first variable, without skipping any even if they have no string values. Each VariableMap contains DatumMap nonterminals, each of which maps from a 0-based index within its variable’s data to a 0-based label index, e.g. pair value-idx = 2, label-idx = 3, means that the third data value (which must be SYSMIS or NaN) is to be replaced by the string of the fourth Label.

The labels themselves follow the pairs. The valuable part of each label is the string label. Each label also includes a frequency that reports the number of DatumMaps that reference it (although this is not useful).