Next: Other Formats, Previous: dynapi, Up: Functions [Contents][Index]
Internally the DWG consists of multiple different string formats, see Types.
The most important are BITCODE_TV (i.e. char*) encoded according to dwg->header.codepage,
and BITCODE_TU (i.e. wchar_t on Windows, UCS-2).
Externally most functions get and set strings as UTF-8, as in DXF or JSON.
Before r2007 DWG’s the TV and T strings are encoded in its codepage, they are not yet converted from and to their proper codepage to UTF-8, but will eventually. Not via libiconv, just via the locale specific libc btowc(). To encode unicode characters special \U+XXXX sequences are used, and with japanese shift-jis for Katagana and Hiregana \M+1XXXX sequences.
On DWG’s r2007 and later most strings (T and TU) are encoded in the Microsoft specific two-byte UCS-2 Unicode encoding, without proper support for surrogate pairs and the upper planes (i.e. emojis).
Fixed TF strings are not encoded and have a length stored also. Normal strings are all zero-delimited. EED and XDATA strings do have a length though, but have length limitations.
Strings in DXF and JSON also have quoting rules for special characters, like \r, \n, \" and so on.
Transformations:
DWG to DWG: decode reads the T and TU strings in its natural format into the field. encode translate it to TV or TU. encode needs header.from_version and how it was read, from DWG or from an importer (in_dxf or in_json) or the add api (DWG_OPTS_IN).
DXF/JSON to DWG: in_dxf/json keeps the T and TU strings as TV. encode to <r2007 keeps it as TV, r2007+ translates it to TU. it sets DWG_OPTS_IN.
DWG to DXF/JSON: decode keeps the T and TU strings as TV or TU. out_dxf/json translate them to TV or UTF-8 and quote them via \U+XXXX
add api to DWG/DXF: add reads strings as UTF-8, and encodes it from UTF-8 to TV or TU. (TU not yet, as we don’t encode R2004+ yet). add sets DWG_OPTS_IN.
Next: Other Formats, Previous: dynapi, Up: Functions [Contents][Index]