Next: , Previous: Meta types, Up: Extracting meta data


4.3 Meta formats

enum EXTRACTOR_MetaFormat is a C enum which defines on a high level how the extracted meta data is represented. Currently, the library uses three formats: UTF-8 strings, C strings and binary data. A fourth value, EXTRACTOR_METAFORMAT_UNKNOWN is defined but not used. UTF-8 strings are 0-terminated strings that have been converted to UTF-8. The format code is EXTRACTOR_METAFORMAT_UTF8. Ideally, most text meta data will be of this format. Some file formats fail to specify the encoding used for the text. In this case, the text cannot be converted to UTF-8. However, the meta data is still known to be 0-terminated and presumably human-readable. In this case, the format code used is EXTRACTOR_METAFORMAT_C_STRING; however, this should not be understood to mean that the encoding is the same as that used by the C compiler. Finally, for binary data (mostly images), the format EXTRACTOR_METAFORMAT_BINARY is used.

Naturally this is not a precise description of the meta format. Plugins can provide a more precise description (if known) by providing the respective mime type of the meta data. For example, binary image meta data could be also tagged as “image/png” and normal text would typically be tagged as “text/plain”.