Next: , Previous: Writing new Plugins, Up: Top

9 Internal utility functions

Some plugins link against the libextractor_common library which provides common abstractions needed by many plugins. This section documents this internal API for plugin developers. Note that the headers for this library are (intentionally) not installed: we do not consider this API stable and it should hence only be used by plugins that are build and shipped with GNU libextractor. Third-party plugins should not use it.

convert_numeric.h defines various conversion functions for numbers (in particular, byte-order conversion for floating point numbers).

unzip.h defines an API for accessing compressed files.

pack.h provides an interpreter for unpacking structs of integer numbers from streams and converting from big or little endian to host byte order at the same time.

convert.h provides a function for character set conversion described below.

— Function: char * EXTRACTOR_common_convert_to_utf8 (const char *input, size_t len, const char *charset)

Various GNU libextractor plugins make use of the internal convert.h header which defines a function

EXTRACTOR_common_convert_to_utf8 which can be used to easily convert text from any character set to UTF-8. This conversion is important since the linked list of keywords that is returned by GNU libextractor is expected to contain only UTF-8 strings. Naturally, proper conversion may not always be possible since some file formats fail to specify the character set. In that case, it is often better to not convert at all.

The arguments to EXTRACTOR_common_convert_to_utf8 are the input string (which does not have to be zero-terminated), the length of the input string, and the character set (which must be zero-terminated). Which character sets are supported depends on the platform, a list can generally be obtained using the iconv -l command. The return value from EXTRACTOR_common_convert_to_utf8 is a zero-terminated string in UTF-8 format. The responsibility to free the string is with the caller, so storing the string in the keyword list is acceptable.