Next: , Previous: Existing Plugins, Up: Top


8 Writing new Plugins

Writing a new plugin for libextractor usually requires writing of or interfacing with an actual parser for a specific format. How this is can be accomplished depends on the format and cannot be specified in general. However, care should be taken for the code to be reentrant and highly fault-tolerant, especially with respect to malformed inputs.

Plugins should start by verifying that the header of the data matches the specific format and immediately return if that is not the case. Even if the header matches the expected file format, plugins must not assume that the remainder of the file is well formed.

The plugin library must be called libextractor_XXX.so, where XXX denotes the file format of the plugin. The library must export a method libextractor_XXX_extract_method, with the following signature:

void
EXTRACTOR_XXX_extract_method (struct EXTRACTOR_ExtractContext *ec);

ec’ contains various information the plugin may need for its execution. Most importantly, it contains functions for reading (“read”) and seeking (“seek”) the input data and for returning extracted data (“proc”). The “config” member can contain additional configuration options. “proc” should be called on each meta data item found. If “proc” returns non-zero, processing should be aborted (if possible).

In order to test new plugins, the extract command can be run with the options “-ni” and “-l XXX” . This will run the plugin in-process (making it easier to debug) and without any of the other plugins.

8.1 Example for a minimal extract method

The following example shows how a plugin can return the mime type of a file.

     void
     EXTRACTOR_mymime_extract (struct EXTRACTOR_ExtractContext *ec)
     {
       void *data;
       ssize_t data_size,
     
       if (-1 == (data_size = ec->read (ec->cls, &data, 4)))
         return; /* read error */
       if (data_size < 4)
         return; /* file too small */
       if (0 != memcmp (data, "\177ELF", 4))
         return; /* not ELF */
       if (0 != ec->proc (ec->cls, 
                          "mymime",
                          EXTRACTOR_METATYPE_MIMETYPE,
                          EXTRACTOR_METAFORMAT_UTF8,
                          "text/plain",
                          "application/x-executable",
                          1 + strlen("application/x-executable")))
         return;
       /* more calls to 'proc' here as needed */
     }