GNU libextractor - a simple library for keyword extraction
Home
About
Recent News
Contact
Download
Documentation
Reference Manual
Freshmeat Page

GNU libextractor

libextractor

GNU libextractor is a library used to extract meta data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. libextractor is a GNU package. Our official GNU website can be found at http://www.gnu.org/software/libextractor/. libextractor can be downloaded from this site or the GNU mirrors.

The goal is to provide developers of file-sharing networks, browsers or WWW-indexing bots with a universal library to obtain simple keywords and meta data to match against queries and to show to users instead of only relying on filenames. libextractor contains a shell command extract that, similar to the well-known file command, can extract meta data from a file an print the results to stdout.

Currently, libextractor supports the following formats: HTML, PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, FLAC, MP3 (ID3v1 and ID3v2), NSF(E) (NES music), SID (C64 music), OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, S3M (Scream Tracker 3), XM (eXtended Module), IT (Impulse Tracker), FLV, REAL, RIFF (AVI), MPEG, QT and ASF.
Also, various additional MIME types are detected.

libextractor is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Recent News

Sun Jun 13 13:20:37 CEST 2010 | libextractor v0.6.2 released.
This release fixes various minor bugs and in particular adds handling for failures of malloc and the like.
Wed Jan 13 16:22:09 CET 2010 | libextractor v0.6.0 released.
This release breaks backwards compatibility in terms of the APIs. It adds support for out-of-process execution of plugins, improves binary meta data extraction, reduces the footprint (in terms of linking) of the main libextractor library, improves the quality and quantity of the meta data extracted by most of the plugins, drops some of the less-useful plugins (printable, hashing) and finally includes a complete and extensive user and developer manual. The Java binding was also updated to work with the new API, bindings for other languages are still pending.
Sat Oct 24 21:09:18 CEST 2009 | libextractor binding for Mono updated.
You can find the updated binding for Mono in the download section.
Sat Jul 4 11:45:08 CET 2009 | libextractor v0.5.23 released.
This release makes the RPM extractor work with the latest librpm library and links against an external version of libexiv2 (instead of using an internal, outdated version of the code).
Fri Feb 20 11:24:50 MST 2009 | libextractor v0.5.22 released.
This release fixes various minor bugs in various plugins and the build system. We now use libtool 2.x which helps fix some issues with multiple threads loading and unloading certain plugins concurrently.

Links

Related work:

Articles related to libextractor: Projects that use libextractor:

Contact

GNU libextractor is developed by Christian Grothoff and Vids Samanta. For questions about libextractor send email to libextractor@gnu.org.

Please send general FSF & GNU inquiries to <gnu@gnu.org>. There are also other ways to contact the FSF.
Please send broken links and other corrections or suggestions to <libextractor@gnu.org>.

Please see the Translations README for information on coordinating and submitting translations of this article.

Copyright © 2009, 2010 Free Software Foundation, Inc.

Verbatim copying and distribution of this entire article are permitted worldwide, without royalty, in any medium, provided this notice, and the copyright notice, are preserved.


libextractor@gnu.org

Translations of this page