Python package

The core of this package is in the textract.parsers submodule organized by file extension. For example, the .docx parser is located in textract.parsers.docx. Every parser submodule must have a method called extract that does the default text extraction for that file type.

textract.parsers

textract.parsers.doc

textract.parsers.docx

textract.parsers.pdf

textract.parsers.pptx

textract.parsers.txt