Command line interface

textract

Command line tool for extracting text from any document.

usage: textract [-h] [-o OUTPUT] [-m METHOD] [-v] filename
Positional arguments:
filename Filename to extract text.
Options:
-o=-, --output=-
 output raw text in this file
-m=, --method= specify a method of extraction for formats that support it
-v, --version show program’s version number and exit

Note

To make the command line interface as usable as possible, autocompletion of available options with textract is enabled by @kislyuk’s amazing argcomplete package. Follow instructions to enable global autocomplete and you should be all set. As an example, this is also configured in the virtual machine provisioning for this project.