Doctor

Doctor is a microservice for converting and extracting documents and audio files.
Active
In Oakland, California
Open-source

As a part of building CourtListener, we have spent years optimizing our document extraction and audio conversion pipelines. Doctor is the culmination of this work and has functionality like:

Extracting text from documents, including WPD, PDF, DOC, DOCX, RTF, and more. Completing optimized OCR extraction on image-based PDFs. Getting page counts from different document types. Converting audio files from WMA, OGG, WAV, and others to MP3. Making a PDF from images. Creating thumbnails from PDFs. Doctor is designed to scale while providing performant high-quality results. It can be scaled horizontally via a multi-worker or orchestrated single-worker model.

The code in Doctor has processed tens of millions of documents and over 2.5 million minutes of audio.

Parent organization:

Free Law Project

Doctor
Org. type: Non-profit / charity / foundation
Project type: Tool or platform
Last modified: Nov 12, 2025 Added: Nov 19, 2024
Back to Top