Joe Barrow software

Software

Document understanding

CommonForms — automatically detect and insert form fields into PDFs. A family of object detectors. As easy as:

pip install commonforms
commonforms input.pdf fillable.pdf

FormalPDFpypdfium2 wrapper for form operations in PDFs.

OmniOCR — use any open source OCR model behind a unified API.

Datasets

LOCUS-v1 — The Local Ordinance Corpus for the United States, a dataset of 2.2 million municipal laws from around the united states.

CommonForms — A dataset of ~500k prepared form pages, used to train object detectors for form field detection.

Other Things

tinyhnsw — the littlest vector database. A full, readable HNSW implementation in python, for pedagogy.

LambdaNet — the most popular Haskell deep learning framework. Built in 2015.

AllenNLP: The Hard Way — a tutorial series for the (now defunct 😭) AllenNLP library.