Joe Barrow field_notes

Field Notes

Paper Notes: dots.mOCR

last updated 2026-05-12

An extension to dots.OCR. They claim to introduce a new task, Multimodal OCR that parses ~all parts of a document into some representation (SVG, LaTeX equation, HTML table, etc.). Note that formulas and table derendering is already a pretty standard part of many OCR models, e.g., OlmOCR 2, which has a custom RL reward for formulas or LightOnOCR 2, which outputs HTML tables. CharXiv is used to derender chart figures.

Paper also points to StarVector, a model/dataset for generating SVGs from images.