Working notes — things I'm reading, thinking about, and trying to figure out. Less polished than the long-form posts, sometimes revised in place.
Diffusion OCR model
Adding a cache directory for vLLM docker can reduce start times to ~11s.
A 1B VLM, stitching together Qwen3-0.6B and SigLip-400M.
A benchmark for the current frontier of retrievers: possible to verify with reasoning models, difficult to retrieve.
A 3B OCR model that also derenders charts, tables, and other graphical elements.
Can you perform tasks over documents purely using the document image?
In which DeepSeek argues that document images can be more dense, lossless input representations.
A tiny, two-stage, DONUT-based OCR model.
TODO
Evolving rubrics for training a small, powerful deep research model.
TODO
TODO
TODO
TODO
TODO
An update to LightOnOCR, plus a bbox model for figures.
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
Novel RL techniques for training a surprisingly powerful small prover.
TODO
TODO
TODO
TODO
TODO
TODO
TODO
No notes match the selected tags.