Field Notes

Working notes — things I'm reading, thinking about, and trying to figure out. Less polished than the long-form posts, sometimes revised in place.

Paper Notes: FastContext 2026-06-16
A small sub-agent for finding context, released by Microsoft.
Paper Notes: EAGLE-3 2026-06-15
A lightweight speculative decoding method from NVIDIA that can learn from more data than its predecessor.
Nvidia MIG 2026-06-11
How do you split a single (datacenter) GPU into multiple?
Paper Notes: LocateAnything 2026-06-08
TODO
Training a Custom EAGLE-3 HEad 2026-06-08
Working through the BaseTen EAGLE-3 tutorial.
Paper Notes: MinerU-Popo 2026-05-30
Paper Notes: Slow Search 2026-05-29
Looking at a 2010 paper with fresh eyes: what does search look like on Mars?
Paper Notes: Nemotron Parse 1.1 2026-05-27
What happens when you pair a huge vision encoder (600M params) with a tiny text decoder (250M params)? Let's find out!
OmniDocBench 2026-05-27
1000 difficult to OCR pages, used as a canonical torture test.
Paper Notes: QED-Nano 2026-05-27
RL techniques for training a surprisingly powerful small prover.
Notes: Surya OCR 2 2026-05-27
A 650M param OCR model that's ~on par with LightOnOCR-2, and outputs boxes as well.
Paper Notes: OlmOCR 2 2026-05-27
An RLVR approach for training OCR.
olmOCR Bench 2026-05-27
A 1400 PDF benchmark that uses unit test rewards to compute accuracy.
Paper Notes: HunyuanOCR 2026-05-23
A performant 1B parameter OCR model, built on Hunyuan Large 0.5B.
Paper Notes: Kosmos-2.5 2026-05-23
A 1.3B VLM trained on over 350MM pages to output text block coordinates and their text.
Paper Notes: Hierarchical Speculative Decoding 2026-05-22
A training-free speedup for document parsing, by getting a good speculator
Paper Notes: dots.mOCR 2026-05-22
A 3B OCR model that also derenders charts, tables, and other graphical elements.
Paper Notes: EAGLE-2 2026-05-21
A lightweight speculative decoding method from NVidia that decodes multiple drafts in parallel.
Paper Notes: DODO Diffusion OCR 2026-05-18
Diffusion OCR model
Paper Notes: MinerU-Diffusion 2026-05-18
Speeding up vLLM Start Times 2026-05-18
Adding a cache directory for vLLM docker can reduce start times to ~11s.
Paper Notes: LightOnOCR 2026-05-12
A 1B VLM, stitching together Qwen3-0.6B and SigLip-400M.
Paper Notes: OBLIQ-Bench 2026-05-12
A benchmark for the current frontier of retrievers: possible to verify with reasoning models, difficult to retrieve.
Paper Notes: DONUT 2026-05-11
Can you perform tasks over documents purely using the document image?
Paper Notes: Deepseek-OCR 2026-05-11
In which DeepSeek argues that document images can be more dense, lossless input representations.
Paper Notes: Dolphin OCR 2026-05-11
A tiny, two-stage, DONUT-based OCR model.
Paper Notes: Chandra 2026-05-10
TODO
Paper Notes: DR Tulu 2026-05-10
Evolving rubrics for training a small, powerful deep research model.
Paper Notes: Fox 2026-05-10
TODO
Paper Notes: GOT 2.0 2026-05-10
TODO
Paper Notes: Granite Docling 2026-05-10
TODO
Paper Notes: LightOnOCR 2 2026-05-10
An update to LightOnOCR, plus a bbox model for figures.
Paper Notes: MinerU2.5 2026-05-10
TODO
Paper Notes: MonkeyOCR-v1.5 2026-05-10
TODO
Paper Notes: MonkeyOCR 2026-05-10
TODO
Paper Notes: NanoNets OCR 2026-05-10
TODO
Paper Notes: NanoNets OCR2 2026-05-10
TODO
Paper Notes: Nougat 2026-05-10
TODO
Paper Notes: OCRFlux 2026-05-10
TODO
Paper Notes: PaddleOCR-VL 2026-05-10
TODO
Paper Notes: Pix2Struct 2026-05-10
TODO
Paper Notes: Qwen2-VL 2026-05-10
TODO
Paper Notes: Qwen2.5-VL 2026-05-10
TODO
Paper Notes: Qwen3-VL 2026-05-10
TODO
Paper Notes: RolmOCR 2026-05-10
A finetune of Qwen2.5-7B for OCR from Reducto
Paper Notes: TrOCR 2026-05-10
TODO
Paper Notes: Vary 2026-05-10
TODO
Paper Notes: dots.OCR 2026-05-10
TODO
Paper Notes: OlmOCR 2026-05-10
A truly open OCR model (dataset, model, code) based on Qwen2-VL-7B.