Joe Barrow field_notes

Field Notes

Paper Notes: Nougat

last updated 2026-05-10

Dataset: focused on scientific documents and textbooks. Rendered at 96dpi, go into Swin transformer at (896, 672) resolution – aspect ratio between US letter and A4; resize+pad. TeX is compiled to HTML via LaTeXML then to Markdown. 1,748,201 arXiv articles. Plus PMC (pubmed central?) and IDL articles (IDL = OCR only).

Dataset size: 8.2MM pages total, 7.5MM from arXiv, 536k from PMC, and 447k from IDL

Augmentations: Bitmap, Erosion, Dilation, Affine transformation, Shift scale rotate, Grid distortion, Elastic transform, Random brightness contrast, Image compression, Gaussian noise, Gaussian blur; all transformations performed via albumentations

Model sizes: 350M parameters

Tasks: image to markdown

Note: trained for 3 epochs (total of 24 million pages seen as a result)