Paper Notes: NanoNets OCR2

last updated 2026-05-10

Outputs a caption for the <img> tag; using the image caption if available otherwise the model generates one. Handles checkboxes by converting them to [ ] and [x]. Supports complex tables with <table><thead><tbody><tr><th><td> tags. Converts flowcharts to text (graphviz, it looks like?). Model is trained to support multilingual docs and VQA.

Trained on 3MM pages, following the pattern of NanoNets OCR.