Paper Notes: Kosmos-2.5
last updated 2026-05-10
Dataset: 357.4MM document images, split into OCR and Markup:
- OCR: IID-CDIP (27.6MM), PDFs crawled from the web + SEC files (187.4MM), arXiv papers (20.9MM), web-pages (100.5MM), powerpoints, posters, and MARIO-10M (6.2MM), synthetic handwritten data (200k), and aggregated math image benchmarks (0.6MM)
- Markup: 1.1MM docx files from the SEC, 3.7MM academic papers, 2.8MM Readme.md’s taken from github, and 6.3MM web pages
Model size: 1.3B params
Tasks: ocr (image + bbox), image to markdown (<md> prompt), and docvqa