olmOCR
BTR ACTIVE BREAKOUT
Toolkit for linearizing PDFs for LLM datasets/training
stars 18k
last activity 3mo ago
open issues 55
language Python
license Apache-2.0
latest release v0.4.27
momentum · per month since covered + 642/mo
(+8%/mo) · + 10k total since PR#3
metrics as of today
star history
Exact curve on star-history.com ↗- PR#3 8k★ 2025-03-05
- now 18k★ + 10k since first covered
curve is sampled from GitHub's star history; the dashed stretch is before we first covered it, the solid line since. figures at coverage are the numbers we printed then (approx.), current count is live.
covered in
-
Toolkit for linearizing PDFs for LLM datasets/training
// comments
COMING SOONSign in with GitHub to weigh in on olmOCR. We're wiring this up; check back soon.