DeepSeek-OCR: Context Compression Through Optical 2D Mapping

November 6, 2025Posted by Federico Ulfo

DeepSeek AI has unveiled DeepSeek-OCR, a groundbreaking approach to compressing long contexts via optical 2D mapping. This innovative system demonstrates that vision-based compression can achieve remarkable efficiency in handling text-heavy documents, potentially revolutionizing how large language models (LLMs) process extensive textual information.

The DeepSeek-OCR system consists of two primary components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. Together, they achieve an impressive 97% OCR precision when compressing text at a ratio of less than 10× (meaning 10 text tokens compressed into 1 vision token). Even at an aggressive 20× compression ratio, the system maintains approximately 60% accuracy.

Karpathy questions if all LLMs input should actually be images, the advantages are:

more information compression (see paper) => shorter context windows, more efficiency
significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images
input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful.
the tokenizer must go. It import all the ugliness of Unicode, byte encoding, and a lot of historical babbage and security jailbreak risks.

Links