Do LLMs Benefit From their own Words?

March 3, 2026Posted by Federico Ulfo

MIT researchers found that LLMs often get worse in long conversations because of "context pollution": models treat their own previous responses as factual truth, causing errors, hallucinations, and stylistic quirks to snowball and reinforce themselves.Key findings from real user chats:For many open models (e.g. Qwen3-4B, DeepSeek-R1-8B), removing all prior AI responses from context gives the same or better quality. This slashes cumulative context length by up to 10× — huge efficiency win. ~36% of follow-up prompts are fully self-contained; most turns don't actually need the model's earlier output.

Stronger models like GPT-5.2 still benefit from full history, so the ideal isn't "always strip" — it's selective: use a classifier to decide turn-by-turn whether keeping assistant history helps or hurts.Bottom line: We've been blindly stuffing AI's own words into context windows for years, but often they're the least helpful (and sometimes most harmful) part. The paper flips the default assumption — minimum necessary context beats maximum context

Sources: Paper, Tweet