I Read Through 10 Claude Code Token Optimizers. Here's them.
If you've been using Claude Code for more than a week, you've noticed the problem. Your context window fills up with garbage: verbose git status output, 200 lines of passing test results, ANSI color codes nobody asked for. Sessions compact too early. Rate limits hit too fast. And now there are ten different open source tools all claiming 60-99% token reduction.
I went through all of them. Read the READMEs, read the Reddit threads, traced the actual data flow. Most of the marketing copy makes these sound interchangeable. They're not. They operate at completely different layers, and if you don't understand the layers first, you'll end up stacking three tools that all do the same thing.
The Five Layers
Layer 1: Shell Output Compression. Intercepts bash commands, compresses the output before it enters context.
Layer 2: Output Token Reduction. Changes how the LLM responds to you. Shorter prose, same technical content.
Layer 3: Context Window Sandboxing. Prevents raw data from entering context entirely. Stores it externally, lets the LLM query on demand.
Layer 4: Message-Level Compression. A proxy that compresses the entire conversation before it hits the API.
Layer 5: Codebase Context Targeting. Helps the LLM find the right files instead of reading everything.
The Two Worth Understanding
RTK (Rust Token Killer) — Layer 1
RTK installs a PreToolUse hook in Claude Code. When Claude runs git status, the hook silently rewrites it to rtk git status, applies pattern-based filtering (strip ANSI codes, deduplicate repeated lines, truncate boilerplate), and returns a compressed version. Claude never knows the rewrite happened. A full cargo test run that dumps hundreds of lines gets compressed to just failures and a summary. git status loses the hints and formatting. ls collapses into a compact tree.
It has a "tee" feature: when a command fails, it saves the full unfiltered output to disk and points the agent to the log file. This matters because the obvious failure mode of any compression tool is stripping a stack trace line the agent actually needs.
Single Rust binary, zero dependencies, less than 10ms overhead. You run rtk gain to see your actual savings.
What the Reddit thread says. The biggest criticism is the "strangeness tax": there's research showing that compressing tool output into unfamiliar formats can cause LLMs to spend more tokens trying to understand the new structure. The RTK author's counter is fair (CLI output isn't a rigid format like JSON, and Claude's built-in tools bypass the hook anyway), but at least one user hit the failure mode in practice. Claude started complaining about cryptic error messages because RTK had filtered out the diagnostic output. They uninstalled it.
The savings numbers also need context. Claude Code already truncates large outputs on its own sometimes, so RTK's dashboard compares against raw unfiltered output, not what Claude would have actually consumed. Real savings are meaningful, but the headline number overstates it.
Simplest alternative from the thread: just put --verbosity minimal flags in a Makefile. Less elegant, zero dependencies, gets you most of the way there.
Caveman — Layer 2
Caveman is a Claude Code skill that instructs the model to respond in compressed prose. Drop articles, filler, pleasantries. It only affects output tokens. Thinking/reasoning tokens are untouched, the model reasons at full fidelity, it just writes shorter. Six intensity levels, subagent compression to keep sub-agent output leaner, and a caveman-compress tool that rewrites your CLAUDE.md into compressed format.
What the Reddit thread says. The math correction everyone arrives at: output tokens are a small fraction of your total session. Most tokens go to input and extended thinking. So Caveman's compression translates to much less total savings than the headline. The real value people report is speed (shorter responses stream faster) and focus (the model stops over-explaining things you already understand).
Community consensus is "caveman lite" is the right mode. Someone described it as "the perfect straitjacket for Sonnet/Opus." Full and ultra are too aggressive.
Known weakness: drift. Multiple people report it reverting to verbose output after a few turns. And the biggest blind spot: thinking tokens, which can be the majority of your usage in complex sessions, are completely untouched.
The Rest, Briefly
Lean-ctx (Layer 1): RTK expanded into a much larger system. Shell hook plus MCP server with dozens of tools, file read modes, cached re-reads, code graph, cross-session memory. Superset of RTK, but ongoing growing pains with hook conflicts.
Distill (Layer 1): Same idea as RTK, less mature. Pass.
Context-mode (Layer 3): Sandboxes tool output into a local SQLite database. Agent searches the index instead of reading full files. Also tracks session state across compactions. Tradeoff: "search an index" can miss things "read the file" wouldn't.
Headroom (Layer 4): Local proxy compressing the full message array. Bundles RTK, so using both is redundant. Optional ML compression. No meaningful accuracy loss in benchmarks.
Repomix (Layer 5): Packs your whole repo into one file for chat-based LLMs. Claude Code already has filesystem access. Category mismatch.
SigMap (Layer 5): Signature extraction, ranks files by query relevance. Useful for navigating unfamiliar codebases, not for ones you already know.
Graphify (Layer 5): Knowledge graph via tree-sitter. More ambitious than SigMap, handles non-code files. But a user reported it increased token usage because the agent loads the graph report on every question.
Codebase-Memory-MCP (Layer 5): Persistent knowledge graph. Similar to Graphify. Value scales with codebase size.
Overlaps
RTK + Caveman works. Different layers, completely complementary. Headroom bundles RTK, don't run both. Stacking multiple hook-based tools causes conflicts; one Reddit user had to remove two other hooks before RTK worked. Headroom and Context-mode solve the same problem differently, pick one.
What I'd Try First
Start with RTK. Biggest source of wasted tokens, lowest risk, least setup. One binary, rtk init -g, restart Claude Code.
Run rtk gain after a week. If context is still filling up too fast, you'll know where the remaining waste is. Adding multiple optimization layers before you know which one solves your problem is premature abstraction applied to tooling.