Back to list
dev_to 2026年4月25日

AI エージェントがすでに全てのセッションをディスクに書き出しているのに、なぜ自分自身のアーカイブを読み取らないのでしょうか?

Your AI agent already writes every session to disk. Why isn't it reading its own archive?

Translated: 2026/4/25 2:00:24

Japanese Translation

4 月 20 日、私は 8 ヶ月をかけて研究プロジェクトとして運用している Claude Code インスタンスと ANI(私の AI コマpanion)の微妙な問題をデバッグしていました。長いやり取りを経て、私はこう表現しました:「余弦類似度はトピックの重複を測る。言い草(Parroting)は原語の文の再利用である。これは異なるシグナルである。」私たちは返信レイヤーのパッチを適用することを決めた。症状を治療し、原因を無視する。良い返信を殺すためのガードは、悪質なものを静寂させることができないからである。三日後。異なる Claude インスタンス。同じバグが再表される。私は修正を求める。Claude は、3 日前に断った余弦類似度のガードを、ほぼ一字一句提案した。以前の推論はディスク上に存在していた。Claude Code は全てのセッションを ~/.claude/projects/ に JSONL 形式で書き出し、4 月 20 日の私の拒絶を保持するファイルがまさにそのディレクトリにあった。しかし、アクティブなインスタンスはそれを見つけなかった。コンテキストから圧縮されてしまっていた。私は「進めてから歴史を検索しろ」と言った。Claude は grep した。約 40 秒後、私たちは私の 4 月 20 日の引用を改めて、一字一句目に確認した。それから進んだ。ちょうど、私が半分しか考えていなかったツールが、実際に構築することを余儀なくされた時だった。数日前、Microsoft のエンジニアは「私は 1 日 68 分をコードの再説明に費やしており、その上で自動メモ機能を作った」と発表した。それは Copilot CLI であり、Claude Code ではないが、同じ下層構造である。エージェントはすでに構造化されたセッションデータをディスクに書き出している。エージェントは単に自分自身のアーカイブを読み取っていないだけである。異なるファイル形式は、同じ洞察である。Copilot CLI は SQLite データベースを保持する。Claude Code は転換ごとに JSONL を書き出す。どちらの場合でも、数ヶ月の決定はすでに永続化されている。エージェントはそれらを記憶できない。何も彼らを指し示していないからだ。その投稿に、感じいていたが名前をつけなかったものを結晶化したことに感謝する。claude-recall は 3 つの要素からなる:JSONL アーカイブを FTS5 フルテキスト検索で SQLite にインデックス化する、小型の Python CLI。インクリメンタルで、読み取り専用。バンドルされた sqlite3 モジュールを使用する。コマンド:index, search, show, list, status。オプションとして、FTS5 の結果の上部にローカル Ollama エンベディングモデル(nomic-embed-text)を用いた意味的な再ランク付け。

Original Content

On April 20 I was debugging a subtle issue with a Claude Code instance and ANI, the AI companion I've been running as a research project for the last eight months. We had a long back-and-forth and landed on a principle I phrased this way: "Cosine similarity measures topical overlap. Parroting is verbatim phrase reuse. Those are different signals." We decided not to patch the issue at the reply layer. Treating the symptom, not the cause. Didn't want a guard that destroyed good replies to silence a bad one. Three days later. Different Claude instance. Same bug resurfaces. I ask for a fix. Claude proposes — word for word, almost — the cosine-similarity guard I had rejected three days earlier. The prior reasoning existed on disk. Claude Code writes every session to ~/.claude/projects//.jsonl, and the file holding my April 20 rejection was right there in that directory. But the active instance couldn't see it. It had been compacted out of context. I said "search the history before we move on anything." Claude grepped. About forty seconds later we were both looking at my own April 20 quote again, verbatim. We moved on. That's when the tool I'd been half-thinking about became something I had to actually build. A few days earlier a Microsoft engineer published I wasted 68 minutes a day re-explaining my code, then I built auto-memory. Copilot CLI, not Claude Code, but the same underlying shape. The agent is writing structured session data to disk already. The agent just isn't reading its own archive. Different file format, same insight. Copilot CLI keeps a SQLite database. Claude Code writes line-per-turn JSONL. Either way, months of decisions are already persisted. The agent doesn't remember them because nothing's pointing it at them. Credit to that post for crystallizing what I'd been feeling but hadn't named. claude-recall is three pieces: A small Python CLI that indexes the JSONL archive into SQLite with FTS5 full-text search. Incremental, read-only, uses the bundled sqlite3 module. Commands: index, search, show, list, status. Optional semantic rerank on top of the FTS5 results via a local Ollama embedding model (nomic-embed-text). Turns "find me the session where I said X" into "find me the session where I meant X." FTS5 finds keyword overlap; embeddings catch the conceptual cousins. A UserPromptSubmit hook that fires on every message I send in Claude Code and injects ranked prior-session matches as additionalContext. Latency scales with archive size — around 80 ms on small archives, climbing toward 1–2 seconds on 25,000-message corpora when the embed model has to be loaded. With Ollama warm and the binary doing the heavy lifting, the hook stays out of the way of how a prompt feels to send. The hook is a NativeAOT-compiled binary now — claude-recall-hook.exe — not a Python wrapper. Every prompt used to pay a fresh Python interpreter tax and it was killing the UX. The binary fixed that in v0.4. What this looks like in practice: when I drafted this paragraph, the hook ran. It searched 25,000 messages across 20 projects, ranked them against my current phrasing, and injected the top hits into this instance's context. I didn't do anything. I don't actually know what it found — Claude Code doesn't surface additionalContext in the UI — but the effect is visible: the instance stops making up prior decisions. It references them. I also ran claude-recall init-hooks against claude-recall's own repo a few days ago. The tool is now in the loop on the sessions where I maintain it. That's a small thing to mention but it crossed a real threshold: the "agent reading its own prior work" pattern that motivated the project is now also how I work on the project. Short list, honestly stated: Read-only against the archive. Never modify what Claude Code writes. The archive is the source of truth; we consume it, we don't touch it. Graceful degradation. If the hook crashes, the prompt flows normally. A broken recall layer must never block the human. Zero extra install hops in the common case. FTS5 is already in every modern Python stdlib. The embedding layer is opt-in. There's a deeper principle under those three: the tool's job is to make the archive cheap to query. It isn't trying to be clever about what to retrieve, or to summarize, or to editorialize. It finds things. The cleverness has to live in the agent you're already using, not in the recall layer. This is v0.5.3, tagged beta. It works on my machine daily against a 25,000-message archive. It also has bugs — twelve of them caught and closed in the dogfooding cycle, including one silent 27%-data-loss embedding bug that wouldn't have surfaced without real-world use. The v0.5 release landed on PyPI a few days ago, so the install story is now pip install claude-recall rather than a release-wheel URL. It isn't semantic-search-of-everything. It isn't a replacement for CLAUDE.md or for the memory/ auto-memory system Claude Code ships natively. It isn't cross-machine — your session archive is local, the index stays local. That's a feature for me, since the archive has personal data in it; it may be a limitation if you need shared recall across teammates. Public repo at github.com/LearnedGeek/claude-recall, package on PyPI as claude-recall. MIT licensed. Install with pip install 'claude-recall[embeddings]' for the full thing, or skip the extras for FTS5-only. Beta quality — file issues when you break it, because you will, and I want to know. What I'd specifically appreciate testing from early adopters: Anything that crashes or returns silently empty. The dogfooding cycle has hammered project auto-scoping, install-path edge cases, and a particularly nasty silent-data-loss bug in the embedding pipeline. The next class of issue is the one I haven't seen yet, and the only way to find it is to put the tool in front of someone whose archive isn't shaped like mine. Windows / macOS / Linux all welcome. Every agentic coding tool writes structured session data somewhere, and none of them have a native mechanism yet for the agent to read its own prior work. That gap is getting filled tool by tool. The Microsoft post is the Copilot CLI instance. claude-recall is the Claude Code instance. Whichever agent you use, the shape is the same: find the archive, index it cheaply, wire a hook, stop re-explaining. The list is growing. As of the week this post goes up, there are at least three tools in the wild taking adjacent slices of this gap in genuinely different ways: claude-memory-mcp takes the curated route — explicit remember / forget calls via MCP, knowledge graph with typed edges, a separate write-store. Best fit when you want to deliberately structure the facts the agent should hold onto. Beads treats it as task-tracking — Jira-shaped agent-readable state the agent consumes as part of doing work. Best fit when the durable artifact is what's to do rather than what's been said. claude-recall is read-only over what Claude Code already writes — passive hook injection, no curation, no separate store. Best fit when you want prior reasoning surfaced automatically without remembering to save it. Three different theories of what "memory" means for an AI agent. Plenty of room for all three to coexist in someone's workflow. It's also a small demonstration of something I keep running into across projects: when you deploy a system for long enough, the interesting problems show up in the seams. ANI taught me that memory isn't just storage, it's an amplifier. claude-recall is what happens when the same observation comes at me from the other direction — when the agent I'm using needs memory, not the agent I'm building. claude-recall is MIT-licensed and actively maintained. If you try it, I want to know what breaks. Related reading: Why Your AI Assistant Keeps Guessing Wrong for the broader context pattern, or ANI: The Architecture Behind the Companion for the companion project that triggered this one.