dev_to 2026年3月21日

Recurrent Memory Harness（RLM）の紹介：エージェント記憶のための新しいパラダイム

Introducing Recursive Memory Harness: RLM For Agentic Memory

Translated: 2026/3/21 3:07:41

recursive-memoryagentic-aillm-architectureknowledge-graphpersistent-memory

Japanese Translation

持続的な AI 記憶に再帰的アーキテクチャを適用する。Recursive Language Models（MIT CSAIL, 2025）に基づいています。 https://arxiv.org/abs/2512.24601 https://github.com/aayoawoyemi/Ori-Mnemos 再帰的な言語モデルが、情報を強制的に記憶に詰め込み、単一のコンテキストウィンドウの容量を超えて処理することを避ける代わりに、分解し、ナビゲートし、再構成することを可能にすることにより、AI は単一のコンテキストウィンドウが保持できない入力量の 100 倍を処理できるようになります。私たちは、同じ再帰的アーキテクチャを持続的な記憶に適用し、検索品質が蓄積することがわかりました。これは、Redis や Qdrant クラウドインフラ上で構築された記憶システムに相当する性能を達成しましたが、データベースもクラウドも使用せず、ローカルなマークダウンファイルだけで可能です。 Recursive Memory Harness（RMH）は、単独の検索から、能動的な記憶想起へ進むための一歩を踏み出しました。 2025 年 12 月、MIT CSAIL の研究者たちが単純な逆転を提案しました。すべてのデータを、毎クエリに新しく読み込みられる順序的な直線的なコンテキストウィンドウに詰め込むのではなく、モデルが具体的にナビゲートできる環境としてデータを扱うべきです。コンテキストウィンドウを机とみなしてください。AI がクエリ時に知っているすべて（すべてドキュメント、すべて取得された文章、すべて指示）が、その机にはめ込まなければなりません。 RAG に対するこの問題の答えは、書庫員です。質問をされると、書庫員は本のタイトルだけで関連性が見える本を山場に入って取り出し、内容を理解できておらず、ましてや内容を理解することもせず、あなたの机に放置します。その結果、彼らはしばしば間違った本を持ってきます。そして、彼が持ち込む本が机のスペースを取り除くため、実際の思考のためのスペースが減ります。再帰性がある場合、モデル（あなた）自身が図書館に入って行きます。目録を読み、棚から本を取り出して要約し、メモをつけ、学んだことに基づいてより鋭い質問を持って再び目辞しへ入り、もう一本の本を取り出し、繰り返します。その過程全体を通じて机を清潔に保ち、最終的な結果のみを机に戻します。メカニカルに、ドキュメントはコンテキストウィンドウ外のプログラミング環境の変数として保存されます。モデルはそれをチャンク単位で探索するための小さなプログラムを書きます。セクションがまだ大きすぎる場合、モデルはそのセクションだけを扱うサブコールを生成します。これは、一度だけ読んだ報告を返すジュニア研究者です。再帰性は必要に応じて深いレベルまで入れ込みます。現在のすべてのシステム（メモリと検索に関しても）は、コンテキストウィンドウに可能な限り多くの情報を読み込み、荷役することを続ける不健全なプロセスに基づいています。より大きな机を作り、針を藪から探すための特化したシステム（RAG）を数百万ドルかけて作ります。メカニカルに、現在の大きなプレイヤーとラボ（Mem0, Letta, Supermemory など）はすべて、LLM の順序的動作（保存、検索、取得、注入、忘却）を許し、それに基づいて最適化を続けています。しかし、ここで私が書いている通り：私達はずっとモデルに、情報が全面的かつ相対的に関係することを強制するハネスを構築すべきです。我々は、飛行機が鳥をモデル化するように、人間認知をモデル化するソリューションを構築すべきです。 RMH は、持続的な記憶がデータベースではなく環境であるべきだと提言するフレームワークです。情報が孤立したノードとしてではなく、全面的に関連することを強制するハネスです。データベースは知識グラフになります。ノーツはノードとなり、各情報が他の情報に相対性を持つようになり、脳のニューロンのように互いに接続されています。一つを活性化すると、異なるレベルで他のものも活性化します。 1. 検索はグラフに沿って進む必要がある。ノーツが取得された際、活性化はそのエッジに沿って接続されたノーツに伝播する。システムは孤立した結果を返すことはできません。グラフが各情報が接触する他のどの情報と関係しているかを符号化しているため、関連する知識のクラスターのみを返さなければなりません。 2. 解決されていないクエリは再帰する必要がある。検索パスがクエリを完全に解決しない場合、システムは欠けているものへ向かうサブクエリを生成します。各サブクエリは新規のエントリポイントからグラフに入り、独自のパスを実行します。結果が蓄積される。

Original Content

Applying recursive architecture to persistent AI memory. Based on Recursive Language Models (MIT CSAIL, 2025). https://arxiv.org/abs/2512.24601 https://github.com/aayoawoyemi/Ori-Mnemos By empowering LLMs to decompose, navigate, and reassemble information instead of brute-forcing it into memory, Recursive Language Models allow AI to process inputs up to a hundred times beyond what a single context window can hold. We applied that same recursive architecture to persistent memory and found that retrieval quality compounds, matching memory systems built on Redis and Qdrant cloud infrastructure with zero databases, zero cloud. Just local markdown files. Recursive Memory Harness brings us one step closer to active memory recall instead of sequential retrieval. In December 2025, researchers at MIT CSAIL posited a simple inversion. Instead of cramming everything into a sequential, linear context window that gets loaded fresh for every single query, you treat the data as an environment the model can tangibly navigate. Think of the context window as a desk. Everything the AI knows during a query — every document, every retrieved passage, every instruction — has to fit on that desk. RAG's answer to this problem is a librarian. You ask a question, the librarian runs into the stacks, grabs books that seem relevant based only on the titles, not even aware, much less comprehending the contents, and drops them on your desk. As a result they often bring the wrong books. And every book they bring takes up space on your desk, less room for actual thinking. With recursion, you (the model) walk into the library yourself. You read the catalog Pull a book off the shelf and skim it Make a note to yourself Go back to the catalog with a sharper question based on what you just learned Pull another book Repeat Keeping the desk clean during the entire process as you only bring your final results back to the desk. Mechanically, the document is stored as a variable in a programming environment outside the context window. The model writes small programs to explore it in chunks. When a section is still too large, the model spawns a sub-call on that section alone — a junior researcher who reads one piece and reports back. The recursion nests as deep as it needs to. All of the current systems, as it relates to memory and retrieval, are based on iterating on a broken process of consuming and loading as much information into that context window as possible. Build bigger desks. Spend millions creating more specialized systems for finding a needle in the haystack (RAG). Mechanically the current big players and labs — Mem0, Letta, Supermemory — all continue to allow the LLM to operate sequentially: store, search, retrieve, inject, forget. And optimize around it. But as I wrote about here: we should be building harnesses that force the model to relate to pieces of information comprehensively and relatively. We should be building solutions that model human cognition the same way a plane models a bird. RMH is a framework that posits persistent memory should be an environment, not a database. A harness that forces the model to relate to pieces of information comprehensively rather than as isolated nodes. The database becomes a knowledge graph. Notes become nodes where each piece of information has relativity to other pieces of information, similar to neurons in the brain. Activating one, activates others at varying levels. 1. Retrieval must follow the graph. When a note is retrieved, activation propagates along its edges to connected notes. The system cannot return isolated results. It is forced to return clusters of related knowledge, as the graph encodes how each piece of information relates to every other piece it touches. 2. Unresolved queries must recurse. When a retrieval pass does not fully resolve the query, the system generates sub-queries targeting what is missing. Each sub-query enters the graph from new entry points and runs its own pass. Results accumulate. After each pass, the system measures whether new relevant information is still being found. When it isn't, the system stops. 3. Every retrieval must reshape the graph. When a note is accessed, connected notes within two hops receive a vitality boost that decays with distance (spreading activation). When new knowledge references a previously retrieved note, that note gets an additional boost. Notes never retrieved or cited decay on a power-law curve consistent with the Ebbinghaus forgetting curve. The graph is not allowed to be static. It strengthens from use, prunes from neglect. Any framework or system that retrieves isolated notes, answers in a single pass, or treats memory as static storage is not RMH. github.com/aayoawoyemi/Ori-Mnemos Ori Mnemos is the first implementation of Recursive Memory Harness. The AI world is converging on memory and context efficiency as the bottleneck. Your agent's memory, your LLM's memory, is exponentially increasing in value. We are already seeing implementations where agents are sharing knowledge and learning from each other across platforms, decentralized or not. The default architecture in AI memory is cloud-hosted and API-gated — your agent's knowledge stored on someone else's servers, billed monthly, with no performance advantage over local alternatives. The AI landscape is the wild west. Norms are being created and systems are being entrenched. Ori started as a philosophical endeavor to nip this in the bud before it becomes the standard. The question: how can you build a memory architecture that performs at the level of cloud-dependent systems while keeping every piece of data yours? The answer was a folder of markdown files connected by wiki-links, versioned with git, readable with your eyes. No database. No cloud. No vendor between you and your memory. That experiment produced a system that compresses the entire memory stack — Redis, Qdrant, cloud infrastructure — into local files that perform comparably on standard benchmarks with zero dependencies. The system runs through MCP and CLI, with the CLI being finalized in the coming days. Any model connects to Ori and receives recursive graph navigation without writing code, without managing state. Open source under Apache 2.0. npm install ori-memory. I use it in my personal workflow. Over 500 notes spanning six project domains. Over time my AI agent has become a personalized, specialized assistant that knows my work across all of them. It finds connections between projects I didn't draw myself. It surfaces decisions I made months ago when they become relevant again. All of it saved on my computer. All of it mine. HotpotQA tests multi-hop question answering — each question requires finding and combining information from exactly two different documents to produce an answer. Head-to-head on the same 50 questions, same scoring: Metric Ori (RMH) Mem0 R@5 90.0% 29.0% F1 52.3% 25.7% LLM-F1 (answer quality) 41.0% 18.8% Speed 142s 1347s API calls for ingestion None (local) ~500 LLM calls Cost to run Free API costs per query Infrastructure Zero Redis + Qdrant Recursive Language Models proved that recursion over context is worth more than a bigger context window. Recursive Memory Harness proves that relativity in memory outperforms isolated retrieval. Ori Mnemos is the first implementation, open source, zero infrastructure, performing at the level of cloud-dependent systems on standard benchmarks. Star the repo. Run the benchmarks yourself. Tell us what breaks, build ontop of and with RMH.