dev_to 2026年3月15日

非同期コンパクション：誰も話していない競合条件

Async compaction: the race conditions nobody talks about

Translated: 2026/3/15 18:00:17

async-compactionconcurrency-race-conditionsllm-context-managementframework-comparisonmemory-synchronization

Japanese Translation

Claude Code はコンパクション中にエージェントをブロックします。LangGraph はコンパクションをバックグラウンドで実行し、メッセージを無視してドロップします。Aider はバックグラウンドスレッドを起動し、幸運を祈るだけでした。非同期コンパクションは明白な最適化のように聞こえますが、実際に構築しようと試みたときではないのです。私たちは主要なフレームワークがコンテキストコンパクションタイミング（同期、非同期、あるいは行わない）をどのように扱っているかを調査し、コンパクションをクリティカルパスから外したときに発生する並行性の危害を列挙しました。これが見つかったことは次の通りです。ほとんどのフレームワークはコンパクションを同期的に実行します。エージェントは停止し、LLM が要約してから、エージェントは短いコンテキストで再開します。これは遅いですが安全です。 | フレームワーク | アプローチ | エージェントがブロックされているか | 競合のリスク | | :--- | :--- | :--- | :--- | | Claude Code | 95% カパシティで同期 | はい | なし | | LangChain | トーン後に同期 | はい | なし | | AutoGen | チャット間で同期 | はい | なし | | Cursor | なし（手動リセット） | N/A | N/A | | ChatGPT | なし（手動） | N/A | N/A | | Aider | バックグラウンドスレッド | いいえ | 中程度 | | Google ADK | イベントベースの非同期 | いいえ | 中程度 | | LangGraph | 非同期バックグラウンド | いいえ | 高 | 8 つのフレームワークのうち 6 つはブロックするか、完全にコンパクションを行っていません。業界はその実装で投票しました：同期コンパクションは安全なデフォルトです。コストは現実的です。LLM 要約はコンテキストサイズとモデルによって 2〜10 秒かかります。その間、エージェントは応答できません。インタラクティブ用途（コーディングアシスタント、チャットボットなど）にとっては目立つ遅延になります。バックグラウンド自動化においてはそれほど問題ありません。コンパクションをバックグラウンドタスクに移動させるには、5 つの競合バグのカテゴリが導入されます。私たちは実産生フレームワークでこれら全部の証拠を見つけました。コンパクションは現在のメッセージ履歴を読み、LLM に要約を求めて待ちます。その待機中に新しいメッセージが到着します。コンパacted にされた要約にはそれらを含まないのです。要約が元の履歴を置き換える時、新しいメッセージは静かに失われます。 LangGraph のドキュメントされた競合条件：履歴は古いスナップショットから再構築され、その後完全に置き換えられ、コンパクションのウィンドウ中に記録されたアイテムが失われます。提案された解決策（バージョンカウンターと生成 ID）はまだ実装されていません。これは古いスナップショットの結果ですが、それがどのように現れるかのために別のカテゴリを值得するのです：エージェントは何のエラー、警告、ログエントリもないまま、最近のコンテキストを単純に「忘れ」去るのです。ユーザーは「実は、yarn を pnpm に変更してください」と言います。コンパクションが始まります。コンパicted にされた要約は変更前の状態を捉えます。ユーザーの訂正は消え去るのです。 LangGraph の 3 ステップ非同期操作（スナップショット → 要約 → 置き換え）は途中で失敗し、メモリとディスクが同期されていなくなる可能性があります。部分的な失敗は要約が書き込まれたが、古い履歴が完全に削除されていない、あるいは逆の場合を意味します。複数の WHS サービスまたはエージェントが並列にコンパクションを行った場合、結果は順序立てて到着します。サービス A がメッセージ 1〜50 を、サービス B がメッセージ 30〜60 をコンパクションし、重なりのあるカバレッジを持つ場合。どちらの結果が勝ちます？重なりのあるコンパクションをどのようにマージしますか？単一サービスシステムにおいてはこれは起きにくいですが、walrus（メモリ、検索、チャンネルはすべて WHS サービスであり、Compact 機能を宣言することがある）では、並列コンパクションは現実のシナリオです。コンパクションは悪い要約を生み出します。それは重要な事実をドロップし、意思決定を誤って描写したり、エッジケースを一般化して消去したりします。同期的なコンパクションでは、継続する前に検証可能です。非同期コンパクションでは、エージェントはコンパクション前のコンテキストに基づいてすでに行動をとっています。悪い要約を検出するまでに、被害は完了しているのです。調査したどのフレームワークもコンパクションのロールバックを実装していません。要約が生成された瞬間に権威付けられます。トークンの閾値を超えた → バックグラウンドでコンパクションが始まる → 新しいメッセージが到着する → トークンの閾値が再び超える → 第 2 のコンパクションが始まる。今、2 つの並列コンパクションは同じ履歴上で競合します。 LangGraph は max_compact_attempts カウンタを持っていません。無限のコンパクションリトライは理論上可能である可能性があります。提案された解決策には最大のリトライ数制限が含まれていますが、実装されていません。 [インタラクティブなチャート — 元の投稿を確認] Aider はバックグラウンドスレッドで再帰的要約を実行します

Original Content

Claude Code blocks the agent while compacting. LangGraph runs compaction in the background and silently drops messages. Aider spawns a background thread and hopes for the best. Async compaction sounds like the obvious optimization — until you try to build it. We surveyed how major frameworks handle context compaction timing — synchronous, asynchronous, or not at all — and catalogued the concurrency hazards that emerge when you move compaction off the critical path. Here's what we found. Most frameworks run compaction synchronously. The agent stops, the LLM summarizes, the agent continues with a shorter context. It's slow but safe. Framework Approach Agent blocked Race risk Claude Code Sync at 95% capacity Yes None LangChain Sync after turn Yes None AutoGen Sync between chats Yes None Cursor None (manual reset) N/A N/A ChatGPT None (manual) N/A N/A Aider Background thread No Medium Google ADK Async event-based No Medium LangGraph Async background No High Six of eight frameworks either block or don't compact at all. The industry has voted with its implementations: synchronous compaction is the safe default. The cost is real. LLM summarization takes 2–10 seconds depending on context size and model. During that window, the agent can't respond. For interactive use cases (coding assistants, chatbots), that's a noticeable hang. For background automation, it barely matters. Moving compaction to a background task introduces five categories of concurrency bugs. We found evidence of all five in production frameworks. Compaction reads the current message history, sends it to an LLM for summarization, and waits for the result. During that wait, new messages arrive. The compacted summary doesn't include them. When the summary replaces the original history, the new messages are silently lost. LangGraph's documented race: history is rebuilt from a stale snapshot then fully replaced, dropping items recorded during the compaction window. The proposed fix — version counters and generation IDs — is not yet implemented. This is the consequence of stale snapshots, but it deserves its own category because of how it manifests: the agent simply "forgets" recent context with no error, no warning, no log entry. The user says "actually, use pnpm instead of yarn." Compaction starts. The compacted summary captures the pre-change state. The user's correction vanishes. LangGraph's three-step async operation (snapshot → summarize → replace) can fail mid-way, leaving memory and disk out of sync. A partial failure means the summary was written but the old history wasn't fully removed — or vice versa. If multiple WHS services or agents compact in parallel, results arrive out of order. Service A compacts messages 1–50 while Service B compacts messages 30–60 with overlapping coverage. Which result wins? How do you merge overlapping compactions? In single-service systems this is less likely. But in walrus — where memory, search, and channels are all WHS services that may declare the Compact capability — parallel compaction is a real scenario. Compaction produces a bad summary — it drops a critical fact, mischaracterizes a decision, or generalizes away an edge case. In synchronous compaction, you can validate before continuing. In async compaction, the agent has already acted on the pre-compaction context. By the time you detect the bad summary, the damage is done. No framework we surveyed implements compaction rollback. The summary is treated as authoritative the moment it's produced. Token threshold crossed → compaction starts in background → more messages arrive → threshold crossed again → second compaction starts. Two concurrent compactions now race on the same history. LangGraph has no max_compact_attempts counter — infinite compaction retries are theoretically possible. The proposed fix includes a maximum attempt limit, but it's unimplemented. [Interactive chart — see original post] Aider runs recursive summarization in a background thread using a cheaper "weak model" — a smaller, faster LLM that handles compression while the main model continues reasoning. What works: the main agent is never blocked. Compaction cost is reduced by using a cheaper model. Recursive summarization (summary of summaries) keeps context compact over long sessions. What's missing: no documented handling of what happens when the agent queries content that's currently being compacted. If the background thread hasn't finished and the agent needs the old context, it reads stale data or waits — defeating the purpose of async. Google ADK triggers compaction via events and runs summarization asynchronously. The result is written back as a new event. A sliding window with overlap preserves the most recent messages. What works: the event-based architecture means compaction is just another event in the stream. The overlap window (keeping the last N messages uncompacted) prevents the worst stale-snapshot problems — recent context always survives. What's missing: ordering guarantees when events arrive during compaction are not documented. If the compaction event completes after several new user events, the insertion point matters. Google ADK doesn't specify whether the summary event is inserted at the position where compaction started or at the current head. LangGraph attempts true async compaction but has documented concurrency bugs: Silent drop: items recorded during the compaction window are lost when history is fully replaced Partial failure: memory and disk can get out of sync if the three-step operation (snapshot → summarize → replace) fails mid-way Unbounded retries: no maximum compaction attempt counter The proposed fixes are sound — version counters, atomic replacement, max attempts — but none are implemented as of March 2026. LangGraph is the clearest evidence that async compaction is harder than it looks. [Interactive chart — see original post] MemGPT (now Letta) takes a radically different approach: the agent controls its own memory tiers, like an operating system managing physical and virtual memory. The LLM context window is "physical memory." External storage is "virtual memory." The agent explicitly moves information between tiers via function calls. No background compaction. No race conditions. The agent decides what to archive and what to recall. This is the only framework we surveyed with zero concurrency hazards. The trade is cognitive overhead: the agent spends tokens reasoning about memory management instead of the actual task. MemGPT's approach is elegant but expensive in a different currency — model attention rather than infrastructure complexity. Walrus currently compacts synchronously. The on_compact() hook blocks the agent loop while WHS services return compacted context — tokio::task::block_in_place() bridges the async/sync gap. Each service has a 10-second timeout. Safe, but the agent hangs. Moving to async compaction would look like this: Agent loop detects context threshold → fires CompactSession event Background tokio task dispatches to all Compact-capable WHS services Services return compacted prompt additions Results stored in session as "pending compaction" Next on_before_run() injects pending compaction into the prompt Agent continues immediately after step 1 This design uses walrus's existing event infrastructure — DaemonEvent variants, tokio::spawn(), the task watcher pattern in the task registry. No Hook trait changes required. But all five hazards apply: Stale snapshot: messages arrive between event fire (step 1) and result injection (step 5). The compacted summary doesn't include them. Fix: keep a generation counter on the session history. Reject compaction results if the generation has advanced beyond a threshold. Silent drop: if pending compaction replaces history naively, messages from steps 2–4 vanish. Fix: merge, don't replace. Append the compaction summary alongside messages received during the compaction window, not instead of them. Ordering: multiple WHS services may compact in parallel. Their results must be serialized. Fix: the existing RPC mutex on ServiceRegistry (already used for tool dispatch) can serialize compaction results. Alternatively, sequence compaction responses by service priority. Failed rollback: a bad summary from a WHS service corrupts context. Fix: store pre-compaction history snapshot. If the agent detects degraded quality (a heuristic, not foolproof), restore from snapshot. Double compact: threshold crossed again before first compaction completes. Fix: at most one compaction in flight per session. New threshold crossings set a "compact pending" flag but don't spawn another task. [Interactive chart — see original post] From surveying frameworks and Anthropic's context engineering guide, four patterns emerge: 1. Version counters — Track a generation ID on session history. When compaction starts, record the current generation. When results arrive, check if the generation has advanced. If it has, either reject the compaction or merge it with the new messages. Proposed for LangGraph but not yet implemented. 2. Overlapping windows — Never compact the last N messages. Google ADK uses this with its sliding window. Anthropic recommends raw context over compaction over summarization — keep as much original context as possible, especially recent messages. 3. Optimistic apply with validation — Apply the async compaction result, then run a quick validation: are key facts preserved? Does the summary mention the current task? If validation fails, roll back to pre-compaction history. This adds one more LLM call but catches the worst failures. 4. Throttled compaction — At most one compaction in flight per session. New threshold crossings queue, don't spawn. This prevents double compaction entirely and simplifies the state machine. Walrus's task registry already implements similar concurrency control with its queue-and-promote pattern. Is the latency savings worth the complexity? Sync compaction blocks for 2–10 seconds. For interactive agents, that's annoying. For background automation, it's irrelevant. How often does compaction actually happen in practice — once per session? Once per hundred turns? If it's rare, the engineering cost of async may not pay off. Should results be applied immediately or at a natural break? Injecting compaction results mid-turn could confuse the agent. Waiting for a natural break (tool response, user message) is safer but adds latency. Where's the right insertion point? Can you validate a compaction summary without another LLM call? Embedding similarity between pre- and post-compaction context could catch gross information loss. String matching for key entities could catch fact drops. Neither is as reliable as LLM-based validation, but both are cheaper. How should async compaction appear in the task registry? Walrus's task registry tracks agent tasks as a live tree visible via walrus ps. Should background compaction appear as a task? A session annotation? Invisible infrastructure? Observability matters for debugging. Does MemGPT's approach eliminate the need for async compaction entirely? If the agent controls its own memory paging, there's nothing to run in the background. The trade is cognitive overhead — but with capable models, that overhead shrinks. Is agent-controlled paging the endgame, making async compaction a transitional pattern? Further reading Anthropic: Effective context engineering for AI agents MemGPT: Towards LLMs as Operating Systems — virtual context management LangGraph race conditions — documented concurrency bugs LangChain async memory issue — the original feature request ACON: Optimizing Context Compression — failure-driven compression for long-horizon agents Claude Code compaction docs — sync approach with automatic trigger Aider repository map — background summarization architecture Our context compaction survey covers the eight frameworks at an architectural level. This post goes deeper on the async-specific challenges. The persistent agent memory survey covers the broader memory architecture that compaction interacts with. Mem0's extraction pipeline faces similar async challenges. Hermes's FTS5 layer must also handle concurrent writes. Originally published at OpenWalrus.