arxiv_cs_cv 2026年4月24日

StreamMeCo: リアルタイムビデオ理解のための効率的なエージェント記憶圧縮

StreamMeCo: Long-Term Agent Memory Compression for Efficient Streaming Video Understanding

Translated: 2026/4/24 19:51:51

streamingvideo-understandingagent-memorycompressionneural-research

Japanese Translation

arXiv:2604.09000v2 Announce Type: replace Abstract: ビジョンエージェントの記憶は、ストリーミングビデオの理解において顕著な効果を発揮していますが、このような記憶を記録すると、ストレージと計算の両方で大きなオーバーヘッドが発生し、コストが高くなります。この課題に対処するために、StreamMeCo、すなわち効率的なストリーミングエージェント記憶圧縮フレームワークを提案します。具体的には、記憶グラフの接続性に基づき、StreamMeCo は孤立したノードに対してエッジフリーな最小最大サンプリング、接続されたノードに対してエッジ意識的なウェイト剪定を実装し、不要な記憶ノードを除外しつつ精度を維持します。さらに、記憶圧縮による性能劣化をさらに抑制するために、時間減衰記憶取得機構を導入しました。M3-Bench-robot、M3-Bench-web、および Video-MME-Long という 3 つの挑発的ベンチマークデータセットを大規模に実施した実験により、70% の記憶グラフ圧縮において StreamMeCo が 1.87 倍の記憶取得速度向上をもたらす一方で、平均精度を 1.0% 向上させることを示しました。当社のコードは以下の URL で入手可能です：https://github.com/Celina-love-sweet/StreamMeCo

Original Content

arXiv:2604.09000v2 Announce Type: replace Abstract: Vision agent memory has shown remarkable effectiveness in streaming video understanding. However, storing such memory for videos incurs substantial memory overhead, leading to high costs in both storage and computation. To address this issue, we propose StreamMeCo, an efficient Stream Agent Memory Compression framework. Specifically, based on the connectivity of the memory graph, StreamMeCo introduces edge-free minmax sampling for the isolated nodes and an edge-aware weight pruning for connected nodes, evicting the redundant memory nodes while maintaining the accuracy. In addition, we introduce a time-decay memory retrieval mechanism to further eliminate the performance degradation caused by memory compression. Extensive experiments on three challenging benchmark datasets (M3-Bench-robot, M3-Bench-web and Video-MME-Long) demonstrate that under 70% memory graph compression, StreamMeCo achieves a 1.87* speedup in memory retrieval while delivering an average accuracy improvement of 1.0%. Our code is available at https://github.com/Celina-love-sweet/StreamMeCo.