arxiv_cs_cv 2026年4月24日

ビデオコピー検出ための効率的なロジックゲートネットワーク

Efficient Logic Gate Networks for Video Copy Detection

Translated: 2026/4/24 19:45:54

logic-gate-networksvideo-copy-detectionneural-networkscomputer-visiondeep-learning

Japanese Translation

arXiv:2604.21694v1 発表タイプ：新しい要約：ビデオコピー検出は、多様な視覚的な歪み下で堅実な類似度推定を行うことに加え、非常に大規模に動作する必要があります。 although 深度学習ニューラルネットワークは強力な性能を実現していますが、計算コストと記述子サイズが、高Throughput システムにおける実用的な展開を制限します。この研究では、従来の浮動小数点特徴抽出子をコンパクトで論理に基づく表現に置換する、微分可能なロジックゲートネットワーク（LGNs）に基づいたビデオコピー検出フレームワークを提案します。私たちの手法は、激しいフレームミニマル化、二値化前処理、および論理操作と間接接続を同時に学習する可学習な LGN エンベディングモデルを組み合わせます。トレーニング後、モデルは完全に Boolean 回路として離散化でき、非常に高速でメモリ効率の良い推論を可能にします。私たちは、複数のデータセットフォルドと難易度レベルにわたり、異なる類似度戦略、二値化スキーム、および LGN アーキテクチャを系統的に評価します。実験結果は、LGN ベースのモデルが先駆的なモデルと比較して競争力のあるまたは上回る精度およびランキング性能を果たし、記述子を数桁小さく、推論速度を毎秒 11k サンプルを超えて提供することを示しています。これらの発見は、論理ベースのモデルがスケーラブルでリソース効率の良いビデオコピー検出のための有望な代替案である可能性を示唆しています。

Original Content

arXiv:2604.21694v1 Announce Type: new Abstract: Video copy detection requires robust similarity estimation under diverse visual distortions while operating at very large scale. Although deep neural networks achieve strong performance, their computational cost and descriptor size limit practical deployment in high-throughput systems. In this work, we propose a video copy detection framework based on differentiable Logic Gate Networks (LGNs), which replace conventional floating-point feature extractors with compact, logic-based representations. Our approach combines aggressive frame miniaturization, binary preprocessing, and a trainable LGN embedding model that learns both logical operations and interconnections. After training, the model can be discretized into a purely Boolean circuit, enabling extremely fast and memory-efficient inference. We systematically evaluate different similarity strategies, binarization schemes, and LGN architectures across multiple dataset folds and difficulty levels. Experimental results demonstrate that LGN-based models achieve competitive or superior accuracy and ranking performance compared to prior models, while producing descriptors several orders of magnitude smaller and delivering inference speeds exceeding 11k samples per second. These findings indicate that logic-based models offer a promising alternative for scalable and resource-efficient video copy detection.