arxiv_cs_lg 2026年2月10日

BitLogic: 漸化式に基づく FPGA 原生ニューラルネットワークのトレーニングフレームワーク

BitLogic: Training Framework for Gradient-Based FPGA-Native Neural Networks

Translated: 2026/3/15 14:06:10

fpganeural-networksmachine-learningpytorchhardware-acceleration

Japanese Translation

arXiv:2602.07400v1 発表タイプ：新しい要約：深層ニューラルネットワークの推論におけるエネルギーコストとレイテンシコストは、トレーニングよりもデプロイによってますます支配的になり、アリカリファ（演算子）に依存したモデルのためのハードウェア特化型代替案を促進しています。フィールド・プロゲラブル・ゲート・アレイ（FPGA）は、そのような特化のための魅力的な基盤を提供していますが、既存の FPGA ベースのニューラルアプローチは断片化されており、相互比較が困難です。私たちは、Look-Up Table（LUT）計算を基礎とし、FPGA 原生のニューラルネットワークのために、完全に漸化式でエンドツーエンドでトレーニング可能なフレームワーク BitLogic を提示します。BitLogic は、FPGA プリミティブに直接マッピングする微分可能な LUT ノードで乗算 - 累積演算を置き換えることで、ネイティブバイナリ計算、スプロースな接続、および効率的なハードウェア実現を可能にします。このフレームワークは、多様なアーキテクチャに対応するモジュラーな機能的 API、学習されたエンコーダー、ハードウェア認識ヘッド、および複数の境界一貫性のある LUT 緩和をサポートします。自動化されたレジスタ転送レベル（RTL）エクスポート・パイプラインは、トレーニングされた PyTorch モデルをシザブルな HDL へと翻訳し、ソフトウェアとハードウェアの推論の同値性を保証します。標準的なビジョンベンチマークおよび異種ハードウェアプラットフォームにおける実験は、競争的な精度と大幅な FPGA エフィシエンスの向上を示しました。これには、0.3M 以下のロジックゲートで使用した CIFAR-10 の 72.3% のテスト精度や、LUT リソースのみを使用し 20ns 未満での単一サンプル推論が含まれます。

Original Content

arXiv:2602.07400v1 Announce Type: new Abstract: The energy and latency costs of deep neural network inference are increasingly driven by deployment rather than training, motivating hardware-specialized alternatives to arithmetic-heavy models. Field-Programmable Gate Arrays (FPGAs) provide an attractive substrate for such specialization, yet existing FPGA-based neural approaches are fragmented and difficult to compare. We present BitLogic, a fully gradient-based, end-to-end trainable framework for FPGA-native neural networks built around Lookup Table (LUT) computation. BitLogic replaces multiply-accumulate operations with differentiable LUT nodes that map directly to FPGA primitives, enabling native binary computation, sparse connectivity, and efficient hardware realization. The framework offers a modular functional API supporting diverse architectures, along with learned encoders, hardware-aware heads, and multiple boundary-consistent LUT relaxations. An automated Register Transfer Level (RTL) export pipeline translates trained PyTorch models into synthesizable HDL, ensuring equivalence between software and hardware inference. Experiments across standard vision benchmarks and heterogeneous hardware platforms demonstrate competitive accuracy and substantial gains in FPGA efficiency, including 72.3% test accuracy on CIFAR-10 achieved with fewer than 0.3M logic gates, while attaining sub-20 ns single-sample inference using only LUT resources.