arxiv_cs_cv 2026年2月10日

頑健でリアルタイムなバングラデシュ通貨認識：デュアルストリームMobileNetとEfficientNetによるアプローチ

Robust and Real-Time Bangladeshi Currency Recognition: A Dual-Stream MobileNet and EfficientNet Approach

Translated: 2026/2/11 13:16:53

Japanese Translation

arXiv:2602.07015v1 発表タイプ：new 概要：視覚障害者が紙幣を識別する際に他者に依存することは、不正や搾取のリスクを高めるため、正確な通貨認識は支援技術において重要である。本研究ではまず、制御下および実世界のシナリオの両方を含む新しいバングラデシュ紙幣データセットを構築し、より包括的で多様な表現を確保した。次に、データセットの頑健性を高めるために、公開ベンチマークを含む4つの追加データセットを組み込み、さまざまな複雑性をカバーしてモデルの一般化性能を向上させた。現行の認識モデルの限界を克服するために、MobileNetV3-Large と EfficientNetB0 を組み合わせた新規のハイブリッドCNNアーキテクチャを提案し、効率的な特徴抽出を実現した。その後、計算コストを低く抑えつつ性能を向上させるために、効果的な多層パーセプトロン（MLP）分類器を適用し、リソース制約のあるデバイスでも適用可能なシステムを目指した。実験結果では、提案モデルが制御下データセットで97.95%のAccuracy、複雑な背景で92.84%のAccuracy、全データセット結合で94.98%のAccuracyを達成した。モデルの性能は5分割交差検証によって包括的に評価され、以下の7つの指標で検証した：Accuracy（正解率）、Precision（適合率）、Recall（再現率）、F1-score（F1スコア）、Cohen's Kappa（コーエンのカッパ）、MCC（Matthews相関係数）、およびAUC（AUC）。さらに、透明性と解釈性を高めるためにLIMEやSHAPといった説明可能なAI手法を導入している。

Original Content

arXiv:2602.07015v1 Announce Type: new Abstract: Accurate currency recognition is essential for assistive technologies, particularly for visually impaired individuals who rely on others to identify banknotes. This dependency puts them at risk of fraud and exploitation. To address these challenges, we first build a new Bangladeshi banknote dataset that includes both controlled and real-world scenarios, ensuring a more comprehensive and diverse representation. Next, to enhance the dataset's robustness, we incorporate four additional datasets, including public benchmarks, to cover various complexities and improve the model's generalization. To overcome the limitations of current recognition models, we propose a novel hybrid CNN architecture that combines MobileNetV3-Large and EfficientNetB0 for efficient feature extraction. This is followed by an effective multilayer perceptron (MLP) classifier to improve performance while keeping computational costs low, making the system suitable for resource-constrained devices. The experimental results show that the proposed model achieves 97.95% accuracy on controlled datasets, 92.84% on complex backgrounds, and 94.98% accuracy when combining all datasets. The model's performance is thoroughly evaluated using five-fold cross-validation and seven metrics: accuracy, precision, recall, F1-score, Cohen's Kappa, MCC, and AUC. Additionally, explainable AI methods like LIME and SHAP are incorporated to enhance transparency and interpretability.