arxiv_cs_cv 2026年4月24日

Semantic-Fast-SAM：精度を損なわずリアルタイム性能を実現する効率的なセマンティックセグメンター

Semantic-Fast-SAM: Efficient Semantic Segmenter

Translated: 2026/4/24 19:52:22

segment-anythingsemantic-segmentationfastsamreal-time-visionfoundational-models

Japanese Translation

arXiv:2604.20169v2 Announce Type: replace Abstract: SFS（Semantic-Fast-SAM）は、Fast Segment Anything モデル（FastSAM）とセマンティックラベリングパイプラインを組み合わせ、精度を犠牲にしつつリアルタイム性能を発揮するセマンティックセグメンテーション・フレームワークを提案します。FastSAM は、原 transformer 型 SAM を大幅に高速化するために再実装した効率的な CNN ベースのモデルです。FastSAM の高速マスク生成機能に基づき、各マスクに意味のあるカテゴリを割り当てるため、Semantic-Segment-Anything（SSA）ラベリング戦略を導入しました。その結果、SFS モデルは原 SAM ベースのアプローチと比べ、計算コストとメモリーフットプリントの一部に高品質なセマンティックセグメンテーションマップを生成します。Cityscapes および ADE20K ベンチマークの実験において、SFS は先駆的な SAM ベースの手法（Cityscapes で mIoU 約 70.33、ADE20K で mIoU 約 48.01）と同等の精度を示し、クローズドセット設定において SSA より約 20 倍の高速推論を実現しました。また、CLIP ベースのセマンティックヘッドを活用することで、Open-vocabulary セグメンテーションを効果的に処理でき、最近のオープンバコーバスモデルの広範なカテゴリラベリングにおいて superior なパフォーマンスを発揮したことも示しました。本研究は、「セグメント・アラビアン」機能付き実用的なリアルタイムセマンティックセグメンテーションを可能にし、ロボティクスシナリオにおけるファウンデーション・セグメンテーションモデルの適用範囲を広げます。実装は https://github.com/KBH00/Semantic-Fast-SAM に利用可能です。

Original Content

arXiv:2604.20169v2 Announce Type: replace Abstract: We propose Semantic-Fast-SAM (SFS), a semantic segmentation framework that combines the Fast Segment Anything model with a semantic labeling pipeline to achieve real-time performance without sacrificing accuracy. FastSAM is an efficient CNN-based re-implementation of the Segment Anything Model (SAM) that runs much faster than the original transformer-based SAM. Building upon FastSAM's rapid mask generation, we integrate a Semantic-Segment-Anything (SSA) labeling strategy to assign meaningful categories to each mask. The resulting SFS model produces high-quality semantic segmentation maps at a fraction of the computational cost and memory footprint of the original SAM-based approach. Experiments on Cityscapes and ADE20K benchmarks demonstrate that SFS matches the accuracy of prior SAM-based methods (mIoU ~ 70.33 on Cityscapes and 48.01 on ADE20K) while achieving approximately 20x faster inference than SSA in the closed-set setting. We also show that SFS effectively handles open-vocabulary segmentation by leveraging CLIP-based semantic heads, outperforming recent open-vocabulary models on broad class labeling. This work enables practical real-time semantic segmentation with the "segment-anything" capability, broadening the applicability of foundation segmentation models in robotics scenarios. The implementation is available at https://github.com/KBH00/Semantic-Fast-SAM.