arxiv_cs_cv 2026年4月20日

SSMamba: 病理画像分類用の自己教師ありハイブリッド状態空間モデル

SSMamba: A Self-Supervised Hybrid State Space Model for Pathological Image Classification

Translated: 2026/4/20 10:41:53

ssmambavision-transformerself-supervised-learningpathological-image-analysisroi-classification

Japanese Translation

arXiv:2604.15711v1 Announce Type: new 摘要：病理診断は画像解析に大きく依存しており、領域興趣点（ROI）が診断根拠の主要な基盤となっているのに対し、全スライド画像（WSI）レベルのタスクは主に統合されたパターンを捉える。これらの重要な形態的特徴を抽出するために、ビジュアルトランスフォーマー（ViT）および大規模自己教師あり学習（SSL）に基づく ROI レベルのファウンデーションモデル（FM）が広く採用されている。しかし、それらの ROI 解析への適用において、以下の 3 つの核心的な制限が残っている：（1）定尺プリトレーニングが臨床状況の多様さに適応できないため生じる拡大倍率間ドメインシフト；（2）FM の ViT バックボーンが持つ高い計算オーバーヘッドおよび不十分な局所的特徴の記述能力による局所 - 全体関係モデリングの不徹底；（3）伝統的な自己注目機構が微細な診断的キューを見過ごす傾向にあるため、微細な感受性の不十分さ。これらの課題に対処するために、大規模外部データセットに依存せずに効果的な微細な特徴学習を可能にするハイブリッド SSL フレームワークである SSMamba を提案する。このフレームワークは、3 つのドメイン適応コンポーネントを含んでいる：ドメインシフトを軽減する MAMIM（Mamba Masked Image Modeling）、局所 - 全体モデリングのバランスを最適化する DMS（Directional Multi-scale）モジュール、および微細な感受性を向上する LPR（Local Perception Residual）モジュール。SSL のプリトレーニングおよび監督付き微調整（SFT）を含む 2段階のパイプラインを通じて、SSMamba は 10 つの公的 ROI データセットにおいて 11 つの現代的（SOTA）病理 FM よりも優れ、6 つの公的 WSI データセットにおいて 8 つの現代的な手法を上回った。これらの結果は、病理画像解析においてタスク特異的なアーキテクチャ設計の優位性が検証された。

Original Content

arXiv:2604.15711v1 Announce Type: new Abstract: Pathological diagnosis is highly reliant on image analysis, where Regions of Interest (ROIs) serve as the primary basis for diagnostic evidence, while whole-slide image (WSI)-level tasks primarily capture aggregated patterns. To extract these critical morphological features, ROI-level Foundation Models (FMs) based on Vision Transformers (ViTs) and large-scale self-supervised learning (SSL) have been widely adopted. However, three core limitations remain in their application to ROI analysis: (1) cross-magnification domain shift, as fixed-scale pretraining hinders adaptation to diverse clinical settings; (2) inadequate local-global relationship modeling, wherein the ViT backbone of FMs suffers from high computational overhead and imprecise local characterization; (3) insufficient fine-grained sensitivity, as traditional self-attention mechanisms tend to overlook subtle diagnostic cues. To address these challenges, we propose SSMamba, a hybrid SSL framework that enables effective fine-grained feature learning without relying on large external datasets. This framework incorporates three domain-adaptive components: Mamba Masked Image Modeling (MAMIM) for mitigating domain shift, a Directional Multi-scale (DMS) module for balanced local-global modeling, and a Local Perception Residual (LPR) module for enhanced fine-grained sensitivity. Employing a two-stage pipeline, SSL pretraining on target ROI datasets followed by supervised fine-tuning (SFT), SSMamba outperforms 11 state-of-the-art (SOTA) pathological FMs on 10 public ROI datasets and surpasses 8 SOTA methods on 6 public WSI datasets. These results validate the superiority of task-specific architectural designs for pathological image analysis.