arxiv_cs_cv 2026年4月20日

TTL: プルーントビジョン・ラングゲージモデルを用いた OOD 検出のためのテスト時テキスト学習

TTL: Test-time Textual Learning for OOD Detection with Pretrained Vision-Language Models

Translated: 2026/4/20 10:48:03

vision-language-modelsood-detectiontest-time-adaptationclipprompt-learning

Japanese Translation

arXiv:2604.15756v1 告知タイプ：横断概要：CLIP を含むビジョン・ラングゲージモデル (VLM) は、視覚的表現とテキスト的表現の整合化を通じて強力な Out-of-distribution (OOD) 検出能力を示します。最近、CLIP ベースのテスト時適応手法では、外部 OOD ラベルを統合することで検出性能がさらに向上しました。しかし、そのようなラベルは有限で固定されており、実際の OOD 的意味空間は本質的に開かれた状態です。したがって、固定されたラベルは、テストストリームで遭遇する多様かつ進化する OOD 意味を反映する能力が不足しています。この制限に対処するため、私たちは、外部 OOD ラベルに依存しないテストストリームから無ラベルサンプルを用いて動的に OOD 的テキスト的意味を学習する「テスト時テキスト学習 (TTL) 」という枠組みを導入しました。TTL は、新興 OOD 知識を捉えるために学習可能プロンプトを偽ラベル化されたテストサンプルを用いて更新します。偽ラベルに伴うノイズを抑制するために、適応に信頼できる OOD サンプルを選択し、ノイズを抑制する OOD 知識纯化戦略を導入しました。また、TTL は高品質なテキスト特徴を保管する OOD 的テキスト知識バンクを維持し、バッチ間で安定したスコアカルイブレーションを可能にします。標準的なベンチマーク 2 つと 9 つの OOD データセットに対する広範な実験は、TTL が一貫して State-of-the-art の性能を達成し、頑健なテスト時 OOD 検出のための適応の価値を強調しました。当社のコードは https://github.com/figec/TTL に利用可能です。

Original Content

arXiv:2604.15756v1 Announce Type: cross Abstract: Vision-language models (VLMs) such as CLIP exhibit strong Out-of-distribution (OOD) detection capabilities by aligning visual and textual representations. Recent CLIP-based test-time adaptation methods further improve detection performance by incorporating external OOD labels. However, such labels are finite and fixed, while the real OOD semantic space is inherently open-ended. Consequently, fixed labels fail to represent the diverse and evolving OOD semantics encountered in test streams. To address this limitation, we introduce Test-time Textual Learning (TTL), a framework that dynamically learns OOD textual semantics from unlabeled test streams, without relying on external OOD labels. TTL updates learnable prompts using pseudo-labeled test samples to capture emerging OOD knowledge. To suppress noise introduced by pseudo-labels, we introduce an OOD knowledge purification strategy that selects reliable OOD samples for adaptation while suppressing noise. In addition, TTL maintains an OOD Textual Knowledge Bank that stores high-quality textual features, providing stable score calibration across batches. Extensive experiments on two standard benchmarks with nine OOD datasets demonstrate that TTL consistently achieves state-of-the-art performance, highlighting the value of textual adaptation for robust test-time OOD detection. Our code is available at https://github.com/figec/TTL.