arxiv_cs_cv 2026年2月10日

クラス無視数算に関する調査：参照ベースからオープンワールドテキスト誘導アプローチまでの進展

A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches

Translated: 2026/3/15 4:02:31

object-countingfew-shot-learningopen-vocabularyvision-language-modelscomputer-vision

Japanese Translation

arXiv:2501.19184v4 Announce Type: replace 要旨：ビジュアル物体重数は最近、クラス無視数算（CAC）へと転換しており、これは任意のカテゴリにまたがって物を数算する課題に対処し、柔軟で汎用性の高い数算システムの必須となる能力です。人間が既知の分類情報なしに多様なカテゴリの物を容易に識別・数算する一方で、既存の数算方法は既知のカテゴリのインスタンスの列挙に制限されており、大規模ラベル付きデータセットの訓練に苦しむとともに、オープン Vocabulary の設定においては苦戦します。それに対して、CAC はトレーニング中に见过的クラスに属する物物を数算することを目的としており、少数ショット設定で動作します。本稿では、CAC の方法論に関する最初の包括的レビューを提示します。我々は、ターゲット物物のクラスを指定する方法に基づいて、CACのアプローチを 3 つのパラダイムに分類する体系を提案しました：参照ベース、参照なし、オープンワールドテキスト誘導。参照ベースのアプローチは、例示誘導機構に依存することで状態の-art の性能を達成しています。参照なしの方法は、固有の画像パターンを活用することで例示依存性を排除しています。最後に、オープンワールドテキスト誘導方法は視覚 - 言語モデルを使用し、テキストプロンプトを通じて物物クラス記述を可能にし、柔軟で有望な解決策を提供しています。この体系に基づき、30 の CAC アーキテクチャの概要を提供し、ゴールドスタンダードのベンチマークにおける性能を報告するとともに、主要な強みと制限を議論しました。具体的には、ゴールドスタンダードの指標を使用したリーダーボードを設定した FSC-147 データセットの結果、および一般化能力の評価のために CARPK データセットの結果を提示しました。最後に、注釈依存性や一般化といった未解決の課題に関する批判的議論、および将来の方向性について述べました。

Original Content

arXiv:2501.19184v4 Announce Type: replace Abstract: Visual object counting has recently shifted towards class-agnostic counting (CAC), which addresses the challenge of counting objects across arbitrary categories, a crucial capability for flexible and generalizable counting systems. Unlike humans, who effortlessly identify and count objects from diverse categories without prior knowledge, most existing counting methods are restricted to enumerating instances of known classes, requiring extensive labeled datasets for training and struggling in open-vocabulary settings. In contrast, CAC aims to count objects belonging to classes never seen during training, operating in a few-shot setting. In this paper, we present the first comprehensive review of CAC methodologies. We propose a taxonomy to categorize CAC approaches into three paradigms based on how target object classes can be specified: reference-based, reference-less, and open-world text-guided. Reference-based approaches achieve state-of-the-art performance by relying on exemplar-guided mechanisms. Reference-less methods eliminate exemplar dependency by leveraging inherent image patterns. Finally, open-world text-guided methods use vision-language models, enabling object class descriptions via textual prompts, offering a flexible and promising solution. Based on this taxonomy, we provide an overview of 30 CAC architectures and report their performance on gold-standard benchmarks, discussing key strengths and limitations. Specifically, we present results on the FSC-147 dataset, setting a leaderboard using gold-standard metrics, and on the CARPK dataset to assess generalization capabilities. Finally, we offer a critical discussion of persistent challenges, such as annotation dependency and generalization, alongside future directions.