arxiv_cs_cv 2026年2月10日

グローバルコンテキストビジョントランスフォーマーを用いた微細種別猫の認識

Fine-Grained Cat Breed Recognition with Global Context Vision Transformer

Translated: 2026/3/15 18:03:50

deep-learningcat-breed-recognitionvision-transformerimage-classificationdata-augmentation

Japanese Translation

arXiv:2602.07534v1 Announce Type: new 摘要：画像から猫の種を正確に識別することは、毛並み、顔の構造、色といった微妙な差に対処する必要があるため、難しい課題です。本稿では、オックスフォード・IIIT ペットデータセットの一部を用い、画像から猫の種を分類する、深層学習ベースのアプローチを提示します。このデータセットには、さまざまな家庭猫の高解像度画像が含まれています。我々は、猫の種認識のために「グローバルコンテキストビジョントランスフォーマー (GCViT)-tiny」アーキテクチャを採用しました。モデルの汎化性能を向上させるために、回転、水平反転、明るさ調整を含む広範なデータ拡張を実装しました。実験結果は、GCViT-Tiny モデルがテスト精度 92.00%、検証精度 94.54% を達成したことを示しています。これらの知見は、トランスフォーマーベースのアーキテクチャが微細画像分類タスクにおいて効果的であることを強調しています。潜在的な応用分野には、獣医診断、動物保護施設管理、モバイルベースの種別識別システムが含まれます。我々は、https://huggingface.co/spaces/bfarhad/cat-breed-classifier にハッキング・フェースデモも提供します。

Original Content

arXiv:2602.07534v1 Announce Type: new Abstract: Accurate identification of cat breeds from images is a challenging task due to subtle differences in fur patterns, facial structure, and color. In this paper, we present a deep learning-based approach for classifying cat breeds using a subset of the Oxford-IIIT Pet Dataset, which contains high-resolution images of various domestic breeds. We employed the Global Context Vision Transformer (GCViT) architecture-tiny for cat breed recognition. To improve model generalization, we used extensive data augmentation, including rotation, horizontal flipping, and brightness adjustment. Experimental results show that the GCViT-Tiny model achieved a test accuracy of 92.00% and validation accuracy of 94.54%. These findings highlight the effectiveness of transformer-based architectures for fine-grained image classification tasks. Potential applications include veterinary diagnostics, animal shelter management, and mobile-based breed recognition systems. We also provide a hugging face demo at https://huggingface.co/spaces/bfarhad/cat-breed-classifier.