4037 articles

arxiv_cs_cv 2026/4/20

アフィン不変性を持つ近傍演算子を学習する方法

Learning Affine-Equivariant Proximal Operators

arXiv:2604.15556v1 発表タイプ: クロス要約：近傍演算子は、信号処理や機械学習を含む多様なアプリケーションにおいて、不適定な逆問題の解決などにおいて基盤的な役割を果たしています。最近の研究では、データ駆動型あるいは非凸の正則化器に対して正確な近傍値を計算するパラメトリックな関数である「学習型近傍ネットワーク（LPN）」が導入されました。しかし、多くのシナリオでは、これらの正則...

Original: arXiv:2604.15556v1 Announce Type: cross Abstract: Proximal operators are fundamental across many applications in signal processing and machine learning, including solving ill-posed inverse problems. ...

arxiv_cs_cv 2026/4/20

GIST：知的な意味的トポロジーを通じた多式態的知識抽出と空間的アンカリング

GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology

arXiv:2604.15495v1 Announce Type: cross Abstract: レットail ストア、倉庫、病院のような複雑で密集した環境をナビゲートすることは、人間とエンボディメント型 AI にとって重大な空間的アンカリングの課題です。これらの空間では、静的な性質を持つ物に鑑み、密集した視覚的特徴はすぐに陳腐化し、長尾的な意味分布は従来のコンピュータビジョンを困難にします。...

Original: arXiv:2604.15495v1 Announce Type: cross Abstract: Navigating complex, densely packed environments like retail stores, warehouses, and hospitals poses a significant spatial grounding challenge for hum...

arxiv_cs_cv 2026/4/20

ProtoTTA：プロトタイプ導向なテスト時適応学習

ProtoTTA: Prototype-Guided Test-Time Adaptation

arXiv:2604.15494v1 Announce Type: cross Deep networks that rely on prototypes-interpretable representations that can be related to the model input-have gained significant attention for balancing high...

Original: arXiv:2604.15494v1 Announce Type: cross Abstract: Deep networks that rely on prototypes-interpretable representations that can be related to the model input-have gained significant attention for bala...

arxiv_cs_cv 2026/4/20

RelativeFlow: 不洁净参照を用いた医学画像去ノイズ学習の制御

RelativeFlow: Taming Medical Image Denoising Learning with Noisy Reference

arXiv:2604.15459v1 告知タイプ: 横断摘要：医学画像去ノイズ (MID) は、絶対的にクリーンな画像を監督用として欠くことにより、去ノイズ性能を根本的に制限する不洁净参照問題が発生します。既存の擬似監督型ディスクリミネティブ学習 (SimSDL) と生成学習 (SimSGL) は不洁净参照をクリーン目標として扱っており、最適化に達しないか参照バイアスに陥る一方、自己監督型学習...

Original: arXiv:2604.15459v1 Announce Type: cross Abstract: Medical image denoising (MID) lacks absolutely clean images for supervision, leading to a noisy reference problem that fundamentally limits denoising...

arxiv_cs_cv 2026/4/20

M3R：気象情報に基づくマルチモーダル注意力機構を備えた局所降雨予報

M3R: Localized Rainfall Nowcasting with Meteorology-Informed MultiModal Attention

arXiv:2604.15377v1 発表タイプ：クロス要約：正確でタイムリーな降雨の即時予報は、災害減災と水資源管理において不可欠である。近年の深層学習の進展にもかかわらず、多様なマルチメディアデータソースを効果的に活用する限界により、降降水予測は依然として困難を極めている。我々は、可視化 NEXRAD レーザーイメージと数値的個人気象局 (PWS) 測定を相補的に統合し、異種気象データの時間...

Original: arXiv:2604.15377v1 Announce Type: cross Abstract: Accurate and timely rainfall nowcasting is crucial for disaster mitigation and water resource management. Despite recent advances in deep learning, p...

arxiv_cs_cv 2026/4/20

Vision-Language Model を活用した衝突図生成の自動化：多車線円周路の事例研究

Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts

arXiv:2604.15332v1 発表タイプ：横断要約：衝突図は交通安全分析における不可欠なツールですが、手動作成には時間がかかり、人的なばらつきが生じやすいという課題があります。本ศึกษาは、警察の事故報告書から衝突図を生成するために Vision-Language Models（VLMs）を用いる方法を調査し、困難なテストケースである多車線円周路に焦点を当てています。モデルの推論を解釈...

Original: arXiv:2604.15332v1 Announce Type: cross Abstract: Crash diagrams are essential tools in transportation safety analysis, yet their manual preparation remains time-consuming and prone to human variabil...

arxiv_cs_cv 2026/4/20

アウトレグレジブレイアウト生成への 3D ジェネラティブモデルの再利用

Repurposing 3D Generative Model for Autoregressive Layout Generation

arXiv:2604.16299v1 Announce Type: new 要約：我々は、3D ジェネラティブモデルを 3D レイアウト生成への再利用フレームワークである「LaviGen」を紹介しました。従来のテキスト記述から物体レイアウトを推定する手法とは異なり、LaviGen はネイティブの 3D 空間内にて動作し、物体間の幾何学的関係と物理制約を明示的にモデル化するアウトレグレジブプロセスと...

Original: arXiv:2604.16299v1 Announce Type: new Abstract: We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from...

arxiv_cs_cv 2026/4/20

FineCog-Nav: 粗粒度を微細化してゼロショットマルチモーダル UAV ナビゲーションを実現する

FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation

arXiv:2604.16298v1 Announce Type: new Abstract: UAV 版ビジョン言語ナビゲーション (VLN) は、自己中心視点から複雑な 3 次元環境を移動させ、長期計画にわたる曖昧なマルチステップ指示に従うようにエージェントを要求します。既存のゼロショット手法は、大規模な基礎モデルに依存し、汎用的なプロンプトを使用し、また松く調和されたモジュールを組み立てると...

Original: arXiv:2604.16298v1 Announce Type: new Abstract: UAV vision-language navigation (VLN) requires an agent to navigate complex 3D environments from an egocentric perspective while following ambiguous mul...

arxiv_cs_cv 2026/4/20

霧気のある野生動物画像の向上：AnimalHaze3k と IncepDehazeGan

Enhancing Hazy Wildlife Imagery: AnimalHaze3k and IncepDehazeGan

arXiv:2604.16284v1 Announce Type: new 要約：大気による霧気は、保護のための重要なコンピュータビジョンアプリケーション（動物検出、トラッキング、行動分析など）を妨げ、野生動物画像を著しく劣化させます。この課題に対処するため、我々は、物理ベースのパイプラインを用いて 1,159 枚のクリアな野生動物写真から生成された 3,477 枚の霧気画像を含む合成データセット...

Original: arXiv:2604.16284v1 Announce Type: new Abstract: Atmospheric haze significantly degrades wildlife imagery, impeding computer vision applications critical for conservation, such as animal detection, tr...

arxiv_cs_cv 2026/4/20

VEFX-Bench: 汎用的ビデオ編集とビジュアルエフェクトのための包括的なベンチマーク

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

arXiv:2604.16272v1 Announce Type: new Abstract: AI 支援のビデオ制作がより実用的になりつつある中、指示に基づいたビデオ編集は、生成または撮影された映像をプロフェッショナルな要件を満たすよう精製するための不可欠なツールとなっています。しかし、この分野はまだ、完全な編集例を含む大規模な人間の付注データセットと、編集システムの比較に使用できる標準化さ...

Original: arXiv:2604.16272v1 Announce Type: new Abstract: As AI-assisted video creation becomes increasingly practical, instruction-guided video editing has become essential for refining generated or captured ...

arxiv_cs_cv 2026/4/20

Hero-Mamba: 深海水素像Enhancementのためのママンベースの双領域学習

Hero-Mamba: Mamba-based Dual Domain Learning for Underwater Image Enhancement

arXiv:2604.16266v1 Announce Type: new Abstract: 水中画像は、光の吸収と散乱により色歪み、低コントラスト、そしてぼやけたディテールの著しい劣化を被ることがあります。CNN や Transformer などの学習ベースの手法は有望ですが、重要な限界に直面しています: CNN は不均一な劣化に必要な長距離依存関係をモデル化するのが苦手であり、Trans...

Original: arXiv:2604.16266v1 Announce Type: new Abstract: Underwater images often suffer from severe degradation, such as color distortion, low contrast, and blurred details, due to light absorption and scatte...

arxiv_cs_cv 2026/4/20

Vision-Language モデルにおけるモダリティの支配性を緩和するための情報ルーティング

Information Router for Mitigating Modality Dominance in Vision-Language Models

arXiv:2604.16264v1 Announce Type: new 要旨：ビジョン・ランゲージモデル（VLMs）は、多岐にわたるベンチマークで強力な性能を発揮していますが、予測が単一のモダリティに過度に依存する「モダリティの支配性」という課題に苦しんでいます。既往のアプローチは、主にモダリティの提供が十分であるという前提の下、モデルのアテンション配分を調整することでこの問題に対処しています...

Original: arXiv:2604.16264v1 Announce Type: new Abstract: Vision Language models (VLMs) have demonstrated strong performance across a wide range of benchmarks, yet they often suffer from modality dominance, wh...

arxiv_cs_cv 2026/4/20

ビジュアル・ラングauge モデルは本当にビジュアル推理を実行しているのか？——モーダルギャップへの厳密な検証

Do Vision-Language Models Truly Perform Vision Reasoning? A Rigorous Study of the Modality Gap

arXiv:2604.16256v1 Announce Type: new Abstract：ビジュアル・ラングauge モデル（VLM）における推理は、多様なダウンストリームタスクへの応用可能性の広さにより最近大きく注目を集めている。しかし、VLM の卓越した性能が、本質的に視覚に基づいた推理から生じるのか、それともテキストバックボーンの推理能力に依存しているのかは、まだ不明確である。これを体系...

Original: arXiv:2604.16256v1 Announce Type: new Abstract: Reasoning in vision-language models (VLMs) has recently attracted significant attention due to its broad applicability across diverse downstream tasks....

arxiv_cs_cv 2026/4/20

Vision-Language Models の失敗箇所は？画像ジオロケーションのための世界規模解析

Where Do Vision-Language Models Fail? World Scale Analysis for Image Geolocalization

arXiv:2604.16248v1 発表種別：新規要旨：画像ジオロケーションは従来、レトリバルベースの場所認識パイプラインや幾何学ベースのビジュアルローカリゼーションパイプラインを通じて対処されてきました。最近の Vision-Language Models (VLMs) の進歩は、マルチモーダルタスクにおいて強いゼロショット推論能力を示しましたが、地理的推論におけるその性能は十分に研究されて...

Original: arXiv:2604.16248v1 Announce Type: new Abstract: Image geolocalization has traditionally been addressed through retrieval-based place recognition or geometry-based visual localization pipelines. Recen...

arxiv_cs_cv 2026/4/20

Find, Fix, Reason: 動画推論のための文脈修復

Find, Fix, Reason: Context Repair for Video Reasoning

arXiv:2604.16243v1 Announce Type: new Abstract: 強化学習は大規模マルチモーダルモデルの動画推論を進歩させていますが、支配的なパイプラインは、モデルの知識の境界で停滞するオンポリシー自探究に依存するか、政策と政策を混在させる必要があり慎重な正規化を要求するハイブリッドリプレイに頼っています。動的コンテキスト手法は集中した証拠にズームインしますが、多くの...

Original: arXiv:2604.16243v1 Announce Type: new Abstract: Reinforcement learning has advanced video reasoning in large multi-modal models, yet dominant pipelines either rely on on-policy self-exploration, whic...

arxiv_cs_cv 2026/4/20

CollideNet: 解構化された階層的マルチスケールビデオ表現学習による衝突時間予測

CollideNet: Hierarchical Multi-scale Video Representation Learning with Disentanglement for Time-To-Collision Forecasting

arXiv:2604.16240v1 告知タイプ：新しい摘要：衝突時間（TTC）予測は、衝突防止において重要なタスクであり、ビデオに含まれる空間的・時間的な両方の局所的なパターンと全球的なパターンを捉えるために、正確な時間的予測が必要です。マルチスケールなビデオの特性に対処するため、我々は有効な TTC 予測に特化された新規の空間時間階層変形層ベースのアーキテクチャ CollideNet を提...

Original: arXiv:2604.16240v1 Announce Type: new Abstract: Time-to-Collision (TTC) forecasting is a critical task in collision prevention, requiring precise temporal prediction and comprehending both local and ...

arxiv_cs_cv 2026/4/20

ロバストな試験不正検出のための 2 段階の、オブジェクト中心なディープラーニングフレームワーク

A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection

arXiv:2604.16234v1 Announce Type: new 要約：学術的誠実性は、試験不正という恒常的な課題に直面している。従来の厳守は人間の観察を依存しており、それは非効率、高コスト、大規模においてエラーに苛む。ある既存の AI を活用した監視システムが導入され信頼を得ているにもかかわらず、多くのものは透明性を欠き、望ましいパフォーマンスを得るために複数階層のアーキテクチャを必...

Original: arXiv:2604.16234v1 Announce Type: new Abstract: Academic integrity continues to face the persistent challenge of examination cheating. Traditional invigilation relies on human observation, which is i...

arxiv_cs_cv 2026/4/20

YOLOv26 を用いたデンタルパノラマ放射線画像の分析：歯検出から疾病診断まで

Dental Panoramic Radiograph Analysis Using YOLO26 From Tooth Detection to Disease Diagnosis

arXiv:2604.16231v1 Announce Type: new 要約：パノラマ放射線撮影は、歯科において最低限の放射線暴露で全歯列の包括的な観察を提供する基本的な診断ツールです。しかし、手動での解釈は時間がかかり、特に高負荷な臨床現場では誤りの要因となります。これにより、効率的な自動化ソリューションへの強い需要が生まれています。本研究では、YOLOv26 を初めてパノラマ放射線画像にお...

Original: arXiv:2604.16231v1 Announce Type: new Abstract: Panoramic radiography is a fundamental diagnostic tool in dentistry, offering a comprehensive view of the entire dentition with minimal radiation expos...

arxiv_cs_cv 2026/4/20

GAViD：文脈感知グループ感情認識のための大規模マルチモーダルデータセット

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

arXiv:2604.16214v1 Announce Type: new Abstract: 複雑な環境における人間同士の相互作用をモデル化・分析するために、実世界の社会的システムにおける感情動態を理解することは基本的です。グループ感情は、人間同士の複雑に絡み合った相互作用、文脈的影響、および行動的シグナルから生じ、その定量的モデル化は計算社会学の難問です。しかし、文脈的および行動的可変性に形作...

Original: arXiv:2604.16214v1 Announce Type: new Abstract: Understanding affective dynamics in real-world social systems is fundamental to modeling and analyzing human-human interactions in complex environments...

arxiv_cs_cv 2026/4/20

AIFIND: 構造化要素感知による微細な対齐を可能にした增量フェージフォージェリ検出の解釈

AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection

arXiv:2604.16207v1 Announce Type: new アブストラクト：伪造方法が次々と出現しているため、增量フェージフォージェリ検出 (IFFD) は重要なパラダイムへと発展しました。ただし、既存の手法は通常、データリプレイや粗粒度二値的监督に基づいており、これは特徴空間を明示的に制約できず、深刻な特徴のドリフトおよび大規模な忘却を引き起こしています。これを解決するため、AI...

Original: arXiv:2604.16207v1 Announce Type: new Abstract: As forgery types continue to emerge consistently, Incremental Face Forgery Detection (IFFD) has become a crucial paradigm. However, existing methods ty...