4037 articles

arxiv_cs_cv 2026/2/10

3D Transport-Based Morphometry (3D-TBM) による医療画像解析

3D Transport-based Morphometry (3D-TBM) for medical image analysis

arXiv:2602.07260v1 Announce Type: new Abstract: 運送ベースモルフォメトリー (TBM) は、3D 医療画像解析のための新しいフレームワークとして登場しました。画像を可逆変換を通じて運送領域に埋め込むことで、TBM は運送領域特徴を効率的に使用して分類、回帰などのタスクを可能にします。特に、逆写像是解析結果を元の画像空間に投影するのを可能にし、研究者が...

Original: arXiv:2602.07260v1 Announce Type: new Abstract: Transport-Based Morphometry (TBM) has emerged as a new framework for 3D medical image analysis. By embedding images into a transport domain via inverti...

arxiv_cs_cv 2026/2/10

TwistNet-2D: Spiral Twisting を用いた 2 次次元チャネル相互作用の学習によるテクスチャ認識

TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition

arXiv:2602.07262v1 Announce Type: new 要旨：2 次特性統計量はテクスチャ認識に不可欠です。しかし、現在の手法には基本的な矛盾が存在します。双線形プーリングおよびグラム行列は全体的なチャネル相関を捉えられますが、空間構造を崩壊させます。一方、自己注意モデルは重み付け合積を通じて空間的文脈を扱います。しかし、明示的なペアごとの特性相互作用ではありません。本稿では、...

Original: arXiv:2602.07262v1 Announce Type: new Abstract: Second-order feature statistics are central to texture recognition, yet current methods face a fundamental tension: bilinear pooling and Gram matrices ...

arxiv_cs_cv 2026/2/10

VideoNeuMat: 生成モデルから神経素材の抽出

VideoNeuMat: Neural Material Extraction from Generative Video Models

arXiv:2602.07272v1 発表タイプ: 新しいアブストラクト：3 次元レンダリング用の写実的な素材を作成するには、非凡な芸術的スキルが必要です。素材のための生成モデルが助けを貸すことはできつつありますが、現在は高品質なトレーニングデータの欠如によって制限されています。最新のビデオ生成モデルは、実際の素材の外観を容易に生成しますが、その知識は幾何学と照明とに密接に絡みついており、分離で...

Original: arXiv:2602.07272v1 Announce Type: new Abstract: Creating photorealistic materials for 3D rendering requires exceptional artistic skill. Generative models for materials could help, but are currently l...

arxiv_cs_cv 2026/2/10

クロスビューワールドモデル

Cross-View World Models

arXiv:2602.07277v1 発表タイプ：新規要約：ワールドモデルは、未来の状態を想像することでエージェントの計画を可能にしますが、既存のアプローチは通常エゴцентриックな単一点から動作しており、他の視点から計画が容易になる場合であっても困難です。例えば、ナビゲーションは上空の視点から大きく利益を受けます。我々は、クロスビュー予測の目的で訓練されたクロスビューワールドモデル（XV...

Original: arXiv:2602.07277v1 Announce Type: new Abstract: World models enable agents to plan by imagining future states, but existing approaches operate from a single viewpoint, typically egocentric, even when...

arxiv_cs_cv 2026/2/10

Attention 機構を用いた糖網膜症病変セグメンテーション

Diabetic Retinopathy Lesion Segmentation through Attention Mechanisms

arXiv:2602.07301v1 Announce Type: new 抽象：糖尿病性網膜症（DR）は糖尿病による視覚障害や失明を引き起こす眼病です。不可逆的な視覚障害を防ぐためには、画期的なスクリーニングを通じて早期発見が重要です。研究者は多数の深層学習に基づく自動化アルゴリズムを DR スクリーニングの開発してこましたが、その臨床的な適用性は特に病変セグメンテーションに限界があり続いていま...

Original: arXiv:2602.07301v1 Announce Type: new Abstract: Diabetic Retinopathy (DR) is an eye disease which arises due to diabetes mellitus. It might cause vision loss and blindness. To prevent irreversible vi...

arxiv_cs_cv 2026/2/10

画像処理における線形遺伝的プログラミングを用いた析出物セグメンテーションの最適化

Optimization of Precipitate Segmentation Through Linear Genetic Programming of Image Processing

arXiv:2602.07310v1 Announce Type: new Abstract: 現在の加法的製造されたニオブ系銅合金の解析は、顕微写真に存在するコントラストの違い、ノイズ、画像アートのため手動アノテーションに依存しており、合金開発における反復速度が遅くなっています。私たちが、各種の画像アートを考慮し最適化された線形遺伝的プログラミング（LGP）を用いて、FIB断面顕微写真における析...

Original: arXiv:2602.07310v1 Announce Type: new Abstract: Current analysis of additive manufactured niobium-based copper alloys relies on hand annotation due to varying contrast, noise, and image artifacts pre...

arxiv_cs_cv 2026/2/10

LUCID-SAE: 可解釈な概念発見のための統合的なビジョン・ランゲージスパースコード学習

LUCID-SAE: Learning Unified Vision-Language Sparse Codes for Interpretable Concept Discovery

arXiv:2602.07311v1 発表タイプ：新しい要約：スパースオートエンコーダー（SAE）は、異なる表現空間間で比較可能な説明を提供する自然な経路を開示します。しかし、現在の SAE は各模態ごとに訓練され、その辞書の機能は直接理解不能であり、説明はドメインを超えて移転できません。本研究では、画像パッチおよびテキストトークンの表現に対する共有潜在辞書を実際に学習しつつ、模態固有の詳細用...

Original: arXiv:2602.07311v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) offer a natural path toward comparable explanations across different representation spaces. However, current SAEs are traine...

arxiv_cs_cv 2026/2/10

言葉を通じて道路を見る：RGB-T 運転シーンのセグメンテーションのための言語誘導型枠組み

Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation

arXiv:2602.07343v1 Announce Type: new 要約：過酷な照明、照明、あるいは影の条件下における道路シーンの頑健なセマンティックセグメンテーションは、自動運転アプリケーションにおける核心的な課題です。RGB-サーマル融合は標準的なアプローチであるものの、既存の手法はすべての条件に対して統一された静的な融合戦略を適用しており、これは各モーダル特有のノイズがネットワーク...

Original: arXiv:2602.07343v1 Announce Type: new Abstract: Robust semantic segmentation of road scenes under adverse illumination, lighting, and shadow conditions remain a core challenge for autonomous driving ...

arxiv_cs_cv 2026/2/10

適応型マッティングディスティルによる少段階生成の最適化

Optimizing Few-Step Generation with Adaptive Matching Distillation

arXiv:2602.07345v1 Announce Type: new 抽象: ディストリビューションマッチングディスティル（DMD）は強力な加速パラダイムですが、その安定性はしばしば「禁地」と呼ばれる領域で損なわれます。禁地とは、現実の教師モデルが信頼性の低いガイドを、そして偽の教師モデルが十分な排斥力を発しない領域を指します。本稿では、これらの腐敗された領域を回避する暗黙的な戦略として既存...

Original: arXiv:2602.07345v1 Announce Type: new Abstract: Distribution Matching Distillation (DMD) is a powerful acceleration paradigm, yet its stability is often compromised in Forbidden Zone, regions where t...

arxiv_cs_cv 2026/2/10

行・列分離型注意ベースの低照度画像・映像向上装置

Row-Column Separated Attention Based Low-Light Image/Video Enhancement

arXiv:2602.07428v1 発表形式：新しい要約：U-Net 構造は、低照度画像・映像向上に広く用いられている。向上された画像は、適切なグローバル情報の誘導がなされていない場合に、大幅な局所的ノイズの発生や詳細の欠損といった結果となる。注意機構は、より効果的にグローバル情報を注視し活用できる。しかし、画像への注意の適用はパラメータ数および計算量を著しく増加させる可能性がある。我々は、向...

Original: arXiv:2602.07428v1 Announce Type: new Abstract: U-Net structure is widely used for low-light image/video enhancement. The enhanced images result in areas with large local noise and loss of more detai...

arxiv_cs_cv 2026/2/10

不完了深度マップと表面法線の視点認識融合法による精度の高い 3D 再構築

Perspective-aware fusion of incomplete depth maps and surface normals for accurate 3D reconstruction

arXiv:2602.07444v1 Announce Type: new 要約: 私たちは、単一の視点カメラベースのセンサシステムによって取得された深度マップと表面法線マップから 3D 表面を再構築する課題に取り組んでいます。深度マップと法線マップは、それぞれ、構造化光スキャンとフォトメトリック立体視などの手法によって入手できます。私たちは、既存の直方体勾配ベースの深度 - 法線融合法を拡張し、...

Original: arXiv:2602.07444v1 Announce Type: new Abstract: We address the problem of reconstructing 3D surfaces from depth and surface normal maps acquired by a sensor system based on a single perspective camer...

arxiv_cs_cv 2026/2/10

PTB-XL-Image-17K: 大規模な合成 ECG イメージデータセットと、深層学習ベースの数字化化に不可欠な包括的な真の信号データ

PTB-XL-Image-17K: A Large-Scale Synthetic ECG Image Dataset with Comprehensive Ground Truth for Deep Learning-Based Digitization

arXiv:2602.07446v1 Announce Type: new 要旨: 電子心電図（ECG）の数字化化（紙製のまたはスキャンした ECG イメージを時系列信号に戻す作業）は、現代の深層学習応用に数十年にわたるレガシー臨床データを活用する上で不可欠です。ただし、ECG イメージとその対応する包括的なアノテーション付きの真の信号を両方提供する大規模なデータセットの不足により、その進展が妨げ...

Original: arXiv:2602.07446v1 Announce Type: new Abstract: Electrocardiogram (ECG) digitization-converting paper-based or scanned ECG images back into time-series signals-is critical for leveraging decades of l...

arxiv_cs_cv 2026/2/10

SoulX-FlashHead: オラクル導向による無限のリアルタイムストリーミングTalking Heads の生成

SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads

arXiv:2602.07449v1 Announce Type: new Abstract: 高解像度の画像品質と低遅延ストリーミングとのバランスを保つことは、音声驱动的ポートレート生成において大きな課題です。既存の大規模モデルは計算コストが著しく高く、軽量な代替方案はまた、顔の全体表現や時系列安定性を犠牲にしています。本論文では、リアルタイム・無限長・高画質ストリーミングビデオ生成を設計した ...

Original: arXiv:2602.07449v1 Announce Type: new Abstract: Achieving a balance between high-fidelity visual quality and low-latency streaming remains a formidable challenge in audio-driven portrait generation. ...

arxiv_cs_cv 2026/2/10

SpatialReward: 明示的な空間推論によるオンライン RL における画像編集での認識ギャップの架橋

SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning

arXiv:2602.07458v1 Announce Type: new オンライン強化学習（RL）は複雑な画像編集に対して有望な道を開きつつありますが、現在、信頼性と詳細なリワードシグナルの希少さに制約されています。既存のエバリュエーターは、クロス画像比較の忘却や詳細な特徴のキャプチャ不全という、私たちが「Attention Collapse（注意崩壊）」と呼びる重要な認識ギャップに直面する...

Original: arXiv:2602.07458v1 Announce Type: new Abstract: Online Reinforcement Learning (RL) offers a promising avenue for complex image editing but is currently constrained by the scarcity of reliable and fin...

arxiv_cs_cv 2026/2/10

GlobalWasteData: 頑健な廃棄物分類と環境監視のための大規模統合データセット

GlobalWasteData: A Large-Scale, Integrated Dataset for Robust Waste Classification and Environmental Monitoring

arXiv:2602.07463v1 Announce Type: new Abstract: 廃棄物の増加は、多種多様な廃棄物に対して効率的な分別技術を必要とする環境問題です。この目的のためには、自動化された廃棄物分類システムが用いられています。これらの人工知能（AI）モデルの効果は、分類アルゴリズムのトレーニングと解析の基盤となる公共データセットの品質とアクセス可能性に依存しています。いくつ...

Original: arXiv:2602.07463v1 Announce Type: new Abstract: The growing amount of waste is a problem for the environment that requires efficient sorting techniques for various kinds of waste. An automated waste ...

arxiv_cs_cv 2026/2/10

学習された ddometry と Gaussian Splatting を活用した熱検出器による熱 odometry と高密度マッピング

Thermal odometry and dense mapping using learned ddometry and Gaussian splatting

arXiv:2602.07493v1 Announce Type: new Abstract: 煙粒子より波長の長い波長を持つ赤外線熱センサは、暗さ、埃、そして煙にも関与せず画像を捕捉できる。この頑健さは、ロボットにおける運動推定と環境認識において、特に悪条件でますます価値を提供している。しかしながら、既存の熱 odometry とマッピング手法は主に幾何学的なものであり、多様なデータセット間で...

Original: arXiv:2602.07493v1 Announce Type: new Abstract: Thermal infrared sensors, with wavelengths longer than smoke particles, can capture imagery independent of darkness, dust, and smoke. This robustness h...

arxiv_cs_cv 2026/2/10

階層的視覚埋め込みを用いた脳表現の学習

Learning Brain Representation with Hierarchical Visual Embeddings

arXiv:2602.07495v1 Announce Type: new Abstract: 脳信号から視覚表現を解読する技術は、神経科学および人工知能の両分野で大きな注目を集めています。しかし、脳信号がどれだけ真に視覚情報を符号化しているかはまだ不明確です。現在の視覚解読アプローチは多様な脳画像一致戦略を探求していますが、大半は上位のセマンティック特性に焦点を当てており、ピクセルレベルの詳細...

Original: arXiv:2602.07495v1 Announce Type: new Abstract: Decoding visual representations from brain signals has attracted significant attention in both neuroscience and artificial intelligence. However, the d...

arxiv_cs_cv 2026/2/10

IM-Animation: 構造化されていない動作的表現を用いた同一性解離型キャラクターアニメーション

IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation

arXiv:2602.07498v1 発表型：新しい要旨: 最近、動画拡散モデルにおける進歩は、静止画像を駆動動画に基いてアニメーション化することにより、動きのある動画を生成するキャラクターアニメーションを著しく前進させました。明示的アプローチは、スケルトン、DWPose、その他の明示的な構造化シグナルを用いて動작を表すものの、空間的な不整合や変化する体型のスケールに対応するに困難な課題を抱えて...

Original: arXiv:2602.07498v1 Announce Type: new Abstract: Recent progress in video diffusion models has markedly advanced character animation, which synthesizes motioned videos by animating a static identity i...

arxiv_cs_cv 2026/2/10

UAV 物体検出のための境界ボックス変換を用いた適応画像ズームイン

Adaptive Image Zoom-in with Bounding Box Transformation for UAV Object Detection

arXiv:2602.07512v1 発表タイプ：新規抽出: UAV で撮影された画像から物体を検出するのは、物体が小さいという点に課題を抱えています。本研究では、UAV 画像における物体検出のための単純かつ効率的な適応ズームインフレームワークを探求します。主要な動機は、一般のシーン画像と比較して、前景の物体はより小さく分散しており、これが効果的な物体検出器の最適化を妨げる点です。したがって、我...

Original: arXiv:2602.07512v1 Announce Type: new Abstract: Detecting objects from UAV-captured images is challenging due to the small object size. In this work, a simple and efficient adaptive zoom-in framework...

arxiv_cs_cv 2026/2/10

CA-YOLO：生物ミメティックな局部化のためにクロス注意力機能を強化した YOLO

CA-YOLO: Cross Attention Empowered YOLO for Biomimetic Localization

arXiv:2602.07523v1 Announce Type: new 要旨：現代の複雑な環境において、正確かつ効率的な対象局部化は多数の分野で不可欠となっています。しかし、既存のシステムでは精度や小さな対象の識別能力に制約が存在します。本研究では、CA-YOLO に基づいた生物学的に安定した局部化システムを提案し、対象局部化精度と小さな対象の識別能力の両方を向上させます。このシステムの「脳」...

Original: arXiv:2602.07523v1 Announce Type: new Abstract: In modern complex environments, achieving accurate and efficient target localization is essential in numerous fields. However, existing systems often f...