12673 articles

arxiv_cs_cv 2026/2/10

FADE：スパース LoRA と自己 distillation を用いた選択的忘却

FADE: Selective Forgetting via Sparse LoRA and Self-Distillation

arXiv:2602.07058v1 発表型：new 要約：機械的忘却（Machine Unlearning）は、訓練されたモデルから特定のデータや概念の影響を除去しながらも全体の性能を維持する技術であり、データ保護規制や責任ある AI 実践においてますます要求される能力です。最新の進展にもかかわらず、テキストから画像を生成する拡散モデルにおける忘却はまだ課題が多く、高い計算コストと、効果的な忘却...

Original: arXiv:2602.07058v1 Announce Type: new Abstract: Machine Unlearning aims to remove the influence of specific data or concepts from trained models while preserving overall performance, a capability inc...

arxiv_cs_cv 2026/2/10

画像から決定へ：廃材中の非金属異物の推定のための支援型コンピュータービジョン

From Images to Decisions: Assistive Computer Vision for Non-Metallic Content Estimation in Scrap Metal

arXiv:2602.07062v1 Announce Type: new 要約：鋼冶プロセスにおける焼入れ品質は、エネルギー消費、大気排出、安全性を直接左右します。現在、非金属異物（汚染物）の有無は検査員によって視覚的に判断されていますが、これは主観性が高く、ホコリや移動する機械による危険性を伴うアプローチです。我々は、レール車荷役中の画像から汚染率（百分率）を推定し、廃材の種類を分類する支援型...

Original: arXiv:2602.07062v1 Announce Type: new Abstract: Scrap quality directly affects energy use, emissions, and safety in steelmaking. Today, the share of non-metallic inclusions (contamination) is judged ...

arxiv_cs_cv 2026/2/10

Omni モーダルアーキテクチャと物理データエンジンによる物理知性の発現を探る

Exploring Physical Intelligence Emergence via Omni-Modal Architecture and Physical Data Engine

arXiv:2602.07064v1 発表タイプ: 新しい投稿要旨: 物理的理解は、網羅的モーダル（omni-modal）モデルにおいて脆さを持続しており、その理由として、重要な物理的属性は視覚的に曖昧であり、かつウェブスケールなデータでは欠落しているためである。我々は、画像、音声、動画、テキストを統合的に理解し、音声生成と画像生成を内蔵したコンパクトな網羅的モーダルモデル「OmniFysics...

Original: arXiv:2602.07064v1 Announce Type: new Abstract: Physical understanding remains brittle in omni-modal models because key physical attributes are visually ambiguous and sparsely represented in web-scal...

arxiv_cs_cv 2026/2/10

Deep learning ベースのフレームワークを用いた接触なし連番画像から連続変位および機械的圧縮率の推定

Contactless estimation of continuum displacement and mechanical compressibility from image series using a deep learning based framework

arXiv:2602.07065v1 Announce Type: new 要約：光学的観察から物理媒体の機械的特性を非侵襲・接触なしで推定することは、直接物理測定が不可能なマニファンドエンジニアリングおよびバイオメディカル応用の分野において関心を集めています。従来の画像変位評価および非接触材料プローbing の手法は、時間のかかる反復アルゴリズムが非剛体画像登録および構成モデリングに使用され、...

Original: arXiv:2602.07065v1 Announce Type: new Abstract: Contactless and non-invasive estimation of mechanical properties of physical media from optical observations is of interest for manifold engineering an...

arxiv_cs_cv 2026/2/10

現実世界における画像超分解解像度のための双方向報酬導向拡散モデル Bird-SR

Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution

arXiv:2602.07069v1 Announce Type: new 摘要：拡散ベースの超分解解像度モデルは富な細部を合成できますが、合成された対応データで訓練されたモデルは、分布のシフトのために実際の低解像度 (LR) 画像で失敗する傾向があります。ここでは、合成 LR-HR 対と実際の LR 画像を同時に活用し、双方向の報酬導向拡散フレームワークである Bird-SR を提案します。超分...

Original: arXiv:2602.07069v1 Announce Type: new Abstract: Diffusion-based super-resolution can synthesize rich details, but models trained on synthetic paired data often fail on real-world LR images due to dis...

arxiv_cs_cv 2026/2/10

MosaicThinker: 具体 AI における機能的な空間思考のための反復的空間表現構築によるオンデバイス視覚的空間推論

MosaicThinker: On-Device Visual Spatial Reasoning for Embodied AI via Iterative Construction of Space Representation

arXiv:2602.07082v1 Announce Type: new 抽象: 具体 AI が従来の物体検出と認識から、より高度なロボット操作と作動計画へと拡張していくにつれ、ビデオ入力から視覚的空間推論を行うことは、物体の空間的関係を知覚し、デバイスの動作を導くために不可欠です。しかし、既存の視覚言語モデル（VLM）は、3D 空間情報に関する知識が不足しているため、空間推論能力が非常に弱く、...

Original: arXiv:2602.07082v1 Announce Type: new Abstract: When embodied AI is expanding from traditional object detection and recognition to more advanced tasks of robot manipulation and actuation planning, vi...

arxiv_cs_cv 2026/2/10

WorldEdit: 知識に基づいたベンチマークを用いたオープン・ワールドの画像編集に向けて

WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark

arXiv:2602.07095v1 発表タイプ：新要約：画像編集モデルの最近の進展は、属性操作、スタイル転送、ポーズ合成などの明確な指示を実行する際の変革的な能力を示しています。しかし、これらのモデルは、視覚的変化の原因を説明せずに結果を明示的に詳細に述べるような暗黙的な編集指示に対処する際に課題に直面することがあります。これらの制限は、既存のモデルが暗黙的な指示に必要な複雑な世界知識と推理...

Original: arXiv:2602.07095v1 Announce Type: new Abstract: Recent advances in image editing models have demonstrated remarkable capabilities in executing explicit instructions, such as attribute manipulation, s...

arxiv_cs_cv 2026/2/10

TLC-Plan: エンドツーエンドのベクターフLOORPLAN生成のための2段階コードブックベースなネットワーク

TLC-Plan: A Two-Level Codebook Based Network for End-to-End Vector Floorplan Generation

arXiv:2602.07100v1 発表タイプ：新規アブストラクト：自動化されたフLOORPLAN生成は、グローバルな空間組織と正確な幾何学的ディテールの統合的モデリングを通じて、設計品質、アーキテクチャ効率、そして持続可能性の向上を目指しています。しかし、既存のアプローチはラスター空間で動作し、後処理によるベクトライザーを依存しており、構造的な不整合を引き起こし、エンドツーエンドの学習を...

Original: arXiv:2602.07100v1 Announce Type: new Abstract: Automated floorplan generation aims to improve design quality, architectural efficiency, and sustainability by jointly modeling global spatial organiza...

arxiv_cs_cv 2026/2/10

森林における零画数の UAV 導航：リライト可能な 3D Gaussians Splatting を用いたアプローチ

Zero-Shot UAV Navigation in Forests via Relightable 3D Gaussian Splatting

arXiv:2602.07101v1 Announce Type: new Abstract: 構造化されていない屋外環境における UAV 導航を、受動的単眼ビジョンを用いて実現しようとする際、シミュレーションと現実との間に存在する顕著な視覚ドメインギャップが大きな課題となっています。3D Gaussian Splatting は、実世界データから写実的なシーン再構築を可能にするものの、既存の方...

Original: arXiv:2602.07101v1 Announce Type: new Abstract: UAV navigation in unstructured outdoor environments using passive monocular vision is hindered by the substantial visual domain gap between simulation ...

arxiv_cs_cv 2026/2/10

拡張された現実：3D 環境におけるプロンプトインジェクション

Extended to Reality: Prompt Injection in 3D Environments

arXiv:2602.07104v1 Announce Type: new 要約：マルチモーダル大規模言語モデル（MLLM）は、3D 環境における視覚入力の解釈と実行能力を向上させ、ロボットや状況に応じた対話エージェントなど多様なアプリケーションを可能にした。MLLM が物理世界のカメラ撮影映像を推理する際に、新たな攻撃対象領域が浮上し、攻撃者は環境中にテキストを備えた物理物体を配置することで M...

Original: arXiv:2602.07104v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have advanced the capabilities to interpret and act on visual input in 3D environments, empowering diverse app...

arxiv_cs_cv 2026/2/10

Ex-Omni: オムニモーダル大規模言語モデル向けの 3D 顔アニメーション生成を可能にする

Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models

arXiv:2602.07106v1 発表タイプ：新要旨：オムニモーダル大規模言語モデル (OLLM) は、マルチモーダルな理解と生成を統合する意図を持つものの、音声と 3D 顔アニメーションを組み合わせることは、自然なインタラクションの重要性にもかかわらず、ほとんど探索されていません。この課題は、LLM における離散的なトークンレベルの文義推論と、3D 顔動きに必要な密集した微細な時間的动态と...

Original: arXiv:2602.07106v1 Announce Type: new Abstract: Omni-modal large language models (OLLMs) aim to unify multimodal understanding and generation, yet incorporating speech with 3D facial animation remain...

arxiv_cs_cv 2026/2/10

画像データセットにおけるプライバシー：妊婦超音波画像の事例研究

Privacy in Image Datasets: A Case Study on Pregnancy Ultrasounds

arXiv:2602.07149v1 発表タイプ：新規要約：生成モデルの台頭に伴い、大規模なデータセットの使用がインターネット上の収集により増え、それらはしばしば最小限、あるいは全くのデータキュレーションなしで行われています。これは、敏感または個人情報が含まれているという懸念を招きます。本稿では、個人情報が含まれやすくインターネット上にしばしば共有される妊婦超音波画像の存在を探求します。LAIO...

Original: arXiv:2602.07149v1 Announce Type: new Abstract: The rise of generative models has led to increased use of large-scale datasets collected from the internet, often with minimal or no data curation. Thi...

arxiv_cs_cv 2026/2/10

DuMeta++: 多様な年齢をまたぐ一般化されたフューショット脳組織分割のための空間時間二重メタ学習

DuMeta++: Spatiotemporal Dual Meta-Learning for Generalizable Few-Shot Brain Tissue Segmentation Across Diverse Ages

arXiv:2602.07174v1 Announce Type: new 要旨: MRI スキャン画像から脳組織を正確に分割することは神経科学および臨床応用のために不可欠ですが、人間の全生を通じた一貫した性能達成は、脳の姿容と形態の動的で年齢関連の変化のため難しいままでいます。先ほどの研究では、自己教師あり正則化を用いた縦断データのパiredデータでこれらのシフトを緩和しようとしてきましたが、...

Original: arXiv:2602.07174v1 Announce Type: new Abstract: Accurate segmentation of brain tissues from MRI scans is critical for neuroscience and clinical applications, but achieving consistent performance acro...

arxiv_cs_cv 2026/2/10

Conditional Signal Selection Matters in Full-Head 3D GANs

Condition Matters in Full-head 3D GANs

arXiv:2602.07198v1 Announce Type: new Abstract: Condition signal selection is crucial for the stable training of full-head 3D GANs. Without an appropriate conditioning signal, the model suffers from s...

Original: arXiv:2602.07198v1 Announce Type: new Abstract: Conditioning is crucial for stable training of full-head 3D GANs. Without any conditioning signal, the model suffers from severe mode collapse, making ...

arxiv_cs_cv 2026/2/10

RoadSafe365 ベンチマークを用いた実世界での交通安全の理解

Understanding Real-World Traffic Safety through RoadSafe365 Benchmark

arXiv:2602.07212v1 発表タイプ: 新アブストラクト: 近年、交通に関するベンチマークが多モーダルデータ分析を進展させてきたが、公式な安全基準に整合した体系的な評価は一般的に欠如している。このギャップを埋めるために、我々は膨大で多様な実世界動画データコレクションに基づき、交通安全を微細に分析支援する大規模ビジョン・言語ベンチマークである RoadSafe365 を紹介する。以...

Original: arXiv:2602.07212v1 Announce Type: new Abstract: Although recent traffic benchmarks have advanced multimodal data analysis, they generally lack systematic evaluation aligned with official safety stand...

arxiv_cs_cv 2026/2/10

データ駆動型超解像法の向両刃の剣：敵対的超解像モデル

The Double-Edged Sword of Data-Driven Super-Resolution: Adversarial Super-Resolution Models

arXiv:2602.07251v1 発表タイプ：新しい要約: データ駆動型超解像（SR）手法は、分類や検出など Downsstream タスクの向上を目的として、常にもう一度処理パイプラインに組み込まれている。しかし、これらの SR モデルは以前にも知られなかった攻撃表面を導入している。本稿では、推論時に入力がアクセスされることなく、訓練中に SR モデルの重みに直接敵対的行為が埋め込まれるこ...

Original: arXiv:2602.07251v1 Announce Type: new Abstract: Data-driven super-resolution (SR) methods are often integrated into imaging pipelines as preprocessing steps to improve downstream tasks such as classi...

arxiv_cs_cv 2026/2/10

3D Transport-Based Morphometry (3D-TBM) による医療画像解析

3D Transport-based Morphometry (3D-TBM) for medical image analysis

arXiv:2602.07260v1 Announce Type: new Abstract: 運送ベースモルフォメトリー (TBM) は、3D 医療画像解析のための新しいフレームワークとして登場しました。画像を可逆変換を通じて運送領域に埋め込むことで、TBM は運送領域特徴を効率的に使用して分類、回帰などのタスクを可能にします。特に、逆写像是解析結果を元の画像空間に投影するのを可能にし、研究者が...

Original: arXiv:2602.07260v1 Announce Type: new Abstract: Transport-Based Morphometry (TBM) has emerged as a new framework for 3D medical image analysis. By embedding images into a transport domain via inverti...

arxiv_cs_cv 2026/2/10

TwistNet-2D: Spiral Twisting を用いた 2 次次元チャネル相互作用の学習によるテクスチャ認識

TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition

arXiv:2602.07262v1 Announce Type: new 要旨：2 次特性統計量はテクスチャ認識に不可欠です。しかし、現在の手法には基本的な矛盾が存在します。双線形プーリングおよびグラム行列は全体的なチャネル相関を捉えられますが、空間構造を崩壊させます。一方、自己注意モデルは重み付け合積を通じて空間的文脈を扱います。しかし、明示的なペアごとの特性相互作用ではありません。本稿では、...

Original: arXiv:2602.07262v1 Announce Type: new Abstract: Second-order feature statistics are central to texture recognition, yet current methods face a fundamental tension: bilinear pooling and Gram matrices ...

arxiv_cs_cv 2026/2/10

VideoNeuMat: 生成モデルから神経素材の抽出

VideoNeuMat: Neural Material Extraction from Generative Video Models

arXiv:2602.07272v1 発表タイプ: 新しいアブストラクト：3 次元レンダリング用の写実的な素材を作成するには、非凡な芸術的スキルが必要です。素材のための生成モデルが助けを貸すことはできつつありますが、現在は高品質なトレーニングデータの欠如によって制限されています。最新のビデオ生成モデルは、実際の素材の外観を容易に生成しますが、その知識は幾何学と照明とに密接に絡みついており、分離で...

Original: arXiv:2602.07272v1 Announce Type: new Abstract: Creating photorealistic materials for 3D rendering requires exceptional artistic skill. Generative models for materials could help, but are currently l...

arxiv_cs_cv 2026/2/10

クロスビューワールドモデル

Cross-View World Models

arXiv:2602.07277v1 発表タイプ：新規要約：ワールドモデルは、未来の状態を想像することでエージェントの計画を可能にしますが、既存のアプローチは通常エゴцентриックな単一点から動作しており、他の視点から計画が容易になる場合であっても困難です。例えば、ナビゲーションは上空の視点から大きく利益を受けます。我々は、クロスビュー予測の目的で訓練されたクロスビューワールドモデル（XV...

Original: arXiv:2602.07277v1 Announce Type: new Abstract: World models enable agents to plan by imagining future states, but existing approaches operate from a single viewpoint, typically egocentric, even when...