arxiv_cs_lg 2026年2月10日

解像可能なマルチタスク類似度測定：累積局所効果と加重されたフレッチャー距離の統合

An Explainable Multi-Task Similarity Measure: Integrating Accumulated Local Effects and Weighted Fr\'echet Distance

Translated: 2026/3/15 14:49:42

multi-task-learningexplainable-aisimilaritiesfréchet-distanceaccumulated-local-effects

Japanese Translation

arXiv:2602.07966v1 Announce Type: new 摘出：機械学習の多くの文脈において、タスクは常に関係する構成要素として扱われ、その間に知識を転移させることを目的としています。これはマルチタスク学習（MTL）の中心的な目標です。したがって、このマルチタスクシナリオでは、どのタスクが類似しているか、そしてどのように、なぜ類似性を示すかという重要な質問への回答が必要です。本稿では、解像可能人工知能（XAI）の技術を特に累積局所効果（ALE）曲線に基づいて、マルチタスク類似度測定を提案します。 ALE 曲線は加重されたデータ分布を用いたフレッチャー距離によって比較され、結果とする類似度測定は各機能の重要性を取り込むように設計されています。この測定は単一タスク学習シナリオ（各タスクが個別に訓練される場合）と、マルチタスク学習シナリオ（すべてのタスクが同時に学習される場合）の両方で適用可能です。また、この測定はモデル非依存であり、異なる機械学習モデルをタスク間で使用することを可能にします。予測性能のタスク間の変動を考慮するために、スケール因子を導入し、複雑なシナリオでこの測定を適用するためのいくつかの推奨事項を提供します。我々は、4 つのデータセットを用いてこの測定を検証しました。データセットの 1 つは合成データセットであり、残り 3 つは実世界データセットです。実世界データセットには、表形式で構成された有名なパーキンスンデータセットとバイク共有利用データセットが含まれており、これはマルチタスク学習設定における概念ボトルネックエンコーダーの適用を評価するために使用されるセレブ A データセットもあります。これらの結果は、表形式および非表形式のデータにおいてタスク類似性の直感的な予想と一致することを示し、タスク間の関係を探査するための有用なツールとなり、情報に基づいた意思決定を支えることができることを示しています。

Original Content

arXiv:2602.07966v1 Announce Type: new Abstract: In many machine learning contexts, tasks are often treated as interconnected components with the goal of leveraging knowledge transfer between them, which is the central aim of Multi-Task Learning (MTL). Consequently, this multi-task scenario requires addressing critical questions: which tasks are similar, and how and why do they exhibit similarity? In this work, we propose a multi-task similarity measure based on Explainable Artificial Intelligence (XAI) techniques, specifically Accumulated Local Effects (ALE) curves. ALE curves are compared using the Fr\'echet distance, weighted by the data distribution, and the resulting similarity measure incorporates the importance of each feature. The measure is applicable in both single-task learning scenarios, where each task is trained separately, and multi-task learning scenarios, where all tasks are learned simultaneously. The measure is model-agnostic, allowing the use of different machine learning models across tasks. A scaling factor is introduced to account for differences in predictive performance across tasks, and several recommendations are provided for applying the measure in complex scenarios. We validate this measure using four datasets, one synthetic dataset and three real-world datasets. The real-world datasets include a well-known Parkinson's dataset and a bike-sharing usage dataset -- both structured in tabular format -- as well as the CelebA dataset, which is used to evaluate the application of concept bottleneck encoders in a multitask learning setting. The results demonstrate that the measure aligns with intuitive expectations of task similarity across both tabular and non-tabular data, making it a valuable tool for exploring relationships between tasks and supporting informed decision-making.