arxiv_cs_lg 2026年2月10日

Spherical Steering: Geometry-Aware Activation Rotation for Language Models

Translated: 2026/3/15 15:02:42

language-modelsinference-steeringactivation-rotationgeometric-consistencyneural-architecture-search

Japanese Translation

arXiv:2602.08169v1 発表タイプ：new 要約：推論時のステアリングは、リトレーニングのコストをかけずに言語モデル（LM）を制御するための有望なパラダイムとして登場しました。しかし、標準的なアプローチは通常、非代数的な操作であるアクティベーションの加算に依存しており、これは必須的にハIDDEN レプリケーションの大きさを変化させる幾何学的な操作です。これは、レプリケーションの崩壊やオープンエンド生成能力の劣化の懸念を生み出します。本稿では、このトレードオフを解決するアクティベーション回転を用いたトレーニングフリーなプリマティブである「Spherical Steering」を探求します。固定されたベクトルでアクティベーションをシフトさせるのではなく、我々の方法はターゲット方向に沿って法線曲線上にアクティベーションを回転させ、アクティベーションをターゲットコンセプトへと導きつつ、シグナルの整合性を保つように導きます。さらに適応性を高めるために、入力の不確実性に基づいてステアリングの強さを動的に調節するコンフィデンスゲートを導入しました。複数の選択ベンチマークにわたる広範な実験は、Spherical Steering が加算ベースのベースラインを著しく凌駕することを示しています（特に、TruthfulQA, COPA, Storycloze で +10% の改善）。同時に、モデルの一般開Ended 生成品質を維持しました。この研究は、幾何学的整合性の価値を浮き彫りにし、ノルム保持回転が精密な推論時制御のための堅牢で効果的なプリマティブであることを示唆しています。

Original Content

arXiv:2602.08169v1 Announce Type: new Abstract: Inference-time steering has emerged as a promising paradigm for controlling language models (LMs) without the cost of retraining. However, standard approaches typically rely on activation addition, a geometric operation that inevitably alters the magnitude of hidden representations. This raises concerns about representation collapse and degradation of open-ended generation capabilities. In this work, we explore Spherical Steering, a training-free primitive that resolves this trade-off through activation rotation. Rather than shifting activations with a fixed vector, our method rotates them along a geodesic toward a target direction, guiding the activation toward the target concept while preserving the integrity of the signal. To further enhance adaptivity, we incorporate a confidence gate that dynamically modulates steering strength based on input uncertainty. Extensive experiments across multiple-choice benchmarks demonstrate that Spherical Steering significantly outperforms addition-based baselines (notably by +10% on TruthfulQA, COPA, and Storycloze), while simultaneously maintaining the model's general open-ended generation quality. This work highlights the value of geometric consistency, suggesting that norm-preserving rotation is a robust and effective primitive for precise inference-time control.