arxiv_cs_ai 2026年2月10日

Tutti: ユニバーサルフレームワークによる構造化多声生成と歌唱素材モデルを用いた表現豊かな複数歌者合成

Tutti: Expressive Multi-Singer Synthesis via Structure-Level Timbre Control and Vocal Texture Modeling

Translated: 2026/3/7 13:26:08

music-synthesismulti-voice-generationtimbral-control

Japanese Translation

現在の音楽的な声の合成システムは、多くのソロパフォーマンスに高品質なフィードバックを提供していますが、全体的なテンメリコントロールに制約され、単一の曲内で多声構造と歌い方に加えて静観を捉えることはできません。これを解決するため、統合フレームワークとしてTuttiを提案しました。特に、楽曲の構造に対して柔軟な歌手スケジューリングを可能にするStucture-Aware Singer Promptを導入し、コンディショナルガイド付きVAEの使用により、主観的な音響素材（例えば、空間的リバverbとスペクトラル融合）を捕らえることを提案しました。実験では、Tuttiは異なる多声スケジューリングと合唱生成における高いリアルさに優れています。この新しい方法は複雑な多声構成の新たなモデルとなっています。アポケッセン・オームリンクより、https://annoauth123-ctrl.github.io/Tutii_Demo/ にて音声サブルを視聴可能です。

Original Content

arXiv:2602.08233v1 Announce Type: cross Abstract: While existing Singing Voice Synthesis systems achieve high-fidelity solo performances, they are constrained by global timbre control, failing to address dynamic multi-singer arrangement and vocal texture within a single song. To address this, we propose Tutti, a unified framework designed for structured multi-singer generation. Specifically, we introduce a Structure-Aware Singer Prompt to enable flexible singer scheduling evolving with musical structure, and propose Complementary Texture Learning via Condition-Guided VAE to capture implicit acoustic textures (e.g., spatial reverberation and spectral fusion) that are complementary to explicit controls. Experiments demonstrate that Tutti excels in precise multi-singer scheduling and significantly enhances the acoustic realism of choral generation, offering a novel paradigm for complex multi-singer arrangement. Audio samples are available at https://annoauth123-ctrl.github.io/Tutii_Demo/.