arxiv_cs_ai 2026年2月10日

20問ゲームとポリシーベースの再構造化強化学習

Playing 20 Question Game with Policy-Based Reinforcement Learning

Translated: 2026/2/14 7:08:28

Japanese Translation

20問ゲーム（Q20）は、論理的思考と創造力を奨励する有名なゲームです。このゲームにおいて、回答者は最初に知られる人物や種類の動物を含む物体を考えます。その後、質問者は20問以内でその物体を推確し、勝利しようとします。もしQ20システムについて使用されるユーザーが回答者であり、内部システムが質問者で、選択した質問による正しく物体を特定する最適な戦略が必要となっています。しかしながら、ゲーム環境の複雑性と多様性により最適な選択戦略は従来より難しいです。この論文ではポリシーベースの再構造化強化学習(RL)を提案し、その最適な質問戦略を通じて解決策が生成可能となっています。効果性を高めるため、情報量の高い報酬ネットワークも提案されています。他の方法と比較することで、当論文では私たちのRL法はノイズを軽視できない特性を持っており、知識バイオグラフィーに基づかないのが特徴です。実験結果によると、当プログラムはエンターテイメントシステムに匹敵するまで競り勝っています。

Original Content

arXiv:1808.07645v4 Announce Type: replace-cross Abstract: The 20 Questions (Q20) game is a well known game which encourages deductive reasoning and creativity. In the game, the answerer first thinks of an object such as a famous person or a kind of animal. Then the questioner tries to guess the object by asking 20 questions. In a Q20 game system, the user is considered as the answerer while the system itself acts as the questioner which requires a good strategy of question selection to figure out the correct object and win the game. However, the optimal policy of question selection is hard to be derived due to the complexity and volatility of the game environment. In this paper, we propose a novel policy-based Reinforcement Learning (RL) method, which enables the questioner agent to learn the optimal policy of question selection through continuous interactions with users. To facilitate training, we also propose to use a reward network to estimate the more informative reward. Compared to previous methods, our RL method is robust to noisy answers and does not rely on the Knowledge Base of objects. Experimental results show that our RL method clearly outperforms an entropy-based engineering system and has competitive performance in a noisy-free simulation environment.