arxiv_cs_ai 2026年4月20日

AISysRev -- 論文タイトル・要約のスクリーニングを支援する LLM ベースのツール

AISysRev -- LLM-based Tool for Title-abstract Screening

Translated: 2026/4/20 11:17:53

systematic-reviewslarge-language-modelsscreeningarxiv-toolsai-assisted-research

Japanese Translation

arXiv:2510.06708v3 Announce Type: replace-cross アブストラクト：系統的レビューを行うことは手作業が重くなります。スクリーニングや研究選択の段階では、論文数が増大し、対処しにくくなることがあります。最近の研究では、大規模言語モデル（LLM）がタイトル・要約のスクリーニングを行うことができ、人類のタスクをサポートできることが示されています。この目的のために、私たちは AISysRev という LLM ベースのスクリーニングツールを開発しました。このツールはコンテナ化された Web アプリケーションとして実装されています。CSV ファイルを介して、論文のタイトルと要約を受け付けます。ユーザーは包含基準と排除基準を指定できます。Gemini、Claude、Mistral、OpenRouter を介した ChatGPT などの複数の異なる LLM を使用できます。さらに、ローカルホストモデルや OpenAI SDK 互換のどのモデルもサポートしています。AISysRev はゼロショット・プロンプトとフューショット・プロンプトの両方を実装しており、LLM の結果をレビューラーへのガイダンスとして表示するインターフェースを介して、手動スクリーニングも可能です。LLM 呼び出しは並列処理され、スクリーニング速度はモデルとホストの条件により通常 1 分間に 100〜300 論文となります。本ツールの実際の使用を証明するために、私たちは 137 件の論文を対象とした定性的な実証実験を行いました。我々の見解によれば、論文は容易包含、容易排除、境界包含、境界排除の 4 つのカテゴリに分類されます。LLM が誤りを起こしやすくなる境界ケースは、人的介入の必要性を浮き彫りにしています。LLM は系統的レビューにおける人間の判断を置き換えるわけではありませんが、大量の科学的文献を評価する負担を軽減できます。動画：https://www.youtube.com/watch?v=HeblemlgnAQ ツール：https://github.com/EvoTestOps/AISysRev

Original Content

arXiv:2510.06708v3 Announce Type: replace-cross Abstract: Conducting systematic reviews is laborious. In the screening or study selection phase, the number of papers can be overwhelming. Recent research has demonstrated that large language models (LLMs) can perform title-abstract screening and support humans in the task. To this end, we developed AISysRev, an LLM-based screening tool implemented as a containerized web application. The tool accepts CSV files containing paper titles and abstracts. Users specify inclusion and exclusion criteria. Multiple different LLMs can be used, such as Gemini, Claude, Mistral or ChatGPT via OpenRouter. We also support locally hosted models and any model compatible with the OpenAI SDK. AISysRev implements both zero-shot and few-shot prompting, and also allows for manual screening through interfaces that display LLM results as guidance for human reviewers. LLM calls are parallelized, meaning screening speed is typically between 100 to 300 papers per minute, depending on the model and the host. To demonstrate the tool's use in practice, we conducted a qualitative trial study with 137 papers using the tool. Our findings indicate that papers can be classified into four categories: Easy Includes, Easy Excludes, Boundary Includes, and Boundary Excludes. The Boundary cases, where LLMs are prone to errors, highlight the need for human intervention. While LLMs do not replace human judgment in systematic reviews, they can reduce the burden of assessing large volumes of scientific literature. Video: https://www.youtube.com/watch?v=HeblemlgnAQ Tool: https://github.com/EvoTestOps/AISysRev