arxiv_cs_ai 2026年4月24日

mcdok の SemEval-2026 タスク 13 参画: マシン生成されたコードの検出に LLM をフィートニング

mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code

Translated: 2026/4/24 20:26:15

semEvalcode-detectionllm-finetuningmachine-generated-codearxiv

Japanese Translation

arXiv:2604.21365v1 Announce Type: cross Abstract: 様々なプログラミング言語における、マシン生成されたコードスニペットの多分野検出は困難な課題です。SemEval-2026 タスク 13 は、この課題に対して二重分類検出问题および生成源の归属という複数の角度で対応しています。具体的には、サブタスクには生成モデル LLM ファミリの検出に加え、人間と機械が共同で生成されたコード、あるいはその起源を隠蔽するために対策された対抗的な修正コードの検出も含まれています。我々が提出したシステムは、これらの特定の問題に対して、よりコード理解に適した基盤モデルを探索することで、既存の mdok 方法（マシン生成テキスト検出に焦点を当てた）を適応させています。結果は、我々の提出システムがすべての 3 つのサブタスクにおいて競争力があると示唆しています。しかしながら、最優システムとのマージンは依然として大きく、さらなる改善の可能性が存在します。

Original Content

arXiv:2604.21365v1 Announce Type: cross Abstract: Multi-domain detection of the machine-generated code snippets in various programming languages is a challenging task. SemEval-2026 Task~13 copes with this challenge in various angles, as a binary detection problem as well as attribution of the source. Specifically, its subtasks also cover generator LLM family detection, as well as a hybrid code co-generated by humans and machines, or adversarially modified codes hiding its origin. Our submitted systems adjusted the existing mdok approach (focused on machine-generated text detection) to these specific kinds of problems by exploring various base models, more suitable for code understanding. The results indicate that the submitted systems are competitive in all three subtasks. However, the margins from the top-performing systems are significant, and thus further improvements are possible.