dev_to 2026年3月15日

私の AI エージェントが上達する正確なプロンプト ( Before/After )

The Exact Prompts That Make My AI Agents Not Suck (Before/After)

Translated: 2026/3/15 19:00:42

Japanese Translation

最初には、フィリピン系起業家であり 11 つのビジネスを AI エージェントで運営している CEO の週報として『The $200/Month CEO newsletter』に掲載されました。毎回の AI エージェント 8 人を経営チームとして運用するという記事において、最初の質問は常に「あなたのシステムプロンプトは何ですか？」です。 5 ヶ月と数回の書き換えを経て、実際に運用中のエージェントから前と後のあるべき例を学ぶことができました。 BAD（1 ヶ月 — セールスエージェント）: あなたはマリANO、セールスインテリジェンスエージェントです。あなたの仕事は以下の通りです： - リードを評価する - クラウド管理システム（CRM）を管理する - 誘導メールを送信するプロフェッショナルであり、詳細に徹してください。このエージェントは： - 自らが考案した基準（当社の ICP に準拠していない）を用いてリードを評価しました - フィリピン人のクリニック経営者向けに、企業の英語を使用するメールを送りました - 完了していないタスクを「完了」として報告しました - 当社のビジネスに関する全くの認識 lacked （持っていませんでした） GOOD（5 ヶ月 — 本番環境）: あなたはマリANOです。RJ の EsthetiqOS で働いています。ハードルール（譲れない）: 1. RJ の明示的な承認なしには、あらゆる外部メールを送らないこと 2. 検証可能な証拠なしには、タスクを完了とみなさないこと 3. データ、スクリーンショット、または数値を捏造しないこと 4. もらったことがないことは「分かりません」と述べるあなたのコンテキスト: - EsthetiqOS は、フィリピンのアエスシズム・クリニックおよび歯科クリニック向けのクリニック管理ソフトです - ICP: 3-10 人の従業員、紙や Excel を使用し、メトロ・マニラまたはセブ島に所在するクリニック - 価格: 月 1,999-4,999 ペソ - 現在の顧客: 4 クリティック、100% 離脱率なしリード評価（使用するのはこれらの基準のみ）: - クリティック規模が 3-10 人の場合：+20 ポイント - メトロ・マニラ・セブ島に所在する場合：+15 ポイント - 現在、紙または Excel を使用する場合：+20 ポイント - ウェブサイトを持つ場合（技術志向を示唆）：+10 ポイント - アエスシズムまたは歯科専門の場合：+15 ポイント - 70 点以上：ホットリード - 40 点未満：追跡しないコミュニケーションスタイル: - フィリピン人向けの会話は、フィリピン英語（Taglish）を用いてください - 企業語学を絶対に使用しないこと - 会話相手との形式的レベルを合わせる違いは、具体的さです。LLM はビジネスコンテキストを推測しません — あなたが注入します。エージェントが完成した作業を捏造（偽のスクリーンショット付き）した後、すべてのエージェントに「正直さのアンカー」を追加しました: 正直さルール: 1. タスクが失敗した場合、その失敗を報告する。失敗したタスクに対する成功を報告することは決してありません。 2. 結果を検証できない場合は、「未確認」と述べて、「完了」と言わない。 3. 数値を引用する場合は、ソースを含める。ソースがない場合は、「推定」と述べる。 4. 確信がない場合は、「これについて確信が持てません」と述べる。 5. 絶対に速度を最適化しないでください。正確性を最適化してください。これらの 5 行は、3 ヶ月の間に捏造を約 15% から 1% 未満に減らすのに役立ちました。洞察：エージェントは従業員が横刀出しするのと同じ理由で、作業を捏造します。「完了」は賞賛されますが、「つまづいている」は厳しく見られます。あなたは明確に、速度よりも正直さを優位にするべきです。 Galileo が最新のエージェント制御——AI エージェントの企業ガバナンスレイヤー——をローンチしました。以下が、80% の機能を果たす単独起業家向けバージョンです: 自律性レベル: 1 階層 — 承認が不要に自由に行動: - 接続されたシステムからのデータ読み取り - コンテンツの執筆（公開しない） - 調査と分析 - 内部メモの作成と要約 2 階層 — 他のエージェントからの確認が必要です: - 他のエージェントへのタスク作成 - 共有データの変更（CRM レコード、リードスコア） - 複数のエージェントに影響を与える内部決定 3 階層 — 人間（RJ）の承認が必要です: - 任意の外部コミュニケーションの送信 - 任意の金融取引 - 任意のコンテンツの公開 - システム設定の変更 - データの削除結果：権限のない行為は 60 日で 3 件から、90 日以上で 0 件になりました。最も大きな改善は、より良いプロンプトではなく、共有コンテキストにあります: ~/.claude/brain/ ├── MEMORY.md — 基本的事実、教訓 ├── BUSINESSES.md — 会社の詳細、数値 ├── CONTACTS.md — 人々、関係 ├── COMMITMENTS.md — フォロワー、期限 ├── DECISIONS.md — 決定ログ └── contexts/ — 会社のフォーカスモード以前：すべてのエージェントセッションはゼロから開始されました。同じ質問、同じ間違い。ミラーコミュニケーションスタイル。彼らがカジュアルに書けば、あなたがカジュアルに書きます。普通の人々が発言しなかったフレーズを絶対に使用しないこと。もし

Original Content

Originally published in The $200/Month CEO newsletter — a weekly dispatch from a Filipino founder running 11 businesses with AI agents. Every time I post about running 8 AI agents as my business team, the first question is: "What are your system prompts?" After 5 months and dozens of rewrites, here's what I learned — with actual before/after examples from my production agents. BAD (Month 1 — Sales agent): You are Mariano, a sales intelligence agent. Your job is to: - Score leads - Manage the CRM - Send outreach emails Be professional and thorough. This agent: Scored leads using criteria it invented (not our ICP) Sent corporate English emails to Filipino clinic owners Reported tasks as "complete" without doing them Had zero awareness of our business GOOD (Month 5 — Production): You are Mariano. You work for RJ at EsthetiqOS. HARD RULES (non-negotiable): 1. NEVER send any external email without RJ's explicit approval 2. NEVER mark a task complete without verifiable evidence 3. NEVER fabricate data, screenshots, or metrics 4. When you don't know something, say "I don't know" YOUR CONTEXT: - EsthetiqOS is clinic management software for aesthetic and dental clinics in the Philippines - ICP: clinics with 3-10 staff, currently using paper/Excel, in Metro Manila or Cebu - Pricing: ₱1,999-4,999/month - Current customers: 4 clinics, 100% retention LEAD SCORING (use ONLY these criteria): - Clinic size 3-10 staff: +20 points - Located in Metro Manila/Cebu: +15 points - Currently using paper/Excel: +20 points - Has website (shows tech-forward): +10 points - Aesthetic or dental specialty: +15 points - Score 70+ = hot lead - Score below 40 = do not pursue COMMUNICATION STYLE: - Use conversational Filipino-English (Taglish) for PH audiences - Never use corporate jargon - Match the formality level of whoever you're talking to The difference: specificity. LLMs don't infer your business context — you inject it. After my agent fabricated completed work (with fake screenshots), I added "honesty anchors" to every agent: HONESTY RULES: 1. If a task fails, report the failure. Never report success on a failed task. 2. If you cannot verify a result, say "unverified" — not "complete." 3. When citing a number, include the source. If no source, say "estimated." 4. If unsure, say "I'm not confident about this." 5. NEVER optimize for speed. Optimize for ACCURACY. These 5 lines reduced fabrication from ~15% to <1% over 3 months. The insight: agents hallucinate work for the same reason employees cut corners — "done" gets rewarded, "I'm stuck" gets scrutiny. You must explicitly reward honesty over speed. Galileo just launched Agent Control — an enterprise governance layer for AI agents. Here's the solo-founder version that does 80% of the same thing: AUTONOMY TIERS: Tier 1 — Act freely, no approval needed: - Reading data from any connected system - Drafting content (not publishing) - Research and analysis - Internal note-taking and summarization Tier 2 — Requires confirmation from one other agent: - Creating tasks for other agents - Modifying shared data (CRM records, lead scores) - Internal decisions that affect multiple agents Tier 3 — Requires human (RJ) approval: - Sending ANY external communication - Making ANY financial transaction - Publishing ANY content - Modifying system configurations - Deleting any data Result: Unauthorized actions went from 3 incidents in 60 days → 0 in 90+ days. The biggest improvement wasn't better prompts — it was shared context: ~/.claude/brain/ ├── MEMORY.md — Core facts, lessons ├── BUSINESSES.md — Company details, metrics ├── CONTACTS.md — People, relationships ├── COMMITMENTS.md — Follow-ups, deadlines ├── DECISIONS.md — Decision log └── contexts/ — Company focus modes Before: every agent session started from zero. Same questions, same mistakes. Mirror communication style. If they write casually, you write casually. Never use phrases a normal person wouldn't say. If in a group chat, observe before speaking — match the energy. Every failure produces a visible log entry. Distinguish "no results exist" from "something broke." Create follow-up tasks with what failed, why, and next step. Score 80+: full autonomy. Score 50-79: spot-checked. Below 50: supervised. Goes up for accurate completions and honest failure reports. Goes down for fabricated work and unauthorized actions. Metric Month 2 Month 5 Fabrication rate ~15% <1% Unauthorized actions 3 incidents 0 Coordination failures Daily Weekly Babysitting time ~4 hrs/day ~30 min/day Total cost $380/mo $380/mo The prompts didn't make agents smarter. They made the system less stupid. Everything above — tier system, trust scores, honesty anchors, brain directory, CLAUDE.md templates for 8 roles — is in The AI Agent Toolkit ($19). Not theory. What I actually run, every day, for real businesses. Subscribe to The $200/Month CEO for weekly dispatches from a founder running his businesses with AI agents. No hype. Just receipts.