dev_to 2026年3月14日

The New Bottleneck - When AI Writes Code Faster Than Humans Can Review It

Translated: 2026/3/14 14:01:49

artificial-intelligencesoftware-developmentci-cdcode-reviewthird-party-libraries

Japanese Translation

The 10x Productivity Paradox\n\nソフトウェア開発の画期的な変化の中に私たちが生きている。AIエージェントにより、コードを書くことは驚異的に簡単になり、技術負債のギャップを埋め、複数の概念実証（POC）を実装し、バグを除去、テストカバレッジを劇的に高め、雷のように速く新規機能を開発できるようになった。10倍の速度向上という約束は現実的で、それはすでにここにある。\n\nしかし、これは不気味な真実だ：ボトルネックがシフトしたのだ。\n従来型のソフトウェアデリバリーサイクルにおいて、コードを書くことは最も遅いフェーズだった。では？最も速いフェーズだ。新しいボトルネックはコードレビューだ。人間は、AIエージェントが生成できる膨大な量のコードをレビューすることができない。\n\n明らかな解決策はシンプルに思える：「プルリクエストを小さく持とう」という旧来の基準を維持する。コード行数やファイル数に制限を設ける。これで問題解決、右ですか？\n\n完全にそうではない。\n確かに、私たちは小さなプルリクエストを強制し、技術的に「レビューボトルネックを解決」できる。しかし、正直なところ、私たちは問題を解決していないだけで、革新を無視し、進化に反対している。\n\nコード生成に10倍の向上が可能であるならば、なぜコードレビューの容量に10倍の向上を目標としないのか？\n同じペースで継続してデリバリーを行いつつ、私たちの周囲の世界が進化するならば、競合他社は待たない - 私たちの顧客もまたそうする。\n\nでは解決策は何ですか？ああ、明らかな候補はAIエージェント自体ではないか？AIエージェントが私たちのために10倍多くのコードを生成することを助けたのであれば、同じレートとボリュームでそれをレビューすることを助けるべきだ。\n\nしかし、それから来る感覚。あの不気味で不快感の感覚。\nこのコードは本当に私のものではありません。私は書いていません。私はほとんどレビューしていません。私に生産にそのコードを送信する自信を持つことができるのですか？\n\n最近、私が思いつく奇妙な考えが思い浮かびました：AIエージェントは依存関係ですか？\n考えてみてください。AI生成のコードは、もう一つのサードパーティ製コードではないでしょうか？私は私たちはほとんどサードパーティ製コードをレビューしないことを知っているはずです - 我々がインストールして使用する依存関係です。もしあなたが考えてみてください、我々が生産に送り出すほとんどのコードは、我らのものではありません。我々が使用するサードパーティライブラリであり、依存関係ライブラリであり、依存関係を持つ依存関係があり、そしてそれらのようなものです。あなたはそれらのツリーを知っています。\n\nしかし、ここで私たちはあまり考えない部分があります：信頼は再帰的だ。あなたが信じるライブラリの著者？彼らもレビューしていないサードパーティコードに依存する。そして、そのネストされた依存関係の著者らは、彼らの own 依存関係を信じる。そのツリー内のすべてのノードは、他の人が書いたコード、他の人がレビューしたコード - もし誰もレビューしていないのであれば、を信じている。そして我々はそれらすべてを生産に送り出す。\n\nそして、AI生成のコードとの違いは何ですか？両方のケースで、我々は知らない誰か、あるいは何か - コードを書く人、我々は知らない誰かがレビューし、オープンソースの場合、我々は知らない誰かが GitHub でそれらの星を押した - 私たちはそれらを基盤として信頼する。\n\n我々は我々のコードベースに統合されたサードパーティ製コードと我々の心を調和させる方法は何ですか？我々は実際に我々が認識するよりもはるかに多くのことを行っています。\n我々は採用する前に審査する。我々はプロジェクトがどの程度アクティブかを確認する - 最後のコミットは何時に？メンテナーは問題にどのくらい早く反応するか？我々はコミュニティのシグナルを見る - 星、ダウンロード、誰が使っているか。我々はライセンスを確認する。我々はドキュメントを読む。我々は場合によっては重要な部分のソースコードをスキャンするかもしれません。\n\n我々はテストする。ユニットテスト、統合テスト、エンドツーエンドテスト、手動のQA。我々は依存関係が期待通り振る舞うことを検証する契約テストを書く。\n\n我々はサプライチェーンを保護する。我々は脆弱性のスキャナーと依存関係の監査を実行し、バージョンを固定しロックファイルを使用する - 私たちの知識なく何かが私達の下で変化しないように。我々は依存関係ツリーに既知の脆弱性を監視する。\n\n我々は境界を設ける。我々はサードパーティ製コードをインターフェースと抽象化に囲む。我々は契約を定義する - 何が入るか、何が出力されるか。我々は半分のコードベースを書き換えずに依存関係を交換できることを確認する。\n\n我々は生産で監視する。我々は可視性を持っている - ログ、メトリクス、アラート。我々はステージドロールアウトと機能フラグを使用する。何か壊れれば私たちは高速でロールバックできる。\n\nもし AI 生成のコードが...

Original Content

The 10x Productivity Paradox We're living through an incredible shift in software development. AI agents have made it absurdly easy to write code - close tech debt gaps, create multiple POC implementations, kill bugs, dramatically increase testing coverage, and ship new features at lightning speed. The promise of 10x velocity increase is real, and it's here. But here's the uncomfortable truth: the bottleneck has shifted. In the traditional software delivery cycle, writing code was often the slowest phase. Now? It's the fastest. The new bottleneck is code review. Humans simply can't review the enormous volume of code that AI agents can generate. The obvious solution seems simple: stick to the good old standard of "keep pull requests small." Set limits on lines of code or number of files. Problem solved, right? Not quite. Sure, we can enforce small PRs and technically 'solve' the review bottleneck. But let's be honest - we're not solving the problem, we're ignoring innovation and pushing back on evolution. If we can achieve a 10x increase in code generation, why not strive for a 10x increase in code review capacity? If we keep delivering at the same pace while the world evolves around us, our competitors won't wait - and neither will our customers. So what's the solution? Well, the obvious suspect is the AI agent itself, right? If an AI agent helped us produce 10x more code, it should help us review it at the same rate and volume. But then comes that feeling. That awkward, uncomfortable feeling. This code isn't really mine. I didn't write it. I barely reviewed it. How can I have confidence shipping it to production? Lately, a strange thought has been popping into my mind: What if AI agents are dependencies? Think about it. What if AI-generated code is just another third-party code? I'm positive that most of us don't review third-party code - dependencies that we install and use. If you think about it, most of the code we ship to production isn't ours. It's a third-party library we use, a dependency library, which also has dependencies, which have dependencies, and so on. You know that tree. But here's the part we rarely stop to think about: the trust is recursive. The author of that library you trust? They also trust third-party code they didn't review. And the authors of those nested dependencies trust their own dependencies. Every node in that tree is trusting code that someone else wrote, someone else reviewed - or maybe no one reviewed at all. And we ship all of it to production. So what's the difference between that and AI-generated code? In both cases, someone - or something - we don't know wrote the code, someone we don't know reviewed it, and in the case of open-source, someone we don't know even starred it on GitHub and we base our trust on that. How do we reconcile our minds with third-party code integrated into our codebase? We actually do a lot more than we realize. We vet before we adopt. We check how active the project is - when was the last commit? How fast do maintainers respond to issues? We look at community signals - stars, downloads, who else is using it. We check the license. We read the docs. We might even skim the source code of the critical parts. We test. Unit tests, integration tests, end-to-end tests, manual QA. We write contract tests that verify the dependency behaves the way we expect it to. We secure our supply chain. We run vulnerability scanners and dependency audits. We pin versions and use lock files so nothing changes underneath us without our knowledge. We monitor for known vulnerabilities in our dependency tree. We set boundaries. We wrap third-party code behind interfaces and abstractions. We define contracts - what goes in, what comes out. We make sure we can swap a dependency out without rewriting half the codebase. We monitor in production. We have observability - logs, metrics, alerts. We use staged rollouts and feature flags. We can roll back fast if something breaks. What if AI-generated code is just another third-party dependency? What if we need the same trust mechanisms - vetting, testing, boundaries, security scanning, and monitoring? The longer I think about this idea, the harder it is for me to counter-argue against it. My mind starts drifting to more questions - and the uncomfortable part is that none of them feel unanswerable. Should there be a way to distinguish AI-generated code from human-written code in the codebase? How do I mark it, track it, set boundaries around it? Do I need to see and read the code myself, or is it enough to define clear review rules and let AI agents review it for me? How many AI agent reviewers do I need before I feel confident? Is one enough? Three? If multiple AI agents review the same code and agree, is that more trustworthy than a single human reviewer who skimmed it? Should we have different AI agents with different review focuses - one for security, one for performance, one for correctness? What's the equivalent of "2 approvals required" in an AI-review world? I don't have all the answers yet. But the thing that keeps pulling me deeper into this rabbit hole is that these questions feel like they have answers. They feel like problems we can solve - with tooling, with process, with experimentation. And if that's the case, if these concerns can be addressed with real solutions, then are we actually looking at a legitimate shift in how we think about code ownership and review? Here's what keeps me up at night: Trust without understanding: We trust libraries with millions of downloads that we've never read. Why is AI-generated code different? Is it because it's generated specifically for our codebase, making us feel more responsible for it? The illusion of control: When we install a dependency, we accept that we don't control it. When AI generates code in our repository, we feel like we should control it. But should we? And yet, the more I sit with these questions, the more I keep arriving at the same place: Ownership doesn't mean authorship. Here's the reality - we own the code we ship to production, including third-party libraries. When there's a bug, the customer doesn't care if the faulty code is a third-party library or code written by me. Nor should I care about the distinction when fixing it. I should fix the bug - either by open-source contribution or in "user-land" - and make sure I catch it next time before the customer does. The same goes for AI-generated code. I didn't write it, I might not have reviewed every line, but if I use it, it's mine. Ownership is about responsibility, not authorship. If we accept this mental model - that AI-generated code is a dependency - it could change how we think about a lot of things. None of this is a recipe. It's a direction worth exploring. Code review doesn't disappear - it evolves. Just as AI became our multiplier for writing code, it could become our multiplier for reviewing it. We'd still review. But the nature of human review might shift from line-by-line inspection to higher-level concerns, while AI agents handle the detailed implementation review. Human review could shift toward contract review: Instead of reviewing every line ourselves, we'd focus on the contract: What should this code do? What are the edge cases? What are the performance requirements? AI reviewers could handle the implementation details - style, correctness, edge cases in the code itself. Trust mechanisms could add layers of confidence: Tests, security scanning, pinned versions, and observability wouldn't replace review - they'd reinforce it. Together with AI-assisted review, they could form a safety net that's broader than any single human reviewer could provide. Documentation might matter more than implementation: Understanding what the code does could become more important than understanding how it does it. Just like we read API docs for libraries instead of their source code. Monitoring could close the loop: Instead of assuming correctness at merge time, we'd verify it continuously in production. Observability, alerts, and fast rollbacks would become first-class citizens in the delivery process - not afterthoughts. If you take this idea far enough, it hints at something even more fundamental - we might be moving to a higher layer of abstraction altogether. English becomes the programming language. We write declarative specifications (the "what") instead of imperative code (the "how"). "Create a user authentication system with JWT tokens and rate limiting" becomes the new code. The AI figures out the implementation. This would require new forms of verification - tests that validate behavior rather than implementation, security scanning that runs automatically, monitoring that confirms correctness in production, and pipelines that validate the "what" was correctly translated into the "how." Here's another brain teaser: What happens when both code generation and code review are 10x in volume? Do we have enough product requests to keep us busy? Do we have enough problems to solve? Does the bottleneck shift to product discovery and requirement gathering? But that's a topic for another post. I don't have all the answers. This is a thought experiment, not a playbook. But if this idea resonates, here are some directions worth exploring: What does trust infrastructure look like? Tests, security scanning, monitoring - the full chain, not just one link. If these are going to support our quality gate, how comprehensive, fast, and reliable do they need to be? How should code review processes change? Maybe we need different review levels - contract review vs. implementation review. Maybe AI-generated code gets a different review process than human-written code. What would that look like for your team? Where do boundaries make sense? Wrapping AI-generated code behind clear interfaces and abstractions could keep the blast radius small if you need to replace or rewrite it later. What can we learn from AI reviewing AI? What happens when multiple AI agents review each other's code? What do they catch that humans miss - and what do they miss that humans catch? What should we measure? How often does AI-generated code cause production issues compared to human-written code? We won't know unless we track it. Let data inform the conversation. Is the discomfort useful? This feeling of unease might be healthy. It keeps us questioning and improving our processes rather than blindly adopting a new paradigm. Final Thoughts The AI era is forcing us to question fundamental assumptions about software development. The idea that AI-generated code might be just another dependency is uncomfortable, controversial, and possibly wrong. But it's worth exploring. And if we do explore it, we don't have to go all-in immediately. In fact, we probably shouldn't. Some areas of our codebase are critical and dangerous. Payment processing, authentication, security features, data privacy controls - maybe it's fine not to be 10x there. Maybe slower is safer, and safer is better. We can experiment with AI-generated and AI-reviewed code in less critical areas first. Internal tools, admin dashboards, test utilities, documentation. Build confidence, gather data, learn what works and what doesn't. Then, gradually expand as we develop better trust mechanisms, review processes, and confidence in the approach. Not every part of your system needs to move at 10x speed. Sometimes, the right speed for critical infrastructure is the speed at which humans can thoroughly understand and validate every change. But if we're going to unlock the true potential of AI in software development, we need to rethink not just how we write code, but how we review it, trust it, and build confidence around it. The bottleneck has shifted. The question is: are we ready for a world where the code we ship isn't the code we wrote? What do you think? Is AI-generated code a dependency? How are you handling code review in the age of AI? I'd love to hear your thoughts - find me at @sag1v. Originally published on debuggr.io. I write about software engineering, AI, and the things that keep bugging me about our industry. If this resonated with you, come visit debuggr.io for more.