dev_to 2026年4月25日

OpenAI が GPT-5.5 をリリースした。実際の機能とコストについて解説します

OpenAI Just Released GPT-5.5. Here's What It Actually Does (and What It Costs You)

Translated: 2026/4/25 3:38:59

gpt-5.5openaimachine-learningagentic-ailarge-language-models

Japanese Translation

GPT-5.4 は 3 月 5 日にリリースされました。それから 7 週間後、2026 年 4 月 23 日に OpenAI が GPT-5.5 を発表した時点で、このペース自体がこの競争がどこへ向かうかを示唆しています。これは目的のための目的としてのイテレーションではありません。GPT-5.5 は根本から本当に異なるモデルであり、OpenAI のスタック上で開発されている場合、ベンチマークの表だけを超えた方法で影響を及ぼします。開発者が知るべき全てのこと。毎前の GPT-5.x モデルにおける主要な不満は同じでした：個々のタスクでは印象的だが、継続的な多段階思考を必要とするanything では脆い。複雑なタスクを与えて、相当な最初の試みを得た後、次の 1 時間すべて自身でその後の各ステップを管理する必要があるのです。 GPT-5.5 は、計画、ツールを用いること、自身作業を確認、曖昧さを航行し、バブリングなしに継続することを信頼できるように、汚れた多部分のタスクを処理するように設計されています。これが宣言された目標であり、ほとんどのモデルリリースの主張とは異なり、これを真面目に受け入れるための十分な第三者ベンチマークデータが存在します。 GPT-5.5 に関する最初の理解すべき点はアーキテクチャ的です。GPT-5 を含め、バージョン 5.1 から 5.4 までのすべての GPT モデルは同じベースアーキテクチャ上に構築されました。GPT-5.5 はこのパターンを完全に壊しました。それはゼロから訓練されたモデルです。LushBinary それは細部ではありません。新鮮なベース訓練とは、特に長期間、マルチファイル、多段階のタスクにわたってコンテキストを維持する際の、根本的なレベルでの異なる思考方法につながります。 GPT-5.5 は 3 つのバリアントでリリースされます：標準モデル（GPT-5.5 Thinking）と、より高い計算リソースを必要とするバージョン GPT-5.5 Pro。モデルはテキストおよび画像入力に対応し、約 920K トークンのコンテキストウィンドウを持ちます。Artificial Analysis Codex において、GPT-5.5 は Plus、Pro、Business、Enterprise、Edu、Go プランをまたいで 400,000 トークンのコンテキストウィンドウでアクセス可能であることを意味します。gHacks Tech News GPT-5.5 は、実世界でのサービスにおいて GPT-5.4 と同等のトークンあたりの遅延を維持しつつ、有意に高いレベルの知性を発揮します。これは同じ Codex タスクを完了するためにより少ないトークンを使用することにも関連します。OpenAI その最後の点は、あなたのコストモデルにとって重要です。研究の側面で、OpenAI には注目に値する具体的な例があります。カスタムハネスを持つ GPT-5.5 の内部バージョンは、後続で Lean において検証された新しい組合せ論におけるラムジー数に関する証明を発見し、GPT-5.5 がコードや説明だけでなく、核心的研究領域における数学的に革新的な議論を貢献した具体例でした。OpenAI Codex におけるエージェントコーディングが主要なユースケースです。このモデルは、実装、リファクタリング、デバッグ、テスト、検証を連続ループとして行うエンジニアリング作業を処理するように設計されています。Developer Tech News 早期テスト者の現実世界のシグナルは特に具体的です。Every の CEO の Dan Shipper は、GPT-5.5 が後始动の不具合に対して彼のエンジニアが最終的に選択したようなシステムの書き換えを再現した一方で、GPT-5.4 はできなかったと述べました。MagicPath の CEO の Pietro Schirano は、このモデルが数百のフロントエンドおよびリファクタリングの変更と分岐をマージし、これらが既に分岐したメインコードベースに統合して、作業を約 20 分で解決したと述べています。Cursor の共同創設者 Michael Truell は、GPT-5.5 が GPT-5.4 よりタスクに長い間取り組むことができて、より信頼性の高いツール使用を示したと指摘しました。Developer Tech News コンピュータの使用は有意に良くなりました。OSWorld-Verified が現実世界のコンピュータ環境において自律的に動作する能力を評価する、GPT-5.5 は 75.0% から 78.7% に向上し、GPT-5.4 のスコアを達成しました。gHacks Tech News 44 の職業をまたぐ知識作業は GDPval を通じてトラッキングされます。GPT-5.5 は GDPval で 84.9%、Tau2-bench Telecom で 98.0% を達成し、複雑な顧客サービスワークフローをテストする Prometheus 2-bench に対して、プロンプトチューニングなしで高いスコアを達成しました。OpenAI OpenAI はまた内部のユースケースも共有しました：財務チームは Codex を K-1 税表の 24,771 件のレビューに使用し、71,637 ページにまたがり、前年よりも 2 週間を加速したのに役立つことでした。Go-to-Market の従業員は週のビジネスレポートを自動化し、1 週間あたり 5〜10 時間を節約しました。OpenAI Codex + ブラウザ拡張も新機能です。GPT-5.5 で、Cod

Original Content

GPT-5.4 shipped on March 5. Seven weeks later, on April 23, 2026, OpenAI released GPT-5.5 — and the pace alone tells you something about where this race is headed. This isn't iteration for iteration's sake. GPT-5.5 is a genuinely different model from the ground up, and if you're building on top of OpenAI's stack, the changes matter in ways that go beyond the benchmark table. Here's everything developers need to know. The core complaint with every prior GPT-5.x model was the same: impressive on individual tasks, but brittle on anything that required sustained, multi-step reasoning. You'd hand it a complex task, get a decent first pass, and then spend the next hour managing every subsequent step yourself. GPT-5.5 is designed to handle messy, multi-part tasks where you can trust it to plan, use tools, check its own work, navigate ambiguity, and keep going without babysitting. OpenAI That's the stated goal, and unlike most model launch claims, there's enough third-party benchmark data to take it seriously. The first thing to understand about GPT-5.5 is architectural. Every GPT model since GPT-5 — versions 5.1 through 5.4 — was built on the same base architecture. GPT-5.5 breaks that pattern entirely. It's a model trained from scratch. LushBinary That's not a minor detail. Fresh base training means the model reasons differently at a fundamental level, particularly in how it maintains context across long, multi-file, multi-step tasks. GPT-5.5 ships in three variants: the standard model (GPT-5.5 Thinking), and a higher-compute version called GPT-5.5 Pro. The model supports text and image input and has a context window of approximately 920K tokens. Artificial Analysis In Codex specifically, GPT-5.5 can be accessed with a 400,000 token context window across Plus, Pro, Business, Enterprise, Edu, and Go plans. gHacks Tech News GPT-5.5 matches GPT-5.4's per-token latency in real-world serving while performing at a significantly higher level of intelligence. It also uses fewer tokens to complete the same Codex tasks. OpenAI That last point matters for your cost model, which we'll get to. On the research side, OpenAI has a concrete example worth noting. An internal version of GPT-5.5 with a custom harness helped discover a new proof about Ramsey numbers in combinatorics, later verified in Lean — a concrete case of GPT-5.5 contributing not just code or explanation, but a mathematically novel argument in a core research area. OpenAI Agentic coding in Codex is the headline use case. The model is designed to handle engineering work such as implementation, refactoring, debugging, testing, and validation as a continuous loop. Developer Tech News Real-world signals from early testers are notably specific. Dan Shipper, CEO of Every, said GPT-5.5 reproduced the type of system rewrite one of his engineers had eventually chosen for a post-launch issue, while GPT-5.4 could not. Pietro Schirano, CEO of MagicPath, said the model merged a branch with hundreds of frontend and refactor changes into a main codebase that had also diverged, resolving the work in about 20 minutes. Cursor co-founder Michael Truell noted GPT-5.5 stayed on task longer and showed more reliable tool use than GPT-5.4. Developer Tech News Computer use is meaningfully better. On OSWorld-Verified, which assesses a model's ability to operate in real-world computer environments autonomously, GPT-5.5 achieves 78.7%, up from GPT-5.4's 75.0%. gHacks Tech News Knowledge work across 44 occupations is tracked via GDPval. GPT-5.5 scores 84.9% on GDPval and 98.0% on Tau2-bench Telecom, which tests complex customer-service workflows, without prompt tuning. OpenAI OpenAI also shared internal use cases: the Finance team used Codex to review 24,771 K-1 tax forms across 71,637 pages, helping accelerate the task by two weeks compared to the prior year. A Go-to-Market employee automated weekly business reporting, saving 5–10 hours per week. OpenAI Codex + browser expansion is also new. With GPT-5.5, Codex can interact with web apps, test flows, click through pages, capture screenshots, and iterate on what it sees until it completes the task — expanding well beyond the terminal. 9to5Mac OpenAI moved away from SWE-bench Verified as a primary eval, citing plateau concerns. The benchmarks now favored are more demanding and more representative of real work. On Terminal-Bench 2.0, GPT-5.5 achieves 82.7%, up from GPT-5.4's 75.1%. Claude Opus 4.7 sits at 69.4%. gHacks Tech News Terminal-Bench tests real command-line workflows: multi-step shell scripting, package management, build configuration, container orchestration. A single wrong flag breaks the chain. This is the benchmark where GPT-5.5's lead is most decisive. On SWE-Bench Pro, GPT-5.5 scores 58.6%. Claude Opus 4.7 scores higher at 64.3%. gHacks Tech News That's an honest trade-off OpenAI included in their own launch materials — a rare sign of benchmark confidence elsewhere even if not everywhere. On CyberGym, GPT-5.5 scores 81.8%, versus GPT-5.4's 79.0% and Claude Opus 4.7's 73.1%. gHacks Tech News On FrontierMath Tier 1–3, GPT-5.5 scores 51.7%, up from GPT-5.4's 47.6%. Skypage One important caveat from third-party testing: in many benchmarks, GPT-5.4 Pro still outperforms the default GPT-5.5. The New Stack The Pro tier of the older model remains competitive unless you're specifically targeting the areas where the new architecture shines. Two things make this release significant beyond the spec sheet. First, the architecture break. Every GPT-5.x model up to 5.4 was a refinement of the same base. GPT-5.5 is not. GPT-5.5 (codenamed "Spud") is the first fully retrained base model since GPT-4.5. LushBinary That changes what's possible downstream. The previous models delivered steady improvements to Codex, but each was constrained by the original GPT-5 architecture. GPT-5.5 doesn't have that ceiling. Second, the super app strategy. Greg Brockman said GPT-5.5 is another step toward a "super app" — a unified service combining ChatGPT, Codex, and an AI browser — that Brockman and Sam Altman envision as the primary interface for enterprise work. TechCrunch GPT-5.5 is both a model release and an infrastructure move. The cadence — GPT-5.4 on March 5, GPT-5.5 on April 23 — is deliberate. OpenAI is trying to establish category lock-in before enterprise procurement cycles close. The NVIDIA integration is also notable. GPT-5.5 was co-designed, trained, and served on NVIDIA GB200 and GB300 NVL72 systems. Codex analyzed weeks of production traffic data and wrote custom heuristic algorithms for load balancing and partitioning, resulting in more than 20% faster token generation speeds. Developer Tech News The model helped optimize its own serving stack. That feedback loop between the model and the infrastructure it runs on is new. This is where the release gets complicated for independent developers and smaller teams. GPT-5.5 API pricing: $5.00 per million input tokens, $30.00 per million output tokens. Apidog That's double GPT-5.4's input price of $2.50. GPT-5 launched in August 2025 at $0.63 per million input tokens. GPT-5.4 increased that to $2.50 in March 2026. GPT-5.5 doubles it again to $5.00 — nearly an 8x increase in under a year. Skypage GPT-5.5 Pro pricing: $30 per million input tokens and $180 per million output tokens, with Priority processing at 2.5 times the standard rate. EdTech Innovation Hub OpenAI's defense of this is token efficiency — the model reaches the same output with fewer tokens, so your actual bill may not double even if the rate does. At 10 million output tokens per month, GPT-5.5 standard comes to $300 versus Claude Opus 4.7's $250. If GPT-5.5's agentic performance means 25% fewer task iterations, you break even. Build Fast with AI The math works — if the efficiency gains hold for your specific workload. Benchmark your actual tasks before assuming the sticker price reflects your real cost. One concrete optimization to implement immediately: cached input tokens on GPT-5.5 drop to $0.50 per million — a tenth of the standard rate. Cache system prompts, tool schemas, and repo context on anything reused across requests. Skypage GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. GPT-5.5 Pro is rolling out to Pro, Business, and Enterprise users in ChatGPT. As of April 24, 2026, both GPT-5.5 and GPT-5.5 Pro are available in the API. OpenAI For API access, the model IDs are gpt-5.5 for standard and gpt-5.5-pro for the Pro tier. Both are available through the Chat Completions and Responses APIs. On the safety side, OpenAI has classified GPT-5.5's cybersecurity and biological capabilities as High under its Preparedness Framework, though below the Critical threshold. The company is also running a Trusted Access for Cyber program through Codex, allowing verified users expanded access to advanced security capabilities. EdTech Innovation Hub Quick cost controls worth building in on day one: route premium, long-horizon tasks to GPT-5.5 and standard queries to GPT-5.4 or GPT-5.4-mini. The per-token price jump makes tiered routing a budget necessity, not an optimization. The real story here isn't a single model release — it's the six-week cadence that produced it. OpenAI is shipping at a pace that forces enterprise decisions before anyone has time to fully evaluate. Whether that serves developers or just locks them in faster is a question the next few months will answer. Follow for more coverage on MCP, agentic AI, and AI infrastructure.