dev_to 2026年4月20日

設計段階での FinOps：Terraform の一行も書かないで、$3,840/月の無駄な支出を発見した

FinOps at design time: I found $3,840/month in avoidable spend before writing a line of Terraform

Translated: 2026/4/20 11:20:23

finopsaws-cost-optimizationinfrastructure-as-codelambdadynamodb

Japanese Translation

FinOps はほぼ完全に事後的なアプローチです。AWS Cost Explorer は過去のコストサイクルの結果を教えてくれるし、Trusted Advisor は現在のリソースの利用率低下を示唆します。コスト異常アラートも、異常が発生してから何時間か経ってから発火します。標準的な FinOps スタックにあるすべてのツールは、既に存在するインフラを分析します。つまり、それらが有用になる頃には、建築の 80% の長期コストを決定する構造的決定は既に決まっており、デプロイ済みで現在、それを逆転させるのは高額になります。私は 9 年間 AWS ソリューションアーキテクトを務めてきました。このパターンは一貫しており、私はそこに同罪になっています：建築を設計し、IaC（インフラストラクチャとしてコード化）を書き、デプロイし、そしてコストを発見する。価格計算子は定常状態のトラフィックと適切な構成を前提とした静的推計を提供しますが、これらの前提どちらも実際のワークロードの下では成り立ちません。この記事では、私はこのパターンを破り、かつ単一のリソースがプロビジョニングされる前にも $3,840/月の避けるべき支出を発見したセッションについて記述しています。建築シリーズ B の SaaS プロダクトのイベント処理パイプライン。顧客アクティビティのイベントは API を経由して取り入れられ、非同期に処理され、ダウンストリーム分析のために保存されます。期待されるベースラインは 1,200 RPS（リクエストあたり秒）で、キャンペーン日の 6 倍のスパイクがあります。 Route 53 → API Gateway → Lambda（取り入れ）→ SQS → Lambda（処理者）→ DynamoDB Lambda は 512 MB、予約コンカーレンス 200 で構成されています。DynamoDB はオンデマンド容量モードです。AWS 価格計算子の定常状態ベースライン推計：～$4,100/月。一定のシミュレーションで 1,200 RPS の場合、すべてが健全に見えました。コストは $4,230/月に落ち着き、価格計算子の数字に近く、良い兆候と感じられました。古いワークフローはここで止まります。定常状態であれば問題ありません、コストも範囲内なのでデプロイへ進みます。しかし pinpole のワークフローはこれで終わりではありません。発見 1：スパイク負荷下のオンデマンド DynamoDB 私は 7,200 RPS のスパイクパターンを実行しました - カampain 日の 6 倍の負荷。AI 推奨パネルは数秒以内に更新されました。発見：DynamoDB のオンデマンドモードで 7,200 RPS の取り入れ、1.4 倍の書き込み増幅を伴うサブインデックスを生成し、キャンペーン日の DynamoDB 書き込みコストだけでは約 $2,890/月を生み出すことが予想されます。自動スケーリング付きのプロビジョンド容量（最小 1,500 WCU、最大 12,000 WCU、目標利用率 70%）なら、それを約 $740/月まで削減できます。発見 2：Lambda メモリ割り当て AI 推奨エンジンが、512 MB の両方の Lambda 関数がメモリ/コスト曲線の特定の領域で動作していることを指摘しました。メモリ割り当てを増やすことが、高い単価/GB 秒率にもかかわらず、総計算コストを削減し、実行期間は非線形的に短縮されます。理由としては、CPU 増加時に Lambda が CPU をメモリに比例して割り当てるためです。発見 3：API Gateway の前に分散層がないスパイク負荷下では、API Gateway が直接完全な要求ボリュームを吸収していました。Canvas に CloudFront を追加し、再実行すると、キャッシュ可能なレスポンスはもうオリジンにヒットしなくなり - API Gateway の RPS は取り入れ層で有意に低下し、月次 API Gateway コストの削減が CloudFront コストを相殺しました。結果 Before | After ---|--- DynamoDB（キャンペーン日）|$2,890/月 | $740/月 Lambda（両方の関数）|ベースライン | 削減済み API Gateway + CloudFront | $X | $X － Δ 合計特定節約 | $3,840/月 | デプロイメントパイプラインが触られる前にも 3 つの発見が特定されました。最適化された設定のデプロイ後検証は、シミュレーション投影に対して $30 未満で完了しました。より広範な点ドルの数字より、メカニズムの方が重要です。これらののは見落としられた最適化ではありません。DynamoDB の容量モード、Lambda のメモリサイズ調整、分散層の決定は、ほぼすべてのイベント駆動型 AWS 建築に見られます。それらは通常、最初の請求サイクルまで見つからないままになります - エンジニアが過失をしたからではなく、それらを捉えるために必要なツールが歴史的にデプロイされたインフラストラクチャを必要としてきたからです。その制約は除去可能です。FinOps が通常操作するフィードバックループ

Original Content

FinOps is almost entirely retrospective. AWS Cost Explorer tells you what happened last billing cycle. Trusted Advisor tells you which resources are underutilised right now. Cost anomaly alerts fire after the anomaly has already run for hours. Every tool in the standard FinOps stack analyses infrastructure that already exists. Which means by the time any of them are useful, the structural decisions that determine 80% of your architecture's lifetime cost have already been made, deployed, and are now expensive to reverse. I have been an AWS solutions architect for nine years. The pattern is consistent, and I have been complicit in it: design the architecture, write the IaC, deploy, and then discover the cost. The Pricing Calculator gives you a static estimate that assumes steady-state traffic and correct configuration. Neither assumption holds under a real workload. This post is about a session where I broke that pattern - and caught $3,840 per month in avoidable spend before a single resource was provisioned. The architecture Event processing pipeline for a Series B SaaS product. Customer activity events ingested via API, processed asynchronously, stored for downstream analytics. Expected baseline: 1,200 RPS, with a 6× spike on campaign days. Route 53 → API Gateway → Lambda (ingest) → SQS → Lambda (processor) → DynamoDB Lambda configured at 512 MB, reserved concurrency 200. DynamoDB in on-demand capacity mode. The AWS Pricing Calculator estimate at steady-state baseline: ~$4,100/month. Under a Constant simulation at 1,200 RPS, everything looked healthy. Cost settled at $4,230/month - close to the Pricing Calculator number, which felt like a good sign. Old workflow would have stopped there. Steady state is fine, cost is in range, proceed to deploy. pinpole's workflow does not stop there. Finding 1: DynamoDB on-demand at spike load I ran a Spike pattern at 7,200 RPS - the 6× campaign day load. The AI recommendations panel updated within seconds. The finding: DynamoDB on-demand at 7,200 RPS ingest, with 1.4× write amplification to a secondary index, was going to produce approximately $2,890/month in DynamoDB write costs alone on campaign days. Provisioned capacity with auto-scaling - minimum 1,500 WCU, maximum 12,000 WCU, target utilisation 70% - would bring that to approximately $740/month. Finding 2: Lambda memory allocation The AI recommendation engine flagged that both Lambda functions at 512 MB were likely operating in a region of the memory/cost curve where increasing memory allocation reduces total compute cost despite the higher per-GB-second rate. The reason: execution duration drops non-linearly when CPU increases, because Lambda allocates CPU proportionally to memory. Finding 3: No distribution layer in front of API Gateway Under spike load, API Gateway was absorbing the full request volume directly. Adding CloudFront to the canvas and rerunning showed that cacheable responses no longer hit the origin - API Gateway RPS at the ingest layer dropped meaningfully at peak, and the monthly API Gateway cost reduction offset the CloudFront cost. The result Before After DynamoDB (campaign day) $2,890/mo $740/mo Lambda (both functions) Baseline Reduced API Gateway + CloudFront $X $X − delta Total identified saving $3,840/mo All three findings identified before a deployment pipeline was touched. The post-deployment validation on the optimised configuration came in at $30 under the simulation projection. The broader point The dollar figure matters less than the mechanism. These are not obscure optimisations. DynamoDB capacity mode, Lambda memory right-sizing, and distribution layer decisions exist in almost every event-driven AWS architecture. They are routinely not caught until the first billing cycle - not because engineers are negligent, but because the tools required to catch them have historically required deployed infrastructure. That constraint is removable. The feedback loop that FinOps typically operates in - deploy, observe, optimise, redeploy - now has a step zero. 14-day Pro trial, no credit card. Free tier available at app.pinpole.cloud