dev_to 2026年3月21日

Go における高性能キャッシュレイヤーの構築

Building a High-Performance Cache Layer in Go

Translated: 2026/3/21 7:00:37

gocache-layerredisperformancesystem-design

Japanese Translation

サービスが遅くなっています。Redis を追加し、速度が上がります。しかし、Redis がボトルネックになります。すべてのリクエストがネットワーカーウンドトリップ（RTT）を行うため、セリャライゼーションコストが積み上がり、負荷が掛かると接続プールの争奪からレイテンシスパイクが見えてきます。これに覚えがありますか？この記事では、私たちは Go でローカルメモリキャッシュと Redis を組み合わせた 2 段階キャッシュレイヤーを構築します。単一フライト（singleflight）を使用してキャッシュスタームを防ぎ、おもちゃ的なキャッシュと本番環境で試行した実績のあるキャッシュを区別するプロダクション上の考慮事項について議論します。 Redis は優秀ですが、それでもネットワーク経由です。一般的なサービスでは、以下のような運用遅延があります： | 運用 | 遅延 | |---|---| | ローカルメモリ読み取り | ~50ns | | Redis GET (同じ AZ) | ~0.5-1ms | | PostgreSQL クエリ | ~2-10ms | ローカルメモリと Redis 間で 10,000 倍の違いがあります。1 秒間に数千回読み込まれるホットキーの場合、これは重要です。ローカルキャッシュはまた以下の効果をもたらします： - ネットワークオーバーヘッドゼロ – セリャライゼーションも TCP も接続プールの管理も不要 - 回復力 – Redis が一時的にダウンしてもサービスが応答可能 - Redis 負荷の削減 – コマンド数が減るため Redis の CPU とネットワーク使用量が減少トレードオフ？ローカルキャッシュはインスタンス単位であり、古いデータをサービスすることがあります。私たちはこれらに対応します。まずは単純で効果的なローカルキャッシュから始めましょう。同時アクセスには sync.Map、TTL 除外には背景 goroutine を使用します。 ```go package cache import ( "sync" "time" ) type entry struct { value any expiresAt time.Time } type LocalCache struct { data sync.Map maxSize int size int64 mu sync.Mutex // size を保護 } func NewLocalCache(maxSize int, evictInterval time.Duration) *LocalCache { c := &LocalCache{maxSize: maxSize} go c.evictLoop(evictInterval) return c } func (c *LocalCache) Get(key string) (any, bool) { raw, ok := c.data.Load(key) if !ok { return nil, false } e := raw.(*entry) if time.Now().After(e.expiresAt) { c.data.Delete(key) c.decrSize() return nil, false } return e.value, true } func (c *LocalCache) Set(key string, value any, ttl time.Duration) { _, loaded := c.data.LoadOrStore(key, &entry{ value: value, expiresAt: time.Now().Add(ttl), }) if !loaded { c.incrSize() } else { c.data.Store(key, &entry{ value: value, expiresAt: time.Now().Add(ttl), }) } } func (c *LocalCache) Delete(key string) { if _, loaded := c.data.LoadAndDelete(key); loaded { c.decrSize() } } func (c *LocalCache) evictLoop(interval time.Duration) { ticker := time.NewTicker(interval) defer ticker.Stop() for range ticker.C { now := time.Now() c.data.Range(func(key, value any) bool { if now.After(value.(*entry).expiresAt) { c.data.Delete(key) c.decrSize() } return true }) } } func (c *LocalCache) incrSize() { c.mu.Lock(); c.size++; c.mu.Unlock() } func (c *LocalCache) decrSize() { c.mu.Lock(); c.size--; c.mu.Unlock() } ``` これにより、読み取りと書き込みは O(1) で、遅延 + 周期的な除外が機能します。sync.Map は、キャッシュが一般的に示す読み込み集中で書き込みが少ないパターンに最適化されています。なぜ通常の map と sync.RWMutexを使わないのか？多くの goroutine がある読み込み集中のワークロードでは、sync.Map は読み込み経路上でのロック争いを完全に回避します。書き込み集中の負荷下では、RWMutex を使用した分断されたマップがそれを凌駕できますが、キャッシュはほぼ常に読み込み集中です。さて、ローカルキャッシュを Redis と組み合わせ、2 段階システムを構成しましょう。検索フローは以下の通りです： 1. ローカルキャッシュをチェック → ヒットか？すぐに返す。 2. Redis をチェック → ヒットか？ローカルキャッシュをバックフィルして返す。 3. ロイダー（DB、API など）を呼び出す → 両方のキャッシュを埋めて返す。 ```go package cache import ( "context" "fmt" "time" "github.com/redis/go-redis/v9" ) type TieredCache struct { local *LocalCache redis *redis.Client localTTL time.Duration redisTTL time.Duration } func NewTieredCache(rc *redis.Client, localTT ```

Original Content

Your service is slow. You add Redis. It gets faster. Then Redis becomes the bottleneck -- every request still makes a network round-trip, serialization costs add up, and under load you start seeing latency spikes from connection pool contention. Sound familiar? In this article, we'll build a two-tier cache layer in Go that combines a local in-memory cache with Redis, prevent cache stampedes using singleflight, and discuss the production considerations that separate a toy cache from a battle-tested one. Redis is excellent. But it's still a network hop away. For a typical service: Operation Latency Local memory read ~50ns Redis GET (same AZ) ~0.5-1ms PostgreSQL query ~2-10ms That's a 10,000x difference between local memory and Redis. For hot keys that get read thousands of times per second, this matters. A local cache also gives you: Zero network overhead -- no serialization, no TCP, no connection pools Resilience -- your service still responds if Redis goes down briefly Reduced Redis load -- fewer commands means lower Redis CPU and network usage The tradeoff? Local caches are per-instance and can serve stale data. We'll address both. Let's start with a simple but effective local cache. We'll use sync.Map for concurrent access and a background goroutine for TTL eviction. package cache import ( "sync" "time" ) type entry struct { value any expiresAt time.Time } type LocalCache struct { data sync.Map maxSize int size int64 mu sync.Mutex // guards size } func NewLocalCache(maxSize int, evictInterval time.Duration) *LocalCache { c := &LocalCache{maxSize: maxSize} go c.evictLoop(evictInterval) return c } func (c *LocalCache) Get(key string) (any, bool) { raw, ok := c.data.Load(key) if !ok { return nil, false } e := raw.(*entry) if time.Now().After(e.expiresAt) { c.data.Delete(key) c.decrSize() return nil, false } return e.value, true } func (c *LocalCache) Set(key string, value any, ttl time.Duration) { _, loaded := c.data.LoadOrStore(key, &entry{ value: value, expiresAt: time.Now().Add(ttl), }) if !loaded { c.incrSize() } else { c.data.Store(key, &entry{ value: value, expiresAt: time.Now().Add(ttl), }) } } func (c *LocalCache) Delete(key string) { if _, loaded := c.data.LoadAndDelete(key); loaded { c.decrSize() } } func (c *LocalCache) evictLoop(interval time.Duration) { ticker := time.NewTicker(interval) defer ticker.Stop() for range ticker.C { now := time.Now() c.data.Range(func(key, value any) bool { if now.After(value.(*entry).expiresAt) { c.data.Delete(key) c.decrSize() } return true }) } } func (c *LocalCache) incrSize() { c.mu.Lock(); c.size++; c.mu.Unlock() } func (c *LocalCache) decrSize() { c.mu.Lock(); c.size--; c.mu.Unlock() } This gives us O(1) reads and writes with lazy + periodic expiration. The sync.Map is optimized for the read-heavy, write-light pattern that caches typically exhibit. Why not a regular map with sync.RWMutex? For read-dominated workloads with many goroutines, sync.Map avoids lock contention on the read path entirely. Under write-heavy loads, a sharded map with RWMutex can outperform it -- but caches are almost always read-heavy. Now let's compose the local cache with Redis into a two-tier system. The lookup flow: Check local cache -> hit? Return immediately. Check Redis -> hit? Backfill local cache, return. Call the loader (DB, API, etc.) -> Populate both caches, return. package cache import ( "context" "fmt" "time" "github.com/redis/go-redis/v9" ) type TieredCache struct { local *LocalCache redis *redis.Client localTTL time.Duration redisTTL time.Duration } func NewTieredCache(rc *redis.Client, localTTL, redisTTL time.Duration) *TieredCache { return &TieredCache{ local: NewLocalCache(10000, 30*time.Second), redis: rc, localTTL: localTTL, redisTTL: redisTTL, } } func (tc *TieredCache) Get(ctx context.Context, key string) ([]byte, bool) { // Tier 1: local memory if val, ok := tc.local.Get(key); ok { return val.([]byte), true } // Tier 2: Redis val, err := tc.redis.Get(ctx, key).Bytes() if err == nil { tc.local.Set(key, val, tc.localTTL) // backfill L1 return val, true } return nil, false } func (tc *TieredCache) Set(ctx context.Context, key string, value []byte) error { tc.local.Set(key, value, tc.localTTL) return tc.redis.Set(ctx, key, value, tc.redisTTL).Err() } // GetOrLoad implements the full cache-aside pattern. func (tc *TieredCache) GetOrLoad( ctx context.Context, key string, loader func(ctx context.Context) ([]byte, error), ) ([]byte, error) { if val, ok := tc.Get(ctx, key); ok { return val, nil } val, err := loader(ctx) if err != nil { return nil, fmt.Errorf("loader for key %s: %w", key, err) } _ = tc.Set(ctx, key, val) // best-effort cache write return val, nil } Usage is clean: data, err := cache.GetOrLoad(ctx, "user:1234", func(ctx context.Context) ([]byte, error) { u, err := db.GetUser(ctx, 1234) if err != nil { return nil, err } return json.Marshal(u) }) Important: keep the local TTL shorter than the Redis TTL. A good starting point is local 10-30s, Redis 5-15 minutes. This bounds cross-instance staleness while still absorbing the vast majority of reads locally. There's a critical problem with GetOrLoad. When a popular key expires, hundreds of goroutines simultaneously discover the miss and all call the loader. This is a cache stampede -- it can flatten your database. Go's golang.org/x/sync/singleflight deduplicates concurrent calls for the same key so only one goroutine does the actual work: import "golang.org/x/sync/singleflight" type TieredCache struct { local *LocalCache redis *redis.Client localTTL time.Duration redisTTL time.Duration sf singleflight.Group } func (tc *TieredCache) GetOrLoad( ctx context.Context, key string, loader func(ctx context.Context) ([]byte, error), ) ([]byte, error) { if val, ok := tc.Get(ctx, key); ok { return val, nil } // Only one goroutine executes per key; others wait and share the result. result, err, shared := tc.sf.Do(key, func() (any, error) { // Double-check: another goroutine may have filled the cache // while we waited for the singleflight slot. if val, ok := tc.Get(ctx, key); ok { return val, nil } val, err := loader(ctx) if err != nil { return nil, err } _ = tc.Set(ctx, key, val) return val, nil }) if err != nil { return nil, err } _ = shared // useful for metrics: high share rate = stampede prevention working return result.([]byte), nil } The double-check inside Do matters. Between the initial miss and acquiring the singleflight slot, another goroutine may have already populated the cache. Without this, you'd still make one redundant database call per stampede event. Test setup: 8-core machine, 100 concurrent goroutines, 10K unique keys with Zipfian distribution (some keys much hotter than others, like real traffic). func BenchmarkCacheTiers(b *testing.B) { b.Run("redis-only", func(b *testing.B) { b.RunParallel(func(pb *testing.PB) { for pb.Next() { rdb.Get(ctx, zipfKey()) } }) }) b.Run("local-only", func(b *testing.B) { b.RunParallel(func(pb *testing.PB) { for pb.Next() { local.Get(zipfKey()) } }) }) b.Run("tiered", func(b *testing.B) { b.RunParallel(func(pb *testing.PB) { for pb.Next() { tiered.Get(ctx, zipfKey()) } }) }) } Results: Approach ops/sec p50 p99 Redis only 85,000 0.6ms 2.1ms Local only 12,000,000 48ns 210ns Tiered (warm) 10,500,000 52ns 380ns Tiered (cold start) 78,000 0.7ms 2.4ms At steady state the tiered cache runs at near-local-only speed because hot keys live in L1. The extra local-miss check on cold paths adds only ~4ns of overhead before falling through to Redis. Stampede test: 1000 goroutines hitting the same expired key simultaneously: Without singleflight With singleflight 1000 DB calls 1 DB call p99: 850ms p99: 12ms The difference is dramatic and gets worse under real load. An unbounded local cache will OOM your process. Two approaches: Max entry count -- simple and predictable. Evict oldest entries when full. Add a size check in Set and use an LRU library like hashicorp/golang-lru/v2 when you need eviction ordering. Max memory bytes -- more precise but harder. For []byte values you can sum lengths directly; for arbitrary types, estimation gets complex. Start with max entry count + short TTL. Monitor via runtime.MemStats and adjust. TTL-based eviction is often sufficient. When you also need to cap size: LRU -- the default choice. Well-understood, works for most access patterns. LFU -- better for heavily skewed workloads. More complex to implement correctly. Random -- surprisingly effective and nearly free. Consider it for unpredictable access patterns. For most services, LRU + TTL hits the sweet spot. Options for multi-instance consistency: Short local TTLs -- accept bounded staleness (10-30s). Simplest approach, often sufficient. Redis Pub/Sub -- publish invalidation events on write; instances subscribe and evict locally. func (tc *TieredCache) Invalidate(ctx context.Context, key string) error { tc.local.Delete(key) tc.redis.Del(ctx, key) return tc.redis.Publish(ctx, "cache:invalidate", key).Err() } // Each instance subscribes on startup: func (tc *TieredCache) SubscribeInvalidations(ctx context.Context) { sub := tc.redis.Subscribe(ctx, "cache:invalidate") go func() { for msg := range sub.Channel() { tc.local.Delete(msg.Payload) } }() } Cache misses too. If a key doesn't exist in your database, store a sentinel to prevent repeated lookups: var sentinel = []byte("__MISS__") // In the loader: if errors.Is(err, ErrNotFound) { _ = cache.Set(ctx, key, sentinel) // short TTL return nil, ErrNotFound } Without this, a nonexistent key generates a database query on every request -- a pattern attackers can exploit. Track these metrics (export to Prometheus, Datadog, etc.): Hit rate per tier -- local should be 80%+ for hot paths Singleflight share rate -- high = stampede prevention working Cache size -- entry count and estimated memory Loader latency -- what you're protecting the system from type Metrics struct { LocalHits atomic.Int64 LocalMisses atomic.Int64 RedisHits atomic.Int64 RedisMisses atomic.Int64 SFShared atomic.Int64 } A dashboard showing per-tier hit rates will immediately tell you whether your cache is earning its complexity. Request -> Local Cache (L1, ~50ns) |miss Redis (L2, ~0.5ms) |miss singleflight dedup | Database (~5ms) | Populate L1 + L2 Key takeaways: Two tiers beat one -- local absorbs hot reads, Redis handles the long tail and cross-instance sharing. singleflight is non-negotiable -- without it, cache expiration under load becomes a database stampede. Short local TTLs -- 10-30s balances freshness against hit rate. Monitor everything -- hit rates, sizes, loader latency. Caches fail silently. Start with the simple version. Measure. Then add complexity only where the numbers justify it. This is part of the **Production Backend Patterns* series, where we tackle real infrastructure problems with practical Go code. Follow for the next post on rate limiting and backpressure.* If this article helped you, consider buying me a coffee on Ko-fi! Follow me for more production backend patterns.