dev_to 2026年4月20日

規制対象アプリにおけるデータプライバシー：開発者が知っておくべきこと

Data Privacy in Regulated Applications: What Developers Need to Know

Translated: 2026/4/20 11:20:39

Japanese Translation

規制対象のアプリーツは、通常のソフトウェアとは不快な点で一つだけ異なります：あなたは避けたいデータを法的に収集することになっています。政府の ID、社会保障番号、リアルタイム位置情報です。規制上の義務があなたが敏感な材料を収集させ、別の法律がそれを保護することを要求します。その緊張関係はコンプライアンス会議では解決されません。それはあなたのアーキテクチャで解決するか、もはや解決しないだけです。 KYC パイプラインの問題大多数のチームは KYC（Know Your Customer）について同じ失敗を犯します：それは機能を扱うのではなく、孤立したサブシステムとして扱うべきです。結果として、政府 ID のスキャンはユーザーの偏好と同じデータベースに保存され、同じアプリケーションサービスにアクセス可能になり、同じログ集約器に送信されます。最初に早期に考えられるべき構造的な質問：あなたは生の身分証明書を本当に保存すべきですか？多くの場合、KYC プロバイダー（Persona、Jumio、Onfido など）に任せて、検証の参照番号と結果のみを保存するのが清潔な道です。あなたのデータベースは kyc_status: verified, provider_ref: "abc123", とタイムスタンプのみを持ちます。それだけです。しかし—そしてこれは重要です—一部のゲーム規制当局は、第三者の参照だけでなく、身分証明書を独立して保持することを明確に要求しています。例えば、ミシガン州 MGCB の技術標準は、コピーを直接保管することを要求する具体的なデータ保留義務を持っており、委任が十分であることを前提とせずに管轄区域要件を確認する必要があります。あなたのコンプライアンスおよび法務チームは、あなたのアーキテクトが署名する必要があるのはストレージモデルです。あなたが本格的に KYC データを保管する必要がある場合（いくつかの管轄区域はこれを要求している）、それを隔離してください： users → user_id, email, created_at kyc_profiles → kyc_id, user_id, status, verified_at, provider_ref kyc_vault → 暗号化されたブロット、厳格な ACL、別の権限情報 Vault はデフォルトでアプリケーションレイヤーから到達できません。専用コンプライアンスサービスだけがそれを触っており、すべての読み取りはログされます。アプリケーションログではなく、別の監査尾跡です。フィールドレベルの暗号化は、ディスクの暗号化よりも重要です。リステイでの暗号化は、誰かがハードドライブを持っていく場合にあなたを守りますが、フィールドレベルの暗号化は、あなたのエンジニア、あなたのクエリ、およびあなたの誤配置されたストレージバケットからあなたを守ります。KMS を使用して SSN、生年月日、ドキュメントハッシュを個別に暗号化してください。暗号化は、明確でログされた正当化を必要とすべきです。KMS ベースのフィールド暗号化のサーバーレスコンテキストにおける具体的な実装パターンを、この生産ウォークスルーは価値あるものです—鍵ポリシーの範囲に関する議論だけで、大多数のチームに苦痛な失敗を省きます。検証の浮遊（Verification Drift）は常に過小評価されています。ユーザーの KYC は今日有効です。十八ヶ月後、そのドキュメントは期限切れになり、またはあなたのプロバイダーのリスクモデルがシフトします。必要になる前に再検証フローを構築してください。陳腐な KYC はコンプライアンスの負債であり、不必要なデータ漏洩です—あなたは有用な寿命を過ぎて敏感な材料を持続しており、対応する義務は存在しません。ジオフェンシングによる位置情報の保存管轄区域の強制は特定の制約を創出します：あなたは単にユーザーの居住場所を裏付けることしかできません。取引の際にユーザーが物理的にどこにいるか確認することは、本質的に異なる問題です。直感は、タイムスタンプ付きの GPS 座標をログに記録することです。これを避けてください。それは誰かの移動パターンに関する詳細な記録であり、規制は通常、チェック結果の証明を要求します—生の座標のそれら自体の保持は要求しません。あなたの義務が実際に要求しているものに最小限に集約して収集してください。より清潔なパターン： Client → 内部の GeoValidation サービスに座標を送信 GeoValidation → 管轄区域多角形に対してチェック → 戻り値：{ permitted: true, jurisdiction: "MI", checked_at: timestamp } Main app → 結果のみを保存し、座標は削除ジオ結果の短期有効期限（TTL）キャッシュは合理的です—すべてのリクエストに対して再チェックすることは、必要な位置データよりも多くを生成し、遅延を追加します。しかし、金融取引では TTL を短く保ってください。数分でなく、数時間です。ユーザー

Original Content

Regulated apps are different from regular software in one uncomfortable way: you're legally required to collect data you'd rather not touch. Government IDs. Social security numbers. Real-time location. The regulatory mandate forces you to gather sensitive material — then separate laws demand you protect it. That tension doesn't get resolved in a compliance meeting. It gets resolved in your architecture, or it doesn't get resolved at all. The KYC Pipeline Problem Most teams make the same mistake with KYC: they treat it as a feature rather than an isolated subsystem. The result is government ID scans sitting in the same database as user preferences, accessible to the same application services, shipped to the same logging aggregator. The first structural question worth asking early: should you store raw identity documents at all? In many cases, delegating to a KYC provider — Persona, Jumio, Onfido — and storing only the verification reference and outcome is the cleaner path. Your database holds kyc_status: verified, provider_ref: "abc123", and a timestamp. Nothing else. However — and this matters — some gaming regulators explicitly require independent retention of identity documents, not just a third-party reference. Michigan's MGCB technical standards, for example, have specific data retention obligations that may require you to hold copies directly. Check the jurisdiction requirements before assuming delegation is sufficient. Your compliance and legal team needs to sign off on the storage model, not just your architect. When you genuinely need to retain KYC data (some jurisdictions require it), keep it isolated: users → user_id, email, created_at kyc_profiles → kyc_id, user_id, status, verified_at, provider_ref kyc_vault → encrypted blob, strict ACL, separate credentials The vault should be unreachable from your application layer by default. Only a dedicated compliance service touches it, and every read gets logged. Not application logs — a separate audit trail. Field-level encryption matters here more than disk encryption. Encrypting at rest protects you if someone walks out with a hard drive. Field-level encryption protects you from your own engineers, your own queries, and your own misconfigured storage buckets. Use your KMS to encrypt SSN, DOB, and document hashes individually. Decryption should require explicit, logged justification. For a concrete implementation pattern using KMS-backed field encryption in a serverless context, this production walkthrough is worth reading — the key policy scoping discussion alone saves most teams a painful mistake. Verification drift is consistently underestimated. A user's KYC is valid today. Eighteen months later, their document has expired or your provider's risk model has shifted. Build re-verification flows before you need them. Stale KYC is both a compliance liability and unnecessary data exposure — you're holding sensitive material past its useful life with no corresponding obligation. Geofencing Without Storing Location Jurisdiction enforcement creates a specific constraint: you can't just verify where a user lives. Confirming where they physically are at the moment of a transaction is a fundamentally different problem. The instinct is to log GPS coordinates with timestamps. Avoid it. That's a detailed record of someone's movement patterns, and regulations typically require proof of the check result — not retention of the raw coordinates themselves. Minimize what you collect to what the obligation actually demands. A cleaner pattern: Client → sends coordinates to internal GeoValidation service GeoValidation → checks against jurisdiction polygon → returns: { permitted: true, jurisdiction: "MI", checked\_at: timestamp } Main app → stores result only, coordinates discarded Short TTL caching on geo results is reasonable — re-checking on every request creates more location data than required and adds latency. But keep the TTL short on financial transactions. Minutes, not hours. Users can cross state lines. On mobile, request whenInUse authorization, not always. Background location collection is rarely justified by the actual regulatory requirement. If your legal team pushes for it, ask them to point to the specific obligation. Usually they can't. IP-based geolocation is a useful secondary fraud signal — but under GDPR, IP addresses (including hashed ones) can still constitute personal data if re-identification remains possible. Treat IPs as PII by default, minimize retention, and don't treat IP geolocation as primary jurisdiction evidence. It's a corroborating signal, not proof. Logging Is Where Privacy Goes to Die Application logs are the most overlooked PII risk in most systems. Engineers treat them as ephemeral debugging tools. In practice, they're shipped to third-party aggregators, retained for months, searchable by broad engineering teams, and occasionally exported during incident investigations. A log line containing a user's email, IP, and a behavioral event is personal data under GDPR, regardless of your intent when writing it. The solution isn't better scrubbing — it's not writing it in the first place. If you want a solid grounding on exactly what GDPR classifies as personal data and how the storage limitation principle translates into engineering constraints, this dev.to breakdown covers it without the legal padding. Pseudonymize at write time, not after. Your log pipeline should never receive a raw email address. It receives an internal pseudonymous ID instead. The OWASP Privacy Cheat Sheet lays out data classification and pseudonymization requirements clearly — worth keeping open when defining your log field taxonomy. Classify every log field explicitly: ✅ user\_id: "usr\_8f3k2" — pseudonymous internal ID ✅ action: "kyc\_check\_passed" — behavioral event, no direct PII ❌ email: "user@email.com" — never in logs ❌ ip\_address: raw — treat as PII, minimize or drop ❌ ssn\_last4 + user\_id — linkable combination, avoid Note on IP addresses specifically: hashing doesn't automatically resolve the GDPR question. If the original IP can be reconstructed or re-identified through other means, a hashed value may still be personal data. The safer default is not retaining them in logs at all unless there's a documented, necessary purpose. Audit logs operate under entirely different rules. Compliance requires an immutable record of access and actions — append-only, write-once, accessible only to your compliance function. Engineers debugging a production incident should not share an access tier with your financial audit trail. These are separate systems with separate purposes, and treating them as one creates both security and compliance exposure. Scrubbing middleware on outbound log streams catches mistakes. It's not a design strategy — it's a fallback for when the actual design fails somewhere. Retention Is an Engineering Problem, Not a Policy Document Every regulated company has a retention policy. Far fewer have the technical enforcement of it. The policy says "delete KYC documents per regulatory schedule." The data sits in production indefinitely because no one built the deletion job. The NIST Privacy Framework treats data lifecycle management — including retention and disposal — as a core engineering outcome, not an afterthought. It's a useful structural reference when you're defining what "done" actually looks like for a retention program. Retention windows vary significantly by jurisdiction, data type, and applicable regulation — the figures below are illustrative starting points, not legal requirements. Validate specifics with counsel for your target jurisdictions: Data Type Retention Window Enforcement Mechanism KYC raw vault Account lifetime + jurisdiction requirement Compliance workflow, manual review gate Geofence results 90 days TTL field, rolling purge job Session/auth logs 90–180 days Log store TTL config Financial records 5–7 years Legal hold check before purge Behavioral/marketing data 12 months Scheduled deletion, user-scoped The right-to-erasure problem in distributed systems is harder than it first appears. Deleting a user means accounting for: primary database, read replicas, analytics warehouse, event streams, backups, third-party KYC provider, email service, and any other downstream system you pushed their data into. Build a data subject request (DSR) workflow that fans out across all stores. Retrofitting this is significantly more painful than building it early. For records you're legally required to keep, anonymize the user linkage rather than deleting the record. The transaction happened, the financial record stays — but the user_id foreign key gets replaced with a non-reversible hash. Compliant retention, no live PII. Soft deletes (deleted_at column) are not privacy-compliant deletion. They're a UX convenience that leaves data fully intact. For regulated data, you need either hard deletion or cryptographic erasure — delete the encryption key, and the data becomes permanently unreadable without requiring a physical delete. What Regulated Betting Applications Actually Look Like Online sports betting is one of the most instructive domains for this kind of engineering. The compliance surface is unusually wide: state gaming authority requirements, AML obligations, age and identity verification mandates, and consumer privacy law all apply simultaneously — and they don't always point in the same direction. Applications supporting sports betting in Michigan operate under Michigan Gaming Control Board oversight, which imposes specific technical requirements around identity verification, geolocation, and record retention. CCPA adds a further layer for any California residents using the platform — note that CCPA scoping depends on user residency and operator thresholds, not just the state of operation. These obligations coexist with platform-level privacy commitments and create a genuinely complex compliance matrix. In practice, this means a KYC gate at account creation that hard-blocks product access until verification is confirmed and stored per the applicable retention requirement. Geo-check middleware injected at the transaction layer — not just login, but every wager. An audit pipeline physically isolated from the observability stack, with access controls that don't overlap with general engineering. A compliance data store with credentials that most engineers never hold. The betting domain is worth studying even if you're building healthcare software or fintech tooling. Regulatory pressure is intense enough that architectural shortcuts are genuinely costly — teams in this space have had to solve these problems for real, under audit, rather than deferring them to a future sprint. Practical Checklist KYC storage model has been validated against jurisdiction-specific regulatory requirements — not just assumed Identity data lives in an isolated schema with field-level encryption and separate credentials No PII written to application logs — pseudonymization at the source, not post-hoc scrubbing IP addresses treated as PII by default — not retained in logs without documented necessity Geolocation data is ephemeral — check result stored, raw coordinates discarded Audit logs are append-only, on a separate pipeline, inaccessible to general engineering access Retention windows are technically enforced — TTLs and scheduled purge jobs, not just documented policy A tested DSR workflow exists that fans out deletion across every data store, including third parties Third-party KYC and data vendors have signed DPAs Soft deletes are not used as a substitute for compliant deletion of regulated personal data Erasure approach (hard delete vs. cryptographic erasure vs. anonymization) has been reviewed against applicable DPA guidance Geo-check middleware runs at the transaction layer, not only at authentication