Back to list
dev_to 2026年3月7日

フレンチアドレスバリデーションAPIの構築:26Mつの住所を含む

Building a French Address Validation API with 26M Addresses

Translated: 2026/3/7 13:03:41
apivalidationfrenchdata-management

Japanese Translation

フランス政府が管理するBase Adresse Nationale (BAN)には、2,600万件の住所情報(フランス各地の全ての道、ハウスナンバー、島や海外地域を含む)があります。我们构建了GEOREFER来使数据通过单个REST API公开,这还包括从SIRENEデータベースでの会社検索です。” この記事では、それを行う過程について話します。

Original Content

The French government's Base Adresse Nationale (BAN) contains 26 million addresses — every street, every house number, every hamlet across mainland France and overseas territories. We built GEOREFER to make this data accessible through a single REST API, combined with company lookup from the SIRENE database. This is the technical story of how we did it. If you're building a FinTech product in France, you need to validate customer addresses for KYC compliance. Sounds simple, right? Here's what the landscape looks like in 2026: API Adresse (BAN) — Free, but no SLA, rate-limited to 50 req/s, and no company data La Poste RNVP — The gold standard for postal validation, but no public REST API Google Address Validation — Global coverage but $0.005/request adds up fast, and no SIRENE integration INSEE API SIRENE — Company data, but separate authentication, slow responses (~500ms), and no address validation To do proper KYC, you need at least two of these APIs, with different auth mechanisms, different response formats, and different rate limits. We decided to build one API that does it all. GEOREFER is built on a straightforward Java stack: Java 11 + Spring Boot 2.7.5 PostgreSQL 16 (42M+ rows across 12 tables) Redis 7 (API key cache, TTL 5min) Elasticsearch 7.17 (city autocomplete, fuzzy search) The architecture follows a clean layered approach: REST Controllers (17 controllers, 39 endpoints) | Business Services (12 interfaces, 16 implementations) | Repositories (JPA + Elasticsearch) | PostgreSQL + Redis + Elasticsearch The BAN publishes its data as CSV files, updated monthly. The full dataset is around 3.5 GB compressed. Our import strategy: Download the latest BAN CSV export Parse with streaming CSV reader (no full file in memory) Batch insert using JDBC batch operations (batch size = 5000) Index city data into Elasticsearch for autocomplete The key challenge was handling the French administrative hierarchy: Region (18) → Department (101) → Commune (35,000+) → Address (26M) Each commune has an INSEE code (5 digits), one or more postal codes, and belongs to exactly one department. Paris, Lyon, and Marseille have arrondissements that function as sub-communes with their own INSEE codes. We store communes in a french_town_desc table with full hierarchy: SELECT f.name, f.insee_code, f.postal_code, d.name as department, r.name as region FROM georefer.french_town_desc f JOIN georefer.department d ON f.department_code = d.code JOIN georefer.region r ON d.region_code = r.code WHERE f.name ILIKE 'paris%' The core feature is POST /addresses/validate. You send a French address, and we return: Confidence score (0-100) — how sure we are the address exists GeoTrust Score (0-100) — composite reliability score for KYC Validated address — normalized, corrected, with GPS coordinates AFNOR format — postal-standard NF Z 10-011 formatting The GeoTrust Score is a weighted composite: Component Weight What it measures Confidence 35% Street-level address matching Geo Consistency 25% Cross-validation: postal code vs commune vs department Postal Match 20% Postal code precision (exact, partial, invalid) Country Risk 20% FATF/GAFI country risk rating curl -X POST 'https://georefer.io/geographical_repository/v1/addresses/validate' \ -H 'Content-Type: application/json' \ -H 'X-Georefer-API-Key: YOUR_API_KEY' \ -d '{ "street_line": "15 Rue de la Paix", "postal_code": "75002", "city": "Paris", "country_code": "FR" }' Response: { "success": true, "data": { "validated_address": { "street_line": "15 Rue de la Paix", "postal_code": "75002", "city": "PARIS", "country": "France" }, "confidence_score": 95, "geotrust_score": { "overall": 92, "level": "LOW", "components": { "confidence": 95, "geo_consistency": 100, "postal_match": 100, "country_risk": 0 } } } } City autocomplete needs to be fast — under 50ms for a good UX. We use Elasticsearch's Completion Suggester with a custom analyzer: city_analyzer: edge_ngram (min=2, max=15) + ascii_folding city_search_analyzer: standard + ascii_folding The ASCII folding is critical for French cities. Users type "Beziers" but the official name is "Beziers". Our analyzer handles both. The GET /cities/autocomplete?q=marseil&limit=5 endpoint returns results in under 50ms, even with 35,000+ communes indexed. We also support fuzzy search with GET /cities/search?q=Monplier — using Elasticsearch's fuzziness AUTO parameter, this correctly returns "Montpellier" despite the typos. GEOREFER is a SaaS with 5 subscription plans: Plan Daily Limit Rate/min Price DEMO 50 10 Free FREE 100 10 Free STARTER 5,000 30 49 EUR/mo PRO 50,000 60 199 EUR/mo ENTERPRISE Unlimited 200 Custom Each API key gets its own token bucket (Bucket4j) for rate limiting. Authentication goes through a Spring filter chain: Request → API Key validation (Redis cache) → Quota check → Rate limit → Feature gate → Controller The Feature Gate controls which endpoints each plan can access. For example, company search (/companies) requires PRO or higher, while city search is available on all plans. We're currently at 16.8 million SIRENE establishments imported and 35,000+ communes indexed. The API handles 39 endpoints across geographic data, address validation, company search, and admin/billing. If you're building anything that touches French addresses or company data, give it a try: Free tier: 100 requests/day, no credit card required Docs: https://georefer.io/docs Sign up: https://georefer.io/#signup Examples: https://github.com/azmoris-group/georefer-examples In the next article, we'll deep-dive into how we query 16.8M SIRENE establishments in 66ms using PostgreSQL trigram indexes. AZMORIS Engineering — "Software that Endures"