Back to list
フレンチアドレスバリデーションAPIの構築:26Mつの住所を含む
Building a French Address Validation API with 26M Addresses
Translated: 2026/3/7 13:03:41
Japanese Translation
フランス政府が管理するBase Adresse Nationale (BAN)には、2,600万件の住所情報(フランス各地の全ての道、ハウスナンバー、島や海外地域を含む)があります。我们构建了GEOREFER来使数据通过单个REST API公开,这还包括从SIRENEデータベースでの会社検索です。” この記事では、それを行う過程について話します。
Original Content
The French government's Base Adresse Nationale (BAN) contains 26 million addresses — every street, every house number, every hamlet across mainland France and overseas territories. We built GEOREFER to make this data accessible through a single REST API, combined with company lookup from the SIRENE database.
This is the technical story of how we did it.
If you're building a FinTech product in France, you need to validate customer addresses for KYC compliance. Sounds simple, right?
Here's what the landscape looks like in 2026:
API Adresse (BAN) — Free, but no SLA, rate-limited to 50 req/s, and no company data
La Poste RNVP — The gold standard for postal validation, but no public REST API
Google Address Validation — Global coverage but $0.005/request adds up fast, and no SIRENE integration
INSEE API SIRENE — Company data, but separate authentication, slow responses (~500ms), and no address validation
To do proper KYC, you need at least two of these APIs, with different auth mechanisms, different response formats, and different rate limits.
We decided to build one API that does it all.
GEOREFER is built on a straightforward Java stack:
Java 11 + Spring Boot 2.7.5
PostgreSQL 16 (42M+ rows across 12 tables)
Redis 7 (API key cache, TTL 5min)
Elasticsearch 7.17 (city autocomplete, fuzzy search)
The architecture follows a clean layered approach:
REST Controllers (17 controllers, 39 endpoints)
|
Business Services (12 interfaces, 16 implementations)
|
Repositories (JPA + Elasticsearch)
|
PostgreSQL + Redis + Elasticsearch
The BAN publishes its data as CSV files, updated monthly. The full dataset is around 3.5 GB compressed.
Our import strategy:
Download the latest BAN CSV export
Parse with streaming CSV reader (no full file in memory)
Batch insert using JDBC batch operations (batch size = 5000)
Index city data into Elasticsearch for autocomplete
The key challenge was handling the French administrative hierarchy:
Region (18) → Department (101) → Commune (35,000+) → Address (26M)
Each commune has an INSEE code (5 digits), one or more postal codes, and belongs to exactly one department. Paris, Lyon, and Marseille have arrondissements that function as sub-communes with their own INSEE codes.
We store communes in a french_town_desc table with full hierarchy:
SELECT f.name, f.insee_code, f.postal_code,
d.name as department, r.name as region
FROM georefer.french_town_desc f
JOIN georefer.department d ON f.department_code = d.code
JOIN georefer.region r ON d.region_code = r.code
WHERE f.name ILIKE 'paris%'
The core feature is POST /addresses/validate. You send a French address, and we return:
Confidence score (0-100) — how sure we are the address exists
GeoTrust Score (0-100) — composite reliability score for KYC
Validated address — normalized, corrected, with GPS coordinates
AFNOR format — postal-standard NF Z 10-011 formatting
The GeoTrust Score is a weighted composite:
Component
Weight
What it measures
Confidence
35%
Street-level address matching
Geo Consistency
25%
Cross-validation: postal code vs commune vs department
Postal Match
20%
Postal code precision (exact, partial, invalid)
Country Risk
20%
FATF/GAFI country risk rating
curl -X POST 'https://georefer.io/geographical_repository/v1/addresses/validate' \
-H 'Content-Type: application/json' \
-H 'X-Georefer-API-Key: YOUR_API_KEY' \
-d '{
"street_line": "15 Rue de la Paix",
"postal_code": "75002",
"city": "Paris",
"country_code": "FR"
}'
Response:
{
"success": true,
"data": {
"validated_address": {
"street_line": "15 Rue de la Paix",
"postal_code": "75002",
"city": "PARIS",
"country": "France"
},
"confidence_score": 95,
"geotrust_score": {
"overall": 92,
"level": "LOW",
"components": {
"confidence": 95,
"geo_consistency": 100,
"postal_match": 100,
"country_risk": 0
}
}
}
}
City autocomplete needs to be fast — under 50ms for a good UX. We use Elasticsearch's Completion Suggester with a custom analyzer:
city_analyzer: edge_ngram (min=2, max=15) + ascii_folding
city_search_analyzer: standard + ascii_folding
The ASCII folding is critical for French cities. Users type "Beziers" but the official name is "Beziers". Our analyzer handles both.
The GET /cities/autocomplete?q=marseil&limit=5 endpoint returns results in under 50ms, even with 35,000+ communes indexed.
We also support fuzzy search with GET /cities/search?q=Monplier — using Elasticsearch's fuzziness AUTO parameter, this correctly returns "Montpellier" despite the typos.
GEOREFER is a SaaS with 5 subscription plans:
Plan
Daily Limit
Rate/min
Price
DEMO
50
10
Free
FREE
100
10
Free
STARTER
5,000
30
49 EUR/mo
PRO
50,000
60
199 EUR/mo
ENTERPRISE
Unlimited
200
Custom
Each API key gets its own token bucket (Bucket4j) for rate limiting. Authentication goes through a Spring filter chain:
Request → API Key validation (Redis cache) → Quota check → Rate limit → Feature gate → Controller
The Feature Gate controls which endpoints each plan can access. For example, company search (/companies) requires PRO or higher, while city search is available on all plans.
We're currently at 16.8 million SIRENE establishments imported and 35,000+ communes indexed. The API handles 39 endpoints across geographic data, address validation, company search, and admin/billing.
If you're building anything that touches French addresses or company data, give it a try:
Free tier: 100 requests/day, no credit card required
Docs: https://georefer.io/docs
Sign up: https://georefer.io/#signup
Examples: https://github.com/azmoris-group/georefer-examples
In the next article, we'll deep-dive into how we query 16.8M SIRENE establishments in 66ms using PostgreSQL trigram indexes.
AZMORIS Engineering — "Software that Endures"