MCP data.gouv.fr: querying French open data from Claude Code
- Published on
- ·6 min read
French open data is great in theory
data.gouv.fr. France's national open data platform. Thousands of datasets: elected officials, budgets, geography, transportation, healthcare, education. A goldmine on paper. In practice? You spend 20 minutes clicking through the interface, download a CSV, open it in a spreadsheet, realize it's the wrong file - or the format changed since last update, because of course nobody tells you when that happens. Start over.
Then I found out data.gouv.fr ships an official MCP server. MCP is the Model Context Protocol - an open standard that plugs external data sources directly into your AI assistant. So basically: you ask a question in plain text from your terminal, and the AI goes digging through government datasets on your behalf. I was skeptical. Tried it anyway. Works surprisingly well.
Setup: honestly pretty painless
No repo to clone. No Docker. No API key (that part surprised me). The server runs on a free public instance, no signup required. One command:
claude mcp add --transport http datagouv https://mcp.data.gouv.fr/mcp
That drops the config into ~/.claude.json. Restart Claude Code, you're good.
Quick sanity check:
claude mcp list
If datagouv shows up with http transport, you're set. If it doesn't show up, restart Claude Code fully - I got caught by this the first time, a simple /mcp reset wasn't enough.
For Claude Desktop (or Cursor, Windsurf, VS Code)
Here you need npx mcp-remote as a wrapper in the JSON config file:
{
"mcpServers": {
"datagouv": {
"command": "npx",
"args": ["-y", "mcp-remote", "https://mcp.data.gouv.fr/mcp"]
}
}
}
Same idea for other editors.
The toolbox
Three categories. I won't write a novel about each tool - the tables speak for themselves.
Datasets (the files)
| Tool | What it does |
|---|---|
search_datasets | Keyword search |
get_dataset_info | Dataset metadata (title, license, dates...) |
list_dataset_resources | List available files (CSV, JSON, XLS...) |
query_resource_data | Query a CSV/XLSX directly via the Tabular API |
get_resource_info | Technical details (format, size, URL) |
download_and_parse_resource | Download and parse a JSON/JSONL file |
query_resource_data is the star here, by far. You can query a 50,000-row CSV without downloading it, with filters (exact, contains, less, greater), sorting, pagination. Kills the whole download-open-filter-close cycle.
Dataservices (third-party APIs)
| Tool | What it does |
|---|---|
search_dataservices | Find registered APIs |
get_dataservice_info | API metadata (base URL, docs) |
get_dataservice_openapi_spec | Grab the OpenAPI spec to see available endpoints |
You can stumble upon the Address API, the Sirene API (French company registry), and directly read their OpenAPI spec to figure out how to call them. No need to go hunt for docs on some third-party site.
Metrics
One tool: get_metrics. Visit and download stats. Niche, but useful to check whether a dataset is actively maintained or something abandoned since 2019.
What does this actually look like? Validating elected officials data
Here's something I ran into this week. I had a JSON file with French deputies data - names, departments, political groups. 580 entries. The thing is, I had no idea if my data was still accurate. Deputies resign, substitutes take over, last names change after marriages... things move constantly at the National Assembly.
Instead of spending an afternoon cross-referencing everything by hand on the National Assembly website (spoiler: their search interface is... let's say "character-building"), I gave the MCP a shot.
"Repertoire National des Elus" into search_datasets. Boom. Ministry of the Interior dataset, ID 5c34c4d1634f4173183a64f1. The official source, the one that counts.
list_dataset_resources gives me the file list: one for deputies, one for senators, one for mayors, regional councilors... Each file with its ID, format, size. Standard stuff so far.
And here's where it gets interesting. query_resource_data on the deputies file. 575 rows. My local file: 581. Six too many. Problem identified in 30 seconds, without opening anything.
From there I had names compared one by one. The results weren't pretty: 4 entries that weren't even deputies (former ministers lingering in the file - no clue how they got there), 13 deputies no longer in office, 21 completely missing. And around sixty names with accent or hyphen differences from the official source. Oh, and 5 departments spelled "Reunion" instead of "La Reunion". The kind of thing that goes unnoticed for months.
The kind of cleanup that eats a solid half-day by hand, between back-and-forth on the Assembly website and copy-pasting in spreadsheets. Took me 20 minutes in conversation.
The usual workflow
Same pattern every time:
search_datasets -> get_dataset_info -> list_dataset_resources -> query_resource_data
Search with short, precise keywords (the API does AND matching, so "France elected officials open data" returns nothing - "Repertoire elus" gets what you need). Identify the right dataset by reading the metadata. List the files to find the right one. Query directly with filters and pagination. That's it.
For third-party APIs, same logic but three steps:
search_dataservices -> get_dataservice_info -> get_dataservice_openapi_spec
Discover the API, grab the spec, call it.
Things I learned the hard way
Keywords are an art form. "Assemblee nationale deputes" works. "French parliament members open data list" returns zero results. The API is strict about matching - get to the point. I burned a solid 15 minutes rephrasing queries before I understood that fewer words = more results.
Start small. page_size=20 to preview the data structure. Then scale up. I made the mistake of requesting 500 rows upfront on a dataset I didn't know yet - turned out to be zip codes, 36,000 rows, obviously didn't need any of that.
Big files aren't its strong suit. Beyond 1000 rows, you're better off with download_and_parse_resource instead of paginating 50 times. Hard cap at 50 MB per file.
Zero auth required. The public instance asks for nothing. No key, no account. Just make sure your MCP client handles HTTP streamable transport (Claude Code does natively, for others check the docs).
Metrics don't work in demo mode. Production environment only. Not a big deal, but good to know if you're wondering why get_metrics keeps throwing errors.
After a few days with it
For anyone working with French public data, this is a real shortcut. I can't see myself going back to the old way - browsing data.gouv.fr, clicking "download", opening LibreOffice Calc, squinting at columns. No thanks.
It's a standard MCP server, so it plugs into Claude Code, Claude Desktop, Cursor, VS Code, Gemini CLI. If you already have an MCP setup, 30 seconds to integrate. If you don't, solid excuse to get started.
Project link: datagouv/datagouv-mcp on GitHub
Public instance: https://mcp.data.gouv.fr/mcp
Previous post
← Secure remote access to your NAS with Tailscale