Fransys

Tech blog — Architecture, Cloud & DevOps

BlogServicesContactAbout

Follow me

githubGitHublinkedinLinkedinmailMail

© 2026 Fransys • Fransys

Fransys

Categories

  • All posts
  • Tags
  • productivity10
  • nas10
  • ai8
  • security7
  • self-hosting7
  • linux6
  • claude-code6
  • neovim5
  • docker5
  • editor4
  • networking4
  • mcp3
  • vpn3
  • lua2
  • terminal2
claude-codeaimcpscienceproductivity

Building a Claude Code skill to fact-check scientific claims

Published on
March 5, 2026·11 min read
Avatar François GUERLEZFrançois GUERLEZ
Disclaimer

This skill relies on an LLM (Claude) to orchestrate searches and synthesize results. Despite cross-referencing multiple sources (PubMed, Semantic Scholar, web) and built-in verification steps, an LLM can still produce factual errors, shortcuts, or omissions. The generated reports are not a substitute for professional medical advice or an expert-led literature review. Use them as a starting point, not as absolute truth.

Health claims are everywhere and nobody checks them

"Spirulina is a superfood." "Coffee causes cancer." "Intermittent fasting cures everything."

We all have that friend who drops these at dinner. And honestly, fact-checking any of them is real work. You need PubMed, meta-analyses, regulatory positions from the FDA or EFSA, checking whether studies are industry-funded... That's 2 hours per claim. At least.

So I automated it. Type /science-check spirulina is a superfood in Claude Code, wait about 2 minutes, get a full report. Verdicts, evidence levels, risks, verified sources. The works.

I burned a weekend iterating on this (and an entire evening on an allowed-tools bug I'll get to later). Here's how I built it and what I learned.

What it does

A slash command /science-check that:

  1. Launches 3 research agents in parallel (meta-analyses, risks, critical analysis)
  2. Searches PubMed, Semantic Scholar, Google Scholar via MCP servers
  3. Validates key studies (retractions, funding, sample sizes)
  4. Produces a structured report with per-claim verdicts

Here's the full terminal output from /science-check is spirulina a "superfood"?:

- [x] Claim recovered and translated to English
- [x] ORIENTATION phase: 3 parallel searches launched
- [x] DEEPENING phase: reference sources fetched
- [x] VALIDATION phase: key studies verified
- [x] SELF-CHECK phase: quality checklist validated
- [x] SYNTHESIS phase: report generated

══════════════════════════════════════════════════════════════════════
 SCIENCE CHECK
══════════════════════════════════════════════════════════════════════

Claim: Spirulina is a "superfood"
Date: March 5, 2026
Overall verdict: PARTIALLY CONFIRMED

── Scientific consensus ───────────────────────────────────────────

Spirulina has documented, real benefits — primarily on lipid profiles
and blood pressure — supported by multiple meta-analyses of RCTs.
However, the term "superfood" has NO official scientific definition
— it's a marketing term. Confirmed benefits are modest and targeted,
far from the miracle food image pushed by marketing.

── Per-claim breakdown ────────────────────────────────────────────

Claim                             │ Verdict              │ Evidence level
──────────────────────────────────┼──────────────────────┼──────────────────────────────────
Lowers cholesterol/LDL/TG        │ CONFIRMED            │ Meta-analysis 20 RCTs (n=1076)
Lowers blood pressure             │ CONFIRMED            │ Meta-analysis of RCTs
Helps with weight loss            │ Partially confirmed  │ Meta-analysis 17 RCTs: −1.07 kg
Glycemic control (T2 diabetes)    │ Partially confirmed  │ Meta-analysis 8 studies
Reduces inflammation (CRP, IL-6)  │ Partially confirmed  │ Meta-analyses: modest effect
Rich in protein ("60%")           │ OVERSTATED           │ 3 g serving = 2 g protein
Vitamin B12 source                │ DEBUNKED             │ Inactive pseudo-B12
Liver detox                       │ UNPROVEN             │ No proven mechanism
General immune boost              │ UNPROVEN             │ Zero quality RCTs
Anti-cancer                       │ PREMATURE            │ In vitro only
"Superfood" (term)                │ NOT APPLICABLE       │ Purely a marketing term

── Risks and side effects ─────────────────────────────────────────

Risk                                  │ Severity         │ Population
──────────────────────────────────────┼──────────────────┼───────────────────────────────
Heavy metal contamination             │ Moderate-high    │ All (94% products contaminated)
Microcystins (cyanotoxins)            │ High             │ Chronic use ≥ 4 g/day
Autoimmune diseases                   │ Moderate-high    │ Lupus, MS, vitiligo, RA
Anticoagulant interactions            │ Moderate         │ On warfarin/aspirin
Immunosuppressant interactions        │ Moderate         │ On azathioprine/cyclosporine
Phenylketonuria (PKU)                 │ Very high        │ Absolute contraindication
Allergic reactions                    │ Variable         │ Iodine/seafood allergy
Pseudo-B12 masking deficiency         │ Low              │ Vegans

── Official positions ─────────────────────────────────────────────

ANSES (France) : Safe at moderate doses. CI: PKU, allergies.
                 Heavy metal contamination alert. (2017)
FDA (USA)      : GRAS status. Approved as food coloring.
                 Minimal supplement regulation.
EFSA (EU)      : REJECTED diabetes health claims (2013).
                 No full assessment completed.
WHO            : No specific position.

── Red flags ──────────────────────────────────────────────────────

⚠ "Superfood" = no official scientific definition
⚠ Claims-to-evidence ratio ~10:1 (50+ claims, <5 proven)
⚠ $630M → $1.4B market (likely publication bias)
⚠ 1 retracted study: "Spirulina Unleashed" (MDPI, 2024)
⚠ Weasel language: 36x "may/might/suggest", 0x "proven"
⚠ 94% of samples positive for microcystins

── Sources (14 consulted) ─────────────────────────────────────────

 [1] Spirulina & lipid profile — Meta-analysis 20 RCTs (2023)
 [2] Spirulina & cardiometabolic health — Meta-analysis (2025)
 [3] Spirulina & blood pressure — Meta-analysis RCTs (2021)
 [4] Spirulina & body composition — Meta-analysis 17 RCTs (2025)
 [5] Spirulina & type 2 diabetes — Meta-analysis (2021)
 [6] Spirulina & CRP — GRADE meta-analysis (2025)
 [7] Spirulina & inflammation — Meta-analysis RCTs (2025)
 [8] Examine.com — Evidence-based review (2025)
 [9] ANSES — Regulatory position (2017)
[10] Rubio et al. — Heavy metals (2021)
[11] Autoimmune reactions — Case reports (2025)
[12] Umbrella review — Meta-analyses (2026)
[13] EFSA — Rejected claims (2013)
[14] "Spirulina Unleashed" — Retracted (2024/2025)

══════════════════════════════════════════════════════════════════════

Bottom line: real but modest benefits (cholesterol, blood pressure)
confirmed by 15+ meta-analyses. "Superfood" = pure marketing hype.
Contamination risks are non-trivial.

Not bad for 2 minutes of waiting.

Setting up the MCP servers

The skill uses two MCP servers to hit scientific databases directly. Without them it falls back to WebSearch - still works, just less precise.

PubMed MCP (mcp-simple-pubmed)

Direct access to NCBI's Entrez API. Free, just needs an email.

# Test it works
uvx mcp-simple-pubmed --help

Paper Search MCP (paper-search-mcp)

Multi-source search: PubMed, arXiv, bioRxiv, medRxiv, Semantic Scholar, Google Scholar. The Swiss army knife for academic search.

# Test it
uvx --from paper-search-mcp python -m paper_search_mcp.server --help

The ~/.claude/mcp.json

Drop both servers into your global MCP config:

See the full mcp.json
{
  "mcpServers": {
    "pubmed": {
      "command": "uvx",
      "args": ["mcp-simple-pubmed"],
      "env": {
        "PUBMED_EMAIL": "your@email.com"
      }
    },
    "paper-search": {
      "command": "uvx",
      "args": ["--from", "paper-search-mcp", "python", "-m", "paper_search_mcp.server"],
      "env": {
        "SEMANTIC_SCHOLAR_API_KEY": ""
      }
    }
  }
}

Few things to know:

  • PUBMED_EMAIL: NCBI's Entrez API needs an email. No API key, just an email to identify requests. Use yours.
  • SEMANTIC_SCHOLAR_API_KEY: optional. Works without one but with lower rate limits. Grab a free key at semanticscholar.org/product/api.
  • Both use uvx (the uv runner). Don't have uv? curl -LsSf https://astral.sh/uv/install.sh | sh.

You need to restart Claude Code after editing mcp.json. MCP servers load at startup, not hot.

4 files, not one big blob

The skill lives in ~/.claude/skills/science-check/ (browse on GitHub):

science-check/
├── SKILL.md              # Main instructions (105 lines)
├── REPORT_TEMPLATE.md    # Report template
├── TRUSTED_SOURCES.md    # Trusted sources by tier
└── EVIDENCE_HIERARCHY.md # Evidence levels grid

My first version? One 250-line file. Claude kept losing track, mixing up workflow phases, forgetting sections of the report. Frustrating.

Here's the thing: Claude Code loads the entire SKILL.md when the skill triggers. Every token competes with conversation history. Moving references to separate files is "progressive disclosure" - Claude only loads them when it actually needs them. That one change fixed most of my issues.

The SKILL.md: where it all happens

Here's the full file. The YAML frontmatter tells Claude when to trigger the skill, and the markdown body defines how to execute it:

name: science-check
description: 'Verifie une affirmation scientifique ou de sante en croisant
  PubMed, Semantic Scholar, et le web...'
user-invocable: true
argument-hint: '[affirmation a verifier]'
allowed-tools:
  - Agent
  - Bash
  - Read
  - WebSearch
  - WebFetch
  - AskUserQuestion
  - Write
  - mcp__pubmed__search_pubmed
  - mcp__pubmed__get_paper_fulltext
  - mcp__paper-search__search_pubmed
  - mcp__paper-search__search_arxiv
  - mcp__paper-search__search_google_scholar
  - mcp__paper-search__search_biorxiv
  - mcp__paper-search__search_medrxiv
  - mcp__paper-search__read_pubmed_paper
  - mcp__paper-search__read_biorxiv_paper
  - mcp__paper-search__read_medrxiv_paper

The full 6-phase workflow and all rules are in the SKILL.md on GitHub.

What I learned about this frontmatter (the hard way)

Make the description "pushy". Claude tends to under-trigger skills. You ask "does magnesium help with sleep?" and Claude just answers instead of using the skill. By explicitly listing "nutrition, dietary supplements, medication, therapy" in the description, you nudge the triggering. Took me 4 or 5 rewrites before it triggered consistently.

allowed-tools needs the full MCP names. This is where I lost an evening. First test: I run /science-check, agents launch, everything looks fine... except they never use PubMed. No error in the logs. Just... silence. Turns out the format is mcp__<server_name>__<tool_name> and I hadn't declared the MCP tools in allowed-tools. Claude simply didn't have permission to use them inside the skill context. No error, no warning. Nothing. Maddening.

Agent in the list. That's what enables 3 parallel searches instead of sequential. ~1 minute instead of ~3. Worth it.

The 6-phase workflow

Phase 1: Translate the claim

Nothing fancy. Scientific databases are in English. "Spirulina is good for health" becomes "spirulina health benefits evidence".

Phase 2: Orientation - 3 parallel agents

Three sub-agents are launched simultaneously, each with a different research angle:

  • Agent A searches for meta-analyses and systematic reviews (highest evidence level)
  • Agent B searches for risks and side effects (the counterpart often missing from benefit-oriented searches)
  • Agent C searches for critical analyses and debunking (confirmation bias reduction)

Each agent has access to WebSearch and the PubMed/Paper Search MCPs, provided they are declared in allowed-tools.

Phase 3: Deep dive

Claude fetches the best sources found, following a reliability ranking defined in TRUSTED_SOURCES.md:

  • Tier 1: Cochrane Library, PubMed, Examine.com
  • Tier 2: EFSA, FDA, ANSES, WHO
  • Tier 3: Harvard Health, Mayo Clinic, McGill OSS, NHS
  • Tier 4: Retraction Watch, Semantic Scholar (citation counts)

The paper-search MCP can pull citation counts directly from Semantic Scholar, providing a signal on a study's real-world impact.

Phase 4: Cross-validation

For each key study, Claude checks:

  • Sample size (n=?)
  • Study type (RCT, observational, animal, in vitro)
  • Funding source (industry = potential bias)
  • Replication of results
  • Retraction status via Retraction Watch

Phase 5: Self-verification

A quality checklist is evaluated before writing the report:

  • At least 3 independent sources consulted
  • At least 1 meta-analysis or systematic review found (otherwise flagged in the report)
  • No conclusion based on a single study
  • Risks and side effects identified

If any criterion fails, Claude relaunches targeted searches before proceeding to synthesis. Without this phase, I found Claude would sometimes conclude "CONFIRMED" from a single RCT with 30 participants.

Phase 6: Synthesis with ultrathink

The word ultrathink in the SKILL.md activates Claude's extended thinking. Synthesis requires weighing contradictory evidence (positive vs negative meta-analyses, divergent opinions between EFSA and FDA, etc.) and producing a weighted overall verdict. The report is generated following the template in REPORT_TEMPLATE.md, using the evidence grid from EVIDENCE_HIERARCHY.md.

The reference files

EVIDENCE_HIERARCHY.md

Evidence grid used to assign verdicts:

See the full grid on GitHub.

TRUSTED_SOURCES.md

Trusted sources ranked by consultation priority:

See the full ranking on GitHub.

REPORT_TEMPLATE.md

The template Claude follows to generate the final report:

See the full template on GitHub.

Try it

Restart Claude Code, then:

/science-check intermittent fasting helps with weight loss

Claude shows a progress checklist, launches 3 background agents (you see notifications as they complete), runs cross-validations, and produces the full report. About 1-2 minutes depending on topic complexity.

What I learned

The description is 80% of the work. I spent more time tweaking those 3 lines of YAML than writing the entire workflow. If Claude doesn't trigger the skill, nothing else matters.

Sub-agents are a game changer. Sequential searches -> 3 parallel agents = 3x faster, better quality because each agent has its dedicated angle. Catch: without Agent in allowed-tools, it silently falls back to sequential. No warning.

Self-verification isn't optional. I almost removed it to save tokens. Bad idea. It's the phase that stops Claude from concluding "CONFIRMED" based on an in vitro study with 12 mice.

The silent MCP debugging trap. When an MCP tool is missing from allowed-tools, there's no error. Claude just... doesn't use it. Burned an evening on this. Check your declared tools.

Limits

This doesn't replace a doctor. The report is only as good as what's available online, and Claude can misread a study. But for a first filter - "is this worth bringing up with my doctor?" - it's become a reflex.

Next up: formal evals with Anthropic's skill-creator framework, a cache to avoid hammering PubMed with duplicate queries, and maybe a web version via the Agent SDK for non-devs.

If you work in a field where you need to verify claims - health, nutrition, but also finance, law, tech - the pattern transfers: parallel multi-angle search, cross-validation, self-verification, structured report. The skill changes, the skeleton stays.

Previous post

← Sovereign VPN: Setting up your own server with Headscale in Switzerland

Next post

Building a Claude Code Skill to Fact-Check the News→
← Back to blog

Table of Contents

  • Health claims are everywhere and nobody checks them
  • What it does
  • Setting up the MCP servers
  • PubMed MCP (mcp-simple-pubmed)
  • Paper Search MCP (paper-search-mcp)
  • The ~/.claude/mcp.json
  • 4 files, not one big blob
  • The SKILL.md: where it all happens
  • What I learned about this frontmatter (the hard way)
  • The 6-phase workflow
  • Phase 1: Translate the claim
  • Phase 2: Orientation - 3 parallel agents
  • Phase 3: Deep dive
  • Phase 4: Cross-validation
  • Phase 5: Self-verification
  • Phase 6: Synthesis with ultrathink
  • The reference files
  • EVIDENCE_HIERARCHY.md
  • TRUSTED_SOURCES.md
  • REPORT_TEMPLATE.md
  • Try it
  • What I learned
  • Limits