Skip to content

Transparency

Methodology

This page is the authoritative statement of how the ME/CFS Atlas is built, what its versions are, what its limits are, what data it publishes, and how to cite a specific snapshot. It is the page a researcher, journalist, or developer should land on first.

Methodology policy version v0.1 · Last reviewed 2026-04-19

1. What the atlas is (and is not)

The atlas is a structured, machine-generated index of published ME/CFS research. Every per-study summary is drafted by a program from the paper’s title and abstract. Classifications — evidence level, research approach, research paradigm, PEM criterion, case-definition quality — are assigned by the same automated pipeline. No study has been reviewed by a clinician or peer reviewer.

It is not a systematic review, a clinical resource, a guideline, or medical advice. See /editorial-policy for the full editorial contract.

2. Pipeline overview

PubMed is queried daily. Matching records enter a nine-stage automated pipeline: fetch, normalise, deduplicate, lineage check, classify, summarise, draft, moderate (lint-gate enforce), and auto-publish. Retracted and withdrawn studies are flagged during or after publish via PubMed title-prefix detection and Crossref updated-by metadata, then surfaced with a red banner on their study page and indexed on the /retractions page.

Step-by-step narrative: /process.

3. Versions

Active at the time of this render:

  • Generator v1 — summariser prompt. gen.v1.2026-04-16
  • Scanner v1.4 — deterministic wording-risk scanner. pass-b.v1.4.2026-04-17
  • Classifier Claude Haiku 4.5 (claude-haiku-4-5-20251001) applied to title + abstract for evidence-level / publication-type / PEM / paradigm assignment.
  • Lint gate enforce. Only the literal env value off disables automatic blocking of CRITICAL-flagged drafts.
  • Editorial policy — version v0.1. See /editorial-policy.

4. PubMed query

The atlas seeds its corpus from a single, fixed, version- controlled PubMed search term. Anyone with access to PubMed can reproduce the seed:

("myalgic encephalomyelitis"[Title/Abstract] OR "chronic fatigue syndrome"[Title/Abstract] OR "ME/CFS"[Title/Abstract] OR "post-exertional malaise"[Title/Abstract] OR "systemic exertion intolerance"[Title/Abstract])

The query captures the five canonical title/abstract phrases. Papers that use only a non-listed synonym (e.g. “SEID” without any of the listed phrases) are not indexed; this is a known coverage gap.

5. Key limitations

  • Every per-study summary is machine-drafted from the abstract. It may miss material from the full text.
  • Classifier labels (evidence level, paradigm, approach) are LLM output. They may be wrong on specific studies. See the “Report a mistake” path on each study page.
  • Evidence level reflects study type, not methodological quality — a weak review still sits at E0.
  • The “biomedical” paradigm bucket is narrower than “all biologically relevant research”; many biologically relevant observational cohorts land in the “neutral” bucket because the paper itself takes no explicit stance.
  • Retraction detection covers the PubMed title prefix + Crossref updated-by field. Retractions not recorded in either source will be missed until they are.
  • The atlas does not track longitudinal corpus snapshots. A saved JSON export is the snapshot.

6. Open data

The full public atlas is available as structured JSON. The shape is documented inline in the response, and fields match this methodology page. Retracted studies are included with the isRetracted flag set and a short reason; it is up to the consumer to decide whether to filter them.

Public atlas total on this snapshot: 6,119 non-retracted public studies (6,129 including 10 flagged retracted).

7. How to cite

Cite the atlas with a specific snapshot date so the version stamp is reproducible:

ME/CFS Atlas. Generator v1 / Scanner v1.4 / policy v0.1. Accessed 2026-04-19. https://www.mecfsatlas.com/

For a specific study, cite the underlying paper (the DOI / PubMed link on that study page). The atlas is a reading aid that sits on top of the published record, not a substitute for it.

8. Methodology changelog

Every substantive methodology change is recorded below. Version strings are the canonical identifier in each subsystem’s source file and in exported data.

  • 2026-04-18

    Retraction & Lineage v1.0 (Phase 1A–1D)

    Added retraction detection via PubMed title prefix + Crossref updated-by. Populated RETRACTION_OF lineage edges between atlas studies. Public /retractions index. Retraction banner + noindex on retracted study pages. Self-maintaining cron via /api/ingest/lineage post-publish scan. Two new invariant checks (retraction log provenance + retraction-edge consistency).

  • 2026-04-17

    Scanner v1.4 (pass-b.v1.4.2026-04-17)

    Added noun-form suppression for 'causes' and 'triggers' patterns. Reduces CAUS_LEAK / efficacy-leak false positives without losing true-positive coverage.

  • 2026-04-16

    Generator v1 (gen.v1.2026-04-16)

    Tighter summariser prompt. Ten non-negotiable discipline rules: prefer association language over mechanism claims; hedge causal language; explicitly bridge non-ME/CFS populations back to ME/CFS with a qualifier; no invented numbers; no stance labels on individual studies.

  • 2026-04-16

    Lint gate: enforce

    Auto-moderation gate now blocks any draft with a CRITICAL wording flag from auto-publish. Env escape hatch: set LINT_GATE_MODE=off.

  • 2026-04-15

    Editorial policy v0.1

    First versioned editorial policy. Anonymous operator language. Retired reviewedStatus enum from all public trust surfaces. 'What is human-reviewed' section replaced with 'What is hand-corrected (and what is not)'.

9. Reporting a methodology issue

If a methodology claim on this page, a published version string, an export field semantic, or an individual study’s classification looks wrong, please contact the operator. Confirmed issues land in this changelog with a dated entry.