Clinical Trials Landscape

Map the trial landscape by mechanism, phase and sponsor.

Overview

Problem. What's in development for a disease; what's the competition?

Use when: Surveying active clinical trials
Avoid when: Browsing the website by hand

Learning goals

Figures

Trial Landscape Overview
Script Workflow
Mechanism Classification
Output Deliverables
Reading the Landscape
Interpretation Caveats

Tutorial

When to Use This Skill

Do NOT use for: - Detailed single-trial protocol analysis

  • Efficacy/safety comparisons (requires literature review skill)

Installation

Software Version License Commercial Use Installation
pandas ≥1.3 BSD-3 ✅ Permitted pip install pandas
requests ≥2.25 Apache-2.0 ✅ Permitted pip install requests
numpy ≥1.20 BSD-3 ✅ Permitted pip install numpy
plotnine ≥0.10 MIT ✅ Permitted pip install plotnine
plotnine-prism ≥0.3 MIT ✅ Permitted pip install plotnine-prism
seaborn ≥0.11 BSD-3 ✅ Permitted pip install seaborn
matplotlib ≥3.4 PSF ✅ Permitted pip install matplotlib
reportlab ≥3.6 BSD ✅ Permitted pip install reportlab
pyyaml ≥5.0 MIT ✅ Permitted pip install pyyaml
pip install pandas requests numpy plotnine plotnine-prism seaborn matplotlib reportlab pyyaml

System requirements: Internet connection for ClinicalTrials.gov API calls.


Inputs

Required:

  • Disease / condition terms — list of conditions to search ClinicalTrials.gov

Optional:

  • Disease config — pre-built config ID (e.g., "ibd") for mechanism taxonomy, or None for generic
  • Mechanism filter — e.g., "Anti-IL-23 (p19)", "Anti-TL1A", "JAK Inhibitor"
  • Sponsor filter — e.g., "Takeda", "AbbVie"
  • Status filter — Default: all active (Recruiting + Active not recruiting + Not yet recruiting)
  • Phase filter — Phase 1, 2, 3, 4

Outputs

Visualizations (PNG + SVG):

  • landscape_overview.png/.svg — 6-panel landscape figure (300 DPI)
  • Mechanism × Phase heatmap, top sponsors, phase stacked bars, mechanism counts, timeline, sponsor type
  • landscape_supplementary.png/.svg — 4-panel supplementary figure
  • Top 15 countries, study design by phase, enrollment distribution, phase transition funnel

Results (CSV):

  • trials_all.csv — All trials with 46 columns (mechanism, phase, sponsor, geography, study design, arms, endpoints, eligibility, regulatory)
  • trials_by_mechanism.csv — Mechanism × phase cross-tabulation
  • trials_by_sponsor.csv — Sponsor summary with trial counts
  • trials_filtered.csv — Filtered subset (if mechanism/sponsor filter applied)

Reports:

  • landscape_report.pdf — Publication-quality PDF with 24 sections: executive summary, mechanism deep-dives, geographic landscape, study design, phase transition funnel, endpoint comparison, combination therapies, biosimilar assessment, whitespace analysis, and more
  • landscape_report.md — Markdown version with identical 24-section structure

Analysis objects (Pickle):

  • analysis_object.pkl — Complete landscape for downstream use
  • Load with: import pickle; obj = pickle.load(open('analysis_object.pkl', 'rb'))
  • Contains: trials_df (46 columns), mechanism/phase/sponsor distributions, geographic stats, design stats, parameters

Clarification Questions

  1. Data Source (ASK THIS FIRST):

    • This skill queries the ClinicalTrials.gov API v2 directly (free, no key needed).
    • Use live API data? (recommended, ~30 seconds)
    • Or use cached demo data? Pre-loaded IBD landscape snapshot for quick demo
  2. Disease Area:

    • Which disease area to analyze?
    • a) IBD (Inflammatory Bowel Disease) — pre-built config with 14 mechanism classes
    • b) Oncology (generic intervention-type classification)
    • c) Autoimmune / Rheumatology (generic classification)
    • d) Other (specify disease and condition terms)
  3. Scope (if IBD selected): - Which conditions?

    • a) All IBD (Crohn's, UC, and IBD unspecified) — recommended
    • b) Crohn's Disease only
    • c) Ulcerative Colitis only
    • (If other disease) — Provide list of condition search terms
  4. Focus:

    • Any mechanism or sponsor to highlight?
    • (IBD) a) Anti-IL-23 — recommended for demo | b) Anti-TL1A | c) All mechanisms
    • (Other) Specify or skip highlighting

Standard Workflow

🚨 MANDATORY: USE SCRIPTS EXACTLY AS SHOWN - DO NOT WRITE INLINE CODE 🚨

Step 1 — Load config and query ClinicalTrials.gov:

import sys; sys.path.insert(0, ".")
from scripts.disease_config import load_disease_config, get_default_conditions
from scripts.query_clinicaltrials import query_trials

# Load disease config (use "ibd" for IBD, or None for generic)
config = load_disease_config("ibd")

# Get conditions from config or specify manually
conditions = get_default_conditions(config) or ["Crohn's Disease", "Ulcerative Colitis", "Inflammatory Bowel Disease"]

raw_trials = query_trials(
    conditions=conditions,
    statuses=["RECRUITING", "ACTIVE_NOT_RECRUITING", "ENROLLING_BY_INVITATION", "NOT_YET_RECRUITING"],
)

✅ VERIFICATION: "✓ Retrieved {N} trials from ClinicalTrials.gov"

Step 2 — Classify and compile:

from scripts.classify_mechanisms import classify_all
from scripts.compile_trials import compile_trials

classified = classify_all(raw_trials, config=config)
trials_df = compile_trials(classified, output_dir="landscape_results")

DO NOT write inline classification code. The script loads mechanism taxonomy from config.

✅ VERIFICATION: "✓ Trial data compiled successfully!"

Step 3 — Generate visualizations:

from scripts.generate_landscape_plots import generate_landscape_plots

generate_landscape_plots(
    trials_df,
    output_dir="landscape_results",
    highlight_mechanism="Anti-IL-23 (p19)",  # or None for no highlight
    highlight_sponsor=None,                   # or "Takeda" to highlight
    config=config,
)

DO NOT write inline plotting code. The script handles all 6 panels + PNG/SVG export.

✅ VERIFICATION: "✓ All landscape visualizations generated successfully!"

Step 4 — Export results:

from scripts.export_all import export_all

export_all(
    trials_df,
    parameters={
        "conditions": conditions,
        "statuses": ["RECRUITING", "ACTIVE_NOT_RECRUITING", "ENROLLING_BY_INVITATION", "NOT_YET_RECRUITING"],
        "highlight_mechanism": "Anti-IL-23 (p19)",
    },
    output_dir="landscape_results",
    config=config,
)

DO NOT write custom export code. Use export_all().

✅ VERIFICATION: "=== Export Complete ==="


⚠️ CRITICAL — DO NOT:


⚠️ IF SCRIPTS FAIL — Script Failure Hierarchy:

  1. Fix and Retry (90%) — Install missing package, re-run script
  2. Modify Script (5%) — Edit the script file itself, document changes
  3. Use as Reference (4%) — Read script, adapt approach, cite source
  4. Write from Scratch (1%) — Only if genuinely impossible, explain why

NEVER skip directly to writing inline code without trying the script first.


Common Issues

Error Cause Solution
ConnectionError / Timeout ClinicalTrials.gov unreachable Check internet connection; retry after 30 seconds
HTTP 429 Too Many Requests Rate limit exceeded Increase RATE_LIMIT_DELAY in query_clinicaltrials.py
ModuleNotFoundError: plotnine Missing visualization package pip install plotnine plotnine-prism
Empty results (0 trials) Overly restrictive filters Broaden condition/status/phase filters
Many "Unclassified" mechanisms No disease config or new drugs Use a disease config (e.g., "ibd") or update disease_configs/*.yaml
SVG export failed Missing SVG backend Normal — PNG is always generated as fallback
Sponsor name variants Same company, different names Update SPONSOR_NORMALIZATION in compile_trials.py
ModuleNotFoundError: yaml Missing pyyaml pip install pyyaml

Interpretation Guidelines


Suggested Next Steps

  1. Deep-dive a mechanism — Use literature-preclinical to review mechanism biology
  2. Track a sponsor's full pipeline — Use development-landscape for broader pipeline view
  3. Biomarker analysis — Use lasso-biomarker-panel to identify response biomarkers from trial data
  4. Export to presentation — Use landscape_report.md and plots for stakeholder review

Related Skills


References

Code preview

scripts/__init__.py

# clinicaltrials-landscape scripts package

scripts/classify_mechanisms.py

"""
Classify clinical trial interventions by mechanism of action.

Supports config-driven taxonomy (disease-specific patterns loaded from
disease_configs/*.yaml) and generic fallback classification by
intervention type when no config is available.
"""

import re

try:
    from disease_config import get_mechanism_patterns, get_drug_normalization
except ImportError:
    from scripts.disease_config import get_mechanism_patterns, get_drug_normalization


# ============================================================
# MODULE-LEVEL STATE (set by configure())
# ============================================================
# Taxonomy and drug normalization loaded from disease config.
# When empty, classify_mechanism() uses generic intervention-type fallback.

_mechanism_patterns = []
_drug_normalization = {}
_configured = False


def configure(config):
    """
    Load mechanism taxonomy and drug normalization from disease config.

    Parameters
    ----------
    config : dict or None
        Parsed disease config from load_disease_config().
        If None, clears taxonomy (uses generic fallback).
    """
    global _mechanism_patterns, _drug_normalization, _configured
    _mechanism_patterns = get_mechanism_patterns(config)
    _drug_normalization = get_drug_normalization(config)
    _configured = True


# Phase normalization mapping
PHASE_MAP = {
    "EARLY_PHASE1": "Phase 1",
    "PHASE1": "Phase 1",
    "PHASE2": "Phase 2",
    "PHASE3": "Phase 3",
    "PHASE4": "Phase 4",
    "NA": "Not Applicable",
}


def _normalize_drug_name(name):
    """Normalize drug name to canonical form using loaded config."""
    if not name:
        return name
    for pattern, canonical in _drug_normalization.items():
        if re.match(pattern, name.strip()):
            return canonical
    return name.strip()


def _is_biosimilar(interventions, brief_title="", official_title=""):
    """Detect if a trial involves a biosimilar product."""
    corpus = " ".join([
        *[intv.get("name", "") + " " + intv.get("description", "") for intv in interventions],
        brief_title, official_title
    ]).lower()
    biosimilar_patterns = [
        "biosimilar", "ct-p13", "sb2", "sb5", "abp 501", "gp2017",
        "remsima", "inflectra", "renflexis", "avsola", "ixifi",
        "hadlima", "hyrimoz", "cyltezo", "amjevita", "idacio",
        "similar biologic", "proposed biosimilar",
    ]
    return any(p in corpus for p in biosimilar_patterns)


def classify_mechanism(interventions, brief_title="", official_title=""):

scripts/compile_trials.py

"""
Compile, deduplicate, and structure clinical trial data (Step 2b).

Produces a clean DataFrame with normalized phases, mechanisms,
sponsor names, enrollment cleaning, and enriched fields from
the API (geography, study design, arms, endpoints, eligibility).
Deduplicates by NCT ID.
"""

import os
import re
import pandas as pd
import numpy as np


# Sponsor normalization: maps lowercase substrings to canonical names
SPONSOR_NORMALIZATION = {
    "takeda": "Takeda",
    "takeda pharmaceutical": "Takeda",
    "takeda development center": "Takeda",
    "millennium pharmaceuticals": "Takeda",
    "abbvie": "AbbVie",
    "johnson & johnson": "J&J / Janssen",
    "janssen": "J&J / Janssen",
    "janssen research": "J&J / Janssen",
    "janssen-cilag": "J&J / Janssen",
    "janssen biotech": "J&J / Janssen",
    "eli lilly": "Eli Lilly",
    "lilly": "Eli Lilly",
    "pfizer": "Pfizer",
    "bristol-myers squibb": "Bristol-Myers Squibb",
    "bristol myers squibb": "Bristol-Myers Squibb",
    "gilead": "Gilead Sciences",
    "gilead sciences": "Gilead Sciences",
    "roche": "Roche / Genentech",
    "genentech": "Roche / Genentech",
    "astrazeneca": "AstraZeneca",
    "novartis": "Novartis",
    "amgen": "Amgen",
    "merck sharp": "Merck / MSD",
    "merck & co": "Merck / MSD",
    "msd": "Merck / MSD",
    "sanofi": "Sanofi",
    "regeneron": "Regeneron",
    "boehringer ingelheim": "Boehringer Ingelheim",
    "arena pharmaceuticals": "Arena / Pfizer",
    "celgene": "Bristol-Myers Squibb",
    "galapagos": "Galapagos",
    "prometheus biosciences": "Merck / MSD",
    "teva": "Teva",
    "teva branded": "Teva",
    "teva pharmaceutical": "Teva",
}

# Phase numeric ordering
PHASE_NUMERIC = {
    "Phase 1": 1,
    "Phase 1/2": 1.5,
    "Phase 2": 2,
    "Phase 2/3": 2.5,
    "Phase 3": 3,
    "Phase 3/4": 3.5,
    "Phase 4": 4,
    "Not Applicable": 0,
}

# Region mapping for geographic analysis
COUNTRY_TO_REGION = {
    # North America
    "United States": "North America", "Canada": "North America", "Mexico": "North America",
    # Western Europe
    "United Kingdom": "Western Europe", "Germany": "Western Europe", "France": "Western Europe",
    "Italy": "Western Europe", "Spain": "Western Europe", "Netherlands": "Western Europe",
    "Belgium": "Western Europe", "Switzerland": "Western Europe", "Austria": "Western Europe",
    "Ireland": "Western Europe", "Sweden": "Western Europe", "Denmark": "Western Europe",
    "Norway": "Western Europe", "Finland": "Western Europe", "Portugal": "Western Europe",
    "Greece": "Western Europe", "Luxembourg": "Western Europe",
    # Eastern Europe
    "Poland": "Eastern Europe", "Czech Republic": "Eastern Europe", "Czechia": "Eastern Europe",
    "Hungary": "Eastern Europe", "Romania": "Eastern Europe", "Bulgaria": "Eastern Europe",

Companion files

TypePathBytes
Markdownreferences/api-parameters.md5,684
Markdownreferences/mechanisms.md5,047
Markdownreferences/output-schema.md7,461
Pythonscripts/__init__.py43
Pythonscripts/classify_mechanisms.py11,773
Pythonscripts/compile_trials.py22,885
Pythonscripts/disease_config.py4,799
Pythonscripts/export_all.py9,705
Pythonscripts/generate_landscape_plots.py26,198
Pythonscripts/generate_pdf_report.py63,886
Pythonscripts/generate_report.py83,351
Pythonscripts/query_clinicaltrials.py10,621
MarkdownSKILL.md10,441
JSONskill.meta.json2,618