⌂ / Overview / Databases, Literature & Drug Discovery / Clinical Trials Landscape

View companion source

Clinical Trials Landscape

Map the trial landscape by mechanism, phase and sponsor.

Overview

Problem. What's in development for a disease; what's the competition?

Use when: Surveying active clinical trials

Avoid when: Browsing the website by hand

Learning goals

Place findings in a clinical, industry context
Programmatic queries: reproducible and updatable

Figures

Tutorial

When to Use This Skill

Map competitive landscape across therapeutic mechanisms for any disease
Track specific mechanism classes (e.g., anti-IL23, anti-TL1A, JAK inhibitors)
Identify sponsors and their pipeline positions by phase
Phase distribution analysis for business development diligence
Pipeline monitoring for a specific sponsor's disease portfolio
Pre-built disease configs available (IBD with 14 mechanism classes); generic mode for any other disease

Do NOT use for:

Detailed single-trial protocol analysis
Efficacy/safety comparisons (requires literature review skill)

Installation

Software	Version	License	Commercial Use	Installation
pandas	≥1.3	BSD-3	✅ Permitted	`pip install pandas`
requests	≥2.25	Apache-2.0	✅ Permitted	`pip install requests`
numpy	≥1.20	BSD-3	✅ Permitted	`pip install numpy`
plotnine	≥0.10	MIT	✅ Permitted	`pip install plotnine`
plotnine-prism	≥0.3	MIT	✅ Permitted	`pip install plotnine-prism`
seaborn	≥0.11	BSD-3	✅ Permitted	`pip install seaborn`
matplotlib	≥3.4	PSF	✅ Permitted	`pip install matplotlib`
reportlab	≥3.6	BSD	✅ Permitted	`pip install reportlab`
pyyaml	≥5.0	MIT	✅ Permitted	`pip install pyyaml`

pip install pandas requests numpy plotnine plotnine-prism seaborn matplotlib reportlab pyyaml

System requirements: Internet connection for ClinicalTrials.gov API calls.

Inputs

Required:

Disease / condition terms — list of conditions to search ClinicalTrials.gov

Optional:

Disease config — pre-built config ID (e.g., "ibd") for mechanism taxonomy, or None for generic
Mechanism filter — e.g., "Anti-IL-23 (p19)", "Anti-TL1A", "JAK Inhibitor"
Sponsor filter — e.g., "Takeda", "AbbVie"
Status filter — Default: all active (Recruiting + Active not recruiting + Not yet recruiting)
Phase filter — Phase 1, 2, 3, 4

Outputs

Visualizations (PNG + SVG):

landscape_overview.png/.svg — 6-panel landscape figure (300 DPI)
Mechanism × Phase heatmap, top sponsors, phase stacked bars, mechanism counts, timeline, sponsor type
landscape_supplementary.png/.svg — 4-panel supplementary figure
Top 15 countries, study design by phase, enrollment distribution, phase transition funnel

Results (CSV):

trials_all.csv — All trials with 46 columns (mechanism, phase, sponsor, geography, study design, arms, endpoints, eligibility, regulatory)
trials_by_mechanism.csv — Mechanism × phase cross-tabulation
trials_by_sponsor.csv — Sponsor summary with trial counts
trials_filtered.csv — Filtered subset (if mechanism/sponsor filter applied)

Reports:

landscape_report.pdf — Publication-quality PDF with 24 sections: executive summary, mechanism deep-dives, geographic landscape, study design, phase transition funnel, endpoint comparison, combination therapies, biosimilar assessment, whitespace analysis, and more
landscape_report.md — Markdown version with identical 24-section structure

Analysis objects (Pickle):

analysis_object.pkl — Complete landscape for downstream use
Load with: import pickle; obj = pickle.load(open('analysis_object.pkl', 'rb'))
Contains: trials_df (46 columns), mechanism/phase/sponsor distributions, geographic stats, design stats, parameters

Clarification Questions

Data Source (ASK THIS FIRST):
- This skill queries the ClinicalTrials.gov API v2 directly (free, no key needed).
- Use live API data? (recommended, ~30 seconds)
- Or use cached demo data? Pre-loaded IBD landscape snapshot for quick demo
Disease Area:
- Which disease area to analyze?
- a) IBD (Inflammatory Bowel Disease) — pre-built config with 14 mechanism classes
- b) Oncology (generic intervention-type classification)
- c) Autoimmune / Rheumatology (generic classification)
- d) Other (specify disease and condition terms)
Scope (if IBD selected): - Which conditions?
- a) All IBD (Crohn's, UC, and IBD unspecified) — recommended
- b) Crohn's Disease only
- c) Ulcerative Colitis only
- (If other disease) — Provide list of condition search terms
Focus:
- Any mechanism or sponsor to highlight?
- (IBD) a) Anti-IL-23 — recommended for demo | b) Anti-TL1A | c) All mechanisms
- (Other) Specify or skip highlighting

Standard Workflow

🚨 MANDATORY: USE SCRIPTS EXACTLY AS SHOWN - DO NOT WRITE INLINE CODE 🚨

Step 1 — Load config and query ClinicalTrials.gov:

import sys; sys.path.insert(0, ".")
from scripts.disease_config import load_disease_config, get_default_conditions
from scripts.query_clinicaltrials import query_trials

# Load disease config (use "ibd" for IBD, or None for generic)
config = load_disease_config("ibd")

# Get conditions from config or specify manually
conditions = get_default_conditions(config) or ["Crohn's Disease", "Ulcerative Colitis", "Inflammatory Bowel Disease"]

raw_trials = query_trials(
    conditions=conditions,
    statuses=["RECRUITING", "ACTIVE_NOT_RECRUITING", "ENROLLING_BY_INVITATION", "NOT_YET_RECRUITING"],
)

✅ VERIFICATION: "✓ Retrieved {N} trials from ClinicalTrials.gov"

Step 2 — Classify and compile:

from scripts.classify_mechanisms import classify_all
from scripts.compile_trials import compile_trials

classified = classify_all(raw_trials, config=config)
trials_df = compile_trials(classified, output_dir="landscape_results")

DO NOT write inline classification code. The script loads mechanism taxonomy from config.

✅ VERIFICATION: "✓ Trial data compiled successfully!"

Step 3 — Generate visualizations:

from scripts.generate_landscape_plots import generate_landscape_plots

generate_landscape_plots(
    trials_df,
    output_dir="landscape_results",
    highlight_mechanism="Anti-IL-23 (p19)",  # or None for no highlight
    highlight_sponsor=None,                   # or "Takeda" to highlight
    config=config,
)

DO NOT write inline plotting code. The script handles all 6 panels + PNG/SVG export.

✅ VERIFICATION: "✓ All landscape visualizations generated successfully!"

Step 4 — Export results:

from scripts.export_all import export_all

export_all(
    trials_df,
    parameters={
        "conditions": conditions,
        "statuses": ["RECRUITING", "ACTIVE_NOT_RECRUITING", "ENROLLING_BY_INVITATION", "NOT_YET_RECRUITING"],
        "highlight_mechanism": "Anti-IL-23 (p19)",
    },
    output_dir="landscape_results",
    config=config,
)

DO NOT write custom export code. Use export_all().

✅ VERIFICATION: "=== Export Complete ==="

⚠️ CRITICAL — DO NOT:

❌ Write inline classification code → STOP: Use classify_all() from scripts
❌ Write inline plotting code (ggplot, plt, sns) → STOP: Use generate_landscape_plots()
❌ Write custom export code → STOP: Use export_all()
❌ Try to scrape ClinicalTrials.gov HTML → Use the API via query_trials()

⚠️ IF SCRIPTS FAIL — Script Failure Hierarchy:

Fix and Retry (90%) — Install missing package, re-run script
Modify Script (5%) — Edit the script file itself, document changes
Use as Reference (4%) — Read script, adapt approach, cite source
Write from Scratch (1%) — Only if genuinely impossible, explain why

NEVER skip directly to writing inline code without trying the script first.

Common Issues

Error	Cause	Solution
ConnectionError / Timeout	ClinicalTrials.gov unreachable	Check internet connection; retry after 30 seconds
HTTP 429 Too Many Requests	Rate limit exceeded	Increase `RATE_LIMIT_DELAY` in query_clinicaltrials.py
ModuleNotFoundError: plotnine	Missing visualization package	`pip install plotnine plotnine-prism`
Empty results (0 trials)	Overly restrictive filters	Broaden condition/status/phase filters
Many "Unclassified" mechanisms	No disease config or new drugs	Use a disease config (e.g., `"ibd"`) or update `disease_configs/*.yaml`
SVG export failed	Missing SVG backend	Normal — PNG is always generated as fallback
Sponsor name variants	Same company, different names	Update `SPONSOR_NORMALIZATION` in compile_trials.py
ModuleNotFoundError: yaml	Missing pyyaml	`pip install pyyaml`

Interpretation Guidelines

Mechanism classification is based on intervention names and descriptions — some trials with vague descriptions (e.g., "Study Drug") will be classified as "Other Biologic" or "Unclassified"
Phase 2/3 indicates a combined Phase 2/3 study design
Sponsor normalization groups subsidiaries under parent company (e.g., Millennium → Takeda)
Industry vs Academic based on ClinicalTrials.gov leadSponsor.class field
The landscape reflects registered trials, not all pipeline programs (pre-IND programs won't appear)
Disease configs provide curated mechanism taxonomies; without config, classification uses generic intervention types

Suggested Next Steps

Deep-dive a mechanism — Use literature-preclinical to review mechanism biology
Track a sponsor's full pipeline — Use development-landscape for broader pipeline view
Biomarker analysis — Use lasso-biomarker-panel to identify response biomarkers from trial data
Export to presentation — Use landscape_report.md and plots for stakeholder review

Related Skills

development-landscape — Broader, multi-source pipeline landscape for any target
literature-preclinical — Literature review for mechanism biology
lasso-biomarker-panel — Biomarker discovery from expression data

References

ClinicalTrials.gov API v2: https://clinicaltrials.gov/data-api/api
ClinicalTrials.gov: https://clinicaltrials.gov/
See references/api-parameters.md for full API parameter reference
See references/mechanisms.md for mechanism taxonomy details
See references/output-schema.md for output column definitions

Code preview

scripts/init.py

# clinicaltrials-landscape scripts package

scripts/classify_mechanisms.py

"""
Classify clinical trial interventions by mechanism of action.

Supports config-driven taxonomy (disease-specific patterns loaded from
disease_configs/*.yaml) and generic fallback classification by
intervention type when no config is available.
"""

import re

try:
    from disease_config import get_mechanism_patterns, get_drug_normalization
except ImportError:
    from scripts.disease_config import get_mechanism_patterns, get_drug_normalization


# ============================================================
# MODULE-LEVEL STATE (set by configure())
# ============================================================
# Taxonomy and drug normalization loaded from disease config.
# When empty, classify_mechanism() uses generic intervention-type fallback.

_mechanism_patterns = []
_drug_normalization = {}
_configured = False


def configure(config):
    """
    Load mechanism taxonomy and drug normalization from disease config.

    Parameters
    ----------
    config : dict or None
        Parsed disease config from load_disease_config().
        If None, clears taxonomy (uses generic fallback).
    """
    global _mechanism_patterns, _drug_normalization, _configured
    _mechanism_patterns = get_mechanism_patterns(config)
    _drug_normalization = get_drug_normalization(config)
    _configured = True


# Phase normalization mapping
PHASE_MAP = {
    "EARLY_PHASE1": "Phase 1",
    "PHASE1": "Phase 1",
    "PHASE2": "Phase 2",
    "PHASE3": "Phase 3",
    "PHASE4": "Phase 4",
    "NA": "Not Applicable",
}


def _normalize_drug_name(name):
    """Normalize drug name to canonical form using loaded config."""
    if not name:
        return name
    for pattern, canonical in _drug_normalization.items():
        if re.match(pattern, name.strip()):
            return canonical
    return name.strip()


def _is_biosimilar(interventions, brief_title="", official_title=""):
    """Detect if a trial involves a biosimilar product."""
    corpus = " ".join([
        *[intv.get("name", "") + " " + intv.get("description", "") for intv in interventions],
        brief_title, official_title
    ]).lower()
    biosimilar_patterns = [
        "biosimilar", "ct-p13", "sb2", "sb5", "abp 501", "gp2017",
        "remsima", "inflectra", "renflexis", "avsola", "ixifi",
        "hadlima", "hyrimoz", "cyltezo", "amjevita", "idacio",
        "similar biologic", "proposed biosimilar",
    ]
    return any(p in corpus for p in biosimilar_patterns)


def classify_mechanism(interventions, brief_title="", official_title=""):

scripts/compile_trials.py

"""
Compile, deduplicate, and structure clinical trial data (Step 2b).

Produces a clean DataFrame with normalized phases, mechanisms,
sponsor names, enrollment cleaning, and enriched fields from
the API (geography, study design, arms, endpoints, eligibility).
Deduplicates by NCT ID.
"""

import os
import re
import pandas as pd
import numpy as np


# Sponsor normalization: maps lowercase substrings to canonical names
SPONSOR_NORMALIZATION = {
    "takeda": "Takeda",
    "takeda pharmaceutical": "Takeda",
    "takeda development center": "Takeda",
    "millennium pharmaceuticals": "Takeda",
    "abbvie": "AbbVie",
    "johnson & johnson": "J&J / Janssen",
    "janssen": "J&J / Janssen",
    "janssen research": "J&J / Janssen",
    "janssen-cilag": "J&J / Janssen",
    "janssen biotech": "J&J / Janssen",
    "eli lilly": "Eli Lilly",
    "lilly": "Eli Lilly",
    "pfizer": "Pfizer",
    "bristol-myers squibb": "Bristol-Myers Squibb",
    "bristol myers squibb": "Bristol-Myers Squibb",
    "gilead": "Gilead Sciences",
    "gilead sciences": "Gilead Sciences",
    "roche": "Roche / Genentech",
    "genentech": "Roche / Genentech",
    "astrazeneca": "AstraZeneca",
    "novartis": "Novartis",
    "amgen": "Amgen",
    "merck sharp": "Merck / MSD",
    "merck & co": "Merck / MSD",
    "msd": "Merck / MSD",
    "sanofi": "Sanofi",
    "regeneron": "Regeneron",
    "boehringer ingelheim": "Boehringer Ingelheim",
    "arena pharmaceuticals": "Arena / Pfizer",
    "celgene": "Bristol-Myers Squibb",
    "galapagos": "Galapagos",
    "prometheus biosciences": "Merck / MSD",
    "teva": "Teva",
    "teva branded": "Teva",
    "teva pharmaceutical": "Teva",
}

# Phase numeric ordering
PHASE_NUMERIC = {
    "Phase 1": 1,
    "Phase 1/2": 1.5,
    "Phase 2": 2,
    "Phase 2/3": 2.5,
    "Phase 3": 3,
    "Phase 3/4": 3.5,
    "Phase 4": 4,
    "Not Applicable": 0,
}

# Region mapping for geographic analysis
COUNTRY_TO_REGION = {
    # North America
    "United States": "North America", "Canada": "North America", "Mexico": "North America",
    # Western Europe
    "United Kingdom": "Western Europe", "Germany": "Western Europe", "France": "Western Europe",
    "Italy": "Western Europe", "Spain": "Western Europe", "Netherlands": "Western Europe",
    "Belgium": "Western Europe", "Switzerland": "Western Europe", "Austria": "Western Europe",
    "Ireland": "Western Europe", "Sweden": "Western Europe", "Denmark": "Western Europe",
    "Norway": "Western Europe", "Finland": "Western Europe", "Portugal": "Western Europe",
    "Greece": "Western Europe", "Luxembourg": "Western Europe",
    # Eastern Europe
    "Poland": "Eastern Europe", "Czech Republic": "Eastern Europe", "Czechia": "Eastern Europe",
    "Hungary": "Eastern Europe", "Romania": "Eastern Europe", "Bulgaria": "Eastern Europe",

Companion files

Type	Path	Bytes
Markdown	references/api-parameters.md	5,684
Markdown	references/mechanisms.md	5,047
Markdown	references/output-schema.md	7,461
Python	scripts/__init__.py	43
Python	scripts/classify_mechanisms.py	11,773
Python	scripts/compile_trials.py	22,885
Python	scripts/disease_config.py	4,799
Python	scripts/export_all.py	9,705
Python	scripts/generate_landscape_plots.py	26,198
Python	scripts/generate_pdf_report.py	63,886
Python	scripts/generate_report.py	83,351
Python	scripts/query_clinicaltrials.py	10,621
Markdown	SKILL.md	10,441
JSON	skill.meta.json	2,618

Clinical Trials Landscape

Overview

Learning goals

Figures

Tutorial

When to Use This Skill

Installation

Inputs

Outputs

Clarification Questions

Standard Workflow

Step 1 — Load config and query ClinicalTrials.gov:

Step 2 — Classify and compile:

Step 3 — Generate visualizations:

Step 4 — Export results:

⚠️ CRITICAL — DO NOT:

Common Issues

Interpretation Guidelines

Suggested Next Steps

Related Skills

References

Code preview

scripts/__init__.py

scripts/classify_mechanisms.py

scripts/compile_trials.py

Companion files

scripts/init.py