⌂ / Overview / Population Genetics & Genomics / Mendelian Randomization

View companion source

Mendelian Randomization

Genetic variants as instruments to infer causal effects.

Overview

Problem. MR approximates a randomized trial via randomly-assigned genes.

Use when: Testing X→Y, with GWAS for both

Avoid when: When instrument assumptions are violated

Learning goals

Instruments: genes are assigned at conception
Multiple estimators cross-check each other

Figures

Tutorial

When to Use This Skill

You have GWAS summary statistics for an exposure and outcome trait
You want to test causal direction between two traits (not just correlation)
You need to assess whether an observed association is likely causal or confounded
You want to use genetic variants as instrumental variables (natural experiment)
You have OpenGWAS trait IDs or your own GWAS summary statistics files

Not suitable for: One-sample MR (individual-level data), non-linear MR, multivariable MR with >2 exposures

Installation

install.packages(c("remotes", "ggplot2", "ggprism", "dplyr", "rmarkdown"))
remotes::install_github("MRCIEU/TwoSampleMR")
# For PDF report generation (optional but recommended):
# install.packages("tinytex"); tinytex::install_tinytex()
# For MR-PRESSO outlier detection (optional but recommended):
# remotes::install_github("rondolab/MR-PRESSO")

Software	Version	License	Commercial Use
TwoSampleMR	≥0.5.6	GPL-3	✅ Permitted
ieugwasr	≥0.2.1	MIT	✅ Permitted
ggplot2	≥3.4.0	MIT	✅ Permitted
ggprism	≥1.0.3	GPL (≥3)	✅ Permitted
dplyr	≥1.1.0	MIT	✅ Permitted
rmarkdown	≥2.20	GPL-3	✅ Permitted

Inputs

Option A — OpenGWAS IDs (recommended):

Exposure ID (e.g., "ieu-a-300" for LDL cholesterol)
Outcome ID (e.g., "ieu-a-7" for coronary heart disease)
Browse available traits at: https://gwas.mrcieu.ac.uk/

Option B — User-provided files (CSV/TSV):

Exposure GWAS summary statistics
Outcome GWAS summary statistics

Required Column	Description	Example
SNP	rsID	rs1234567
beta	Effect estimate	0.05
se	Standard error	0.01
pval	P-value	5e-10
effect_allele	Effect allele	A
other_allele	Other allele	G
eaf	Effect allele frequency (optional)	0.3

Outputs

Results (CSV):

mr_results.csv — MR estimates from all 4 methods (beta, SE, p-value, nSNP, F-statistics)
heterogeneity_results.csv — Cochran's Q test for instrument heterogeneity
pleiotropy_results.csv — MR-Egger intercept test for directional pleiotropy
directionality_results.csv — Steiger test confirming causal direction
harmonized_data.csv — SNP-level harmonized exposure-outcome data
single_snp_results.csv — Per-SNP Wald ratio estimates
leaveoneout_results.csv — Leave-one-out robustness estimates
MR-PRESSO outlier results (if heterogeneity significant and MRPRESSO installed)

Plots (PNG + SVG):

mr_scatter_plot — SNP-exposure vs SNP-outcome with method regression lines
mr_forest_plot — Individual + combined SNP effect estimates
mr_funnel_plot — Precision vs effect size (asymmetry = pleiotropy)
mr_leaveoneout_plot — Effect stability when removing each SNP

Report:

mr_report.pdf — Structured analysis report (Introduction, Methods, Results, Figures, Conclusions)

Analysis objects (RDS):

mr_object.rds — Complete analysis (results, sensitivity, harmonized data)
Load with: mr_obj <- readRDS("mr_results/mr_object.rds")

Clarification Questions

Input Data (ASK THIS FIRST):
- Do you have specific GWAS summary statistics or OpenGWAS trait IDs?
- If files uploaded: Are these the exposure and outcome GWAS files?
- Expected formats: CSV/TSV with SNP, beta, se, pval, effect_allele, other_allele
- Or use example data? LDL Cholesterol → Coronary Heart Disease demo (drug-target validation example)
Exposure and Outcome:
- (If using example data) Pre-set: Exposure = LDL Cholesterol, Outcome = Coronary Heart Disease — no need to specify
- (If using your own data) What is the exposure (potential cause)? What is the outcome (potential effect)?
Parameters (defaults usually fine):
- P-value threshold for instruments? (default: 5×10⁻⁸)
- LD clumping r²? (default: 0.001)

Standard Workflow

🚨 MANDATORY: USE SCRIPTS EXACTLY AS SHOWN — DO NOT WRITE INLINE CODE 🚨

Step 1 — Load and harmonize data:

source("scripts/load_data.R")
dat <- load_example_data()
# OR: dat <- load_from_opengwas("ieu-a-300", "ieu-a-7")
# OR: dat <- load_from_files("exposure.csv", "outcome.csv")

DO NOT write inline data loading or harmonization code. Use the functions above.

✅ ✅ VERIFICATION: You MUST see "✓ Data loaded and harmonized successfully!"

Step 2 — Run MR analysis:

source("scripts/run_mr_analysis.R")
mr_results <- run_mr(dat)
sensitivity <- run_sensitivity(dat, mr_results)

DO NOT write inline MR code. Just source the script and call the functions.

✅ ✅ VERIFICATION: You MUST see "✓ MR analysis completed successfully!" AND "✓ Sensitivity analyses completed successfully!"

Step 3 — Generate visualizations:

source("scripts/mr_plots.R")
generate_all_plots(mr_results, dat, sensitivity$singlesnp, sensitivity$leaveoneout, output_dir = "mr_results")

DO NOT write inline plotting code (ggsave, ggplot, etc.). Just use the function.

✅ ✅ VERIFICATION: You MUST see "✓ All MR plots generated successfully!"

Step 4 — Export results and generate report:

source("scripts/export_results.R")
export_all(mr_results, sensitivity, dat, output_dir = "mr_results")

DO NOT write custom export code. Use export_all(). It automatically generates the PDF report.

✅ ✅ VERIFICATION: You MUST see "✓ Report generated successfully!" AND "=== Export Complete ==="

❌ IF YOU DON'T SEE VERIFICATION MESSAGES: You wrote inline code. Stop and use the scripts.

⚠️ CRITICAL — DO NOT:

❌ Write inline MR analysis code → STOP: Use run_mr() and run_sensitivity()
❌ Write inline plotting code → STOP: Use generate_all_plots()
❌ Write custom export code → STOP: Use export_all()
❌ Write custom report code → STOP: Use generate_report()
❌ Try to install system dependencies → Scripts handle package installation

⚠️ IF SCRIPTS FAIL — Script Failure Hierarchy:

Fix and Retry (90%) — Install missing R package, re-run script
Modify Script (5%) — Edit the script file, document changes
Use as Reference (4%) — Read script, adapt approach, cite source
Write from Scratch (1%) — Only if genuinely impossible, explain why

Common Issues

Issue	Cause	Solution
"No instruments found"	No SNPs below p-value threshold	Try a less stringent threshold or check trait ID
LD clumping API fails	OpenGWAS/IEU API temporarily down	Script falls back to no clumping with warning; results may be affected by LD
"Only N SNPs retained"	Allele harmonization removed most SNPs	Check if exposure/outcome are from same genome build
Steiger test fails	Sample sizes unavailable in metadata	Normal for some datasets; other sensitivity tests still valid
SVG export error	Missing optional dependency	Normal — `generate_all_plots()` falls back to base R svg() automatically
OpenGWAS rate limiting	Too many API requests	Wait a few minutes and retry
PDF report fails	LaTeX/tinytex not installed	Install with `tinytex::install_tinytex()` — report auto-falls back to HTML or base R PDF
Steiger R² warning for binary outcome	Outcome is case-control, not quantitative	Use `get_r_from_lor()` with prevalence to compute liability-scale R² before directionality test
MR-PRESSO not available	MRPRESSO package not installed	`remotes::install_github('rondolab/MR-PRESSO')` — optional but recommended when heterogeneity is significant
"Cannot find function"	Script not sourced	Run `source("scripts/load_data.R")` before calling functions

Interpreting Results

See references/interpretation-guide.md for detailed guidance.

Quick interpretation:

Concordant methods (IVW, Egger, WM, WMode agree on direction + significance) → stronger evidence
Any method non-significant or discordant → must be discussed explicitly, not dismissed
IVW significant + no heterogeneity + no pleiotropy → strongest evidence
Egger intercept p < 0.05 → directional pleiotropy may bias IVW
High heterogeneity (Q p < 0.05) → run MR-PRESSO, flag outlier instruments
Steiger direction incorrect → reverse causation concern (check binary outcome R² correction)
F-statistic < 10 → weak instrument bias toward the null

Suggested Next Steps

Multiple exposures? → Run bidirectional MR (swap exposure/outcome)
Pleiotropy detected? → Consider MR-PRESSO or multivariable MR
Significant result? → Replicate with independent GWAS datasets
Drug target validation? → Use cis-MR with variants near gene of interest
Pathway analysis? → Combine with functional enrichment skills

Related Skills

polygenic-risk-score — Polygenic risk score computation (LDpred2)
polygenic-risk-score-prs-catalog — PRS from pre-computed PGS Catalog weights
eqtl-colocalization-coloc — eQTL colocalization analysis (MR follow-up)

References

Sanderson E, et al. (2022). Mendelian randomization. Nat Rev Methods Primers. PMC7384151
Hemani G, et al. (2018). The MR-Base platform supports systematic causal inference across the human phenome. eLife. DOI: 10.7554/eLife.34408
TwoSampleMR package: https://github.com/MRCIEU/TwoSampleMR
OpenGWAS database: https://gwas.mrcieu.ac.uk/
STROBE-MR guidelines: https://doi.org/10.1001/jama.2023.1788

Code preview

scripts/export_results.R

# =============================================================================
# Mendelian Randomization - Export Results
# =============================================================================
# Functions:
#   export_all()  - Export all MR results, sensitivity analyses, RDS object, and PDF report
# =============================================================================

#' Export all MR analysis results
#'
#' Saves:
#'   1. mr_results.csv - Primary MR estimates (all methods)
#'   2. heterogeneity_results.csv - Cochran's Q test results
#'   3. pleiotropy_results.csv - MR-Egger intercept test
#'   4. directionality_results.csv - Steiger directionality test
#'   5. harmonized_data.csv - SNP-level harmonized data
#'   6. single_snp_results.csv - Per-SNP Wald ratio estimates
#'   7. leaveoneout_results.csv - Leave-one-out estimates
#'   8. mr_object.rds - Complete analysis object for downstream use
#'   9. mr_report.pdf - Structured analysis report (auto-generated)
#'
#' @param mr_results MR results from run_mr()
#' @param sensitivity_results Sensitivity list from run_sensitivity()
#' @param dat Harmonized data from load_data.R
#' @param output_dir Directory for saving results (default: "mr_results")
export_all <- function(mr_results, sensitivity_results, dat,
                        output_dir = "mr_results") {
    cat("\n=== Exporting MR Results ===\n\n")

    if (!dir.exists(output_dir)) {
        dir.create(output_dir, recursive = TRUE)
    }

    file_count <- 0

    # 1. Primary MR results
    cat("1. MR results (all methods)...\n")
    write.csv(mr_results, file.path(output_dir, "mr_results.csv"), row.names = FALSE)
    cat("   Saved: mr_results.csv\n")
    file_count <- file_count + 1

    # 2. Heterogeneity results
    cat("2. Heterogeneity test results...\n")
    write.csv(sensitivity_results$heterogeneity,
              file.path(output_dir, "heterogeneity_results.csv"), row.names = FALSE)
    cat("   Saved: heterogeneity_results.csv\n")
    file_count <- file_count + 1

    # 3. Pleiotropy results
    cat("3. Pleiotropy test results...\n")
    write.csv(sensitivity_results$pleiotropy,
              file.path(output_dir, "pleiotropy_results.csv"), row.names = FALSE)
    cat("   Saved: pleiotropy_results.csv\n")
    file_count <- file_count + 1

    # 4. Directionality results
    cat("4. Directionality test results...\n")
    if (!is.null(sensitivity_results$directionality)) {
        write.csv(sensitivity_results$directionality,
                  file.path(output_dir, "directionality_results.csv"), row.names = FALSE)
        cat("   Saved: directionality_results.csv\n")
        file_count <- file_count + 1
    } else {
        cat("   Skipped (directionality test was not available)\n")
    }

    # 5. Harmonized data
    cat("5. Harmonized SNP-level data...\n")
    write.csv(dat, file.path(output_dir, "harmonized_data.csv"), row.names = FALSE)
    cat("   Saved: harmonized_data.csv\n")
    file_count <- file_count + 1

    # 6. Single-SNP results
    cat("6. Single-SNP Wald ratio results...\n")
    write.csv(sensitivity_results$singlesnp,
              file.path(output_dir, "single_snp_results.csv"), row.names = FALSE)
    cat("   Saved: single_snp_results.csv\n")
    file_count <- file_count + 1

    # 7. Leave-one-out results
    cat("7. Leave-one-out results...\n")

scripts/generate_report.R

# =============================================================================
# Mendelian Randomization - Report Generation
# =============================================================================
# Functions:
#   generate_report()  - Generate PDF analysis report (intro/methods/results/conclusions)
# =============================================================================

#' Generate a structured MR analysis report
#'
#' Creates a PDF report with Introduction, Methods, Results, Figures, and Conclusions.
#' Requires rmarkdown + LaTeX (tinytex) for PDF. Falls back to HTML or base R PDF.
#'
#' @param mr_results MR results from run_mr()
#' @param sensitivity Sensitivity results from run_sensitivity()
#' @param dat Harmonized data from load_data.R
#' @param output_dir Directory for saving report (default: "mr_results")
generate_report <- function(mr_results, sensitivity, dat, output_dir = "mr_results") {
    cat("\n=== Generating MR Analysis Report ===\n\n")

    if (!dir.exists(output_dir)) dir.create(output_dir, recursive = TRUE)

    has_rmarkdown <- requireNamespace("rmarkdown", quietly = TRUE)

    if (has_rmarkdown) {
        rmd_content <- .build_rmd_content(mr_results, sensitivity, dat, output_dir)
        rmd_path <- file.path(output_dir, "mr_report.Rmd")
        writeLines(rmd_content, rmd_path)

        # Try PDF first, then HTML
        pdf_ok <- tryCatch({
            rmarkdown::render(rmd_path,
                output_format = rmarkdown::pdf_document(
                    toc = FALSE, latex_engine = "xelatex"),
                output_file = "mr_report.pdf",
                output_dir = output_dir, quiet = TRUE)
            cat("   Saved:", file.path(output_dir, "mr_report.pdf"), "\n")
            TRUE
        }, error = function(e) {
            cat("   PDF rendering failed (", conditionMessage(e), ")\n")
            FALSE
        })

        if (!pdf_ok) {
            html_ok <- tryCatch({
                rmarkdown::render(rmd_path,
                    output_format = "html_document",
                    output_file = "mr_report.html",
                    output_dir = output_dir, quiet = TRUE)
                cat("   Saved:", file.path(output_dir, "mr_report.html"),
                    "(HTML fallback)\n")
                TRUE
            }, error = function(e) {
                cat("   HTML also failed. Using base R PDF...\n")
                FALSE
            })

            if (!html_ok) {
                .generate_base_pdf(mr_results, sensitivity, dat, output_dir)
            }
        }

        unlink(rmd_path)
    } else {
        cat("   rmarkdown not available. Using base R PDF...\n")
        .generate_base_pdf(mr_results, sensitivity, dat, output_dir)
    }

    cat("\n\u2713 Report generated successfully!\n")
    return(invisible(NULL))
}

# --- Rmd content builder -----------------------------------------------------

.build_rmd_content <- function(mr_results, sensitivity, dat, output_dir) {
    exposure <- unique(dat$exposure)[1]
    outcome <- unique(dat$outcome)[1]
    n_instruments <- nrow(dat)
    date_str <- format(Sys.Date(), "%B %d, %Y")

    ivw <- mr_results[mr_results$method == "Inverse variance weighted", ]

scripts/load_data.R

# =============================================================================
# Mendelian Randomization - Data Loading & Harmonization
# =============================================================================
# Three entry points:
#   load_example_data()       - LDL Cholesterol → CHD demo (OpenGWAS)
#   load_from_opengwas()      - Any exposure/outcome by OpenGWAS ID
#   load_from_files()         - User-provided GWAS summary statistics
# =============================================================================

# --- Package setup -----------------------------------------------------------

.ensure_packages <- function() {
    options(repos = c(CRAN = "https://cloud.r-project.org"))

    if (!requireNamespace("remotes", quietly = TRUE)) {
        install.packages("remotes")
    }

    if (!requireNamespace("TwoSampleMR", quietly = TRUE)) {
        cat("Installing TwoSampleMR from GitHub (~2 min)...\n")
        remotes::install_github("MRCIEU/TwoSampleMR", upgrade = "never")
    }

    if (!requireNamespace("ieugwasr", quietly = TRUE)) {
        install.packages("ieugwasr")
    }

    suppressPackageStartupMessages({
        library(TwoSampleMR)
        library(ieugwasr)
        library(dplyr)
    })
}

# --- OpenGWAS data loading ---------------------------------------------------

#' Load exposure and outcome data from OpenGWAS and harmonize
#'
#' @param exposure_id OpenGWAS ID for exposure (e.g., "ieu-a-300" for LDL cholesterol)
#' @param outcome_id OpenGWAS ID for outcome (e.g., "ieu-a-7" for CHD)
#' @param p_threshold P-value threshold for instrument selection (default: 5e-8)
#' @param clump_r2 LD clumping r-squared threshold (default: 0.001)
#' @param clump_kb LD clumping window in kb (default: 10000)
#' @return Harmonized data frame ready for MR analysis
load_from_opengwas <- function(exposure_id, outcome_id,
                                p_threshold = 5e-8,
                                clump_r2 = 0.001,
                                clump_kb = 10000) {
    .ensure_packages()

    cat("\n=== Loading MR Data from OpenGWAS ===\n")
    cat("  Exposure:", exposure_id, "\n")
    cat("  Outcome: ", outcome_id, "\n\n")

    # Step 1: Extract instruments for exposure
    cat("Step 1/4: Extracting genome-wide significant instruments (p <", p_threshold, ")...\n")
    exposure_dat <- extract_instruments(
        outcomes = exposure_id,
        p1 = p_threshold,
        clump = TRUE,
        r2 = clump_r2,
        kb = clump_kb
    )

    if (is.null(exposure_dat) || nrow(exposure_dat) == 0) {
        stop("No instruments found for exposure '", exposure_id,
             "'. Check the ID or try a less stringent p-value threshold.")
    }
    cat("  Found", nrow(exposure_dat), "instruments after LD clumping\n\n")

    # Step 2: Extract outcome data for these SNPs
    cat("Step 2/4: Extracting outcome data for instruments...\n")
    outcome_dat <- extract_outcome_data(
        snps = exposure_dat$SNP,
        outcomes = outcome_id
    )

    if (is.null(outcome_dat) || nrow(outcome_dat) == 0) {
        stop("No outcome data found for '", outcome_id,
             "'. Check the ID or ensure SNPs are available in the outcome GWAS.")

Companion files

Type	Path	Bytes
Markdown	references/interpretation-guide.md	8,831
Markdown	references/method-reference.md	6,972
R	scripts/export_results.R	4,939
R	scripts/generate_report.R	18,109
R	scripts/load_data.R	10,932
R	scripts/mr_plots.R	6,785
R	scripts/run_mr_analysis.R	14,838
Markdown	SKILL.md	9,812
JSON	skill.meta.json	1,841