Mendelian Randomization

Genetic variants as instruments to infer causal effects.

Overview

Problem. MR approximates a randomized trial via randomly-assigned genes.

Use when: Testing X→Y, with GWAS for both
Avoid when: When instrument assumptions are violated

Learning goals

Figures

Two-Sample MR Overview
Three IV Assumptions
4-Step MR Workflow
Four MR Methods
Sensitivity & Reading Plots
Common Pitfalls

Tutorial

When to Use This Skill

Not suitable for: One-sample MR (individual-level data), non-linear MR, multivariable MR with >2 exposures

Installation

install.packages(c("remotes", "ggplot2", "ggprism", "dplyr", "rmarkdown"))
remotes::install_github("MRCIEU/TwoSampleMR")
# For PDF report generation (optional but recommended):
# install.packages("tinytex"); tinytex::install_tinytex()
# For MR-PRESSO outlier detection (optional but recommended):
# remotes::install_github("rondolab/MR-PRESSO")
Software Version License Commercial Use
TwoSampleMR ≥0.5.6 GPL-3 ✅ Permitted
ieugwasr ≥0.2.1 MIT ✅ Permitted
ggplot2 ≥3.4.0 MIT ✅ Permitted
ggprism ≥1.0.3 GPL (≥3) ✅ Permitted
dplyr ≥1.1.0 MIT ✅ Permitted
rmarkdown ≥2.20 GPL-3 ✅ Permitted

Inputs

Option A — OpenGWAS IDs (recommended):

  • Exposure ID (e.g., "ieu-a-300" for LDL cholesterol)
  • Outcome ID (e.g., "ieu-a-7" for coronary heart disease)
  • Browse available traits at: https://gwas.mrcieu.ac.uk/

Option B — User-provided files (CSV/TSV): - Exposure GWAS summary statistics

  • Outcome GWAS summary statistics
Required Column Description Example
SNP rsID rs1234567
beta Effect estimate 0.05
se Standard error 0.01
pval P-value 5e-10
effect_allele Effect allele A
other_allele Other allele G
eaf Effect allele frequency (optional) 0.3

Outputs

Results (CSV):

  • mr_results.csv — MR estimates from all 4 methods (beta, SE, p-value, nSNP, F-statistics)
  • heterogeneity_results.csv — Cochran's Q test for instrument heterogeneity
  • pleiotropy_results.csv — MR-Egger intercept test for directional pleiotropy
  • directionality_results.csv — Steiger test confirming causal direction
  • harmonized_data.csv — SNP-level harmonized exposure-outcome data
  • single_snp_results.csv — Per-SNP Wald ratio estimates
  • leaveoneout_results.csv — Leave-one-out robustness estimates
  • MR-PRESSO outlier results (if heterogeneity significant and MRPRESSO installed)

Plots (PNG + SVG):

  • mr_scatter_plot — SNP-exposure vs SNP-outcome with method regression lines
  • mr_forest_plot — Individual + combined SNP effect estimates
  • mr_funnel_plot — Precision vs effect size (asymmetry = pleiotropy)
  • mr_leaveoneout_plot — Effect stability when removing each SNP

Report:

  • mr_report.pdf — Structured analysis report (Introduction, Methods, Results, Figures, Conclusions)

Analysis objects (RDS):

  • mr_object.rds — Complete analysis (results, sensitivity, harmonized data)
  • Load with: mr_obj <- readRDS("mr_results/mr_object.rds")

Clarification Questions

  1. Input Data (ASK THIS FIRST):

    • Do you have specific GWAS summary statistics or OpenGWAS trait IDs?
    • If files uploaded: Are these the exposure and outcome GWAS files?
    • Expected formats: CSV/TSV with SNP, beta, se, pval, effect_allele, other_allele
    • Or use example data? LDL Cholesterol → Coronary Heart Disease demo (drug-target validation example)
  2. Exposure and Outcome:

    • (If using example data) Pre-set: Exposure = LDL Cholesterol, Outcome = Coronary Heart Disease — no need to specify
    • (If using your own data) What is the exposure (potential cause)? What is the outcome (potential effect)?
  3. Parameters (defaults usually fine):

    • P-value threshold for instruments? (default: 5×10⁻⁸)
    • LD clumping r²? (default: 0.001)

Standard Workflow

🚨 MANDATORY: USE SCRIPTS EXACTLY AS SHOWN — DO NOT WRITE INLINE CODE 🚨

Step 1 — Load and harmonize data:

source("scripts/load_data.R")
dat <- load_example_data()
# OR: dat <- load_from_opengwas("ieu-a-300", "ieu-a-7")
# OR: dat <- load_from_files("exposure.csv", "outcome.csv")

DO NOT write inline data loading or harmonization code. Use the functions above.

✅ VERIFICATION: You MUST see "✓ Data loaded and harmonized successfully!"

Step 2 — Run MR analysis:

source("scripts/run_mr_analysis.R")
mr_results <- run_mr(dat)
sensitivity <- run_sensitivity(dat, mr_results)

DO NOT write inline MR code. Just source the script and call the functions.

✅ VERIFICATION: You MUST see "✓ MR analysis completed successfully!" AND "✓ Sensitivity analyses completed successfully!"

Step 3 — Generate visualizations:

source("scripts/mr_plots.R")
generate_all_plots(mr_results, dat, sensitivity$singlesnp, sensitivity$leaveoneout, output_dir = "mr_results")

DO NOT write inline plotting code (ggsave, ggplot, etc.). Just use the function.

✅ VERIFICATION: You MUST see "✓ All MR plots generated successfully!"

Step 4 — Export results and generate report:

source("scripts/export_results.R")
export_all(mr_results, sensitivity, dat, output_dir = "mr_results")

DO NOT write custom export code. Use export_all(). It automatically generates the PDF report.

✅ VERIFICATION: You MUST see "✓ Report generated successfully!" AND "=== Export Complete ==="

IF YOU DON'T SEE VERIFICATION MESSAGES: You wrote inline code. Stop and use the scripts.

⚠️ CRITICAL — DO NOT:

  • Write inline MR analysis codeSTOP: Use run_mr() and run_sensitivity()
  • Write inline plotting codeSTOP: Use generate_all_plots()
  • Write custom export codeSTOP: Use export_all()
  • Write custom report codeSTOP: Use generate_report()
  • Try to install system dependencies → Scripts handle package installation

⚠️ IF SCRIPTS FAIL — Script Failure Hierarchy:

  1. Fix and Retry (90%) — Install missing R package, re-run script
  2. Modify Script (5%) — Edit the script file, document changes
  3. Use as Reference (4%) — Read script, adapt approach, cite source
  4. Write from Scratch (1%) — Only if genuinely impossible, explain why

Common Issues

Issue Cause Solution
"No instruments found" No SNPs below p-value threshold Try a less stringent threshold or check trait ID
LD clumping API fails OpenGWAS/IEU API temporarily down Script falls back to no clumping with warning; results may be affected by LD
"Only N SNPs retained" Allele harmonization removed most SNPs Check if exposure/outcome are from same genome build
Steiger test fails Sample sizes unavailable in metadata Normal for some datasets; other sensitivity tests still valid
SVG export error Missing optional dependency Normal — generate_all_plots() falls back to base R svg() automatically
OpenGWAS rate limiting Too many API requests Wait a few minutes and retry
PDF report fails LaTeX/tinytex not installed Install with tinytex::install_tinytex() — report auto-falls back to HTML or base R PDF
Steiger R² warning for binary outcome Outcome is case-control, not quantitative Use get_r_from_lor() with prevalence to compute liability-scale R² before directionality test
MR-PRESSO not available MRPRESSO package not installed remotes::install_github('rondolab/MR-PRESSO') — optional but recommended when heterogeneity is significant
"Cannot find function" Script not sourced Run source("scripts/load_data.R") before calling functions

Interpreting Results

See references/interpretation-guide.md for detailed guidance.

Quick interpretation:

  • Concordant methods (IVW, Egger, WM, WMode agree on direction + significance) → stronger evidence
  • Any method non-significant or discordant → must be discussed explicitly, not dismissed
  • IVW significant + no heterogeneity + no pleiotropy → strongest evidence
  • Egger intercept p < 0.05 → directional pleiotropy may bias IVW
  • High heterogeneity (Q p < 0.05) → run MR-PRESSO, flag outlier instruments
  • Steiger direction incorrect → reverse causation concern (check binary outcome R² correction)
  • F-statistic < 10 → weak instrument bias toward the null

Suggested Next Steps

Related Skills

References

Code preview

scripts/export_results.R

# =============================================================================
# Mendelian Randomization - Export Results
# =============================================================================
# Functions:
#   export_all()  - Export all MR results, sensitivity analyses, RDS object, and PDF report
# =============================================================================

#' Export all MR analysis results
#'
#' Saves:
#'   1. mr_results.csv - Primary MR estimates (all methods)
#'   2. heterogeneity_results.csv - Cochran's Q test results
#'   3. pleiotropy_results.csv - MR-Egger intercept test
#'   4. directionality_results.csv - Steiger directionality test
#'   5. harmonized_data.csv - SNP-level harmonized data
#'   6. single_snp_results.csv - Per-SNP Wald ratio estimates
#'   7. leaveoneout_results.csv - Leave-one-out estimates
#'   8. mr_object.rds - Complete analysis object for downstream use
#'   9. mr_report.pdf - Structured analysis report (auto-generated)
#'
#' @param mr_results MR results from run_mr()
#' @param sensitivity_results Sensitivity list from run_sensitivity()
#' @param dat Harmonized data from load_data.R
#' @param output_dir Directory for saving results (default: "mr_results")
export_all <- function(mr_results, sensitivity_results, dat,
                        output_dir = "mr_results") {
    cat("\n=== Exporting MR Results ===\n\n")

    if (!dir.exists(output_dir)) {
        dir.create(output_dir, recursive = TRUE)
    }

    file_count <- 0

    # 1. Primary MR results
    cat("1. MR results (all methods)...\n")
    write.csv(mr_results, file.path(output_dir, "mr_results.csv"), row.names = FALSE)
    cat("   Saved: mr_results.csv\n")
    file_count <- file_count + 1

    # 2. Heterogeneity results
    cat("2. Heterogeneity test results...\n")
    write.csv(sensitivity_results$heterogeneity,
              file.path(output_dir, "heterogeneity_results.csv"), row.names = FALSE)
    cat("   Saved: heterogeneity_results.csv\n")
    file_count <- file_count + 1

    # 3. Pleiotropy results
    cat("3. Pleiotropy test results...\n")
    write.csv(sensitivity_results$pleiotropy,
              file.path(output_dir, "pleiotropy_results.csv"), row.names = FALSE)
    cat("   Saved: pleiotropy_results.csv\n")
    file_count <- file_count + 1

    # 4. Directionality results
    cat("4. Directionality test results...\n")
    if (!is.null(sensitivity_results$directionality)) {
        write.csv(sensitivity_results$directionality,
                  file.path(output_dir, "directionality_results.csv"), row.names = FALSE)
        cat("   Saved: directionality_results.csv\n")
        file_count <- file_count + 1
    } else {
        cat("   Skipped (directionality test was not available)\n")
    }

    # 5. Harmonized data
    cat("5. Harmonized SNP-level data...\n")
    write.csv(dat, file.path(output_dir, "harmonized_data.csv"), row.names = FALSE)
    cat("   Saved: harmonized_data.csv\n")
    file_count <- file_count + 1

    # 6. Single-SNP results
    cat("6. Single-SNP Wald ratio results...\n")
    write.csv(sensitivity_results$singlesnp,
              file.path(output_dir, "single_snp_results.csv"), row.names = FALSE)
    cat("   Saved: single_snp_results.csv\n")
    file_count <- file_count + 1

    # 7. Leave-one-out results
    cat("7. Leave-one-out results...\n")

scripts/generate_report.R

# =============================================================================
# Mendelian Randomization - Report Generation
# =============================================================================
# Functions:
#   generate_report()  - Generate PDF analysis report (intro/methods/results/conclusions)
# =============================================================================

#' Generate a structured MR analysis report
#'
#' Creates a PDF report with Introduction, Methods, Results, Figures, and Conclusions.
#' Requires rmarkdown + LaTeX (tinytex) for PDF. Falls back to HTML or base R PDF.
#'
#' @param mr_results MR results from run_mr()
#' @param sensitivity Sensitivity results from run_sensitivity()
#' @param dat Harmonized data from load_data.R
#' @param output_dir Directory for saving report (default: "mr_results")
generate_report <- function(mr_results, sensitivity, dat, output_dir = "mr_results") {
    cat("\n=== Generating MR Analysis Report ===\n\n")

    if (!dir.exists(output_dir)) dir.create(output_dir, recursive = TRUE)

    has_rmarkdown <- requireNamespace("rmarkdown", quietly = TRUE)

    if (has_rmarkdown) {
        rmd_content <- .build_rmd_content(mr_results, sensitivity, dat, output_dir)
        rmd_path <- file.path(output_dir, "mr_report.Rmd")
        writeLines(rmd_content, rmd_path)

        # Try PDF first, then HTML
        pdf_ok <- tryCatch({
            rmarkdown::render(rmd_path,
                output_format = rmarkdown::pdf_document(
                    toc = FALSE, latex_engine = "xelatex"),
                output_file = "mr_report.pdf",
                output_dir = output_dir, quiet = TRUE)
            cat("   Saved:", file.path(output_dir, "mr_report.pdf"), "\n")
            TRUE
        }, error = function(e) {
            cat("   PDF rendering failed (", conditionMessage(e), ")\n")
            FALSE
        })

        if (!pdf_ok) {
            html_ok <- tryCatch({
                rmarkdown::render(rmd_path,
                    output_format = "html_document",
                    output_file = "mr_report.html",
                    output_dir = output_dir, quiet = TRUE)
                cat("   Saved:", file.path(output_dir, "mr_report.html"),
                    "(HTML fallback)\n")
                TRUE
            }, error = function(e) {
                cat("   HTML also failed. Using base R PDF...\n")
                FALSE
            })

            if (!html_ok) {
                .generate_base_pdf(mr_results, sensitivity, dat, output_dir)
            }
        }

        unlink(rmd_path)
    } else {
        cat("   rmarkdown not available. Using base R PDF...\n")
        .generate_base_pdf(mr_results, sensitivity, dat, output_dir)
    }

    cat("\n\u2713 Report generated successfully!\n")
    return(invisible(NULL))
}

# --- Rmd content builder -----------------------------------------------------

.build_rmd_content <- function(mr_results, sensitivity, dat, output_dir) {
    exposure <- unique(dat$exposure)[1]
    outcome <- unique(dat$outcome)[1]
    n_instruments <- nrow(dat)
    date_str <- format(Sys.Date(), "%B %d, %Y")

    ivw <- mr_results[mr_results$method == "Inverse variance weighted", ]

scripts/load_data.R

# =============================================================================
# Mendelian Randomization - Data Loading & Harmonization
# =============================================================================
# Three entry points:
#   load_example_data()       - LDL Cholesterol → CHD demo (OpenGWAS)
#   load_from_opengwas()      - Any exposure/outcome by OpenGWAS ID
#   load_from_files()         - User-provided GWAS summary statistics
# =============================================================================

# --- Package setup -----------------------------------------------------------

.ensure_packages <- function() {
    options(repos = c(CRAN = "https://cloud.r-project.org"))

    if (!requireNamespace("remotes", quietly = TRUE)) {
        install.packages("remotes")
    }

    if (!requireNamespace("TwoSampleMR", quietly = TRUE)) {
        cat("Installing TwoSampleMR from GitHub (~2 min)...\n")
        remotes::install_github("MRCIEU/TwoSampleMR", upgrade = "never")
    }

    if (!requireNamespace("ieugwasr", quietly = TRUE)) {
        install.packages("ieugwasr")
    }

    suppressPackageStartupMessages({
        library(TwoSampleMR)
        library(ieugwasr)
        library(dplyr)
    })
}

# --- OpenGWAS data loading ---------------------------------------------------

#' Load exposure and outcome data from OpenGWAS and harmonize
#'
#' @param exposure_id OpenGWAS ID for exposure (e.g., "ieu-a-300" for LDL cholesterol)
#' @param outcome_id OpenGWAS ID for outcome (e.g., "ieu-a-7" for CHD)
#' @param p_threshold P-value threshold for instrument selection (default: 5e-8)
#' @param clump_r2 LD clumping r-squared threshold (default: 0.001)
#' @param clump_kb LD clumping window in kb (default: 10000)
#' @return Harmonized data frame ready for MR analysis
load_from_opengwas <- function(exposure_id, outcome_id,
                                p_threshold = 5e-8,
                                clump_r2 = 0.001,
                                clump_kb = 10000) {
    .ensure_packages()

    cat("\n=== Loading MR Data from OpenGWAS ===\n")
    cat("  Exposure:", exposure_id, "\n")
    cat("  Outcome: ", outcome_id, "\n\n")

    # Step 1: Extract instruments for exposure
    cat("Step 1/4: Extracting genome-wide significant instruments (p <", p_threshold, ")...\n")
    exposure_dat <- extract_instruments(
        outcomes = exposure_id,
        p1 = p_threshold,
        clump = TRUE,
        r2 = clump_r2,
        kb = clump_kb
    )

    if (is.null(exposure_dat) || nrow(exposure_dat) == 0) {
        stop("No instruments found for exposure '", exposure_id,
             "'. Check the ID or try a less stringent p-value threshold.")
    }
    cat("  Found", nrow(exposure_dat), "instruments after LD clumping\n\n")

    # Step 2: Extract outcome data for these SNPs
    cat("Step 2/4: Extracting outcome data for instruments...\n")
    outcome_dat <- extract_outcome_data(
        snps = exposure_dat$SNP,
        outcomes = outcome_id
    )

    if (is.null(outcome_dat) || nrow(outcome_dat) == 0) {
        stop("No outcome data found for '", outcome_id,
             "'. Check the ID or ensure SNPs are available in the outcome GWAS.")

Companion files

TypePathBytes
Markdownreferences/interpretation-guide.md8,831
Markdownreferences/method-reference.md6,972
Rscripts/export_results.R4,939
Rscripts/generate_report.R18,109
Rscripts/load_data.R10,932
Rscripts/mr_plots.R6,785
Rscripts/run_mr_analysis.R14,838
MarkdownSKILL.md9,812
JSONskill.meta.json1,841