View companion source

Multi-omics Integration

Integrate 2+ omics into interpretable latent factors.

Overview

Problem. Each layer has its own scale and noise; naive merging distorts.

Use when: 2+ layers with shared samples
Avoid when: Non-matched samples

Learning goals

Figures

Multi-Omics Integration Overview
MOFA+ Workflow
Variance Decomposition
Interpreting Factors
When to Use MOFA+

Tutorial

Identify latent factors driving variation across 2+ omics layers using MOFA+ (Multi-Omics Factor Analysis). Decomposes multi-omics data into interpretable factors, each capturing shared or view-specific biological signal. Handles missing data across views natively.

When to Use This Skill

Use when you: - ✅ Have 2+ omics layers measured on overlapping samples (RNA-seq + proteomics, methylation + mutations, etc.)

  • ✅ Want to find shared sources of variation across omics (not just per-omics analysis)
  • ✅ Need to identify which omics layers contribute to each source of variation
  • ✅ Have incomplete data (not all samples measured in all views) — MOFA handles this
  • ✅ Want factor scores for downstream patient stratification or survival analysis

Don't use for:

  • ❌ Single omics data (use bulk-rnaseq-counts-to-de-deseq2 or bulk-omics-clustering)
  • ❌ Supervised prediction (use lasso-biomarker-panel instead)
  • ❌ Single-cell multi-modal (MOFA2 supports it, but consider scrna-trajectory-inference)
  • ❌ Fewer than 10 samples per view

Runtime: ~5-8 minutes total (CLL example). First run adds ~1-3 min for Python environment setup.

Installation

# Bioconductor packages
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install(c("MOFA2", "MOFAdata", "ComplexHeatmap"))

# CRAN packages
install.packages(c("ggprism", "circlize", "reshape2", "RColorBrewer"))
Package Version License Commercial Use Installation
MOFA2 ≥1.12.0 LGPL (≥3) ✅ Permitted BiocManager::install("MOFA2")
MOFAdata ≥1.8.0 Artistic-2.0 ✅ Permitted BiocManager::install("MOFAdata") (example data)
ComplexHeatmap ≥2.18.0 MIT ✅ Permitted BiocManager::install("ComplexHeatmap")
ggprism ≥1.0.3 GPL (≥3) ✅ Permitted install.packages("ggprism")
circlize ≥0.4.15 MIT ✅ Permitted install.packages("circlize")
reshape2 ≥1.4.4 MIT ✅ Permitted install.packages("reshape2")
RColorBrewer ≥1.1 Apache-2.0 ✅ Permitted install.packages("RColorBrewer")
rmarkdown ≥2.25 GPL-3 ✅ Permitted install.packages("rmarkdown") (optional, PDF)

Inputs

Outputs

Analysis objects (RDS):

  • mofa_model.rds — Complete trained MOFA model for downstream use
  • Load with: model <- readRDS('mofa_results/mofa_model.rds')
  • Required for: bulk-omics-clustering (factor-based clustering), lasso-biomarker-panel (feature selection)

CSV results:

  • factor_values.csv — Sample factor scores (samples × factors)
  • weights_*.csv — Feature weights per view (features × factors)
  • variance_explained_per_factor.csv — R² per factor per view
  • variance_explained_total.csv — Total R² per view
  • top_features_per_factor.csv — Top 20 features per factor per view

Visualizations (PNG + SVG):

  • mofa_variance_per_factor — Heatmap: R² per factor per view (signature MOFA plot)
  • mofa_total_variance — Bar chart: total R² per view
  • mofa_factor_scatter — Scatter: Factor 1 vs 2 colored by clinical variable
  • mofa_factor_correlation — Tile: factor-factor correlations
  • mofa_top_weights — Faceted bar: top feature weights per factor
  • mofa_factor_heatmap — ComplexHeatmap: factors × samples with annotations
  • mofa_factor_clinical — Box plots: factor values by clinical groups

Reports:

  • analysis_report.md — Markdown summary with methods, results, references
  • analysis_report.pdf — PDF report with embedded figures (requires rmarkdown + LaTeX)

Clarification Questions

  1. Input Files (ASK THIS FIRST): - Do you have multi-omics data matrices to integrate? - Expected: Named list of matrices (features × samples), or CSV files per omics view - Or use example data? CLL blood cancer dataset (200 patients: mRNA, methylation, mutations, drug response)

🚨 IF EXAMPLE DATA SELECTED: Skip questions 3-4. Proceed directly to Step 1.

  1. Analysis Options:

    • (If using example data) Number of factors:
    • a) 15 factors — standard analysis (recommended)
    • b) 5 factors — quick demo (~2 min faster)
    • (If using own data) Number of factors:
    • a) 15 (recommended starting point)
    • b) Custom number
  2. (Own data only) Data types per view: - Which omics types? (RNA-seq, proteomics, methylation, mutations, metabolomics, drug response, other) - Are any views binary (0/1)? MOFA uses Bernoulli likelihood for binary data.

  3. (Own data only) Sample metadata: - Do you have a sample metadata file (CSV/TSV) with clinical variables? - Variables for factor-trait associations (e.g., disease status, treatment, subtype)?

Standard Workflow

🚨 MANDATORY: USE SCRIPTS EXACTLY AS SHOWN - DO NOT WRITE INLINE CODE 🚨

Step 1 - Load data:

# For CLL example data:
source("scripts/load_example_data.R")
cll <- load_cll_data()

# For user data:
# source("scripts/load_example_data.R")
# cll <- load_user_data(
#   file_paths = list(RNA = "rna.csv", Protein = "protein.csv"),
#   metadata_path = "metadata.csv"
# )

✅ VERIFICATION: "✓ Data loaded successfully!" with per-view dimensions


Step 2 - Run MOFA analysis:

source("scripts/mofa_workflow.R")
model <- run_mofa_analysis(
    data_list = cll$data,
    metadata = cll$metadata,
    n_factors = 15,
    output_dir = "mofa_results"
)

DO NOT write inline MOFA code. Just call run_mofa_analysis().

⏱️ Takes ~2-5 min (+ ~1-3 min extra on first run for Python environment setup via basilisk).

✅ VERIFICATION: "✓ MOFA model trained successfully!" with variance explained summary


Step 3 - Generate visualizations:

source("scripts/mofa_plots.R")
generate_all_plots(model, output_dir = "mofa_results")

DO NOT write inline plotting code (ggsave, ggplot, Heatmap, etc.). Just use the script.

The script handles PNG + SVG export with graceful fallback for SVG dependencies.

✅ VERIFICATION: "✓ All plots generated successfully!" with file count


Step 4 - Export results:

source("scripts/export_results.R")
export_all(model, output_dir = "mofa_results")

DO NOT write custom export code. Use export_all().

✅ VERIFICATION: "=== Export Complete ===" with file list


⚠️ CRITICAL - DO NOT:

  • Write inline MOFA codeSTOP: Use run_mofa_analysis()
  • Write inline plotting code (ggsave, ggplot, Heatmap, etc.)STOP: Use generate_all_plots()
  • Write custom export codeSTOP: Use export_all()
  • Try to install basilisk/reticulate manually → MOFA2 handles Python automatically

⚠️ IF SCRIPTS FAIL - Script Failure Hierarchy:

  1. Fix and Retry (90%) — Install missing package, re-run script
  2. Modify Script (5%) — Edit the script file itself, document changes
  3. Use as Reference (4%) — Read script, adapt approach, cite source
  4. Write from Scratch (1%) — Only if genuinely impossible, explain why

NEVER skip directly to writing inline code without trying the script first.

Common Issues

Error Cause Solution
basilisk Python env setup slow First-time setup of Python backend Normal — wait 1-3 minutes. Only happens once per R installation.
run_mofa hangs at "Training model..." Model training in progress Normal — wait 2-5 min. Training is compute-intensive.
Error in py_call_impl: Python error basilisk environment issue Restart R session, retry. If persistent: BiocManager::install("MOFA2", force = TRUE)
Metadata download failed EBI FTP blocked or offline Normal fallback. Analysis runs without trait plots. Metadata is optional.
"No convergence" Too many factors or too few samples Reduce n_factors (try 5-10). Ensure ≥10 samples.
SVG export failed Missing svglite/cairo Normal. PNG always generated. generate_all_plots() handles fallback automatically.
Memory error Dataset too large Filter features to top 5,000 most variable per view before MOFA.

Interpretation Guide

Variance Decomposition (Key MOFA Output)

Factor Interpretation

Pattern Meaning
Factor active in mRNA + methylation Epigenetic regulation of transcription
Factor active in mutations + drug response Genetic determinants of drug sensitivity
Factor correlates with clinical subtype Biologically meaningful patient stratification
Factor active in only one view View-specific technical or biological variation

See: references/mofa-interpretation-guide.md for detailed downstream analysis.

Suggested Next Steps

After running MOFA: - Patient stratification: Use bulk-omics-clustering on factor scores to define molecular subtypes - Biomarker discovery: Use lasso-biomarker-panel on top-weighted features per factor - Pathway enrichment: Use functional-enrichment-from-degs on top mRNA features per factor - Network analysis: Use coexpression-network on factor-associated genes - Survival analysis: Use survival-analysis-clinical with factor scores as covariates

Related Skills

Skill Relationship
bulk-omics-clustering Downstream: cluster on MOFA factor scores
lasso-biomarker-panel Downstream: select biomarkers from top factor features
disease-progression-longitudinal Complementary: trajectory analysis on factor scores
coexpression-network Downstream: network analysis on factor-associated genes
functional-enrichment-from-degs Downstream: pathway enrichment on top factor features
bulk-rnaseq-counts-to-de-deseq2 Upstream: generate DE results as one omics view

References

Code preview

scripts/export_results.R

# =============================================================================
# Export MOFA+ Analysis Results
# =============================================================================
# Exports factor values, feature weights, variance explained, and analysis
# objects. Generates markdown summary report and optional PDF report.
# =============================================================================

options(repos = c(CRAN = "https://cloud.r-project.org"))

.install_if_missing <- function(pkg, bioc = FALSE) {
    if (!requireNamespace(pkg, quietly = TRUE)) {
        cat("Installing", pkg, "...\n")
        if (bioc) {
            if (!requireNamespace("BiocManager", quietly = TRUE))
                install.packages("BiocManager")
            BiocManager::install(pkg, ask = FALSE, update = FALSE)
        } else {
            install.packages(pkg)
        }
    }
}

#' Export all MOFA analysis results
#'
#' @param model Trained MOFA model object
#' @param output_dir Directory for output files
export_all <- function(model, output_dir = "mofa_results") {

    cat("\n=== Exporting MOFA Analysis Results ===\n\n")

    .install_if_missing("MOFA2", bioc = TRUE)
    .install_if_missing("reshape2")
    library(MOFA2)

    if (!dir.exists(output_dir)) {
        dir.create(output_dir, recursive = TRUE)
    }

    # -------------------------------------------------------------------------
    # 1. Factor values (samples x factors)
    # -------------------------------------------------------------------------
    cat("1. Exporting factor values...\n")
    factors_df <- get_factors(model, as.data.frame = TRUE)
    factors_wide <- reshape2::dcast(factors_df, sample ~ factor, value.var = "value")
    write.csv(factors_wide, file.path(output_dir, "factor_values.csv"), row.names = FALSE)
    cat(sprintf("   %d samples x %d factors\n", nrow(factors_wide), ncol(factors_wide) - 1))
    cat(sprintf("   Saved: %s\n\n", file.path(output_dir, "factor_values.csv")))

    # -------------------------------------------------------------------------
    # 2. Feature weights per view
    # -------------------------------------------------------------------------
    cat("2. Exporting feature weights per view...\n")
    weights_df <- get_weights(model, as.data.frame = TRUE)
    views <- unique(weights_df$view)
    for (v in views) {
        w_sub <- weights_df[weights_df$view == v, ]
        w_wide <- reshape2::dcast(w_sub, feature ~ factor, value.var = "value")
        fname <- sprintf("weights_%s.csv", gsub("[^a-zA-Z0-9]", "_", tolower(v)))
        write.csv(w_wide, file.path(output_dir, fname), row.names = FALSE)
        cat(sprintf("   %s: %d features x %d factors -> %s\n",
                    v, nrow(w_wide), ncol(w_wide) - 1, fname))
    }
    cat("\n")

    # -------------------------------------------------------------------------
    # 3. Variance explained
    # -------------------------------------------------------------------------
    cat("3. Exporting variance explained...\n")
    r2 <- get_variance_explained(model)
    r2_per_factor <- r2$r2_per_factor[[1]]
    r2_total <- r2$r2_total[[1]]

    # Per-factor per-view
    r2_df <- as.data.frame(r2_per_factor)
    r2_df$Factor <- rownames(r2_df)
    write.csv(r2_df, file.path(output_dir, "variance_explained_per_factor.csv"),
              row.names = FALSE)

    # Total per view
    total_df <- data.frame(View = names(r2_total), Total_R2 = as.numeric(r2_total))

scripts/load_example_data.R

# =============================================================================
# Load Example Data for MOFA+ Multi-Omics Integration
# =============================================================================
# Loads the CLL (Chronic Lymphocytic Leukemia) multi-omics dataset
# from the MOFAdata package. 200 patients, 4 omics layers:
#   - Drugs: drug response (310 features)
#   - Methylation: DNA methylation (4248 features)
#   - mRNA: gene expression (5000 features)
#   - Mutations: somatic mutations (69 features, binary)
#
# Also downloads sample metadata from EBI for clinical annotations.
# =============================================================================

options(repos = c(CRAN = "https://cloud.r-project.org"))

.install_if_missing <- function(pkg, bioc = FALSE) {
    if (!requireNamespace(pkg, quietly = TRUE)) {
        cat("Installing", pkg, "...\n")
        if (bioc) {
            if (!requireNamespace("BiocManager", quietly = TRUE))
                install.packages("BiocManager")
            BiocManager::install(pkg, ask = FALSE, update = FALSE)
        } else {
            install.packages(pkg)
        }
    }
}

#' Load CLL multi-omics example data
#'
#' @return list with components:
#'   - data: named list of 4 matrices (Drugs, Methylation, mRNA, Mutations)
#'   - metadata: data.frame of sample annotations (or NULL if download fails)
load_cll_data <- function() {
    cat("\n=== Loading CLL Multi-Omics Example Data ===\n\n")

    # --- Install/load MOFAdata ---
    .install_if_missing("MOFAdata", bioc = TRUE)
    library(MOFAdata)

    # --- Load CLL dataset ---
    cat("Loading CLL multi-omics data (200 patients, 4 omics layers)...\n")
    utils::data("CLL_data", package = "MOFAdata", envir = environment())

    # Validate structure
    stopifnot(is.list(CLL_data))
    stopifnot(length(CLL_data) >= 4)

    # --- Print per-view summary ---
    cat("\nDataset overview:\n")
    all_samples <- unique(unlist(lapply(CLL_data, colnames)))
    cat(sprintf("  Total unique samples: %d\n", length(all_samples)))
    cat("\n  View summaries:\n")
    for (view_name in names(CLL_data)) {
        mat <- CLL_data[[view_name]]
        n_feat <- nrow(mat)
        n_samp <- ncol(mat)
        pct_missing <- round((1 - n_samp / length(all_samples)) * 100, 1)
        cat(sprintf("    %-15s %5d features x %3d samples (%4.1f%% samples missing)\n",
                    view_name, n_feat, n_samp, pct_missing))
    }

    # --- Sample overlap ---
    sample_lists <- lapply(CLL_data, colnames)
    shared <- Reduce(intersect, sample_lists)
    cat(sprintf("\n  Samples with ALL 4 views: %d / %d (%.0f%%)\n",
                length(shared), length(all_samples),
                100 * length(shared) / length(all_samples)))

    # --- Download sample metadata ---
    metadata <- .download_cll_metadata()

    cat("\n✓ Data loaded successfully!\n\n")
    return(list(data = CLL_data, metadata = metadata))
}

#' Download CLL sample metadata from EBI
#' @return data.frame or NULL if download fails
.download_cll_metadata <- function() {
    cat("\nDownloading sample metadata...\n")

scripts/mofa_plots.R

# =============================================================================
# MOFA+ Visualization: Publication-Quality Plots
# =============================================================================
# Generates 7 plots using ggprism::theme_prism() and ComplexHeatmap.
# All plots saved as PNG + SVG with graceful fallback.
# =============================================================================

options(repos = c(CRAN = "https://cloud.r-project.org"))

.install_if_missing <- function(pkg, bioc = FALSE) {
    if (!requireNamespace(pkg, quietly = TRUE)) {
        cat("Installing", pkg, "...\n")
        if (bioc) {
            if (!requireNamespace("BiocManager", quietly = TRUE))
                install.packages("BiocManager")
            BiocManager::install(pkg, ask = FALSE, update = FALSE)
        } else {
            install.packages(pkg)
        }
    }
}

# --- Load required packages ---
.load_plot_packages <- function() {
    .install_if_missing("ggplot2")
    .install_if_missing("ggprism")
    .install_if_missing("reshape2")
    .install_if_missing("RColorBrewer")
    .install_if_missing("ComplexHeatmap", bioc = TRUE)
    .install_if_missing("circlize")
    .install_if_missing("MOFA2", bioc = TRUE)

    library(ggplot2)
    library(ggprism)
    library(reshape2)
    library(RColorBrewer)
    library(ComplexHeatmap)
    library(circlize)
    library(MOFA2)
    library(grid)
}

# --- PNG + SVG save helper ---
.save_plot <- function(plot_obj, base_name, output_dir, width = 8, height = 6, dpi = 300) {
    png_path <- file.path(output_dir, paste0(base_name, ".png"))
    svg_path <- file.path(output_dir, paste0(base_name, ".svg"))

    # Always save PNG
    ggsave(png_path, plot = plot_obj, width = width, height = height, dpi = dpi, device = "png")
    cat(sprintf("   Saved: %s\n", png_path))

    # Try SVG with fallback
    tryCatch({
        ggsave(svg_path, plot = plot_obj, width = width, height = height, device = "svg")
        cat(sprintf("   Saved: %s\n", svg_path))
    }, error = function(e) {
        tryCatch({
            svg(svg_path, width = width, height = height)
            print(plot_obj)
            dev.off()
            cat(sprintf("   Saved: %s\n", svg_path))
        }, error = function(e2) {
            cat("   (SVG export failed)\n")
        })
    })
}

# --- ComplexHeatmap save helper ---
.save_heatmap <- function(ht, base_name, output_dir, width = 10, height = 8) {
    png_path <- file.path(output_dir, paste0(base_name, ".png"))
    svg_path <- file.path(output_dir, paste0(base_name, ".svg"))

    # PNG
    png(png_path, width = width, height = height, units = "in", res = 300)
    draw(ht)
    dev.off()
    cat(sprintf("   Saved: %s\n", png_path))

    # SVG
    tryCatch({

Companion files

TypePathBytes
Markdownreferences/mofa-interpretation-guide.md4,597
Rscripts/export_results.R13,207
Rscripts/load_example_data.R6,447
Rscripts/mofa_plots.R18,418
Rscripts/mofa_workflow.R6,349
MarkdownSKILL.md11,142
JSONskill.meta.json1,540