⌂ / Overview / Bulk Transcriptomics / Co-expression Network

View companion source

Co-expression Network

Build a co-expression network; find modules and hub genes.

Overview

Problem. Move from single genes to co-varying modules.

Use when: Seeking modules or hub genes

Avoid when: With very small sample sizes

Learning goals

From gene lists to network thinking
High connectivity ≠ causation

Figures

Tutorial

Overview

Build weighted gene co-expression networks to identify modules of coordinately expressed genes and discover hub genes that may be key regulators. This workflow uses WGCNA (Weighted Gene Co-expression Network Analysis) to group genes into modules based on their expression patterns across samples, then correlates these modules with experimental conditions or traits.

Key Concept: Unlike single-gene analysis, WGCNA identifies groups of genes that behave similarly across samples, revealing biological pathways and potential regulatory relationships.

Use Cases:

Identify gene modules associated with experimental conditions
Discover hub genes (highly connected genes within modules)
Find genes with similar expression patterns to known genes of interest
Reduce dimensionality of gene expression data for downstream analysis
Generate hypotheses about gene function based on co-expression

Default Prompt: "Build a co-expression network to identify gene modules and hub genes from my RNA-seq data"

When to Use This Skill

Use WGCNA when you want to:

Identify gene modules associated with experimental conditions or phenotypes
Discover hub genes that are highly connected within modules and may be key regulators
Find co-expressed genes with similar expression patterns to known genes of interest
Reduce dimensionality of large gene expression datasets for downstream analysis
Generate hypotheses about gene function based on co-expression patterns

Requirements:

≥15 samples (20+ recommended for robust results)
Normalized expression data (VST, rlog, TPM, or FPKM - NOT raw counts)
5,000-15,000 most variable genes
Batch effects removed or corrected

Not suitable for:

Small sample sizes (<15 samples) - consider alternative approaches
Raw count data - normalize first using DESeq2 or similar
Data with uncorrected batch effects - correct before WGCNA

Installation

Core WGCNA packages:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("WGCNA")

Visualization packages:

install.packages(c("ggplot2", "ggprism"))
BiocManager::install("ComplexHeatmap")

Enrichment analysis (optional):

BiocManager::install(c("clusterProfiler", "org.Hs.eg.db"))  # Human
# BiocManager::install("org.Mm.eg.db")  # Mouse
# BiocManager::install("org.Rn.eg.db")  # Rat

Software	Version	License	Commercial Use	Installation
WGCNA	≥1.70	GPL-2+	✅ Permitted	`BiocManager::install('WGCNA')`
ggplot2	≥3.3.0	MIT	✅ Permitted	`install.packages('ggplot2')`
ComplexHeatmap	≥2.10.0	MIT	✅ Permitted	`BiocManager::install('ComplexHeatmap')`
clusterProfiler	≥4.0.0	Artistic-2.0	✅ Permitted	`BiocManager::install('clusterProfiler')`

Inputs

Required:

Normalized expression matrix (CSV/TSV): - Rows: Genes, Columns: Samples - Values: VST, rlog, TPM, or FPKM (NOT raw counts) - 5,000-15,000 most variable genes recommended

Sample metadata (CSV/TSV): - Sample IDs matching expression matrix columns - Traits/conditions for module-trait correlation

Optional:

Differential expression results (to highlight DEGs)
Gene annotations for enrichment analysis

Data Requirements:

≥15 samples (20+ recommended)
Batch effects removed or corrected
No missing values in expression matrix

Outputs

CSV Files:

wgcna_gene_modules.csv - Gene-module assignments with connectivity metrics
wgcna_hub_genes.csv - Top hub genes per module
wgcna_module_trait_cor.csv - Module-trait correlations with p-values
wgcna_eigengenes.csv - Module eigengene values per sample
wgcna_report.md - Summary report with interpretation

Plots (PNG + SVG):

soft_power_selection.png/.svg - Power selection diagnostic plot
module_dendrogram.png/.svg - Gene dendrogram with module colors
module_trait_correlation.png/.svg - Module-trait heatmap
eigengene_heatmap.png/.svg - Module eigengene expression patterns
hub_genes_barplot.png/.svg - Hub genes by connectivity

Analysis Objects (RDS):

wgcna_network.rds - Complete network object from blockwiseModules - Load with: net <- readRDS('wgcna_network.rds') - Required for: module preservation analysis, advanced network visualization
wgcna_module_colors.rds - Module color assignments per gene - Load with: colors <- readRDS('wgcna_module_colors.rds') - Required for: downstream module-specific analyses
wgcna_expression_matrix.rds - Filtered expression matrix used for analysis - Load with: expr <- readRDS('wgcna_expression_matrix.rds') - Required for: reanalysis, module preservation testing
wgcna_full_results.rds - Complete results object with all components - Load with: results <- readRDS('wgcna_full_results.rds') - Required for: replotting, additional analyses

Key Metrics:

module: Module color assignment (grey = unassigned)
kWithin: Intramodular connectivity (higher = more connected)
MM: Module membership (correlation with eigengene)
hub_score: Combined connectivity metric (MM × kWithin)

Clarification Questions

Input Files (ASK THIS FIRST):
- Do you have specific normalized expression data and sample metadata files to analyze?
- If uploaded: Are these the expression matrix and metadata you'd like to analyze?
- Expected formats: CSV or TSV with genes as rows, samples as columns
- Or use example data? Female mouse liver dataset (135 samples, liver tissue, multiple traits)
What is your normalized expression data format?
- VST (variance stabilizing transformation) from DESeq2
- rlog (regularized log) from DESeq2
- TPM (transcripts per million)
- FPKM/RPKM
- If unsure or raw counts: normalize first using DESeq2
How many samples do you have?
- 15-30 samples (minimum for WGCNA, results may be less robust)
- 30-50 samples (good power for network detection)
- 50+ samples (excellent power, most reliable results)
What traits/conditions do you want to correlate with modules?
- Treatment vs control (binary)
- Disease status or phenotype
- Continuous variables (age, dose, time, weight)
- Multiple traits (all will be tested)
Gene filtering strategy?
- Top 5,000 most variable genes (default, recommended)
- Top 10,000-15,000 genes (for larger datasets)
- All genes passing expression threshold
- Pre-filtered gene list (e.g., from DE analysis)

Standard Workflow

🚨 MANDATORY: USE SCRIPTS EXACTLY AS SHOWN - DO NOT WRITE INLINE CODE 🚨

Step 1 - Load data:

library(WGCNA)
allowWGCNAThreads()

source("scripts/load_example_data.R")
wgcna_data <- load_example_wgcna_data()
datExpr <- wgcna_data$datExpr
meta <- wgcna_data$meta

# For your own data:
# source("scripts/prepare_wgcna_data.R")
# data <- prepare_wgcna_data("expression.csv", "metadata.csv", top_n_genes = 5000)
# datExpr <- data$datExpr
# meta <- data$meta

Step 2 - Run WGCNA analysis:

source("scripts/wgcna_workflow.R")
results <- run_wgcna_analysis(
  datExpr,
  meta,
  traits = c("weight_g", "Glucose_mg_dl"),  # Adjust to your traits
  organism = "mouse"  # or "human", "rat", or NULL to skip enrichment
)

DO NOT write inline WGCNA code. Just source the script.

Step 3 - Generate visualizations:

source("scripts/plot_all_wgcna.R")
plot_all_wgcna(results, output_dir = "wgcna_results")

DO NOT write inline plotting code (png, svg, plotDendroAndColors, etc.). Just use the script.

The script handles PNG + SVG export with graceful fallback for SVG dependencies.

Step 4 - Export results:

source("scripts/export_wgcna_results.R")
export_all(results, output_dir = "wgcna_results")

DO NOT write custom export code. Use export_all().

✅ VERIFICATION - You should see:

After Step 1: "✓ Successfully loaded female mouse liver dataset"
After Step 2: "✓ WGCNA analysis completed successfully!"
After Step 3: "✓ All WGCNA plots generated successfully!"
After Step 4: "=== Export Complete ==="

⚠️ CRITICAL - DO NOT:

❌ Write inline WGCNA code → STOP: Use source("scripts/wgcna_workflow.R")
❌ Write inline plotting code (png, svg, plotDendroAndColors, etc.) → STOP: Use plot_all_wgcna()
❌ Write custom export code → STOP: Use export_all()
❌ Try to install svglite → script handles SVG fallback automatically
❌ Use absolute paths like /mnt/knowhow/ → use relative paths scripts/
❌ Skip soft power selection → required for scale-free topology
❌ Use raw counts → normalize first with DESeq2 VST or rlog

⚠️ IF SCRIPTS FAIL - Script Failure Hierarchy:

Fix and Retry (90%) - Install missing package, re-run script
Modify Script (5%) - Edit the script file itself, document changes
Use as Reference (4%) - Read script, adapt approach, cite source
Write from Scratch (1%) - Only if genuinely impossible, explain why

NEVER skip directly to writing inline code without trying the script first.

What the scripts provide:

scripts/load_example_data.R - Auto-fetch tutorial data (~135 samples)
scripts/prepare_wgcna_data.R - Load and filter your data
scripts/wgcna_workflow.R - Complete WGCNA analysis (power selection, network building, module-trait correlation, hub genes, enrichment)
scripts/plot_all_wgcna.R - All publication-quality plots (PNG + SVG)
scripts/plotting_helpers.R - Plot saving functions with automatic SVG fallback handling
scripts/export_wgcna_results.R - Export results and analysis objects

Parameter Customization

When customization is needed:

Soft power selection: Read references/parameter-tuning-guide.md to understand how to choose appropriate power values for your data
Module detection parameters: See references/parameter-tuning-guide.md#module-detection for guidance on min_module_size and merge_cut_height
Complete custom workflow: Read references/wgcna-reference.md for detailed code examples with explanations (only if you need full control)

Common Issues

Issue	Cause	Solution
Too few samples error	<15 samples	WGCNA requires ≥15 samples; combine replicates or use alternative methods
Scale-free R² never exceeds 0.75	Batch effects or poor data quality	Check for batch effects; try different normalization; inspect PCA
All genes assigned to grey module	minModuleSize too large or poor gene filtering	Lower minModuleSize to 20-30; increase top_n_genes to 10,000-15,000
No significant module-trait correlations	Weak biological signal or incorrect traits	Check trait coding (numeric for continuous, 0/1 for binary); try more samples
Soft power recommended is very high (>20)	Data not suitable for scale-free network	Check normalization; consider signed vs unsigned network
Hub gene identification fails	Module colors not provided correctly	Ensure module_colors matches gene order in datExpr
Enrichment analysis returns no results	Wrong organism or gene ID format	Verify organism parameter matches data; convert gene IDs if needed
Memory errors during network construction	Too many genes	Reduce to 5,000-10,000 most variable genes; increase RAM

Interpretation Guidelines

Module colors:

Each color = distinct co-expression module
Grey = genes not assigned to any module
Larger modules may represent broader biological processes

Hub genes:

High kWithin = highly connected within module
High MM = strong correlation with module eigengene
Hub genes are candidates for experimental validation

Module-trait correlations:

|r| > 0.5 and p < 0.05 = significant association
Positive correlation = module genes increase with trait
Negative correlation = module genes decrease with trait
Focus on modules with strongest associations

Suggested Next Steps

After identifying modules and hub genes:

Functional validation - Validate hub genes experimentally (qPCR, knockdown, overexpression)
Enrichment analysis - Test modules for GO/KEGG enrichment to understand biological processes
Compare with DE results - Overlay DE genes on network to see which modules are enriched
Network visualization - Export to Cytoscape for detailed network visualization
Cross-dataset validation - Test module preservation in independent datasets

Related Skills

bulk-rnaseq-counts-to-de-deseq2 - Normalize counts and perform differential expression analysis (run before WGCNA)
de-results-to-gene-lists - Extract gene lists from DE results to overlay on network
functional-enrichment-from-degs - Perform GO/KEGG enrichment on modules

References

Documentation:

WGCNA Best Practices Guide - Comprehensive guide on data preparation, QC, and troubleshooting
Parameter Tuning Guide - Detailed parameter selection guidance
WGCNA Reference - Complete code examples with explanations
Troubleshooting Guide - Common errors and solutions

Example Data:

Example Datasets - Public datasets for WGCNA analysis

Key Papers:

Key WGCNA Papers - Essential publications
Langfelder & Horvath (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. doi:10.1186/1471-2105-9-559
Zhang & Horvath (2005). A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology. doi:10.2202/1544-6115.1128

Code preview

scripts/build_network.R

# Build co-expression network and detect modules

library(WGCNA)

#' Build co-expression network and detect modules
#'
#' @param datExpr Expression matrix (samples x genes)
#' @param power Soft-thresholding power
#' @param min_module_size Minimum genes per module
#' @param merge_cut_height Height for merging similar modules
#' @return List containing network object, module colors, and module labels
build_network <- function(datExpr, power, min_module_size = 30, merge_cut_height = 0.25) {

  cat("Building network with power =", power, "\n")

  # One-step network construction and module detection
  net <- blockwiseModules(
    datExpr,
    power = power,
    TOMType = "signed",
    networkType = "signed",
    minModuleSize = min_module_size,
    reassignThreshold = 0,
    mergeCutHeight = merge_cut_height,
    numericLabels = TRUE,
    pamRespectsDendro = FALSE,
    saveTOMs = FALSE,
    verbose = 3
  )

  # Convert numeric labels to colors
  module_colors <- labels2colors(net$colors)

  cat("\nModule detection complete:\n")
  cat("Number of modules:", length(unique(module_colors)) - 1, "(excluding grey/unassigned)\n")
  print(table(module_colors))

  return(list(
    net = net,
    module_colors = module_colors,
    module_labels = net$colors
  ))
}

scripts/correlate_modules_traits.R

# Correlate module eigengenes with sample traits

library(WGCNA)
library(ComplexHeatmap)
library(circlize)

#' Correlate module eigengenes with sample traits
#'
#' @param datExpr Expression matrix
#' @param module_colors Module color assignments
#' @param traits Data frame of sample traits (numeric)
#' @param output_file Path to save heatmap
#' @return List with module eigengenes, correlations, and p-values
correlate_modules_traits <- function(datExpr, module_colors, traits,
                                      output_file = "module_trait_correlation.svg") {

  # Calculate module eigengenes
  MEs <- moduleEigengenes(datExpr, colors = module_colors)$eigengenes
  MEs <- orderMEs(MEs)

  # Convert traits to numeric if needed
  traits_numeric <- data.frame(lapply(traits, function(x) {
    if (is.factor(x) || is.character(x)) {
      as.numeric(as.factor(x))
    } else {
      as.numeric(x)
    }
  }))
  rownames(traits_numeric) <- rownames(traits)

  # Calculate correlations
  module_trait_cor <- cor(MEs, traits_numeric, use = "pairwise.complete.obs")
  module_trait_pval <- corPvalueStudent(module_trait_cor, nrow(datExpr))

  # Create text matrix for heatmap (correlation and p-value)
  text_matrix <- matrix(
    paste0(signif(module_trait_cor, 2), "\n(", signif(module_trait_pval, 1), ")"),
    nrow = nrow(module_trait_cor),
    ncol = ncol(module_trait_cor)
  )

  # Create color function
  col_fun <- colorRamp2(c(-1, 0, 1), c("blue", "white", "red"))

  # Create heatmap with ComplexHeatmap
  ht <- Heatmap(
    module_trait_cor,
    name = "Correlation",
    col = col_fun,
    cluster_rows = FALSE,
    cluster_columns = FALSE,
    show_row_names = TRUE,
    show_column_names = TRUE,
    row_names_side = "left",
    column_names_side = "bottom",
    column_title = "Module-Trait Relationships",
    cell_fun = function(j, i, x, y, width, height, fill) {
      grid.text(text_matrix[i, j], x, y, gp = gpar(fontsize = 8))
    },
    heatmap_legend_param = list(
      title = "Correlation",
      at = c(-1, -0.5, 0, 0.5, 1)
    ),
    width = unit(max(8, ncol(traits) * 1.5), "cm"),
    height = unit(max(8, nrow(module_trait_cor) * 0.5), "cm")
  )

  # Save to file
  svg(output_file, width = max(8, ncol(traits) * 1.5), height = max(8, nrow(module_trait_cor) * 0.4))
  draw(ht)
  dev.off()

  cat("Saved:", output_file, "\n")

  # Return results
  results <- data.frame(
    module = rownames(module_trait_cor),
    stringsAsFactors = FALSE
  )

scripts/export_wgcna_results.R

# Export all WGCNA results

library(WGCNA)

#' Export all WGCNA results (including RDS objects for downstream skills)
#'
#' @param results Results object from run_wgcna_analysis()
#' @param output_dir Directory to save results (default: "wgcna_results")
#' @param output_prefix Prefix for output files (default: "wgcna")
export_all <- function(results, output_dir = "wgcna_results", output_prefix = "wgcna") {

  cat("\n=== Exporting WGCNA Results ===\n\n")

  # Create output directory if needed
  if (!dir.exists(output_dir)) {
    dir.create(output_dir, recursive = TRUE)
  }

  # Extract components from results
  gene_info <- results$hub_results$gene_info
  hub_genes <- results$hub_results$hub_genes
  trait_results <- results$trait_results

  # 1. Export gene-module assignments
  csv_path <- file.path(output_dir, paste0(output_prefix, "_gene_modules.csv"))
  write.csv(gene_info, csv_path, row.names = FALSE)
  cat("  Saved:", csv_path, "\n")

  # 2. Export hub genes
  hub_df <- do.call(rbind, lapply(names(hub_genes), function(mod) {
    df <- hub_genes[[mod]]
    df$module <- mod
    df
  }))
  csv_path <- file.path(output_dir, paste0(output_prefix, "_hub_genes.csv"))
  write.csv(hub_df, csv_path, row.names = FALSE)
  cat("  Saved:", csv_path, "\n")

  # 3. Export module-trait correlations
  csv_path <- file.path(output_dir, paste0(output_prefix, "_module_trait_cor.csv"))
  write.csv(trait_results$results, csv_path, row.names = FALSE)
  cat("  Saved:", csv_path, "\n")

  # 4. Export module eigengenes
  csv_path <- file.path(output_dir, paste0(output_prefix, "_eigengenes.csv"))
  write.csv(trait_results$MEs, csv_path)
  cat("  Saved:", csv_path, "\n")

  # 5. Save analysis objects as RDS for downstream skills (CRITICAL)
  cat("\n  Saving analysis objects for downstream use:\n")

  rds_path <- file.path(output_dir, paste0(output_prefix, "_network.rds"))
  saveRDS(results$network, rds_path)
  cat("    • wgcna_network.rds\n")
  cat("      (Load with: net <- readRDS('", basename(rds_path), "'))\n", sep = "")

  rds_path <- file.path(output_dir, paste0(output_prefix, "_module_colors.rds"))
  saveRDS(results$module_colors, rds_path)
  cat("    • wgcna_module_colors.rds\n")
  cat("      (Load with: colors <- readRDS('", basename(rds_path), "'))\n", sep = "")

  rds_path <- file.path(output_dir, paste0(output_prefix, "_expression_matrix.rds"))
  saveRDS(results$datExpr, rds_path)
  cat("    • wgcna_expression_matrix.rds\n")
  cat("      (Load with: expr <- readRDS('", basename(rds_path), "'))\n", sep = "")

  rds_path <- file.path(output_dir, paste0(output_prefix, "_full_results.rds"))
  saveRDS(results, rds_path)
  cat("    • wgcna_full_results.rds (complete results object)\n")
  cat("      (Load with: results <- readRDS('", basename(rds_path), "'))\n", sep = "")

  # 6. Create summary report
  cat("\n  Creating summary report...\n")
  summary_lines <- c(
    "# WGCNA Co-expression Network Analysis Summary\n",
    paste("**Total genes analyzed:**", nrow(gene_info)),
    paste("**Total samples:**", nrow(trait_results$MEs)),
    paste("**Number of modules:**", length(unique(gene_info$module)) - 1, "(excluding grey)\n"),
    "## Module Sizes"
  )

Companion files

Type	Path	Bytes
Text	references/2008_WGCNA an R package for weighted correlation network analysis.pdf.UNAVAILABLE.txt	391
Text	references/2021_WGCNA Gene Correlation Network Analysis - Bioinformatics Workbook.pdf.UNAVAILABLE.txt	401
Markdown	references/parameter-tuning-guide.md	15,005
Markdown	references/troubleshooting.md	10,604
Markdown	references/wgcna-best-practices.md	15,201
Markdown	references/wgcna-reference.md	14,644
R	scripts/build_network.R	1,269
R	scripts/correlate_modules_traits.R	2,786
R	scripts/export_wgcna_results.R	4,480
R	scripts/identify_hub_genes.R	1,958
R	scripts/load_example_data.R	7,444
R	scripts/module_enrichment.R	5,033
R	scripts/pick_soft_power.R	2,452
R	scripts/plot_all_wgcna.R	4,065
R	scripts/plot_eigengene_heatmap.R	1,801
R	scripts/plot_hub_genes.R	2,325
R	scripts/plot_module_dendrogram.R	1,408
R	scripts/plotting_helpers.R	2,930
R	scripts/prepare_wgcna_data.R	2,548
R	scripts/wgcna_workflow.R	3,586
Markdown	SKILL.md	13,975
JSON	skill.meta.json	4,083