⌂ / Overview / Bulk Transcriptomics / DE — Auto Method Selection

View companion source

DE — Auto Method Selection

Unsure which DE method? It picks based on your data.

Overview

Problem. Data-driven choice among the three mainstream DE methods.

Use when: A count matrix but unsure which method

Avoid when: A method is already mandated

Learning goals

Small n→edgeR, normalised/large→limma-voom, medium-large→DESeq2
Inspect the data before choosing a tool

Figures

Tutorial

A method-aware differential expression skill for bulk RNA-seq count data. It does not lock you into one engine. Instead it looks at your data, recommends DESeq2, edgeR, or limma-voom with a plain-language rationale, lets you confirm or override, then runs a single unified workflow and (optionally) cross-checks gene calls across methods.

Reach for this whenever someone has a bulk RNA-seq count matrix and wants differentially expressed genes — especially when the question is "which method should I use?"

When to Use This Skill

Use it when the user has a bulk RNA-seq count matrix and wants differentially expressed genes, and any of the following is true:

They don't specify a method, or explicitly aren't sure whether to use DESeq2, edgeR, or limma-voom.
The experiment has small n, large n, or a non-trivial design (batch, paired, multi-group).
They want a robustness/sensitivity check across methods (concordance).
They uploaded quant.sf (Salmon), Kallisto, featureCounts, or a SummarizedExperiment.

Not for: single-cell RNA-seq (use a scRNA-seq skill), normalized-only data where raw counts are unavailable (handled only as a flagged last-resort fallback — see Caveats), or upstream alignment/quantification (counts are the entry point).

Inputs

Input	Format	Notes
Count matrix	genes x samples, raw integer counts	CSV/TSV/RDS, tximport (Salmon/Kallisto), featureCounts, or `SummarizedExperiment`. Not TPM/FPKM/log-normalized.
Sample metadata	data frame	Rownames (or a sample-id column) match count columns. Must contain a grouping column (default name `condition`). Optional: `batch`, `individual`/`subject`, other covariates.
User options (optional)	—	Method override, reference level, design formula, which contrast/coefficient to test, significance thresholds, whether to run concordance.

The mechanics of loading, validating, and orienting the matrix live in scripts/load_data.R — use it rather than reimplementing ingestion.

Outputs (saved to the results directory)

de_results_<method>.csv — full standardized results table.
de_significant_<method>.csv — significant genes (default padj < 0.05) plus the |log2FC| ≥ 1 subset.
QC plots (PNG and SVG): PCA, MA, volcano, and the method-appropriate mean-variance / dispersion plot.
normalized_counts.csv and the fitted model object (.rds).
On request: concordance_table.csv + consensus_degs.csv.
run_log.txt — method chosen and why, design formula, contrast tested, filter summary, thresholds, full-rank check result, per-engine baseMean_equiv_scale, and package versions.

Standardized results schema (identical across all three engines)

gene_id, baseMean_equiv, log2FoldChange, pvalue, padj, method

This common schema is what makes downstream tooling and concordance method-agnostic. Two columns require care:

baseMean_equiv is a method-native expression-strength metric and is NOT comparable across methods. DESeq2 reports it on the linear count scale (mean normalized count); edgeR (logCPM) and limma (AveExpr) report it on the log2 scale. The same gene's value can differ by orders of magnitude purely because of which engine ran. It is fine for an engine's own MA plot and within-method ranking, but never use it in cross-method logic. See references/comparison-and-caveats.md.
There is deliberately no merged stat column. Wald z (DESeq2), QL F (edgeR), and moderated t (limma) are not comparable, so each engine keeps its native statistic only in its own raw output (labeled with a stat_type); it is never cross-merged.

Workflow

The hard mechanics are in the scripts; your job is to drive them in the right order and explain the method choice. Run from the skill directory so relative scripts/... paths resolve.

Step 0 — Clarify inputs (ask first).

Step 1 — Load and validate.

source("scripts/load_data.R")
loaded <- load_de_data(counts = "<path-or-matrix>", metadata = "<path-or-df>",
                       condition_col = "condition")

This checks sample-ID concordance, matrix orientation, and integer-ness, and returns a clean counts matrix + coldata data frame.

Step 2 — Inspect and recommend (advisory).

source("scripts/inspect_and_recommend.R")
rec <- inspect_and_recommend(loaded$counts, loaded$coldata,
                             condition_col = "condition",
                             design = ~ condition)   # or your covariate formula
print(rec)

This reports replicates per group, integer-ness, library-size spread, PCA clustering by condition/batch, and a programmatic full-rank/confounding check (model-matrix rank vs its number of columns). It returns a recommended method and a proposed design formula.

Translate rec into a short prose recommendation for the user — prefer the recommended method, explain why, name the alternatives, and ask them to confirm or override. The heuristics are advisory, not hard rules:

Default to DESeq2 for typical experiments — robust and widely validated.
Lean toward edgeR when replicates per group are very small — its quasi-likelihood F-test handles tiny-n dispersion estimation well.
Lean toward limma-voom when sample size is large or the design is complex/multi-factor — it is fast, well-calibrated, and flexible with model formulas.
If more than one method is reasonable, recommend the default and offer the concordance check rather than forcing a single choice.

If inspect_and_recommend reports the design is not full rank (e.g. batch perfectly aliased with condition), stop and resolve the confounding with the user before running — do not silently proceed.

Step 3 — Confirm.

Step 4 — Shared pre-filter (always, before any engine).

source("scripts/filter_counts.R")
filt <- filter_counts(loaded$counts, loaded$coldata, condition_col = "condition")

This applies one common filterByExpr() low-expression filter on the raw counts. The same filtered gene set is reused by whichever engine(s) run — this is what makes a later concordance comparison reflect method differences rather than filtering differences.

Step 5 — Run the chosen engine

source("scripts/run_deseq2.R")      # or run_edger.R / run_limma_voom.R
fit <- run_deseq2(filt$counts, filt$coldata,
                  design = ~ condition,
                  contrast = c("condition", "treated", "control"),  # or coef name
                  ref_level = "control")
res <- fit$results   # standardized data frame

For DESeq2, apeglm LFC shrinkage is applied for the engine's own ranking/visualization; the standardized log2FoldChange reported for cross-method use is the unshrunk value.

Step 6 — QC plots and export.

source("scripts/qc_plots.R");      run_all_qc(fit, output_dir = "results")
source("scripts/export_results.R"); export_de(fit, output_dir = "results")

Step 7 — (On request) cross-method concordance.

source("scripts/concordance.R")
concordance(list(deseq2 = res_deseq2, edger = res_edger, limma = res_limma),
            padj_cutoff = 0.05, output_dir = "results")

This produces an overlap-count table and a consensus DEG list (genes significant in ≥2 methods). It compares using only padj and unshrunk log2FoldChange — never baseMean_equiv. Concordance is a robustness check, not a significance booster.

Step 8 — Report.

Method Selection at a Glance (advisory)

Situation	Lean toward	Why
Typical experiment, raw counts, simple design	DESeq2 (default)	Robust, validated, good default shrinkage
Very small replicates per group	edgeR (QL F-test)	Quasi-likelihood handles tiny-n dispersion
Large n and/or complex multi-factor design	limma-voom	Fast, well-calibrated, flexible formulas
User wants a robustness check	run ≥2 + concordance	Reports gene-call overlap across methods

These are starting points to explain and confirm with the user, not thresholds to apply mechanically. Full narrative: references/method-selection-guide.md.

Scientific Caveats

Use adjusted p-values (BH-FDR padj), never raw p-values, to call significance — thousands of genes are tested simultaneously.
Count-based methods (DESeq2, edgeR) and limma-voom all require raw integer counts, never TPM/FPKM. If only normalized values exist, log2(TPM) → limma-trend is a documented last-resort fallback with strong caveats: TPM's transcript-length normalization violates the logCPM mean–variance assumption, so results are approximate and must be flagged as such.
The design must be full rank / not confounded. This is checked programmatically in inspect_and_recommend.R; resolve any confounding before running.
baseMean_equiv is not comparable across methods (see schema note above).
Cross-method LFCs are only compared on unshrunk terms in concordance output.

Self-Test / Eval

scripts/selftest.R runs all three engines on a bundled example dataset (pasilla) and asserts: every engine emits the standardized schema; significance uses padj; the shared filter yields one common gene universe; the recommender behaves sensibly across n; the full-rank check flags a confounded design; concordance uses only padj/LFC (never baseMean_equiv); and a non-integer matrix triggers the raw-counts guardrail. Run it after installing the engine packages to validate the skill end-to-end.

References

references/method-selection-guide.md — when and why to choose each engine (narrative).
references/comparison-and-caveats.md — assumptions, FDR, design/contrast handling, LFC- and baseMean-comparability.
references/troubleshooting.md — common errors across all three engines.

Code preview

scripts/concordance.R

# =============================================================================
# concordance.R  —  on-demand cross-method DEG concordance (TABLE ONLY)
#
# Compares significant-gene calls across 2+ engines that were run on the SAME
# shared-filtered gene universe (see filter_counts.R). This is a ROBUSTNESS check,
# NOT a way to gain significance.
#
# Comparability rules (enforced here):
#   - Significance is decided ONLY by `padj` (BH-FDR) at `padj_cutoff`
#     (optionally combined with |log2FoldChange| >= lfc_cutoff if lfc_cutoff > 0).
#   - The consensus list reports the UNSHRUNK `log2FoldChange` carried in each
#     engine's standardized table (the engines already report unshrunk LFC).
#   - `baseMean_equiv` is NEVER used here (its scale differs by engine).
#
# Inputs:
#   results_list : named list of standardized data.frames from run_*() ($results),
#                  e.g. list(DESeq2 = res1, edgeR = res2, `limma-voom` = res3)
#   padj_cutoff  : significance threshold (default 0.05)
#   lfc_cutoff   : optional |log2FC| filter applied to significance (default 0 = none)
#   min_methods  : a gene is "consensus" if significant in >= this many methods (default 2)
#
# Writes (to output_dir):
#   concordance_table.csv : per-method DEG counts + pairwise overlaps + Jaccard
#   consensus_degs.csv    : genes significant in >= min_methods, with each method's
#                           unshrunk log2FC and padj, and n_methods_significant
# Returns these as a list (invisibly).
# =============================================================================

concordance <- function(results_list, padj_cutoff = 0.05, lfc_cutoff = 0,
                        min_methods = 2L, output_dir = "results") {
  stopifnot(length(results_list) >= 2L)
  dir.create(output_dir, showWarnings = FALSE, recursive = TRUE)
  if (is.null(names(results_list)) || any(names(results_list) == "")) {
    names(results_list) <- vapply(results_list, function(r) unique(r$method)[1], character(1))
  }
  methods <- names(results_list)

  # --- guardrail: refuse to touch baseMean_equiv in any cross-method computation ---
  # (defensive: ensure callers didn't pass a column we must not compare)
  sig_set <- function(df) {
    keep <- !is.na(df$padj) & df$padj < padj_cutoff
    if (lfc_cutoff > 0) keep <- keep & is.finite(df$log2FoldChange) &
      abs(df$log2FoldChange) >= lfc_cutoff
    df$gene_id[keep]
  }
  sig_lists <- lapply(results_list, sig_set)
  names(sig_lists) <- methods

  # universe = union of all gene_ids tested (should be identical if shared-filtered)
  universes <- lapply(results_list, function(d) d$gene_id)
  shared_ok <- length(unique(lapply(universes, function(u) sort(unique(u))))) == 1L

  # ---------- summary / overlap table ----------
  rows <- list()
  for (mi in methods) {
    rows[[length(rows) + 1]] <- data.frame(
      comparison = mi, type = "n_significant",
      value = length(sig_lists[[mi]]), stringsAsFactors = FALSE)
  }
  # all pairwise overlaps + Jaccard
  if (length(methods) >= 2L) {
    cb <- utils::combn(methods, 2L)
    for (k in seq_len(ncol(cb))) {
      a <- cb[1, k]; b <- cb[2, k]
      inter <- length(intersect(sig_lists[[a]], sig_lists[[b]]))
      uni <- length(union(sig_lists[[a]], sig_lists[[b]]))
      rows[[length(rows) + 1]] <- data.frame(
        comparison = paste(a, b, sep = " & "), type = "overlap",
        value = inter, stringsAsFactors = FALSE)
      rows[[length(rows) + 1]] <- data.frame(
        comparison = paste(a, b, sep = " & "), type = "jaccard",
        value = if (uni > 0) round(inter / uni, 4) else NA_real_,
        stringsAsFactors = FALSE)
    }
  }
  # genes significant in ALL methods
  all_inter <- Reduce(intersect, sig_lists)
  rows[[length(rows) + 1]] <- data.frame(
    comparison = paste(methods, collapse = " & "), type = "overlap_all",
    value = length(all_inter), stringsAsFactors = FALSE)

scripts/export_results.R

# =============================================================================
# export_results.R  —  standardized exports + run log
#
# Works on the `fit` object returned by any run_*() engine. All engines return
# the same structure:
#   fit$results        : standardized data.frame
#                        (gene_id, baseMean_equiv, log2FoldChange, pvalue, padj, method)
#   fit$method         : "DESeq2" | "edgeR" | "limma-voom" | "limma-trend"
#   fit$baseMean_scale : "linear" | "log2"   (scale of baseMean_equiv for THIS engine)
#   fit$stat_type      : native statistic label (Wald z / QL F / moderated t)
#   fit$normalized     : normalized expression matrix (for export)
#   fit$object         : the underlying fitted object (for saveRDS)
#   fit$meta           : list with design, contrast, ref_level, filter_summary,
#                        full_rank, package_version, thresholds, ...
#
# Writes (to output_dir):
#   de_results_<method>.csv
#   de_significant_<method>.csv      (padj < cutoff)
#   de_significant_fc_<method>.csv   (padj < cutoff & |log2FC| >= lfc_cutoff)
#   normalized_counts_<method>.csv
#   de_fit_<method>.rds
#   run_log.txt                      (appended; one block per engine run)
# =============================================================================

# Null/NA-coalescing helper (defined first so it is available throughout).
`%||%` <- function(a, b) if (is.null(a) || length(a) == 0L ||
                             (length(a) == 1L && is.na(a))) b else a

# saveRDS to S3-backed mounts (/mnt/results, /mnt/shared-workspace) fails or
# yields 0-byte files because .rds uses random-access writes. Write to local
# scratch first, then shell-copy into place.
.safe_saveRDS <- function(object, path) {
  on_s3 <- grepl("^/mnt/(results|shared-workspace)", normalizePath(dirname(path),
                                                                   mustWork = FALSE))
  if (on_s3) {
    tmp <- file.path(tempdir(), basename(path))
    saveRDS(object, tmp)
    ok <- file.copy(tmp, path, overwrite = TRUE)
    # R's file.copy can also 0-byte on FUSE; fall back to shell cp if so.
    if (!isTRUE(ok) || file.info(path)$size %in% c(0, NA)) {
      system2("cp", c(shQuote(tmp), shQuote(path)))
    }
    unlink(tmp)
  } else {
    saveRDS(object, path)
  }
  invisible(path)
}

export_de <- function(fit, output_dir = "results",
                      padj_cutoff = 0.05, lfc_cutoff = 1) {
  dir.create(output_dir, showWarnings = FALSE, recursive = TRUE)
  m <- fit$method
  res <- fit$results

  # ---- standardized full table ----
  f_all <- file.path(output_dir, sprintf("de_results_%s.csv", m))
  utils::write.csv(res, f_all, row.names = FALSE)

  # ---- significant (padj only; the headline definition) ----
  sig <- res[!is.na(res$padj) & res$padj < padj_cutoff, , drop = FALSE]
  sig <- sig[order(sig$padj), , drop = FALSE]
  f_sig <- file.path(output_dir, sprintf("de_significant_%s.csv", m))
  utils::write.csv(sig, f_sig, row.names = FALSE)

  # ---- significant with fold-change subset ----
  sig_fc <- sig[abs(sig$log2FoldChange) >= lfc_cutoff, , drop = FALSE]
  f_sigfc <- file.path(output_dir, sprintf("de_significant_fc_%s.csv", m))
  utils::write.csv(sig_fc, f_sigfc, row.names = FALSE)

  # ---- normalized counts ----
  f_norm <- NA_character_
  if (!is.null(fit$normalized)) {
    f_norm <- file.path(output_dir, sprintf("normalized_counts_%s.csv", m))
    utils::write.csv(as.data.frame(fit$normalized), f_norm)
  }

  # ---- fitted object ----
  # saveRDS uses random-access writes, which break on S3-backed mounts; the
  # helper writes to local scratch then copies into place.

scripts/filter_counts.R

# =============================================================================
# filter_counts.R  —  shared low-expression pre-filter for ALL engines
#
# Applies ONE common filter (edgeR::filterByExpr) to the raw count matrix using
# the grouping variable, so the SAME filtered gene set is handed to whichever
# engine(s) run. This is essential for fair cross-method concordance: otherwise
# the overlap would partly measure differences in each engine's own internal
# filtering rather than differences in the statistical methods.
#
# Returns: list(counts = filtered matrix, coldata = unchanged, keep = logical,
#               n_before, n_after, filter_summary = character)
# =============================================================================

source_local <- function(f) {
  # source a sibling script regardless of caller's working dir, if available
  here <- tryCatch(dirname(sys.frame(1)$ofile), error = function(e) NA)
  cand <- if (!is.na(here)) file.path(here, f) else f
  if (file.exists(cand)) source(cand)
}

filter_counts <- function(counts, coldata, condition_col = "condition",
                          group = NULL, design = NULL, ...) {
  options(repos = c(CRAN = "https://cloud.r-project.org"))
  if (!requireNamespace("edgeR", quietly = TRUE)) {
    if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
    BiocManager::install("edgeR", update = FALSE, ask = FALSE)
  }

  counts <- as.matrix(counts)
  storage.mode(counts) <- "double"

  if (is.null(group)) {
    if (!condition_col %in% colnames(coldata)) {
      stop("filter_counts: grouping column '", condition_col, "' not in coldata.")
    }
    group <- factor(coldata[[condition_col]])
  }

  n_before <- nrow(counts)

  # filterByExpr keeps genes with a worthwhile count in a minimum number of
  # samples, sized to the smallest group. Use the design if supplied (better for
  # multi-factor models); otherwise use the group factor.
  keep <- if (!is.null(design)) {
    mm <- stats::model.matrix(design, data = coldata)
    edgeR::filterByExpr(counts, design = mm)
  } else {
    edgeR::filterByExpr(counts, group = group)
  }

  counts_f <- counts[keep, , drop = FALSE]
  n_after <- nrow(counts_f)

  summary_txt <- sprintf(
    "Shared filterByExpr pre-filter: %d -> %d genes kept (%d removed); applied identically before all engines.",
    n_before, n_after, n_before - n_after)
  message(summary_txt)

  list(
    counts = counts_f,
    coldata = coldata,
    keep = keep,
    n_before = n_before,
    n_after = n_after,
    filter_summary = summary_txt
  )
}

Companion files

Type	Path	Bytes
Markdown	references/comparison-and-caveats.md	5,386
Markdown	references/method-selection-guide.md	4,721
Markdown	references/troubleshooting.md	6,138
R	scripts/concordance.R	6,092
R	scripts/export_results.R	5,548
R	scripts/filter_counts.R	2,590
R	scripts/inspect_and_recommend.R	9,315
R	scripts/load_data.R	6,980
R	scripts/qc_plots.R	5,752
R	scripts/run_deseq2.R	4,938
R	scripts/run_edger.R	5,397
R	scripts/run_limma_voom.R	5,988
R	scripts/selftest.R	16,427
Markdown	SKILL.md	10,939
JSON	skill.meta.json	3,020