Polygenic Risk Score

Compute PRS from PGS Catalog weights — no GWAS needed.

Overview

Problem. Sum thousands of small-effect variants into one risk score.

Use when: Have genotypes; scoring or stratification
Avoid when: Porting European weights across ancestries

Learning goals

Figures

PRS Catalog Overview
How PRS Is Computed
4-Step PRS Workflow
Multi-Trait Dashboard
Common Pitfalls

Tutorial

When to Use This Skill

For de novo PRS from raw GWAS summary statistics, use the polygenic-risk-score skill (LDpred2-auto) instead.

Installation

install.packages(c("data.table", "ggplot2", "ggprism", "dplyr", "R.utils", "jsonlite", "remotes"))
remotes::install_github("privefl/bigsnpr")
Software Version License Commercial Use Install
bigsnpr ≥1.12 GPL-3 ✅ Permitted remotes::install_github("privefl/bigsnpr")
data.table ≥1.14 MPL-2.0 ✅ Permitted install.packages('data.table')
ggplot2 ≥3.4 MIT ✅ Permitted install.packages('ggplot2')
ggprism ≥1.0.3 GPL (≥3) ✅ Permitted install.packages('ggprism')
dplyr ≥1.1 MIT ✅ Permitted install.packages('dplyr')
jsonlite ≥1.8 MIT ✅ Permitted install.packages('jsonlite')
R.utils ≥2.12 LGPL (≥2.1) ✅ Permitted install.packages('R.utils')

Inputs

Outputs

Per-trait files:

  • prs_scores_<trait>.csv — Individual PRS (z-scores, percentiles, population labels)
  • distribution_<trait>.png/svg — PRS distribution histogram
  • population_<trait>.png/svg — PRS by super-population boxplot

Combined files:

  • combined_prs_scores.csv — All individuals x all traits (wide format) + composite risk
  • prs_correlation_matrix.csv — Trait-trait PRS correlation matrix
  • population_summary.csv — Mean PRS by super-population per trait
  • match_reports.csv — Variant matching summary per trait

Dashboard plots:

  • dashboard_correlation_matrix.png/svg — Heatmap of trait PRS correlations
  • dashboard_composite_risk.png/svg — Composite risk distribution by population
  • dashboard_population_heatmap.png/svg — Mean PRS by trait x super-population

Analysis objects (RDS):

  • prs_analysis.rds — Complete analysis object for downstream use
  • Load with: obj <- readRDS('prs_analysis.rds')
  • Contains: combined_scores, per_trait, cor_matrix, match_reports, snp_weights, trait_info

Clarification Questions

  1. Input Data (ASK THIS FIRST):

    • Do you have specific genotype files (.bed/.bim/.fam) to score?
    • Or use 1000 Genomes Phase 3 example data? (2,490 individuals, 26 populations, 5 super-populations)
  2. Traits to Score:

    • (If using example data) The demo scores 5 cardiometabolic traits (CAD, T2D, LDL, BMI, SBP). Choose analysis mode:
    • a) Full cardiometabolic panel — all 5 traits (recommended)
    • b) Select specific traits from the panel
    • (If using your own data) What traits do you want to score? Use search_pgs_catalog("trait name") to find PGS IDs.
  3. Analysis Options: - a) Standard analysis with dashboard (recommended)

    • b) Individual trait scoring only (no dashboard)

Standard Workflow

🚨 MANDATORY: USE SCRIPTS EXACTLY AS SHOWN - DO NOT WRITE INLINE CODE 🚨

Step 1 - Load reference genotypes and PGS weights:

source("scripts/load_reference_data.R")
ref_data <- load_reference_data()

source("scripts/load_pgs_weights.R")
trait_weights <- load_demo_weights()

DO NOT write custom download or parsing code. Use the scripts.

Step 2 - Score all traits:

source("scripts/score_traits.R")

DO NOT write inline scoring code (big_prodVec, allele matching, etc.). Just source the script.

Step 3 - Generate visualizations:

source("scripts/generate_plots.R")
generate_all_plots(all_results, output_dir = "results")

DO NOT write inline plotting code (ggsave, ggplot, geom_tile, etc.). Just use the script.

The script handles PNG + SVG export with graceful fallback for SVG dependencies.

Step 4 - Export results:

source("scripts/export_results.R")
export_all(all_results, output_dir = "results")

DO NOT write custom export code. Use export_all().

✅ VERIFICATION - You should see:

  • After Step 1: "✓ Reference data loaded successfully" and "✓ PGS Catalog weights loaded: 5/5 traits"
  • After Step 2: "✓ Multi-trait PRS scoring completed successfully! (5 traits, 2490 individuals)"
  • After Step 3: "✓ All plots generated successfully!"
  • After Step 4: "=== Export Complete ==="

❌ IF YOU DON'T SEE THESE: You wrote inline code. Stop and use source().

⚠️ CRITICAL - DO NOT:

  • Write inline scoring codeSTOP: Use source("scripts/score_traits.R")
  • Write inline plotting code (ggsave, ggplot, etc.)STOP: Use generate_all_plots()
  • Write custom export codeSTOP: Use export_all()
  • Try to install svglite → script handles SVG fallback automatically

⚠️ IF SCRIPTS FAIL - Script Failure Hierarchy:

  1. Fix and Retry (90%) — Install missing package, re-run script
  2. Modify Script (5%) — Edit the script file itself, document changes
  3. Use as Reference (4%) — Read script, adapt approach, cite source
  4. Write from Scratch (1%) — Only if genuinely impossible, explain why

NEVER skip directly to writing inline code without trying the script first.

Scoring Custom Traits

To score a single custom trait instead of the demo panel:

source("scripts/load_reference_data.R")
ref_data <- load_reference_data()

source("scripts/load_pgs_weights.R")
# Search for available scores
scores <- search_pgs_catalog("your trait name")
# Download specific score
trait_weights <- list()
tw <- download_pgs_weights("PGS_ID_HERE")
trait_weights[["TRAIT"]] <- list(
    weights = tw$weights, pgs_id = tw$pgs_id, score_meta = tw$score_meta,
    trait_name = "Your Trait", short_name = "TRAIT"
)

# Then continue with Steps 2-4 as above
source("scripts/score_traits.R")

Common Issues

Error Cause Solution
"bigsnpr not found" Missing core dependency remotes::install_github("privefl/bigsnpr")
Download timeout Large scoring file or slow connection Set options(timeout = 900) before running Step 1
Low match rate (<50%) Genome build mismatch Ensure PGS weights and genotypes use same build (GRCh37 for 1000G)
PGS ID not found Wrong or deprecated PGS ID Use search_pgs_catalog("trait") to find valid IDs
SVG export error Missing optional dependency generate_all_plots() handles fallback automatically. DO NOT install svglite manually.
"catalog_data not found" Wrong script for this skill Use score_traits.R (not pgs_catalog_scoring.R from the LDpred2 skill)
Memory error during scoring Very large scoring file Normal for genome-wide scores. Ensure ≥8GB RAM available.

Suggested Next Steps

After completing multi-trait PRS:

  1. Downstream analysis — Load prs_analysis.rds for custom analyses
  2. Additional traits — Add more PGS scores to expand the risk panel
  3. De novo PRS — Use polygenic-risk-score skill for traits without PGS Catalog scores
  4. GWAS interpretation — Pair with functional annotation skills

Related Skills

References

  1. Lambert SA, et al. (2021). The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics, 53(4), 420-425.
  2. 1000 Genomes Project Consortium (2015). A global reference for human genetic variation. Nature, 526(7571), 68-74.
  3. Privé F, et al. (2022). Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. AJHG, 109(1), 12-23.
  4. Khera AV, et al. (2018). Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics, 50(9), 1219-1224.
  5. PGS Catalog: https://www.pgscatalog.org/

Code preview

scripts/export_results.R

###############################################################################
# export_results.R — Export multi-trait PRS results and RDS for downstream use
#
# Function:
#   export_all(all_results, output_dir)
###############################################################################

library(data.table)
library(dplyr)

# --- Helper: population summary -----------------------------------------------

.make_population_summary <- function(combined_scores) {
    if (!"super_population" %in% names(combined_scores)) return(NULL)

    prs_cols <- grep("^prs_", names(combined_scores), value = TRUE)
    trait_labels <- sub("^prs_", "", prs_cols)

    rows <- list()
    for (i in seq_along(prs_cols)) {
        s <- combined_scores %>%
            filter(!is.na(super_population)) %>%
            group_by(super_population) %>%
            summarise(
                n = n(),
                mean_z = round(mean(.data[[prs_cols[i]]], na.rm = TRUE), 4),
                sd_z = round(sd(.data[[prs_cols[i]]], na.rm = TRUE), 4),
                min_z = round(min(.data[[prs_cols[i]]], na.rm = TRUE), 4),
                max_z = round(max(.data[[prs_cols[i]]], na.rm = TRUE), 4),
                .groups = "drop"
            ) %>%
            mutate(trait = trait_labels[i])
        rows[[i]] <- s
    }

    # Add composite risk summary
    comp_s <- combined_scores %>%
        filter(!is.na(super_population)) %>%
        group_by(super_population) %>%
        summarise(
            n = n(),
            mean_z = round(mean(composite_risk, na.rm = TRUE), 4),
            sd_z = round(sd(composite_risk, na.rm = TRUE), 4),
            min_z = round(min(composite_risk, na.rm = TRUE), 4),
            max_z = round(max(composite_risk, na.rm = TRUE), 4),
            .groups = "drop"
        ) %>%
        mutate(trait = "COMPOSITE")
    rows[[length(rows) + 1]] <- comp_s

    do.call(rbind, rows)
}

# --- Main export function -----------------------------------------------------

#' Export all multi-trait PRS results
#' @param all_results list from score_traits.R
#' @param output_dir Directory to save files (default: "results")
export_all <- function(all_results, output_dir = "results") {
    cat("\n=== Exporting Multi-Trait PRS Results ===\n\n")

    dir.create(output_dir, showWarnings = FALSE, recursive = TRUE)

    # 1. Combined scores (wide format: one row per individual, one column per trait)
    cat("[1/7] Combined PRS scores (wide format)\n")
    fwrite(all_results$combined_scores, file.path(output_dir, "combined_prs_scores.csv"))
    cat("  Saved:", file.path(output_dir, "combined_prs_scores.csv"), "\n")
    cat("  (", nrow(all_results$combined_scores), " individuals x ",
        all_results$n_traits, " traits)\n\n", sep = "")

    # 2. Per-trait scores (individual CSVs)
    cat("[2/7] Per-trait PRS scores\n")
    for (trait_name in all_results$trait_names) {
        fname <- paste0("prs_scores_", tolower(trait_name), ".csv")
        fwrite(all_results$per_trait[[trait_name]], file.path(output_dir, fname))
        cat("  Saved:", file.path(output_dir, fname), "\n")
    }
    cat("\n")

    # 3. Correlation matrix

scripts/generate_plots.R

###############################################################################
# generate_plots.R — Multi-trait PRS visualizations using ggprism
#
# Per-trait plots:
#   plot_trait_distribution()      — Histogram for one trait
#   plot_trait_population()        — Boxplot by super-population for one trait
#
# Dashboard plots:
#   plot_correlation_heatmap()     — Trait-trait PRS correlation matrix
#   plot_composite_risk()          — Composite risk distribution by population
#   plot_population_heatmap()      — Mean PRS by trait x super-population
#
# Orchestrator:
#   generate_all_plots(all_results, output_dir)
###############################################################################

library(ggplot2)
library(ggprism)
library(dplyr)

# --- SVG support (optional, graceful fallback) --------------------------------

.has_svglite <- requireNamespace("svglite", quietly = TRUE)
if (.has_svglite) library(svglite)

# --- Save helper (PNG + SVG always) -------------------------------------------

.save_plot <- function(plot, base_path, width = 8, height = 6, dpi = 300) {
    png_path <- sub("\\.(svg|png)$", ".png", base_path)
    ggsave(png_path, plot = plot, width = width, height = height, dpi = dpi, device = "png")
    cat("   Saved:", png_path, "\n")

    svg_path <- sub("\\.(svg|png)$", ".svg", base_path)
    tryCatch({
        ggsave(svg_path, plot = plot, width = width, height = height, device = "svg")
        cat("   Saved:", svg_path, "\n")
    }, error = function(e) {
        tryCatch({
            svg(svg_path, width = width, height = height)
            print(plot)
            dev.off()
            cat("   Saved:", svg_path, "\n")
        }, error = function(e2) {
            cat("   (SVG export failed - PNG available)\n")
        })
    })
}

# Super-population color palette (consistent across all plots)
.pop_colors <- c(
    "AFR" = "#E74C3C", "AMR" = "#F39C12", "EAS" = "#2ECC71",
    "EUR" = "#3498DB", "SAS" = "#9B59B6"
)

# --- Per-trait: PRS Distribution ----------------------------------------------

plot_trait_distribution <- function(scores, trait_name, output_dir = "results") {
    p <- ggplot(scores, aes(x = prs_zscore)) +
        geom_histogram(aes(y = after_stat(density)),
                       bins = 30, fill = "#4A90D9", color = "white", alpha = 0.8) +
        geom_density(color = "#2C3E50", linewidth = 0.8) +
        geom_vline(xintercept = 0, linetype = "dashed", color = "#E67E22", linewidth = 0.8) +
        labs(
            title = paste(trait_name, "- PRS Distribution"),
            subtitle = paste(nrow(scores), "individuals from 1000 Genomes Phase 3"),
            x = "PRS (Z-score)",
            y = "Density"
        ) +
        theme_prism(base_size = 12) +
        theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
              plot.subtitle = element_text(hjust = 0.5, size = 10, color = "gray40"))

    fname <- paste0("distribution_", gsub(" ", "_", tolower(trait_name)), ".png")
    .save_plot(p, file.path(output_dir, fname), width = 8, height = 6)
    return(invisible(p))
}

# --- Per-trait: Population Comparison -----------------------------------------

plot_trait_population <- function(scores, trait_name, output_dir = "results") {

scripts/load_pgs_weights.R

###############################################################################
# load_pgs_weights.R — Search PGS Catalog and download scoring weights
#
# Functions:
#   search_pgs_catalog(trait)      — Search for available PGS scores by trait
#   download_pgs_weights(pgs_id)   — Download a single harmonized scoring file
#   get_demo_traits()              — Get cardiometabolic demo trait configuration
#   load_demo_weights(data_dir)    — Download all 5 cardiometabolic demo weights
###############################################################################

library(data.table)

if (!requireNamespace("jsonlite", quietly = TRUE)) {
    install.packages("jsonlite")
}

`%||%` <- function(a, b) if (!is.null(a)) a else b

# --- Search PGS Catalog ------------------------------------------------------

#' Search PGS Catalog for available scores by trait name
#' @param trait Character string (e.g. "coronary artery disease", "BMI", "LDL cholesterol")
#' @return data.frame of matching scores with id, name, trait, variants, method, publication
search_pgs_catalog <- function(trait) {
    cat("Searching PGS Catalog for '", trait, "'...\n", sep = "")

    trait_url <- paste0("https://www.pgscatalog.org/rest/trait/search?term=",
                        utils::URLencode(trait))
    trait_resp <- jsonlite::fromJSON(trait_url, flatten = TRUE)

    if (length(trait_resp$results) == 0) {
        cat("  No traits found matching '", trait, "'\n", sep = "")
        return(data.frame())
    }

    all_pgs_ids <- unique(unlist(trait_resp$results$associated_pgs_ids))
    if (length(all_pgs_ids) == 0) {
        cat("  Traits found but no associated PGS scores\n")
        return(data.frame())
    }

    cat("  Found", length(all_pgs_ids), "scores across",
        nrow(trait_resp$results), "matching traits\n\n")

    # Fetch details for top scores (limit to first 20 to avoid rate limiting)
    fetch_ids <- head(all_pgs_ids, 20)
    scores <- do.call(rbind, lapply(fetch_ids, function(pgs_id) {
        score_url <- paste0("https://www.pgscatalog.org/rest/score/", pgs_id)
        tryCatch({
            s <- jsonlite::fromJSON(score_url, flatten = TRUE)
            data.frame(
                pgs_id = s$id,
                name = s$name %||% NA_character_,
                trait = s$trait_reported %||% NA_character_,
                variants = s$variants_number %||% NA_integer_,
                method = s$method_name %||% NA_character_,
                genome_build = s$variants_genomebuild %||% NA_character_,
                publication = paste0(s$publication$firstauthor %||% "Unknown",
                                     " (", substr(s$publication$date_publication %||% "", 1, 4), ")"),
                stringsAsFactors = FALSE
            )
        }, error = function(e) NULL)
    }))

    if (is.null(scores) || nrow(scores) == 0) {
        cat("  Could not fetch score details\n")
        return(data.frame())
    }

    cat("  Available PGS scores:\n")
    cat("  ", paste(rep("-", 80), collapse = ""), "\n")
    for (i in seq_len(min(nrow(scores), 15))) {
        cat(sprintf("  %s | %s variants | %s | %s\n",
                    scores$pgs_id[i],
                    format(scores$variants[i], big.mark = ","),
                    scores$method[i] %||% "N/A",
                    scores$publication[i]))
    }
    if (nrow(scores) > 15) cat("  ... and", nrow(scores) - 15, "more\n")
    cat("  ", paste(rep("-", 80), collapse = ""), "\n\n")

Companion files

TypePathBytes
Markdownreferences/interpretation-guide.md2,911
Markdownreferences/pgs-catalog-guide.md3,481
Rscripts/export_results.R8,008
Rscripts/generate_plots.R12,115
Rscripts/load_pgs_weights.R9,155
Rscripts/load_reference_data.R7,122
Rscripts/score_traits.R7,703
MarkdownSKILL.md8,809
JSONskill.meta.json1,892