# Polygenic Risk Score (PGS Catalog) ## When to Use This Skill - **You have a trait of interest** and want to calculate PRS using published, peer-reviewed weights - **Multi-trait risk profiling** (e.g., cardiometabolic panel: CAD, T2D, LDL, BMI, blood pressure) - **Population comparisons** of genetic risk across ancestry groups - **No GWAS summary statistics needed** — uses pre-computed weights from PGS Catalog - **Quick PRS** — minutes per trait (download + score), no LD computation required **For de novo PRS from raw GWAS summary statistics**, use the `polygenic-risk-score` skill (LDpred2-auto) instead. ## Installation ``` install.packages(c("data.table", "ggplot2", "ggprism", "dplyr", "R.utils", "jsonlite", "remotes")) remotes::install_github("privefl/bigsnpr") ``` | Software | Version | License | Commercial Use | Install | | --- | --- | --- | --- | --- | | bigsnpr | ≥1.12 | GPL-3 | ✅ Permitted | `remotes::install_github("privefl/bigsnpr")` | | data.table | ≥1.14 | MPL-2.0 | ✅ Permitted | `install.packages('data.table')` | | ggplot2 | ≥3.4 | MIT | ✅ Permitted | `install.packages('ggplot2')` | | ggprism | ≥1.0.3 | GPL (≥3) | ✅ Permitted | `install.packages('ggprism')` | | dplyr | ≥1.1 | MIT | ✅ Permitted | `install.packages('dplyr')` | | jsonlite | ≥1.8 | MIT | ✅ Permitted | `install.packages('jsonlite')` | | R.utils | ≥2.12 | LGPL (≥2.1) | ✅ Permitted | `install.packages('R.utils')` | ## Inputs - **Target genotypes:** PLINK binary format (.bed/.bim/.fam) — or use 1000 Genomes Phase 3 example data (2,490 individuals, 5 super-populations) - **PGS Catalog score IDs:** One or more PGS IDs (e.g., `PGS000018` for CAD) — use `search_pgs_catalog()` to discover available scores - **Genome build:** GRCh37 (default, matches 1000 Genomes) or GRCh38 ## Outputs **Per-trait files:** - `prs_scores_.csv` — Individual PRS (z-scores, percentiles, population labels) - `distribution_.png/svg` — PRS distribution histogram - `population_.png/svg` — PRS by super-population boxplot **Combined files:** - `combined_prs_scores.csv` — All individuals x all traits (wide format) + composite risk - `prs_correlation_matrix.csv` — Trait-trait PRS correlation matrix - `population_summary.csv` — Mean PRS by super-population per trait - `match_reports.csv` — Variant matching summary per trait **Dashboard plots:** - `dashboard_correlation_matrix.png/svg` — Heatmap of trait PRS correlations - `dashboard_composite_risk.png/svg` — Composite risk distribution by population - `dashboard_population_heatmap.png/svg` — Mean PRS by trait x super-population **Analysis objects (RDS):** - `prs_analysis.rds` — Complete analysis object for downstream use - Load with: `obj <- readRDS('prs_analysis.rds')` - Contains: combined\_scores, per\_trait, cor\_matrix, match\_reports, snp\_weights, trait\_info ## Clarification Questions 1. **Input Data** (ASK THIS FIRST): - Do you have specific genotype files (.bed/.bim/.fam) to score? - **Or use 1000 Genomes Phase 3 example data?** (2,490 individuals, 26 populations, 5 super-populations) 2. **Traits to Score:** - *(If using example data)* The demo scores 5 cardiometabolic traits (CAD, T2D, LDL, BMI, SBP). Choose analysis mode: - a) Full cardiometabolic panel — all 5 traits (recommended) - b) Select specific traits from the panel - *(If using your own data)* What traits do you want to score? Use `search_pgs_catalog("trait name")` to find PGS IDs. 3. **Analysis Options:** - a) Standard analysis with dashboard (recommended) - b) Individual trait scoring only (no dashboard) ## Standard Workflow 🚨 **MANDATORY: USE SCRIPTS EXACTLY AS SHOWN - DO NOT WRITE INLINE CODE** 🚨 **Step 1 - Load reference genotypes and PGS weights:** ``` source("scripts/load_reference_data.R") ref_data <- load_reference_data() source("scripts/load_pgs_weights.R") trait_weights <- load_demo_weights() ``` **DO NOT write custom download or parsing code. Use the scripts.** **Step 2 - Score all traits:** ``` source("scripts/score_traits.R") ``` **DO NOT write inline scoring code (big\_prodVec, allele matching, etc.). Just source the script.** **Step 3 - Generate visualizations:** ``` source("scripts/generate_plots.R") generate_all_plots(all_results, output_dir = "results") ``` 🚨 **DO NOT write inline plotting code (ggsave, ggplot, geom\_tile, etc.). Just use the script.** 🚨 **The script handles PNG + SVG export with graceful fallback for SVG dependencies.** **Step 4 - Export results:** ``` source("scripts/export_results.R") export_all(all_results, output_dir = "results") ``` **DO NOT write custom export code. Use export\_all().** **✅ VERIFICATION - You should see:** - After Step 1: `"✓ Reference data loaded successfully"` and `"✓ PGS Catalog weights loaded: 5/5 traits"` - After Step 2: `"✓ Multi-trait PRS scoring completed successfully! (5 traits, 2490 individuals)"` - After Step 3: `"✓ All plots generated successfully!"` - After Step 4: `"=== Export Complete ==="` **❌ IF YOU DON'T SEE THESE:** You wrote inline code. Stop and use source(). ⚠️ **CRITICAL - DO NOT:** - ❌ **Write inline scoring code** → **STOP: Use `source("scripts/score_traits.R")`** - ❌ **Write inline plotting code (ggsave, ggplot, etc.)** → **STOP: Use `generate_all_plots()`** - ❌ **Write custom export code** → **STOP: Use `export_all()`** - ❌ **Try to install svglite** → script handles SVG fallback automatically **⚠️ IF SCRIPTS FAIL - Script Failure Hierarchy:** 1. **Fix and Retry (90%)** — Install missing package, re-run script 2. **Modify Script (5%)** — Edit the script file itself, document changes 3. **Use as Reference (4%)** — Read script, adapt approach, cite source 4. **Write from Scratch (1%)** — Only if genuinely impossible, explain why **NEVER skip directly to writing inline code without trying the script first.** ## Scoring Custom Traits To score a single custom trait instead of the demo panel: ``` source("scripts/load_reference_data.R") ref_data <- load_reference_data() source("scripts/load_pgs_weights.R") # Search for available scores scores <- search_pgs_catalog("your trait name") # Download specific score trait_weights <- list() tw <- download_pgs_weights("PGS_ID_HERE") trait_weights[["TRAIT"]] <- list( weights = tw$weights, pgs_id = tw$pgs_id, score_meta = tw$score_meta, trait_name = "Your Trait", short_name = "TRAIT" ) # Then continue with Steps 2-4 as above source("scripts/score_traits.R") ``` ## Common Issues | Error | Cause | Solution | | --- | --- | --- | | **"bigsnpr not found"** | **Missing core dependency** | **`remotes::install_github("privefl/bigsnpr")`** | | **Download timeout** | **Large scoring file or slow connection** | **Set `options(timeout = 900)` before running Step 1** | | **Low match rate (<50%)** | **Genome build mismatch** | **Ensure PGS weights and genotypes use same build (GRCh37 for 1000G)** | | **PGS ID not found** | **Wrong or deprecated PGS ID** | **Use `search_pgs_catalog("trait")` to find valid IDs** | | **SVG export error** | **Missing optional dependency** | **`generate_all_plots()` handles fallback automatically. DO NOT install svglite manually.** | | **"catalog\_data not found"** | **Wrong script for this skill** | **Use `score_traits.R` (not `pgs_catalog_scoring.R` from the LDpred2 skill)** | | **Memory error during scoring** | **Very large scoring file** | **Normal for genome-wide scores. Ensure ≥8GB RAM available.** | ## Suggested Next Steps After completing multi-trait PRS: 1. **Downstream analysis** — Load `prs_analysis.rds` for custom analyses 2. **Additional traits** — Add more PGS scores to expand the risk panel 3. **De novo PRS** — Use `polygenic-risk-score` skill for traits without PGS Catalog scores 4. **GWAS interpretation** — Pair with functional annotation skills ## Related Skills - `polygenic-risk-score` — De novo PRS using LDpred2-auto (requires GWAS summary statistics) - `eqtl-colocalization-coloc` — Colocalization of GWAS signals with eQTLs ## References 1. Lambert SA, et al. (2021). The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. *Nature Genetics*, 53(4), 420-425. 2. 1000 Genomes Project Consortium (2015). A global reference for human genetic variation. *Nature*, 526(7571), 68-74. 3. Privé F, et al. (2022). Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. *AJHG*, 109(1), 12-23. 4. Khera AV, et al. (2018). Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. *Nature Genetics*, 50(9), 1219-1224. 5. PGS Catalog: https://www.pgscatalog.org/