🎯

gwas-database

🎯Skill

from ovachiever/droid-tings

VibeIndex|
What it does

Queries the NHGRI-EBI GWAS Catalog to retrieve genetic variant associations, trait-specific SNPs, and summary statistics for genomic research and risk prediction.

πŸ“¦

Part of

ovachiever/droid-tings(370 items)

gwas-database

Installation

git cloneClone repository
git clone https://github.com/ovachiever/droid-tings.git
πŸ“– Extracted from docs: ovachiever/droid-tings
17Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

"Query NHGRI-EBI GWAS Catalog for SNP-trait associations. Search variants by rs ID, disease/trait, gene, retrieve p-values and summary statistics, for genetic epidemiology and polygenic risk scores."

Overview

# GWAS Catalog Database

Overview

The GWAS Catalog is a comprehensive repository of published genome-wide association studies maintained by the National Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EBI). The catalog contains curated SNP-trait associations from thousands of GWAS publications, including genetic variants, associated traits and diseases, p-values, effect sizes, and full summary statistics for many studies.

When to Use This Skill

This skill should be used when queries involve:

  • Genetic variant associations: Finding SNPs associated with diseases or traits
  • SNP lookups: Retrieving information about specific genetic variants (rs IDs)
  • Trait/disease searches: Discovering genetic associations for phenotypes
  • Gene associations: Finding variants in or near specific genes
  • GWAS summary statistics: Accessing complete genome-wide association data
  • Study metadata: Retrieving publication and cohort information
  • Population genetics: Exploring ancestry-specific associations
  • Polygenic risk scores: Identifying variants for risk prediction models
  • Functional genomics: Understanding variant effects and genomic context
  • Systematic reviews: Comprehensive literature synthesis of genetic associations

Core Capabilities

1. Understanding GWAS Catalog Data Structure

The GWAS Catalog is organized around four core entities:

  • Studies: GWAS publications with metadata (PMID, author, cohort details)
  • Associations: SNP-trait associations with statistical evidence (p ≀ 5Γ—10⁻⁸)
  • Variants: Genetic markers (SNPs) with genomic coordinates and alleles
  • Traits: Phenotypes and diseases (mapped to EFO ontology terms)

Key Identifiers:

  • Study accessions: GCST IDs (e.g., GCST001234)
  • Variant IDs: rs numbers (e.g., rs7903146) or variant_id format
  • Trait IDs: EFO terms (e.g., EFO_0001360 for type 2 diabetes)
  • Gene symbols: HGNC approved names (e.g., TCF7L2)

2. Web Interface Searches

The web interface at https://www.ebi.ac.uk/gwas/ supports multiple search modes:

By Variant (rs ID):

```

rs7903146

```

Returns all trait associations for this SNP.

By Disease/Trait:

```

type 2 diabetes

Parkinson disease

body mass index

```

Returns all associated genetic variants.

By Gene:

```

APOE

TCF7L2

```

Returns variants in or near the gene region.

By Chromosomal Region:

```

10:114000000-115000000

```

Returns variants in the specified genomic interval.

By Publication:

```

PMID:20581827

Author: McCarthy MI

GCST001234

```

Returns study details and all reported associations.

3. REST API Access

The GWAS Catalog provides two REST APIs for programmatic access:

Base URLs:

  • GWAS Catalog API: https://www.ebi.ac.uk/gwas/rest/api
  • Summary Statistics API: https://www.ebi.ac.uk/gwas/summary-statistics/api

API Documentation:

  • Main API docs: https://www.ebi.ac.uk/gwas/rest/docs/api
  • Summary stats docs: https://www.ebi.ac.uk/gwas/summary-statistics/docs/

Core Endpoints:

  1. Studies endpoint - /studies/{accessionID}

```python

import requests

# Get a specific study

url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"

response = requests.get(url, headers={"Content-Type": "application/json"})

study = response.json()

```

  1. Associations endpoint - /associations

```python

# Find associations for a variant

variant = "rs7903146"

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations"

params = {"projection": "associationBySnp"}

response = requests.get(url, params=params, headers={"Content-Type": "application/json"})

associations = response.json()

```

  1. Variants endpoint - /singleNucleotidePolymorphisms/{rsID}

```python

# Get variant details

url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146"

response = requests.get(url, headers={"Content-Type": "application/json"})

variant_info = response.json()

```

  1. Traits endpoint - /efoTraits/{efoID}

```python

# Get trait information

url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360"

response = requests.get(url, headers={"Content-Type": "application/json"})

trait_info = response.json()

```

4. Query Examples and Patterns

Example 1: Find all associations for a disease

```python

import requests

trait = "EFO_0001360" # Type 2 diabetes

base_url = "https://www.ebi.ac.uk/gwas/rest/api"

# Query associations for this trait

url = f"{base_url}/efoTraits/{trait}/associations"

response = requests.get(url, headers={"Content-Type": "application/json"})

associations = response.json()

# Process results

for assoc in associations.get('_embedded', {}).get('associations', []):

variant = assoc.get('rsId')

pvalue = assoc.get('pvalue')

risk_allele = assoc.get('strongestAllele')

print(f"{variant}: p={pvalue}, risk allele={risk_allele}")

```

Example 2: Get variant information and all trait associations

```python

import requests

variant = "rs7903146"

base_url = "https://www.ebi.ac.uk/gwas/rest/api"

# Get variant details

url = f"{base_url}/singleNucleotidePolymorphisms/{variant}"

response = requests.get(url, headers={"Content-Type": "application/json"})

variant_data = response.json()

# Get all associations for this variant

url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations"

params = {"projection": "associationBySnp"}

response = requests.get(url, params=params, headers={"Content-Type": "application/json"})

associations = response.json()

# Extract trait names and p-values

for assoc in associations.get('_embedded', {}).get('associations', []):

trait = assoc.get('efoTrait')

pvalue = assoc.get('pvalue')

print(f"Trait: {trait}, p-value: {pvalue}")

```

Example 3: Access summary statistics

```python

import requests

# Query summary statistics API

base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"

# Find associations by trait with p-value threshold

trait = "EFO_0001360" # Type 2 diabetes

p_upper = "0.000000001" # p < 1e-9

url = f"{base_url}/traits/{trait}/associations"

params = {

"p_upper": p_upper,

"size": 100 # Number of results

}

response = requests.get(url, params=params)

results = response.json()

# Process genome-wide significant hits

for hit in results.get('_embedded', {}).get('associations', []):

variant_id = hit.get('variant_id')

chromosome = hit.get('chromosome')

position = hit.get('base_pair_location')

pvalue = hit.get('p_value')

print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")

```

Example 4: Query by chromosomal region

```python

import requests

# Find variants in a specific genomic region

chromosome = "10"

start_pos = 114000000

end_pos = 115000000

base_url = "https://www.ebi.ac.uk/gwas/rest/api"

url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"

params = {

"chrom": chromosome,

"bpStart": start_pos,

"bpEnd": end_pos

}

response = requests.get(url, params=params, headers={"Content-Type": "application/json"})

variants_in_region = response.json()

```

5. Working with Summary Statistics

The GWAS Catalog hosts full summary statistics for many studies, providing access to all tested variants (not just genome-wide significant hits).

Access Methods:

  1. FTP download: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
  2. REST API: Query-based access to summary statistics
  3. Web interface: Browse and download via the website

Summary Statistics API Features:

  • Filter by chromosome, position, p-value
  • Query specific variants across studies
  • Retrieve effect sizes and allele frequencies
  • Access harmonized and standardized data

Example: Download summary statistics for a study

```python

import requests

import gzip

# Get available summary statistics

base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"

url = f"{base_url}/studies/GCST001234"

response = requests.get(url)

study_info = response.json()

# Download link is provided in the response

# Alternatively, use FTP:

# ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/

```

6. Data Integration and Cross-referencing

The GWAS Catalog provides links to external resources:

Genomic Databases:

  • Ensembl: Gene annotations and variant consequences
  • dbSNP: Variant identifiers and population frequencies
  • gnomAD: Population allele frequencies

Functional Resources:

  • Open Targets: Target-disease associations
  • PGS Catalog: Polygenic risk scores
  • UCSC Genome Browser: Genomic context

Phenotype Resources:

  • EFO (Experimental Factor Ontology): Standardized trait terms
  • OMIM: Disease gene relationships
  • Disease Ontology: Disease hierarchies

Following Links in API Responses:

```python

import requests

# API responses include _links for related resources

response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001234")

study = response.json()

# Follow link to associations

associations_url = study['_links']['associations']['href']

associations_response = requests.get(associations_url)

```

Query Workflows

Workflow 1: Exploring Genetic Associations for a Disease

  1. Identify the trait using EFO terms or free text:

- Search web interface for disease name

- Note the EFO ID (e.g., EFO_0001360 for type 2 diabetes)

  1. Query associations via API:

```python

url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations"

```

  1. Filter by significance and population:

- Check p-values (genome-wide significant: p ≀ 5Γ—10⁻⁸)

- Review ancestry information in study metadata

- Filter by sample size or discovery/replication status

  1. Extract variant details:

- rs IDs for each association

- Effect alleles and directions

- Effect sizes (odds ratios, beta coefficients)

- Population allele frequencies

  1. Cross-reference with other databases:

- Look up variant consequences in Ensembl

- Check population frequencies in gnomAD

- Explore gene function and pathways

Workflow 2: Investigating a Specific Genetic Variant

  1. Query the variant:

```python

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"

```

  1. Retrieve all trait associations:

```python

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations"

```

  1. Analyze pleiotropy:

- Identify all traits associated with this variant

- Review effect directions across traits

- Look for shared biological pathways

  1. Check genomic context:

- Determine nearby genes

- Identify if variant is in coding/regulatory regions

- Review linkage disequilibrium with other variants

Workflow 3: Gene-Centric Association Analysis

  1. Search by gene symbol in web interface or:

```python

url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene"

params = {"geneName": gene_symbol}

```

  1. Retrieve variants in gene region:

- Get chromosomal coordinates for gene

- Query variants in region

- Include promoter and regulatory regions (extend boundaries)

  1. Analyze association patterns:

- Identify traits associated with variants in this gene

- Look for consistent associations across studies

- Review effect sizes and directions

  1. Functional interpretation:

- Determine variant consequences (missense, regulatory, etc.)

- Check expression QTL (eQTL) data

- Review pathway and network context

Workflow 4: Systematic Review of Genetic Evidence

  1. Define research question:

- Specific trait or disease of interest

- Population considerations

- Study design requirements

  1. Comprehensive variant extraction:

- Query all associations for trait

- Set significance threshold

- Note discovery and replication studies

  1. Quality assessment:

- Review study sample sizes

- Check for population diversity

- Assess heterogeneity across studies

- Identify potential biases

  1. Data synthesis:

- Aggregate associations across studies

- Perform meta-analysis if applicable

- Create summary tables

- Generate Manhattan or forest plots

  1. Export and documentation:

- Download full association data

- Export summary statistics if needed

- Document search strategy and date

- Create reproducible analysis scripts

Workflow 5: Accessing and Analyzing Summary Statistics

  1. Identify studies with summary statistics:

- Browse summary statistics portal

- Check FTP directory listings

- Query API for available studies

  1. Download summary statistics:

```bash

# Via FTP

wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz

```

  1. Query via API for specific variants:

```python

url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations"

params = {"start": start_pos, "end": end_pos}

```

  1. Process and analyze:

- Filter by p-value thresholds

- Extract effect sizes and confidence intervals

- Perform downstream analyses (fine-mapping, colocalization, etc.)

Response Formats and Data Fields

Key Fields in Association Records:

  • rsId: Variant identifier (rs number)
  • strongestAllele: Risk allele for the association
  • pvalue: Association p-value
  • pvalueText: P-value as text (may include inequality)
  • orPerCopyNum: Odds ratio or beta coefficient
  • betaNum: Effect size (for quantitative traits)
  • betaUnit: Unit of measurement for beta
  • range: Confidence interval
  • efoTrait: Associated trait name
  • mappedLabel: EFO-mapped trait term

Study Metadata Fields:

  • accessionId: GCST study identifier
  • pubmedId: PubMed ID
  • author: First author
  • publicationDate: Publication date
  • ancestryInitial: Discovery population ancestry
  • ancestryReplication: Replication population ancestry
  • sampleSize: Total sample size

Pagination:

Results are paginated (default 20 items per page). Navigate using:

  • size parameter: Number of results per page
  • page parameter: Page number (0-indexed)
  • _links in response: URLs for next/previous pages

Best Practices

Query Strategy

  • Start with web interface to identify relevant EFO terms and study accessions
  • Use API for bulk data extraction and automated analyses
  • Implement pagination handling for large result sets
  • Cache API responses to minimize redundant requests

Data Interpretation

  • Always check p-value thresholds (genome-wide: 5Γ—10⁻⁸)
  • Review ancestry information for population applicability
  • Consider sample size when assessing evidence strength
  • Check for replication across independent studies
  • Be aware of winner's curse in effect size estimates

Rate Limiting and Ethics

  • Respect API usage guidelines (no excessive requests)
  • Use summary statistics downloads for genome-wide analyses
  • Implement appropriate delays between API calls
  • Cache results locally when performing iterative analyses
  • Cite the GWAS Catalog in publications

Data Quality Considerations

  • GWAS Catalog curates published associations (may contain inconsistencies)
  • Effect sizes reported as published (may need harmonization)
  • Some studies report conditional or joint associations
  • Check for study overlap when combining results
  • Be aware of ascertainment and selection biases

Python Integration Example

Complete workflow for querying and analyzing GWAS data:

```python

import requests

import pandas as pd

from time import sleep

def query_gwas_catalog(trait_id, p_threshold=5e-8):

"""

Query GWAS Catalog for trait associations

Args:

trait_id: EFO trait identifier (e.g., 'EFO_0001360')

p_threshold: P-value threshold for filtering

Returns:

pandas DataFrame with association results

"""

base_url = "https://www.ebi.ac.uk/gwas/rest/api"

url = f"{base_url}/efoTraits/{trait_id}/associations"

headers = {"Content-Type": "application/json"}

results = []

page = 0

while True:

params = {"page": page, "size": 100}

response = requests.get(url, params=params, headers=headers)

if response.status_code != 200:

break

data = response.json()

associations = data.get('_embedded', {}).get('associations', [])

if not associations:

break

for assoc in associations:

pvalue = assoc.get('pvalue')

if pvalue and float(pvalue) <= p_threshold:

results.append({

'variant': assoc.get('rsId'),

'pvalue': pvalue,

'risk_allele': assoc.get('strongestAllele'),

'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),

'trait': assoc.get('efoTrait'),

'pubmed_id': assoc.get('pubmedId')

})

page += 1

sleep(0.1) # Rate limiting

return pd.DataFrame(results)

# Example usage

df = query_gwas_catalog('EFO_0001360') # Type 2 diabetes

print(df.head())

print(f"\nTotal associations: {len(df)}")

print(f"Unique variants: {df['variant'].nunique()}")

```

Resources

references/api_reference.md

Comprehensive API documentation including:

  • Detailed endpoint specifications for both APIs
  • Complete list of query parameters and filters
  • Response format specifications and field descriptions
  • Advanced query examples and patterns
  • Error handling and troubleshooting
  • Integration with external databases

Consult this reference when:

  • Constructing complex API queries
  • Understanding response structures
  • Implementing pagination or batch operations
  • Troubleshooting API errors
  • Exploring advanced filtering options

Training Materials

The GWAS Catalog team provides workshop materials:

  • GitHub repository: https://github.com/EBISPOT/GWAS_Catalog-workshop
  • Jupyter notebooks with example queries
  • Google Colab integration for cloud execution

Important Notes

Data Updates

  • The GWAS Catalog is updated regularly with new publications
  • Re-run queries periodically for comprehensive coverage
  • Summary statistics are added as studies release data
  • EFO mappings may be updated over time

Citation Requirements

When using GWAS Catalog data, cite:

  • Sollis E, et al. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research. PMID: 37953337
  • Include access date and version when available
  • Cite original studies when discussing specific findings

Limitations

  • Not all GWAS publications are included (curation criteria apply)
  • Full summary statistics available for subset of studies
  • Effect sizes may require harmonization across studies
  • Population diversity is growing but historically limited
  • Some associations represent conditional or joint effects

Data Access

  • Web interface: Free, no registration required
  • REST APIs: Free, no API key needed
  • FTP downloads: Open access
  • Rate limiting applies to API (be respectful)

Additional Resources

  • GWAS Catalog website: https://www.ebi.ac.uk/gwas/
  • Documentation: https://www.ebi.ac.uk/gwas/docs
  • API documentation: https://www.ebi.ac.uk/gwas/rest/docs/api
  • Summary Statistics API: https://www.ebi.ac.uk/gwas/summary-statistics/docs/
  • FTP site: http://ftp.ebi.ac.uk/pub/databases/gwas/
  • Training materials: https://github.com/EBISPOT/GWAS_Catalog-workshop
  • PGS Catalog (polygenic scores): https://www.pgscatalog.org/
  • Help and support: gwas-info@ebi.ac.uk