🎯

tooluniverse-expression-data-retrieval

🎯Skill

from mims-harvard/tooluniverse

VibeIndex|
What it does

Retrieves comprehensive gene expression and multi-omics datasets from ArrayExpress and BioStudies with intelligent gene disambiguation and quality assessment.

πŸ“¦

Part of

mims-harvard/tooluniverse(19 items)

tooluniverse-expression-data-retrieval

Installation

Quick InstallInstall with npx
npx skills add mims-harvard/ToolUniverse
pip installInstall Python package
pip install tooluniverse
Claude CLIAdd MCP server via Claude CLI
claude mcp add --transport stdio tooluniverse -- tooluniverse-smcp-stdio --compact-mode
pip installInstall Python package
pip install tooluniverse[client] # Minimal installation
Server ConfigurationMCP server configuration block
{ "mcpServers": { "tooluniverse": { "command": "uvx", "args": ...
πŸ“– Extracted from docs: mims-harvard/tooluniverse
8Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

Retrieves gene expression and omics datasets from ArrayExpress and BioStudies with gene disambiguation, experiment quality assessment, and structured reports. Creates comprehensive dataset profiles with metadata, sample information, and download links. Use when users need expression data, omics datasets, or mention ArrayExpress (E-MTAB, E-GEOD) or BioStudies (S-BSST) accessions.

Overview

# Gene Expression & Omics Data Retrieval

Retrieve gene expression experiments and multi-omics datasets with proper disambiguation and quality assessment.

Workflow Overview

```

Phase 0: Clarify Query (if ambiguous)

↓

Phase 1: Disambiguate Gene/Condition

↓

Phase 2: Search & Retrieve (Internal)

↓

Phase 3: Report Dataset Profile

```

---

Phase 0: Clarification (When Needed)

Ask the user ONLY if:

  • Gene name is ambiguous (e.g., "p53" β†’ TP53 or MDM2 studies?)
  • Tissue/condition unclear for comparative studies
  • Organism not specified for non-human research

Skip clarification for:

  • Specific accession numbers (E-MTAB-, E-GEOD-, S-BSST*)
  • Clear disease/tissue + organism combinations
  • Explicit platform requests (RNA-seq, microarray)

---

Phase 1: Query Disambiguation

1.1 Gene Name Resolution

If searching by gene, first resolve official identifiers:

```python

from tooluniverse import ToolUniverse

tu = ToolUniverse()

tu.load_tools()

# For gene-focused searches, resolve official symbol first

# This helps construct better search queries

# Example: "p53" β†’ "TP53" (official HGNC symbol)

```

Gene Disambiguation Checklist:

  • [ ] Official gene symbol identified (HGNC for human, MGI for mouse)
  • [ ] Common aliases noted for search expansion
  • [ ] Species confirmed

1.2 Construct Search Strategy

| User Query Type | Search Strategy |

|-----------------|-----------------|

| Specific accession | Direct retrieval |

| Gene + condition | "[gene] [condition]" + species filter |

| Disease only | "[disease]" + species filter |

| Technology-specific | Add platform keywords (RNA-seq, microarray) |

---

Phase 2: Data Retrieval (Internal)

Search silently. Do NOT narrate the process.

2.1 Search Experiments

```python

# ArrayExpress search

result = tu.tools.arrayexpress_search_experiments(

keywords="[gene/disease] [condition]",

species="[species]",

limit=20

)

# BioStudies for multi-omics

biostudies_result = tu.tools.biostudies_search_studies(

query="[keywords]",

limit=10

)

```

2.2 Get Experiment Details

For top results, retrieve full metadata:

```python

# Get details for each relevant experiment

details = tu.tools.arrayexpress_get_experiment_details(

accession=accession

)

# Get sample information

samples = tu.tools.arrayexpress_get_experiment_samples(

accession=accession

)

# Get available files

files = tu.tools.arrayexpress_get_experiment_files(

accession=accession

)

```

2.3 BioStudies Retrieval

```python

# Multi-omics study details

study_details = tu.tools.biostudies_get_study_details(

accession=study_accession

)

# Study structure

sections = tu.tools.biostudies_get_study_sections(

accession=study_accession

)

# Available files

files = tu.tools.biostudies_get_study_files(

accession=study_accession

)

```

Fallback Chains

| Primary | Fallback | Notes |

|---------|----------|-------|

| ArrayExpress search | BioStudies search | ArrayExpress empty |

| arrayexpress_get_experiment_details | biostudies_get_study_details | E-GEOD may have BioStudies mirror |

| arrayexpress_get_experiment_files | Note "Files unavailable" | Some studies restrict downloads |

---

Phase 3: Report Dataset Profile

Output Structure

Present as a Dataset Search Report. Hide search process.

```markdown

# Expression Data: [Query Topic]

Search Summary

  • Query: [gene/disease] in [species]
  • Databases: ArrayExpress, BioStudies
  • Results: [N] relevant experiments found

Data Quality Overview: [assessment based on criteria below]

---

Top Experiments

1. [E-MTAB-XXXX]: [Title]

| Attribute | Value |

|-----------|-------|

| Accession | [accession with link] |

| Organism | [species] |

| Experiment Type | RNA-seq / Microarray |

| Platform | [specific platform] |

| Samples | [N] samples |

| Release Date | [date] |

Description: [Brief description from metadata]

Experimental Design:

  • Conditions: [treatment vs control, etc.]
  • Replicates: [N biological, M technical]
  • Tissue/Cell type: [if specified]

Sample Groups:

| Group | Samples | Description |

|-------|---------|-------------|

| Control | [N] | [description] |

| Treatment | [N] | [description] |

Data Files Available:

| File | Type | Size |

|------|------|------|

| [filename] | Processed data | [size] |

| [filename] | Raw data | [size] |

| [filename] | Sample metadata | [size] |

Quality Assessment: ●●● High / ●●○ Medium / ●○○ Low

  • Sample size: [adequate/limited]
  • Replication: [yes/no]
  • Metadata completeness: [complete/partial]

---

2. [E-GEOD-XXXXX]: [Title]

[Same structure as above]

---

Multi-Omics Studies (from BioStudies)

[S-BSST-XXXXX]: [Title]

| Attribute | Value |

|-----------|-------|

| Accession | [accession] |

| Study Type | [proteomics/metabolomics/integrated] |

| Organism | [species] |

| Samples | [N] |

Data Types Included:

  • [ ] Transcriptomics
  • [ ] Proteomics
  • [ ] Metabolomics
  • [ ] Other: [specify]

---

Summary Table

| Accession | Type | Samples | Platform | Quality |

|-----------|------|---------|----------|---------|

| [E-MTAB-X] | RNA-seq | [N] | Illumina | ●●● |

| [E-GEOD-X] | Microarray | [N] | Affymetrix | ●●○ |

---

Recommendations

For [specific analysis type]:

  • Best experiment: [accession] - [reason]
  • Alternative: [accession] - [reason]

Data Integration Notes:

  • Platform compatibility: [notes on combining datasets]
  • Batch considerations: [if applicable]

---

Data Access

Direct Download Links

  • [E-MTAB-XXXX processed data](link)
  • [E-MTAB-XXXX raw data](link)

Database Links

  • ArrayExpress: https://www.ebi.ac.uk/arrayexpress/experiments/[accession]
  • BioStudies: https://www.ebi.ac.uk/biostudies/studies/[accession]

Retrieved: [date]

```

---

Data Quality Tiers (Aligned with Evidence Grading)

Experiment Quality Assessment

| Tier | Symbol | Criteria | Evidence Equivalent |

|------|--------|----------|---------------------|

| High Quality | ●●● | β‰₯3 bio replicates, complete metadata, processed data | β˜…β˜…β˜… |

| Medium Quality | ●●○ | 2-3 replicates OR some metadata gaps, accessible | β˜…β˜…β˜† |

| Low Quality | ●○○ | No replicates, sparse metadata, data access issues | β˜…β˜†β˜† |

| Use with Caution | β—‹β—‹β—‹ | Single sample, no replication, outdated platform | β˜†β˜†β˜† |

Data Reliability by Source

| Data Source | Reliability | Notes |

|-------------|-------------|-------|

| GTEx | β˜…β˜…β˜… | Large-scale, well-curated, standardized |

| HPA | β˜…β˜…β˜… | Validated, multiple antibodies |

| ArrayExpress (curated) | β˜…β˜…β˜†-β˜…β˜…β˜… | Depends on individual study |

| GEO/ArrayExpress (direct) | β˜…β˜†β˜†-β˜…β˜…β˜† | Submitter-provided, verify |

| Single-cell (CELLxGENE) | β˜…β˜…β˜† | High resolution but technical variation |

| Microarray (legacy) | β˜…β˜…β˜† | Platform-specific, may need normalization |

Using Expression Evidence in Research

When citing expression data in research reports, include reliability:

```markdown

Tissue Expression:

EGFR shows highest expression in skin (156 TPM) [β˜…β˜…β˜…: GTEx], consistent with

HPA immunohistochemistry [β˜…β˜…β˜…: HPA, strong staining]. A smaller study

found elevated expression in tumors [β˜…β˜…β˜†: E-MTAB-1234, N=30 samples].

```

Include assessment rationale:

```markdown

Quality: ●●● High (β˜…β˜…β˜…)

  • βœ“ 4 biological replicates per condition
  • βœ“ Complete sample annotations
  • βœ“ Processed and raw data available
  • βœ“ Recent RNA-seq platform (Illumina NovaSeq)

Reliability for Use:

  • Differential expression calls: β˜…β˜…β˜… (well-powered)
  • Absolute expression values: β˜…β˜…β˜† (compare within study)
  • Cross-study comparison: β˜…β˜†β˜† (requires batch correction)

```

---

Completeness Checklist

Every dataset report MUST include:

Per Experiment (Required)

  • [ ] Accession number with database link
  • [ ] Organism
  • [ ] Experiment type (RNA-seq/microarray/etc.)
  • [ ] Sample count
  • [ ] Brief description
  • [ ] Quality assessment

Search Summary (Required)

  • [ ] Query parameters stated
  • [ ] Number of results
  • [ ] Databases searched

Recommendations (Required)

  • [ ] Best dataset for user's purpose (or "No suitable data found")
  • [ ] Data access notes

Include Even If Empty

  • [ ] Multi-omics studies section (or "No multi-omics studies found")
  • [ ] Data integration notes (or "Single-platform data, no integration needed")

---

Common Use Cases

Disease Gene Expression

User: "Find breast cancer RNA-seq data"

```python

result = tu.tools.arrayexpress_search_experiments(

keywords="breast cancer RNA-seq",

species="Homo sapiens",

limit=20

)

```

β†’ Report top experiments with quality assessment

Gene-Specific Studies

User: "Find TP53 expression experiments in mouse"

```python

result = tu.tools.arrayexpress_search_experiments(

keywords="TP53 p53", # Include aliases

species="Mus musculus",

limit=15

)

```

β†’ Report experiments studying this gene

Specific Accession Lookup

User: "Get details for E-MTAB-5214"

β†’ Single experiment profile with all details and files

Multi-Omics Integration

User: "Find proteomics and transcriptomics studies for liver disease"

β†’ Search both ArrayExpress and BioStudies, note integration potential

---

Error Handling

| Error | Response |

|-------|----------|

| "No experiments found" | Broaden keywords, remove species filter, try synonyms |

| "Accession not found" | Verify format (E-MTAB-, E-GEOD-, S-BSST*), check if withdrawn |

| "Files not available" | Note in report: "Data files restricted by submitter" |

| "API timeout" | Retry once, then note: "(metadata retrieval incomplete)" |

---

Tool Reference

ArrayExpress (Gene Expression)

| Tool | Purpose |

|------|---------|

| arrayexpress_search_experiments | Keyword/species search |

| arrayexpress_get_experiment_details | Full metadata |

| arrayexpress_get_experiment_files | Download links |

| arrayexpress_get_experiment_samples | Sample annotations |

BioStudies (Multi-Omics)

| Tool | Purpose |

|------|---------|

| biostudies_search_studies | Multi-omics search |

| biostudies_get_study_details | Study metadata |

| biostudies_get_study_files | Data files |

| biostudies_get_study_sections | Study structure |

---

Search Parameters Reference

ArrayExpress

| Parameter | Description | Example |

|-----------|-------------|---------|

| keywords | Free text search | "breast cancer RNA-seq" |

| species | Scientific name | "Homo sapiens" |

| array | Platform filter | "Illumina" |

| limit | Max results | 20 |

BioStudies

| Parameter | Description | Example |

|-----------|-------------|---------|

| query | Free text | "proteomics liver" |

| limit | Max results | 10 |

More from this repository10

🎯
tooluniverse-protein-structure-retrieval🎯Skill

Retrieves protein structure data from various databases and provides detailed structural information for scientific research and analysis.

🎯
tooluniverse-chemical-compound-retrieval🎯Skill

Retrieves comprehensive chemical compound data from PubChem and ChEMBL, providing detailed profiles with identifiers, properties, and bioactivity information.

🎯
tooluniverse-sequence-retrieval🎯Skill

Retrieves biological sequences from NCBI and ENA with precise gene disambiguation, accession handling, and comprehensive sequence metadata.

🎯
tooluniverse-sdk🎯Skill

Enables programmatic access to 1000+ scientific tools for building AI-powered research workflows, data analysis, and computational biology tasks.

🎯
tooluniverse-disease-research🎯Skill

Researches and provides comprehensive insights into diseases, symptoms, treatments, and medical research using advanced AI analysis.

🎯
tooluniverse-target-research🎯Skill

Performs targeted research by systematically exploring and analyzing information sources to gather comprehensive insights on a specific topic or research question.

🎯
tooluniverse-literature-deep-research🎯Skill

Performs comprehensive literature research with target disambiguation, evidence grading, and structured theme extraction for thorough scientific investigations.

🎯
devtu-optimize-skills🎯Skill

Streamlines developer tool skill optimization by analyzing performance, identifying bottlenecks, and recommending targeted improvements for code efficiency.

🎯
devtu-optimize-descriptions🎯Skill

Optimizes tool descriptions in ToolUniverse JSON configs by reviewing and enhancing clarity, prerequisites, parameter guidance, and usage examples.

🎯
devtu-create-tool🎯Skill

Generates scientific tool classes and configurations for ToolUniverse framework, ensuring proper structure, validation, and automated wrapper creation.