🎯

bioservices

🎯Skill

from ovachiever/droid-tings

VibeIndex|
What it does

Retrieves and integrates biological data across 40+ bioinformatics databases, enabling complex multi-database queries and identifier mapping in Python.

πŸ“¦

Part of

ovachiever/droid-tings(370 items)

bioservices

Installation

PythonRun Python server
python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com
PythonRun Python server
python scripts/pathway_analysis.py hsa output_directory/
PythonRun Python server
python scripts/compound_cross_reference.py Geldanamycin
PythonRun Python server
python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG
πŸ“– Extracted from docs: ovachiever/droid-tings
18Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

"Primary Python tool for 40+ bioinformatics services. Preferred for multi-database workflows: UniProt, KEGG, ChEMBL, PubChem, Reactome, QuickGO. Unified API for queries, ID mapping, pathway analysis. For direct REST control, use individual database skills (uniprot-database, kegg-database)."

Overview

# BioServices

Overview

BioServices is a Python package providing programmatic access to approximately 40 bioinformatics web services and databases. Retrieve biological data, perform cross-database queries, map identifiers, analyze sequences, and integrate multiple biological resources in Python workflows. The package handles both REST and SOAP/WSDL protocols transparently.

When to Use This Skill

This skill should be used when:

  • Retrieving protein sequences, annotations, or structures from UniProt, PDB, Pfam
  • Analyzing metabolic pathways and gene functions via KEGG or Reactome
  • Searching compound databases (ChEBI, ChEMBL, PubChem) for chemical information
  • Converting identifiers between different biological databases (KEGG↔UniProt, compound IDs)
  • Running sequence similarity searches (BLAST, MUSCLE alignment)
  • Querying gene ontology terms (QuickGO, GO annotations)
  • Accessing protein-protein interaction data (PSICQUIC, IntactComplex)
  • Mining genomic data (BioMart, ArrayExpress, ENA)
  • Integrating data from multiple bioinformatics resources in a single workflow

Core Capabilities

1. Protein Analysis

Retrieve protein information, sequences, and functional annotations:

```python

from bioservices import UniProt

u = UniProt(verbose=False)

# Search for protein by name

results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism")

# Retrieve FASTA sequence

sequence = u.retrieve("P43403", "fasta")

# Map identifiers between databases

kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")

```

Key methods:

  • search(): Query UniProt with flexible search terms
  • retrieve(): Get protein entries in various formats (FASTA, XML, tab)
  • mapping(): Convert identifiers between databases

Reference: references/services_reference.md for complete UniProt API details.

2. Pathway Discovery and Analysis

Access KEGG pathway information for genes and organisms:

```python

from bioservices import KEGG

k = KEGG()

k.organism = "hsa" # Set to human

# Search for organisms

k.lookfor_organism("droso") # Find Drosophila species

# Find pathways by name

k.lookfor_pathway("B cell") # Returns matching pathway IDs

# Get pathways containing specific genes

pathways = k.get_pathway_by_gene("7535", "hsa") # ZAP70 gene

# Retrieve and parse pathway data

data = k.get("hsa04660")

parsed = k.parse(data)

# Extract pathway interactions

interactions = k.parse_kgml_pathway("hsa04660")

relations = interactions['relations'] # Protein-protein interactions

# Convert to Simple Interaction Format

sif_data = k.pathway2sif("hsa04660")

```

Key methods:

  • lookfor_organism(), lookfor_pathway(): Search by name
  • get_pathway_by_gene(): Find pathways containing genes
  • parse_kgml_pathway(): Extract structured pathway data
  • pathway2sif(): Get protein interaction networks

Reference: references/workflow_patterns.md for complete pathway analysis workflows.

3. Compound Database Searches

Search and cross-reference compounds across multiple databases:

```python

from bioservices import KEGG, UniChem

k = KEGG()

# Search compounds by name

results = k.find("compound", "Geldanamycin") # Returns cpd:C11222

# Get compound information with database links

compound_info = k.get("cpd:C11222") # Includes ChEBI links

# Cross-reference KEGG β†’ ChEMBL using UniChem

u = UniChem()

chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315

```

Common workflow:

  1. Search compound by name in KEGG
  2. Extract KEGG compound ID
  3. Use UniChem for KEGG β†’ ChEMBL mapping
  4. ChEBI IDs are often provided in KEGG entries

Reference: references/identifier_mapping.md for complete cross-database mapping guide.

4. Sequence Analysis

Run BLAST searches and sequence alignments:

```python

from bioservices import NCBIblast

s = NCBIblast(verbose=False)

# Run BLASTP against UniProtKB

jobid = s.run(

program="blastp",

sequence=protein_sequence,

stype="protein",

database="uniprotkb",

email="your.email@example.com" # Required by NCBI

)

# Check job status and retrieve results

s.getStatus(jobid)

results = s.getResult(jobid, "out")

```

Note: BLAST jobs are asynchronous. Check status before retrieving results.

5. Identifier Mapping

Convert identifiers between different biological databases:

```python

from bioservices import UniProt, KEGG

# UniProt mapping (many database pairs supported)

u = UniProt()

results = u.mapping(

fr="UniProtKB_AC-ID", # Source database

to="KEGG", # Target database

query="P43403" # Identifier(s) to convert

)

# KEGG gene ID β†’ UniProt

kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535")

# For compounds, use UniChem

from bioservices import UniChem

u = UniChem()

chembl_from_kegg = u.get_compound_id_from_kegg("C11222")

```

Supported mappings (UniProt):

  • UniProtKB ↔ KEGG
  • UniProtKB ↔ Ensembl
  • UniProtKB ↔ PDB
  • UniProtKB ↔ RefSeq
  • And many more (see references/identifier_mapping.md)

6. Gene Ontology Queries

Access GO terms and annotations:

```python

from bioservices import QuickGO

g = QuickGO(verbose=False)

# Retrieve GO term information

term_info = g.Term("GO:0003824", frmt="obo")

# Search annotations

annotations = g.Annotation(protein="P43403", format="tsv")

```

7. Protein-Protein Interactions

Query interaction databases via PSICQUIC:

```python

from bioservices import PSICQUIC

s = PSICQUIC(verbose=False)

# Query specific database (e.g., MINT)

interactions = s.query("mint", "ZAP70 AND species:9606")

# List available interaction databases

databases = s.activeDBs

```

Available databases: MINT, IntAct, BioGRID, DIP, and 30+ others.

Multi-Service Integration Workflows

BioServices excels at combining multiple services for comprehensive analysis. Common integration patterns:

Complete Protein Analysis Pipeline

Execute a full protein characterization workflow:

```bash

python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com

```

This script demonstrates:

  1. UniProt search for protein entry
  2. FASTA sequence retrieval
  3. BLAST similarity search
  4. KEGG pathway discovery
  5. PSICQUIC interaction mapping

Pathway Network Analysis

Analyze all pathways for an organism:

```bash

python scripts/pathway_analysis.py hsa output_directory/

```

Extracts and analyzes:

  • All pathway IDs for organism
  • Protein-protein interactions per pathway
  • Interaction type distributions
  • Exports to CSV/SIF formats

Cross-Database Compound Search

Map compound identifiers across databases:

```bash

python scripts/compound_cross_reference.py Geldanamycin

```

Retrieves:

  • KEGG compound ID
  • ChEBI identifier
  • ChEMBL identifier
  • Basic compound properties

Batch Identifier Conversion

Convert multiple identifiers at once:

```bash

python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG

```

Best Practices

Output Format Handling

Different services return data in various formats:

  • XML: Parse using BeautifulSoup (most SOAP services)
  • Tab-separated (TSV): Pandas DataFrames for tabular data
  • Dictionary/JSON: Direct Python manipulation
  • FASTA: BioPython integration for sequence analysis

Rate Limiting and Verbosity

Control API request behavior:

```python

from bioservices import KEGG

k = KEGG(verbose=False) # Suppress HTTP request details

k.TIMEOUT = 30 # Adjust timeout for slow connections

```

Error Handling

Wrap service calls in try-except blocks:

```python

try:

results = u.search("ambiguous_query")

if results:

# Process results

pass

except Exception as e:

print(f"Search failed: {e}")

```

Organism Codes

Use standard organism abbreviations:

  • hsa: Homo sapiens (human)
  • mmu: Mus musculus (mouse)
  • dme: Drosophila melanogaster
  • sce: Saccharomyces cerevisiae (yeast)

List all organisms: k.list("organism") or k.organismIds

Integration with Other Tools

BioServices works well with:

  • BioPython: Sequence analysis on retrieved FASTA data
  • Pandas: Tabular data manipulation
  • PyMOL: 3D structure visualization (retrieve PDB IDs)
  • NetworkX: Network analysis of pathway interactions
  • Galaxy: Custom tool wrappers for workflow platforms

Resources

scripts/

Executable Python scripts demonstrating complete workflows:

  • protein_analysis_workflow.py: End-to-end protein characterization
  • pathway_analysis.py: KEGG pathway discovery and network extraction
  • compound_cross_reference.py: Multi-database compound searching
  • batch_id_converter.py: Bulk identifier mapping utility

Scripts can be executed directly or adapted for specific use cases.

references/

Detailed documentation loaded as needed:

  • services_reference.md: Comprehensive list of all 40+ services with methods
  • workflow_patterns.md: Detailed multi-step analysis workflows
  • identifier_mapping.md: Complete guide to cross-database ID conversion

Load references when working with specific services or complex integration tasks.

Installation

```bash

uv pip install bioservices

```

Dependencies are automatically managed. Package is tested on Python 3.9-3.12.

Additional Information

For detailed API documentation and advanced features, refer to:

  • Official documentation: https://bioservices.readthedocs.io/
  • Source code: https://github.com/cokelaer/bioservices
  • Service-specific references in references/services_reference.md