🎯

dnanexus-integration

🎯Skill

from ovachiever/droid-tings

VibeIndex|
What it does

Enables seamless genomics pipeline development by interacting with DNAnexus cloud platform, managing data, building apps, and executing bioinformatics workflows using dxpy SDK.

πŸ“¦

Part of

ovachiever/droid-tings(370 items)

dnanexus-integration

Installation

git cloneClone repository
git clone https://github.com/ovachiever/droid-tings.git
πŸ“– Extracted from docs: ovachiever/droid-tings
16Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

"DNAnexus cloud genomics platform. Build apps/applets, manage data (upload/download), dxpy Python SDK, run workflows, FASTQ/BAM/VCF, for genomics pipeline development and execution."

Overview

# DNAnexus Integration

Overview

DNAnexus is a cloud platform for biomedical data analysis and genomics. Build and deploy apps/applets, manage data objects, run workflows, and use the dxpy Python SDK for genomics pipeline development and execution.

When to Use This Skill

This skill should be used when:

  • Creating, building, or modifying DNAnexus apps/applets
  • Uploading, downloading, searching, or organizing files and records
  • Running analyses, monitoring jobs, creating workflows
  • Writing scripts using dxpy to interact with the platform
  • Setting up dxapp.json, managing dependencies, using Docker
  • Processing FASTQ, BAM, VCF, or other bioinformatics files
  • Managing projects, permissions, or platform resources

Core Capabilities

The skill is organized into five main areas, each with detailed reference documentation:

1. App Development

Purpose: Create executable programs (apps/applets) that run on the DNAnexus platform.

Key Operations:

  • Generate app skeleton with dx-app-wizard
  • Write Python or Bash apps with proper entry points
  • Handle input/output data objects
  • Deploy with dx build or dx build --app
  • Test apps on the platform

Common Use Cases:

  • Bioinformatics pipelines (alignment, variant calling)
  • Data processing workflows
  • Quality control and filtering
  • Format conversion tools

Reference: See references/app-development.md for:

  • Complete app structure and patterns
  • Python entry point decorators
  • Input/output handling with dxpy
  • Development best practices
  • Common issues and solutions

2. Data Operations

Purpose: Manage files, records, and other data objects on the platform.

Key Operations:

  • Upload/download files with dxpy.upload_local_file() and dxpy.download_dxfile()
  • Create and manage records with metadata
  • Search for data objects by name, properties, or type
  • Clone data between projects
  • Manage project folders and permissions

Common Use Cases:

  • Uploading sequencing data (FASTQ files)
  • Organizing analysis results
  • Searching for specific samples or experiments
  • Backing up data across projects
  • Managing reference genomes and annotations

Reference: See references/data-operations.md for:

  • Complete file and record operations
  • Data object lifecycle (open/closed states)
  • Search and discovery patterns
  • Project management
  • Batch operations

3. Job Execution

Purpose: Run analyses, monitor execution, and orchestrate workflows.

Key Operations:

  • Launch jobs with applet.run() or app.run()
  • Monitor job status and logs
  • Create subjobs for parallel processing
  • Build and run multi-step workflows
  • Chain jobs with output references

Common Use Cases:

  • Running genomics analyses on sequencing data
  • Parallel processing of multiple samples
  • Multi-step analysis pipelines
  • Monitoring long-running computations
  • Debugging failed jobs

Reference: See references/job-execution.md for:

  • Complete job lifecycle and states
  • Workflow creation and orchestration
  • Parallel execution patterns
  • Job monitoring and debugging
  • Resource management

4. Python SDK (dxpy)

Purpose: Programmatic access to DNAnexus platform through Python.

Key Operations:

  • Work with data object handlers (DXFile, DXRecord, DXApplet, etc.)
  • Use high-level functions for common tasks
  • Make direct API calls for advanced operations
  • Create links and references between objects
  • Search and discover platform resources

Common Use Cases:

  • Automation scripts for data management
  • Custom analysis pipelines
  • Batch processing workflows
  • Integration with external tools
  • Data migration and organization

Reference: See references/python-sdk.md for:

  • Complete dxpy class reference
  • High-level utility functions
  • API method documentation
  • Error handling patterns
  • Common code patterns

5. Configuration and Dependencies

Purpose: Configure app metadata and manage dependencies.

Key Operations:

  • Write dxapp.json with inputs, outputs, and run specs
  • Install system packages (execDepends)
  • Bundle custom tools and resources
  • Use assets for shared dependencies
  • Integrate Docker containers
  • Configure instance types and timeouts

Common Use Cases:

  • Defining app input/output specifications
  • Installing bioinformatics tools (samtools, bwa, etc.)
  • Managing Python package dependencies
  • Using Docker images for complex environments
  • Selecting computational resources

Reference: See references/configuration.md for:

  • Complete dxapp.json specification
  • Dependency management strategies
  • Docker integration patterns
  • Regional and resource configuration
  • Example configurations

Quick Start Examples

Upload and Analyze Data

```python

import dxpy

# Upload input file

input_file = dxpy.upload_local_file("sample.fastq", project="project-xxxx")

# Run analysis

job = dxpy.DXApplet("applet-xxxx").run({

"reads": dxpy.dxlink(input_file.get_id())

})

# Wait for completion

job.wait_on_done()

# Download results

output_id = job.describe()["output"]["aligned_reads"]["$dnanexus_link"]

dxpy.download_dxfile(output_id, "aligned.bam")

```

Search and Download Files

```python

import dxpy

# Find BAM files from a specific experiment

files = dxpy.find_data_objects(

classname="file",

name="*.bam",

properties={"experiment": "exp001"},

project="project-xxxx"

)

# Download each file

for file_result in files:

file_obj = dxpy.DXFile(file_result["id"])

filename = file_obj.describe()["name"]

dxpy.download_dxfile(file_result["id"], filename)

```

Create Simple App

```python

# src/my-app.py

import dxpy

import subprocess

@dxpy.entry_point('main')

def main(input_file, quality_threshold=30):

# Download input

dxpy.download_dxfile(input_file["$dnanexus_link"], "input.fastq")

# Process

subprocess.check_call([

"quality_filter",

"--input", "input.fastq",

"--output", "filtered.fastq",

"--threshold", str(quality_threshold)

])

# Upload output

output_file = dxpy.upload_local_file("filtered.fastq")

return {

"filtered_reads": dxpy.dxlink(output_file)

}

dxpy.run()

```

Workflow Decision Tree

When working with DNAnexus, follow this decision tree:

  1. Need to create a new executable?

- Yes β†’ Use App Development (references/app-development.md)

- No β†’ Continue to step 2

  1. Need to manage files or data?

- Yes β†’ Use Data Operations (references/data-operations.md)

- No β†’ Continue to step 3

  1. Need to run an analysis or workflow?

- Yes β†’ Use Job Execution (references/job-execution.md)

- No β†’ Continue to step 4

  1. Writing Python scripts for automation?

- Yes β†’ Use Python SDK (references/python-sdk.md)

- No β†’ Continue to step 5

  1. Configuring app settings or dependencies?

- Yes β†’ Use Configuration (references/configuration.md)

Often you'll need multiple capabilities together (e.g., app development + configuration, or data operations + job execution).

Installation and Authentication

Install dxpy

```bash

uv pip install dxpy

```

Login to DNAnexus

```bash

dx login

```

This authenticates your session and sets up access to projects and data.

Verify Installation

```bash

dx --version

dx whoami

```

Common Patterns

Pattern 1: Batch Processing

Process multiple files with the same analysis:

```python

# Find all FASTQ files

files = dxpy.find_data_objects(

classname="file",

name="*.fastq",

project="project-xxxx"

)

# Launch parallel jobs

jobs = []

for file_result in files:

job = dxpy.DXApplet("applet-xxxx").run({

"input": dxpy.dxlink(file_result["id"])

})

jobs.append(job)

# Wait for all completions

for job in jobs:

job.wait_on_done()

```

Pattern 2: Multi-Step Pipeline

Chain multiple analyses together:

```python

# Step 1: Quality control

qc_job = qc_applet.run({"reads": input_file})

# Step 2: Alignment (uses QC output)

align_job = align_applet.run({

"reads": qc_job.get_output_ref("filtered_reads")

})

# Step 3: Variant calling (uses alignment output)

variant_job = variant_applet.run({

"bam": align_job.get_output_ref("aligned_bam")

})

```

Pattern 3: Data Organization

Organize analysis results systematically:

```python

# Create organized folder structure

dxpy.api.project_new_folder(

"project-xxxx",

{"folder": "/experiments/exp001/results", "parents": True}

)

# Upload with metadata

result_file = dxpy.upload_local_file(

"results.txt",

project="project-xxxx",

folder="/experiments/exp001/results",

properties={

"experiment": "exp001",

"sample": "sample1",

"analysis_date": "2025-10-20"

},

tags=["validated", "published"]

)

```

Best Practices

  1. Error Handling: Always wrap API calls in try-except blocks
  2. Resource Management: Choose appropriate instance types for workloads
  3. Data Organization: Use consistent folder structures and metadata
  4. Cost Optimization: Archive old data, use appropriate storage classes
  5. Documentation: Include clear descriptions in dxapp.json
  6. Testing: Test apps with various input types before production use
  7. Version Control: Use semantic versioning for apps
  8. Security: Never hardcode credentials in source code
  9. Logging: Include informative log messages for debugging
  10. Cleanup: Remove temporary files and failed jobs

Resources

This skill includes detailed reference documentation:

references/

  • app-development.md - Complete guide to building and deploying apps/applets
  • data-operations.md - File management, records, search, and project operations
  • job-execution.md - Running jobs, workflows, monitoring, and parallel processing
  • python-sdk.md - Comprehensive dxpy library reference with all classes and functions
  • configuration.md - dxapp.json specification and dependency management

Load these references when you need detailed information about specific operations or when working on complex tasks.

Getting Help

  • Official documentation: https://documentation.dnanexus.com/
  • API reference: http://autodoc.dnanexus.com/
  • GitHub repository: https://github.com/dnanexus/dx-toolkit
  • Support: support@dnanexus.com