
π―Skills21
Manages and troubleshoots Apache Airflow workflows by listing DAGs, testing pipelines, running tasks, and monitoring system health.
Guides developers in creating robust Apache Airflow DAGs using best practices and MCP tools.
Traces data origins by investigating DAGs, source tables, and external systems to map the complete upstream lineage of a data asset.
Systematically diagnoses Airflow DAG failures by performing deep root cause analysis, identifying error sources, and providing structured prevention recommendations.
Queries data warehouses to answer business questions by executing SQL, finding tables, and retrieving precise metrics and trends.
Traces downstream data dependencies to reveal potential impacts and risks when modifying tables or data pipelines.
Triggers and monitors Airflow DAG runs, automatically waiting for completion and providing immediate feedback on success or failure.
Guides users through migrating Apache Airflow 2.x projects to Airflow 3.x, addressing code changes, imports, operators, and compatibility issues.
Verifies data freshness by identifying timestamp columns, checking last update times, and assessing data currency across tables.
Manages local Airflow environment using Astro CLI, enabling start/stop, log viewing, container troubleshooting, and environment control.
Profiles a database table by extracting comprehensive metadata, statistical insights, column characteristics, cardinality analysis, and sample data for quick data understanding.
Initializes and configures Astro/Airflow projects with CLI commands, creating project structure, managing dependencies, and setting up connections.
Initializes warehouse schema discovery by generating a comprehensive .astro/warehouse.md file with table metadata for instant data lookups.
Annotates Airflow tasks with data lineage by specifying input and output datasets using inlets and outlets for operators without built-in OpenLineage extraction.
Captures lineage details for Airflow operators by creating custom OpenLineage extractors for unsupported or third-party operators.
Enables human-in-the-loop validation and intervention for Airflow DAG runs through interactive approval workflows.
Executes dbt Core commands and manages dbt project workflows within the Cosmos AI agent framework for data transformation tasks.
Integrates dbt transformations with Cosmos orchestration to streamline data pipeline workflows and enhance data transformation processes.
Initializes and configures data warehouse infrastructure with best practices for schema design, access controls, and performance optimization.
Configures and sets up initial database connection parameters and credentials for connecting to a data warehouse like Snowflake, BigQuery, or Redshift within an Airflow project.
Helps data engineers discover and explore data sources, schemas, and metadata across different warehouses and databases by automatically scanning and profiling available data resources.