🎯

gcp-bq-data-export

🎯Skill

from funnelenvy/agents_webinar_demos

VibeIndex|
What it does

Exports BigQuery data to Cloud Storage in various formats like CSV, JSON, Avro, and Parquet with flexible compression and export options.

gcp-bq-data-export

Installation

Install skill:
npx skills add https://github.com/funnelenvy/agents_webinar_demos --skill gcp-bq-data-export
1
Last UpdatedJan 16, 2026

Skill Details

SKILL.md

Use when exporting BigQuery data to Cloud Storage, extracting tables to CSV, JSON, Avro, or Parquet formats, or using EXPORT DATA statements. Covers bq extract command, format options, compression, and wildcard exports.

Overview

# BigQuery Data Export

Use this skill when exporting data from BigQuery to Cloud Storage or local files.

Basic Extract Command

```bash

bq extract \

--location=LOCATION \

--destination_format=FORMAT \

--compression=COMPRESSION \

PROJECT:DATASET.TABLE \

gs://bucket/file.ext

```

Extract to CSV

Basic CSV Export

```bash

bq extract \

--destination_format=CSV \

--print_header=true \

dataset.table \

gs://bucket/export.csv

```

CSV with Options

```bash

bq extract \

--destination_format=CSV \

--compression=GZIP \

--field_delimiter=',' \

--print_header=true \

dataset.table \

gs://bucket/export.csv.gz

```

CSV flags:

  • --field_delimiter=',' - Column separator
  • --print_header=true/false - Include header row

Tab-Delimited Export

```bash

bq extract \

--destination_format=CSV \

--field_delimiter=$'\t' \

dataset.table \

gs://bucket/export.tsv

```

Extract to JSON

Newline-Delimited JSON

```bash

bq extract \

--destination_format=NEWLINE_DELIMITED_JSON \

--compression=GZIP \

dataset.table \

gs://bucket/export.json.gz

```

Output format:

```json

{"id": 1, "name": "Alice", "amount": 100.50}

{"id": 2, "name": "Bob", "amount": 250.75}

```

Extract to Avro

```bash

bq extract \

--destination_format=AVRO \

--compression=SNAPPY \

dataset.table \

gs://bucket/export.avro

```

Benefits:

  • Preserves schema
  • Efficient binary format
  • Fast re-import to BigQuery

Extract to Parquet

```bash

bq extract \

--destination_format=PARQUET \

dataset.table \

gs://bucket/export.parquet

```

Benefits:

  • Columnar format
  • Good compression
  • Compatible with many analytics tools

Compression Options

Available compression:

  • GZIP - Good compression, slower (CSV, JSON, Avro)
  • SNAPPY - Fast, moderate compression (Avro, Parquet)
  • DEFLATE - Similar to GZIP (Avro)
  • NONE - No compression (fastest)

Example:

```bash

bq extract \

--destination_format=CSV \

--compression=GZIP \

dataset.table \

gs://bucket/export.csv.gz

```

Large Table Exports

Using Wildcards (>1 GB)

BigQuery limitation: 1 GB per file

Solution: Use wildcard in destination

```bash

bq extract \

--destination_format=CSV \

dataset.large_table \

'gs://bucket/export-*.csv'

```

Output:

```

gs://bucket/export-000000000000.csv

gs://bucket/export-000000000001.csv

gs://bucket/export-000000000002.csv

...

```

Shard Pattern

```bash

# Create sharded exports

bq extract \

--destination_format=AVRO \

dataset.large_table \

'gs://bucket/shard/data-*.avro'

```

Note: Number of files depends on data size, not configurable.

Export Specific Partitions

Single Partition

```bash

# Export 2024-01-15 partition only

bq extract \

--destination_format=CSV \

dataset.partitioned_table\$20240115 \

gs://bucket/export_20240115.csv

```

Date Range (use WHERE in EXPORT DATA)

See EXPORT DATA section below.

EXPORT DATA SQL Statement

Basic EXPORT DATA

```sql

EXPORT DATA OPTIONS(

uri='gs://bucket/export-*.csv',

format='CSV',

overwrite=true,

header=true,

field_delimiter=','

) AS

SELECT * FROM project.dataset.table

WHERE date >= '2024-01-01';

```

Export Query Results

```sql

EXPORT DATA OPTIONS(

uri='gs://bucket/aggregated-*.parquet',

format='PARQUET',

overwrite=true

) AS

SELECT

customer_id,

DATE(order_timestamp) as order_date,

SUM(amount) as total_amount,

COUNT(*) as order_count

FROM project.dataset.orders

WHERE DATE(order_timestamp) >= '2024-01-01'

GROUP BY customer_id, order_date;

```

Format Options

CSV:

```sql

EXPORT DATA OPTIONS(

uri='gs://bucket/*.csv',

format='CSV',

header=true,

field_delimiter=',',

compression='GZIP'

) AS SELECT ...;

```

JSON:

```sql

EXPORT DATA OPTIONS(

uri='gs://bucket/*.json',

format='JSON',

compression='GZIP'

) AS SELECT ...;

```

Avro:

```sql

EXPORT DATA OPTIONS(

uri='gs://bucket/*.avro',

format='AVRO',

compression='SNAPPY'

) AS SELECT ...;

```

Parquet:

```sql

EXPORT DATA OPTIONS(

uri='gs://bucket/*.parquet',

format='PARQUET'

) AS SELECT ...;

```

Export to Local Files (Not Recommended)

Small Results via Query

```bash

# For small datasets only

bq query \

--format=csv \

--max_rows=10000 \

--use_legacy_sql=false \

'SELECT * FROM project.dataset.table LIMIT 10000' \

> local_export.csv

```

Limitation: Not suitable for large datasets. Use GCS for production.

Export Scheduled (Automation)

Using Cloud Scheduler + EXPORT DATA

```bash

# Create scheduled query

bq mk --transfer_config \

--target_dataset=dataset \

--display_name='Daily Export' \

--schedule='every 24 hours' \

--params='{"query":"EXPORT DATA OPTIONS(uri='\''gs://bucket/daily-.csv'\'', format='\''CSV'\'') AS SELECT FROM dataset.table WHERE date = CURRENT_DATE()"}' \

--data_source=scheduled_query

```

Using Cloud Composer (Airflow)

```python

from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator

export_task = BigQueryInsertJobOperator(

task_id='export_to_gcs',

configuration={

'extract': {

'sourceTable': {

'projectId': 'project',

'datasetId': 'dataset',

'tableId': 'table'

},

'destinationUris': ['gs://bucket/export-*.csv'],

'destinationFormat': 'CSV'

}

}

)

```

Monitoring Exports

Check Extract Jobs

```bash

bq ls --jobs --max_results=10

```

Job Details

```bash

bq show -j JOB_ID

```

Failed Exports

```sql

SELECT

job_id,

user_email,

error_result.message as error_message,

creation_time

FROM region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT

WHERE

job_type = 'EXTRACT'

AND state = 'DONE'

AND error_result IS NOT NULL

AND creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)

ORDER BY creation_time DESC;

```

Export Best Practices

Format Selection

CSV:

  • βœ… Human-readable
  • βœ… Universal compatibility
  • ❌ Larger file size
  • ❌ No schema preservation

JSON:

  • βœ… Human-readable
  • βœ… Preserves nested structures
  • ❌ Larger file size

Avro:

  • βœ… Preserves schema
  • βœ… Efficient binary format
  • βœ… Fast BigQuery re-import
  • ❌ Not human-readable

Parquet:

  • βœ… Columnar format
  • βœ… Good compression
  • βœ… Analytics tool compatible
  • ❌ Not human-readable

Compression Recommendations

For long-term storage: GZIP (best compression)

For processing pipelines: SNAPPY (fast)

For network transfer: GZIP (smaller size)

For speed: NONE (no compression overhead)

Wildcards for Large Exports

Always use wildcards for:

  • Tables >500 MB
  • Unknown data size
  • Distributed processing

Example:

```bash

bq extract dataset.large_table 'gs://bucket/export-*.avro'

```

Cost Considerations

Export Costs

  • BigQuery extract: FREE
  • GCS storage: Standard GCS pricing
  • Network egress: Free within same region

Optimization

Reduce costs:

  • Export only needed columns (use EXPORT DATA with SELECT)
  • Filter rows before export (WHERE clause)
  • Use compression (smaller files)
  • Export to GCS in same region as BigQuery

Example - filtered export:

```sql

EXPORT DATA OPTIONS(

uri='gs://bucket/*.parquet',

format='PARQUET'

) AS

SELECT customer_id, order_date, amount -- Only needed columns

FROM project.dataset.orders

WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) -- Last 30 days only

AND amount > 0; -- Filter out zero amounts

```

Common Patterns

Daily Export

```bash

#!/bin/bash

DATE=$(date +%Y%m%d)

bq extract \

--destination_format=CSV \

--compression=GZIP \

dataset.table\$$DATE \

gs://bucket/exports/daily_export_$DATE.csv.gz

```

Incremental Export

```sql

-- Create temp table with new data

CREATE TEMP TABLE new_data AS

SELECT * FROM project.dataset.table

WHERE updated_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);

-- Export only new data

EXPORT DATA OPTIONS(

uri='gs://bucket/incremental/data-*.parquet',

format='PARQUET'

) AS

SELECT * FROM new_data;

```

Export with Transformation

```sql

EXPORT DATA OPTIONS(

uri='gs://bucket/transformed-*.csv',

format='CSV'

) AS

SELECT

customer_id,

UPPER(customer_name) as customer_name,

ROUND(amount, 2) as amount,

FORMAT_DATE('%Y-%m-%d', order_date) as order_date

FROM project.dataset.orders

WHERE order_date >= '2024-01-01';

```

Troubleshooting

"Permission denied"

Problem: No write access to GCS bucket

Solution: Grant BigQuery service account Storage Object Creator role

"Table too large"

Problem: Export exceeds 1GB without wildcard

Solution: Use wildcard pattern gs://bucket/export-*.csv

"Invalid URI"

Problem: Incorrect GCS path format

Solution: Use gs://bucket/path/file format, not https://

"Quota exceeded"

Problem: Too many extract jobs

Solution: Batch exports or increase quota

Quick Reference

Format recommendations:

  • Re-import to BigQuery β†’ Avro
  • Analytics tools β†’ Parquet
  • Data exchange β†’ CSV
  • API consumption β†’ JSON

Compression guide:

  • Best ratio β†’ GZIP
  • Fastest β†’ SNAPPY or NONE
  • Balance β†’ SNAPPY

Size limits:

  • 1 GB per file (use wildcards)
  • 10 TB per extract job
  • 50,000 URIs per export

Syntax patterns:

```bash

# Single file

gs://bucket/file.csv

# Wildcard (recommended)

'gs://bucket/prefix-*.csv'

# Sharded with path

'gs://bucket/path/to/shard-*.parquet'

```