gcp-bq-data-export
π―Skillfrom funnelenvy/agents_webinar_demos
Exports BigQuery data to Cloud Storage in various formats like CSV, JSON, Avro, and Parquet with flexible compression and export options.
Installation
npx skills add https://github.com/funnelenvy/agents_webinar_demos --skill gcp-bq-data-exportSkill Details
Use when exporting BigQuery data to Cloud Storage, extracting tables to CSV, JSON, Avro, or Parquet formats, or using EXPORT DATA statements. Covers bq extract command, format options, compression, and wildcard exports.
Overview
# BigQuery Data Export
Use this skill when exporting data from BigQuery to Cloud Storage or local files.
Basic Extract Command
```bash
bq extract \
--location=LOCATION \
--destination_format=FORMAT \
--compression=COMPRESSION \
PROJECT:DATASET.TABLE \
gs://bucket/file.ext
```
Extract to CSV
Basic CSV Export
```bash
bq extract \
--destination_format=CSV \
--print_header=true \
dataset.table \
gs://bucket/export.csv
```
CSV with Options
```bash
bq extract \
--destination_format=CSV \
--compression=GZIP \
--field_delimiter=',' \
--print_header=true \
dataset.table \
gs://bucket/export.csv.gz
```
CSV flags:
--field_delimiter=','- Column separator--print_header=true/false- Include header row
Tab-Delimited Export
```bash
bq extract \
--destination_format=CSV \
--field_delimiter=$'\t' \
dataset.table \
gs://bucket/export.tsv
```
Extract to JSON
Newline-Delimited JSON
```bash
bq extract \
--destination_format=NEWLINE_DELIMITED_JSON \
--compression=GZIP \
dataset.table \
gs://bucket/export.json.gz
```
Output format:
```json
{"id": 1, "name": "Alice", "amount": 100.50}
{"id": 2, "name": "Bob", "amount": 250.75}
```
Extract to Avro
```bash
bq extract \
--destination_format=AVRO \
--compression=SNAPPY \
dataset.table \
gs://bucket/export.avro
```
Benefits:
- Preserves schema
- Efficient binary format
- Fast re-import to BigQuery
Extract to Parquet
```bash
bq extract \
--destination_format=PARQUET \
dataset.table \
gs://bucket/export.parquet
```
Benefits:
- Columnar format
- Good compression
- Compatible with many analytics tools
Compression Options
Available compression:
GZIP- Good compression, slower (CSV, JSON, Avro)SNAPPY- Fast, moderate compression (Avro, Parquet)DEFLATE- Similar to GZIP (Avro)NONE- No compression (fastest)
Example:
```bash
bq extract \
--destination_format=CSV \
--compression=GZIP \
dataset.table \
gs://bucket/export.csv.gz
```
Large Table Exports
Using Wildcards (>1 GB)
BigQuery limitation: 1 GB per file
Solution: Use wildcard in destination
```bash
bq extract \
--destination_format=CSV \
dataset.large_table \
'gs://bucket/export-*.csv'
```
Output:
```
gs://bucket/export-000000000000.csv
gs://bucket/export-000000000001.csv
gs://bucket/export-000000000002.csv
...
```
Shard Pattern
```bash
# Create sharded exports
bq extract \
--destination_format=AVRO \
dataset.large_table \
'gs://bucket/shard/data-*.avro'
```
Note: Number of files depends on data size, not configurable.
Export Specific Partitions
Single Partition
```bash
# Export 2024-01-15 partition only
bq extract \
--destination_format=CSV \
dataset.partitioned_table\$20240115 \
gs://bucket/export_20240115.csv
```
Date Range (use WHERE in EXPORT DATA)
See EXPORT DATA section below.
EXPORT DATA SQL Statement
Basic EXPORT DATA
```sql
EXPORT DATA OPTIONS(
uri='gs://bucket/export-*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=','
) AS
SELECT * FROM project.dataset.table
WHERE date >= '2024-01-01';
```
Export Query Results
```sql
EXPORT DATA OPTIONS(
uri='gs://bucket/aggregated-*.parquet',
format='PARQUET',
overwrite=true
) AS
SELECT
customer_id,
DATE(order_timestamp) as order_date,
SUM(amount) as total_amount,
COUNT(*) as order_count
FROM project.dataset.orders
WHERE DATE(order_timestamp) >= '2024-01-01'
GROUP BY customer_id, order_date;
```
Format Options
CSV:
```sql
EXPORT DATA OPTIONS(
uri='gs://bucket/*.csv',
format='CSV',
header=true,
field_delimiter=',',
compression='GZIP'
) AS SELECT ...;
```
JSON:
```sql
EXPORT DATA OPTIONS(
uri='gs://bucket/*.json',
format='JSON',
compression='GZIP'
) AS SELECT ...;
```
Avro:
```sql
EXPORT DATA OPTIONS(
uri='gs://bucket/*.avro',
format='AVRO',
compression='SNAPPY'
) AS SELECT ...;
```
Parquet:
```sql
EXPORT DATA OPTIONS(
uri='gs://bucket/*.parquet',
format='PARQUET'
) AS SELECT ...;
```
Export to Local Files (Not Recommended)
Small Results via Query
```bash
# For small datasets only
bq query \
--format=csv \
--max_rows=10000 \
--use_legacy_sql=false \
'SELECT * FROM project.dataset.table LIMIT 10000' \
> local_export.csv
```
Limitation: Not suitable for large datasets. Use GCS for production.
Export Scheduled (Automation)
Using Cloud Scheduler + EXPORT DATA
```bash
# Create scheduled query
bq mk --transfer_config \
--target_dataset=dataset \
--display_name='Daily Export' \
--schedule='every 24 hours' \
--params='{"query":"EXPORT DATA OPTIONS(uri='\''gs://bucket/daily-.csv'\'', format='\''CSV'\'') AS SELECT FROM dataset.table WHERE date = CURRENT_DATE()"}' \
--data_source=scheduled_query
```
Using Cloud Composer (Airflow)
```python
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
export_task = BigQueryInsertJobOperator(
task_id='export_to_gcs',
configuration={
'extract': {
'sourceTable': {
'projectId': 'project',
'datasetId': 'dataset',
'tableId': 'table'
},
'destinationUris': ['gs://bucket/export-*.csv'],
'destinationFormat': 'CSV'
}
}
)
```
Monitoring Exports
Check Extract Jobs
```bash
bq ls --jobs --max_results=10
```
Job Details
```bash
bq show -j JOB_ID
```
Failed Exports
```sql
SELECT
job_id,
user_email,
error_result.message as error_message,
creation_time
FROM region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
job_type = 'EXTRACT'
AND state = 'DONE'
AND error_result IS NOT NULL
AND creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
ORDER BY creation_time DESC;
```
Export Best Practices
Format Selection
CSV:
- β Human-readable
- β Universal compatibility
- β Larger file size
- β No schema preservation
JSON:
- β Human-readable
- β Preserves nested structures
- β Larger file size
Avro:
- β Preserves schema
- β Efficient binary format
- β Fast BigQuery re-import
- β Not human-readable
Parquet:
- β Columnar format
- β Good compression
- β Analytics tool compatible
- β Not human-readable
Compression Recommendations
For long-term storage: GZIP (best compression)
For processing pipelines: SNAPPY (fast)
For network transfer: GZIP (smaller size)
For speed: NONE (no compression overhead)
Wildcards for Large Exports
Always use wildcards for:
- Tables >500 MB
- Unknown data size
- Distributed processing
Example:
```bash
bq extract dataset.large_table 'gs://bucket/export-*.avro'
```
Cost Considerations
Export Costs
- BigQuery extract: FREE
- GCS storage: Standard GCS pricing
- Network egress: Free within same region
Optimization
Reduce costs:
- Export only needed columns (use EXPORT DATA with SELECT)
- Filter rows before export (WHERE clause)
- Use compression (smaller files)
- Export to GCS in same region as BigQuery
Example - filtered export:
```sql
EXPORT DATA OPTIONS(
uri='gs://bucket/*.parquet',
format='PARQUET'
) AS
SELECT customer_id, order_date, amount -- Only needed columns
FROM project.dataset.orders
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) -- Last 30 days only
AND amount > 0; -- Filter out zero amounts
```
Common Patterns
Daily Export
```bash
#!/bin/bash
DATE=$(date +%Y%m%d)
bq extract \
--destination_format=CSV \
--compression=GZIP \
dataset.table\$$DATE \
gs://bucket/exports/daily_export_$DATE.csv.gz
```
Incremental Export
```sql
-- Create temp table with new data
CREATE TEMP TABLE new_data AS
SELECT * FROM project.dataset.table
WHERE updated_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
-- Export only new data
EXPORT DATA OPTIONS(
uri='gs://bucket/incremental/data-*.parquet',
format='PARQUET'
) AS
SELECT * FROM new_data;
```
Export with Transformation
```sql
EXPORT DATA OPTIONS(
uri='gs://bucket/transformed-*.csv',
format='CSV'
) AS
SELECT
customer_id,
UPPER(customer_name) as customer_name,
ROUND(amount, 2) as amount,
FORMAT_DATE('%Y-%m-%d', order_date) as order_date
FROM project.dataset.orders
WHERE order_date >= '2024-01-01';
```
Troubleshooting
"Permission denied"
Problem: No write access to GCS bucket
Solution: Grant BigQuery service account Storage Object Creator role
"Table too large"
Problem: Export exceeds 1GB without wildcard
Solution: Use wildcard pattern gs://bucket/export-*.csv
"Invalid URI"
Problem: Incorrect GCS path format
Solution: Use gs://bucket/path/file format, not https://
"Quota exceeded"
Problem: Too many extract jobs
Solution: Batch exports or increase quota
Quick Reference
Format recommendations:
- Re-import to BigQuery β Avro
- Analytics tools β Parquet
- Data exchange β CSV
- API consumption β JSON
Compression guide:
- Best ratio β GZIP
- Fastest β SNAPPY or NONE
- Balance β SNAPPY
Size limits:
- 1 GB per file (use wildcards)
- 10 TB per extract job
- 50,000 URIs per export
Syntax patterns:
```bash
# Single file
gs://bucket/file.csv
# Wildcard (recommended)
'gs://bucket/prefix-*.csv'
# Sharded with path
'gs://bucket/path/to/shard-*.parquet'
```
More from this repository9
Queries and analyzes code using OpenAI Codex CLI, providing read-only codebase exploration and technical insights.
Automates browser interactions using Playwright Python, enabling web scraping, screenshot capture, and dynamic page testing with precise element selection.
Generates precise CSV export formats for Google Ads Enhanced Conversions, Customer Match, and Meta Custom Audiences with specific data schema requirements.
gcp-bq-data-loading skill from funnelenvy/agents_webinar_demos
gcp-cli-gotchas skill from funnelenvy/agents_webinar_demos
Queries code and files using Google Gemini CLI, providing AI-powered insights, explanations, and analysis for codebases.
Automates BigQuery table operations like creation, schema updates, data loading, and management across Google Cloud Platform projects
Syncs contacts and lists to HubSpot CRM using a private API token, enabling seamless data integration and management.
Automates anomaly detection workflows by connecting data sources, triggering alerts in Slack, and orchestrating multi-step notification processes using N8N workflow automation.