🎯

bright-data

🎯Skill

from vm0-ai/vm0-skills

What it does

Enables seamless web scraping of social media platforms like Twitter, Reddit, YouTube, and Instagram using Bright Data's powerful API.

📦

Part of

vm0-ai/vm0-skills(138 items)

bright-data

Installation

Add MarketplaceAdd marketplace to Claude Code

/plugin marketplace add vm0-ai/vm0-skills

Install PluginInstall plugin from marketplace

/plugin install notion@vm0-skills

Install PluginInstall plugin from marketplace

/plugin install slack-webhook@vm0-skills

git cloneClone repository

git clone https://github.com/vm0-ai/vm0-skills.git

📖 Extracted from docs: vm0-ai/vm0-skills

Need more details? View full documentation on GitHub →

4Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Bright Data Web Scraper API via curl. Use this skill for scraping social media (Twitter/X, Reddit, YouTube, Instagram, TikTok), account management, and usage monitoring.

Overview

# Bright Data Web Scraper API

Use the Bright Data API via direct curl calls for social media scraping, web data extraction, and account management.

> Official docs: https://docs.brightdata.com/

---

When to Use

Use this skill when you need to:

Scrape social media - Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
Extract web data - Posts, profiles, comments, engagement metrics
Monitor usage - Track bandwidth and request usage
Manage account - Check status and zones

---

Prerequisites

Sign up at [Bright Data](https://brightdata.com/)
Get your API key from [Settings > Users](https://brightdata.com/cp/setting/users)
Create a Web Scraper dataset in the [Control Panel](https://brightdata.com/cp/datasets) to get your dataset_id

```bash

export BRIGHTDATA_API_KEY="your-api-key"

```

Base URL

```

https://api.brightdata.com

```

---

> Important: When using $VAR in a command that pipes to another command, wrap the command containing $VAR in bash -c '...'. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.

> ```bash

> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

> ```

---

Social Media Scraping

Bright Data supports scraping these social media platforms:

|----------|----------|-------|----------|--------------|

| Twitter/X | ✅ | ✅ | - | - |

| Reddit | - | ✅ | ✅ | - |

| YouTube | ✅ | ✅ | ✅ | - |

| Instagram | ✅ | ✅ | ✅ | ✅ |

| TikTok | ✅ | ✅ | ✅ | - |

| LinkedIn | ✅ | ✅ | - | - |

---

How to Use

1. Trigger Scraping (Asynchronous)

Trigger a data collection job and get a snapshot_id for later retrieval.

Write to /tmp/brightdata_request.json:

```json

[

{"url": "https://twitter.com/username"},

{"url": "https://twitter.com/username2"}

]

```

Then run (replace with your actual dataset ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \

-H "Content-Type: application/json" \

-d @/tmp/brightdata_request.json'

```

Response:

```json

{

"snapshot_id": "s_m4x7enmven8djfqak"

}

```

---

2. Trigger Scraping (Synchronous)

Get results immediately in the response (for small requests).

Write to /tmp/brightdata_request.json:

```json

[

{"url": "https://www.reddit.com/r/technology/comments/xxxxx"}

]

```

Then run (replace with your actual dataset ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \

-H "Content-Type: application/json" \

-d @/tmp/brightdata_request.json'

```

---

3. Monitor Progress

Check the status of a scraping job (replace with your actual snapshot ID):

```bash

bash -c 'curl -s "https://api.brightdata.com/datasets/v3/progress/" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

```

Response:

```json

{

"snapshot_id": "s_m4x7enmven8djfqak",

"dataset_id": "gd_xxxxx",

"status": "running"

}

```

Status values: running, ready, failed

---

4. Download Results

Once status is ready, download the collected data (replace with your actual snapshot ID):

```bash

bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshot/?format=json" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

```

---

5. List Snapshots

Get all your snapshots:

```bash

bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {snapshot_id, dataset_id, status}'

```

---

6. Cancel Snapshot

Cancel a running job (replace with your actual snapshot ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

```

---

Platform-Specific Examples

Twitter/X - Scrape Profile

Write to /tmp/brightdata_request.json:

```json

[

{"url": "https://twitter.com/elonmusk"}

]

```

Then run (replace with your actual dataset ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \

-H "Content-Type: application/json" \

-d @/tmp/brightdata_request.json'

```

Returns: x_id, profile_name, biography, is_verified, followers, following, profile_image_link

Twitter/X - Scrape Posts

Write to /tmp/brightdata_request.json:

```json

[

{"url": "https://twitter.com/username/status/123456789"}

]

```

Then run (replace with your actual dataset ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \

-H "Content-Type: application/json" \

-d @/tmp/brightdata_request.json'

```

Returns: post_id, text, replies, likes, retweets, views, hashtags, media

---

Reddit - Scrape Subreddit Posts

Write to /tmp/brightdata_request.json:

```json

[

{"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}

]

```

Then run (replace with your actual dataset ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \

-H "Content-Type: application/json" \

-d @/tmp/brightdata_request.json'

```

Parameters: url, sort_by (new/top/hot)

Returns: post_id, title, description, num_comments, upvotes, date_posted, community

Reddit - Scrape Comments

Write to /tmp/brightdata_request.json:

```json

[

{"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}

]

```

Then run (replace with your actual dataset ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \

-H "Content-Type: application/json" \

-d @/tmp/brightdata_request.json'

```

Returns: comment_id, user_posted, comment_text, upvotes, replies

---

YouTube - Scrape Video Info

Write to /tmp/brightdata_request.json:

```json

[

{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}

]

```

Then run (replace with your actual dataset ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \

-H "Content-Type: application/json" \

-d @/tmp/brightdata_request.json'

```

Returns: title, views, likes, num_comments, video_length, transcript, channel_name

YouTube - Search by Keyword

Write to /tmp/brightdata_request.json:

```json

[

{"keyword": "artificial intelligence", "num_of_posts": 50}

]

```

Then run (replace with your actual dataset ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \

-H "Content-Type: application/json" \

-d @/tmp/brightdata_request.json'

```

YouTube - Scrape Comments

Write to /tmp/brightdata_request.json:

```json

[

{"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}

]

```

Then run (replace with your actual dataset ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \

-H "Content-Type: application/json" \

-d @/tmp/brightdata_request.json'

```

Returns: comment_text, likes, replies, username, date

---

Instagram - Scrape Profile

Write to /tmp/brightdata_request.json:

```json

[

{"url": "https://www.instagram.com/username"}

]

```

Then run (replace with your actual dataset ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \

-H "Content-Type: application/json" \

-d @/tmp/brightdata_request.json'

```

Returns: followers, post_count, profile_name, is_verified, biography

Instagram - Scrape Posts

Write to /tmp/brightdata_request.json:

```json

[

{

"url": "https://www.instagram.com/username",

"num_of_posts": 20,

"start_date": "01-01-2024",

"end_date": "12-31-2024"

}

]

```

Then run (replace with your actual dataset ID):

```bash

bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \

-H "Content-Type: application/json" \

-d @/tmp/brightdata_request.json'

```

---

Account Management

Check Account Status

```bash

bash -c 'curl -s "https://api.brightdata.com/status" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

```

Response:

```json

{

"status": "active",

"customer": "hl_xxxxxxxx",

"can_make_requests": true,

"ip": "x.x.x.x"

}

```

Get Active Zones

```bash

bash -c 'curl -s "https://api.brightdata.com/zone/get_active_zones" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {name, type}'

```

Get Bandwidth Usage

```bash

bash -c 'curl -s "https://api.brightdata.com/customer/bw" \

-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'

```

---

Getting Dataset IDs

To use the scraping features, you need a dataset_id:

Go to [Bright Data Control Panel](https://brightdata.com/cp/datasets)
Create a new Web Scraper dataset or select an existing one
Choose the platform (Twitter, Reddit, YouTube, etc.)
Copy the dataset_id from the dataset settings

Dataset IDs can also be found in the bandwidth usage API response under the data field keys (e.g., v__ds_api_gd_xxxxx where gd_xxxxx is your dataset ID).

---

Common Parameters

| Parameter | Description | Example |

|-----------|-------------|---------|

| url | Target URL to scrape | https://twitter.com/user |

| keyword | Search keyword | "artificial intelligence" |

| num_of_posts | Limit number of results | 50 |

| start_date | Filter by date (MM-DD-YYYY) | "01-01-2024" |

| end_date | Filter by date (MM-DD-YYYY) | "12-31-2024" |

| sort_by | Sort order (Reddit) | new, top, hot |

| format | Response format | json, csv |

---

Rate Limits

Batch mode: up to 100 concurrent requests
Maximum input size: 1GB per batch
Exceeding limits returns 429 error

---

Guidelines

Create datasets first: Use the Control Panel to create scraper datasets
Use async for large jobs: Use /trigger for discovery and batch operations
Use sync for small jobs: Use /scrape for single URL quick lookups
Check status before download: Poll /progress until status is ready
Respect rate limits: Don't exceed 100 concurrent requests
Date format: Use MM-DD-YYYY for date parameters

More from this repository10

🏪

vm0-ai-vm0-skills🏪Marketplace

VM0 SaaS integration skills for Claude Code

🎯

apify🎯Skill

Extracts web data by running pre-built or custom scrapers on any website, automating data collection and retrieval at scale.

🔌

resend🔌Plugin

Resend email API via curl. Use this skill to send transactional emails, manage contacts, domains, and API keys.

🔌

serpapi🔌Plugin

SerpApi search engine results API via curl. Use this skill to scrape Google, Bing, YouTube, and other search engines.

🔌

linear🔌Plugin

Linear issue tracking API via curl. Use this skill to create, update, and query issues, projects, and teams using GraphQL.

🔌

runway🔌Plugin

Runway AI API for video generation via curl. Use this skill to generate videos from images, text, or other videos.

🔌

deepseek🔌Plugin

DeepSeek AI large language model API via curl. Use this skill for chat completions, reasoning, and code generation with OpenAI-compatible endpoints.

🔌

tavily🔌Plugin

Tavily AI search API integration via curl. Use this skill to perform live web search and RAG-style retrieval.

🔌

youtube🔌Plugin

YouTube Data API v3 via curl. Use this skill to search videos, get video/channel info, list playlists, and fetch comments.

🔌

plausible🔌Plugin

Plausible Analytics API for querying website statistics and managing sites. Use this skill to get visitor counts, pageviews, traffic sources, and manage analytics sites.