🎯

scrapeninja

🎯Skill

from vm0-ai/vm0-skills

VibeIndex|
What it does

Scrapes websites with advanced anti-bot protection using Chrome TLS fingerprint, rotating proxies, and optional JavaScript rendering.

πŸ“¦

Part of

vm0-ai/vm0-skills(138 items)

scrapeninja

Installation

Add MarketplaceAdd marketplace to Claude Code
/plugin marketplace add vm0-ai/vm0-skills
Install PluginInstall plugin from marketplace
/plugin install notion@vm0-skills
Install PluginInstall plugin from marketplace
/plugin install slack-webhook@vm0-skills
git cloneClone repository
git clone https://github.com/vm0-ai/vm0-skills.git
πŸ“– Extracted from docs: vm0-ai/vm0-skills
13Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

High-performance web scraping API with Chrome TLS fingerprint and JS rendering

Overview

# ScrapeNinja

High-performance web scraping API with Chrome TLS fingerprint, rotating proxies, smart retries, and optional JavaScript rendering.

> Official docs: https://scrapeninja.net/docs/

---

When to Use

Use this skill when you need to:

  • Scrape websites with anti-bot protection (Cloudflare, Datadome)
  • Extract data without running a full browser (fast /scrape endpoint)
  • Render JavaScript-heavy pages (/scrape-js endpoint)
  • Use rotating proxies with geo selection (US, EU, Brazil, etc.)
  • Extract structured data with Cheerio extractors
  • Intercept AJAX requests
  • Take screenshots of pages

---

Prerequisites

  1. Get an API key from RapidAPI or APIRoad:

- RapidAPI: https://rapidapi.com/restyler/api/scrapeninja

- APIRoad: https://apiroad.net/marketplace/apis/scrapeninja

Set environment variable:

```bash

# For RapidAPI

export SCRAPENINJA_API_KEY="your-rapidapi-key"

# For APIRoad (use X-Apiroad-Key header instead)

export SCRAPENINJA_API_KEY="your-apiroad-key"

```

---

> Important: When using $VAR in a command that pipes to another command, wrap the command containing $VAR in bash -c '...'. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.

> ```bash

> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'

> ```

How to Use

1. Basic Scrape (Non-JS, Fast)

High-performance scraping with Chrome TLS fingerprint, no JavaScript:

Write to /tmp/scrapeninja_request.json:

```json

{

"url": "https://example.com"

}

```

Then run:

```bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'

```

With custom headers and retries:

Write to /tmp/scrapeninja_request.json:

```json

{

"url": "https://example.com",

"headers": ["Accept-Language: en-US"],

"retryNum": 3,

"timeout": 15

}

```

Then run:

```bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

```

2. Scrape with JavaScript Rendering

For JavaScript-heavy sites (React, Vue, etc.):

Write to /tmp/scrapeninja_request.json:

```json

{

"url": "https://example.com",

"waitForSelector": "h1",

"timeout": 20

}

```

Then run:

```bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, bodyLength: (.body | length)}'

```

With screenshot:

Write to /tmp/scrapeninja_request.json:

```json

{

"url": "https://example.com",

"screenshot": true

}

```

Then run:

```bash

# Get screenshot URL from response

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq -r '.info.screenshot'

```

3. Geo-Based Proxy Selection

Use proxies from specific regions:

Write to /tmp/scrapeninja_request.json:

```json

{

"url": "https://example.com",

"geo": "eu"

}

```

Then run:

```bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .info

```

Available geos: us, eu, br (Brazil), fr (France), de (Germany), 4g-eu

4. Smart Retries

Retry on specific HTTP status codes or text patterns:

Write to /tmp/scrapeninja_request.json:

```json

{

"url": "https://example.com",

"retryNum": 3,

"statusNotExpected": [403, 429, 503],

"textNotExpected": ["captcha", "Access Denied"]

}

```

Then run:

```bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

```

5. Extract Data with Cheerio

Extract structured JSON using Cheerio extractor functions:

Write to /tmp/scrapeninja_request.json:

```json

{

"url": "https://news.ycombinator.com",

"extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\".titleline > a\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\"href\")})).get(); }"

}

```

Then run:

```bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.extractor'

```

6. Intercept AJAX Requests

Capture XHR/fetch responses:

Write to /tmp/scrapeninja_request.json:

```json

{

"url": "https://example.com",

"catchAjaxHeadersUrlMask": "api/data"

}

```

Then run:

```bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.info.catchedAjax'

```

7. Block Resources for Speed

Speed up JS rendering by blocking images and media:

Write to /tmp/scrapeninja_request.json:

```json

{

"url": "https://example.com",

"blockImages": true,

"blockMedia": true

}

```

Then run:

```bash

bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'

```

---

API Endpoints

| Endpoint | Description |

|----------|-------------|

| /scrape | Fast non-JS scraping with Chrome TLS fingerprint |

| /scrape-js | Full Chrome browser with JS rendering |

| /v2/scrape-js | Enhanced JS rendering for protected sites (APIRoad only) |

---

Request Parameters

Common Parameters (all endpoints)

| Parameter | Type | Default | Description |

|-----------|------|---------|-------------|

| url | string | required | URL to scrape |

| headers | string[] | - | Custom HTTP headers |

| retryNum | int | 1 | Number of retry attempts |

| geo | string | us | Proxy geo: us, eu, br, fr, de, 4g-eu |

| proxy | string | - | Custom proxy URL (overrides geo) |

| timeout | int | 10/16 | Timeout per attempt in seconds |

| textNotExpected | string[] | - | Text patterns that trigger retry |

| statusNotExpected | int[] | [403, 502] | HTTP status codes that trigger retry |

| extractor | string | - | Cheerio extractor function |

JS Rendering Parameters (`/scrape-js`, `/v2/scrape-js`)

| Parameter | Type | Default | Description |

|-----------|------|---------|-------------|

| waitForSelector | string | - | CSS selector to wait for |

| postWaitTime | int | - | Extra wait time after load (1-12s) |

| screenshot | bool | true | Take page screenshot |

| blockImages | bool | false | Block image loading |

| blockMedia | bool | false | Block CSS/fonts loading |

| catchAjaxHeadersUrlMask | string | - | URL pattern to intercept AJAX |

| viewport | object | 1920x1080 | Custom viewport size |

---

Response Format

```json

{

"info": {

"statusCode": 200,

"finalUrl": "https://example.com",

"headers": ["content-type: text/html"],

"screenshot": "base64-encoded-png",

"catchedAjax": {

"url": "https://example.com/api/data",

"method": "GET",

"body": "...",

"status": 200

}

},

"body": "...",

"extractor": { "extracted": "data" }

}

```

---

Guidelines

  1. Start with /scrape: Use the fast non-JS endpoint first, only switch to /scrape-js if needed
  2. Retries: Set retryNum to 2-3 for unreliable sites
  3. Geo Selection: Use eu for European sites, us for American sites
  4. Extractors: Test extractors at https://scrapeninja.net/cheerio-sandbox/
  5. Blocked Sites: For Cloudflare/Datadome protected sites, use /v2/scrape-js via APIRoad
  6. Screenshots: Set screenshot: false to speed up JS rendering
  7. Rate Limits: Check your plan limits on RapidAPI/APIRoad dashboard

---

Tools

  • Playground: https://scrapeninja.net/scraper-sandbox
  • Cheerio Sandbox: https://scrapeninja.net/cheerio-sandbox
  • cURL Converter: https://scrapeninja.net/curl-to-scraper

More from this repository10