🎯

dev-browser

🎯Skill

from yuanxiao0115/agent-skills

What it does

Automates browser tasks by maintaining page state, enabling navigation, form filling, screenshots, and web data extraction across script executions.

📦

Part of

yuanxiao0115/agent-skills(22 items)

dev-browser

Installation

Quick InstallInstall with npx

npx skills add yuanxiao0115/agent-skills/skills --all

Quick InstallInstall with npx

npx skills add yuanxiao0115/agent-skills/skills/read-github

Quick InstallInstall with npx

npx skills add yuanxiao0115/agent-skills/skills --skill read-github planner

Quick InstallInstall with npx

npx skills add yuanxiao0115/agent-skills/skills /path/to/your/project

npm installInstall npm package

npm install -g skills

+ 22 more commands

📖 Extracted from docs: yuanxiao0115/agent-skills

Need more details? View full documentation on GitHub →

4Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.

Overview

# Dev Browser Skill

Browser automation that maintains page state across script executions. Write small, focused scripts to accomplish tasks incrementally. Once you've proven out part of a workflow and there is repeated work to be done, you can write a script to do the repeated work in a single execution.

Choosing Your Approach

Local/source-available sites: Read the source code first to write selectors directly
Unknown page layouts: Use getAISnapshot() to discover elements and selectSnapshotRef() to interact with them
Visual feedback: Take screenshots to see what the user sees

Setup

Two modes available. Ask the user if unclear which to use.

Standalone Mode (Default)

Launches a new Chromium browser for fresh automation sessions.

```bash

cd ~/.codex/skills/browser/dev-browser && ./server.sh &

```

Add --headless flag if user requests it. Wait for the Ready message before running scripts.

Extension Mode

Connects to user's existing Chrome browser. Use this when:

The user is already logged into sites and wants you to do things behind an authed experience that isn't local dev.
The user asks you to use the extension

Important: The core flow is still the same. You create named pages inside of their browser.

Start the relay server:

```bash

cd ~/.codex/skills/browser/dev-browser && npm i && npm run start-extension &

```

Wait for Waiting for extension to connect...

Workflow:

Scripts call client.page("name") just like the normal mode to create new pages / connect to existing ones.
Automation runs on the user's actual browser session

If the extension hasn't connected yet, tell the user to launch and activate it. Download link: https://github.com/SawyerHood/dev-browser/releases

Writing Scripts

> Run all scripts from ~/.codex/skills/browser/dev-browser/ directory. The @/ import alias requires this directory's config.

Execute scripts inline using heredocs:

```bash

cd ~/.codex/skills/browser/dev-browser && npx tsx <<'EOF'

import { connect, waitForPageLoad } from "@/client.js";

const client = await connect();

const page = await client.page("example"); // descriptive name like "cnn-homepage"

await page.setViewportSize({ width: 1280, height: 800 });

await page.goto("https://example.com");

await waitForPageLoad(page);

console.log({ title: await page.title(), url: page.url() });

await client.disconnect();

EOF

```

Write to tmp/ files only when the script needs reuse, is complex, or user explicitly requests it.

Key Principles

Small scripts: Each script does ONE thing (navigate, click, fill, check)
Evaluate state: Log/return state at the end to decide next steps
Descriptive page names: Use "checkout", "login", not "main"
Disconnect to exit: await client.disconnect() - pages persist on server
Plain JS in evaluate: page.evaluate() runs in browser - no TypeScript syntax

Workflow Loop

Follow this pattern for complex tasks:

Write a script to perform one action
Run it and observe the output
Evaluate - did it work? What's the current state?
Decide - is the task complete or do we need another script?
Repeat until task is done

No TypeScript in Browser Context

Code passed to page.evaluate() runs in the browser, which doesn't understand TypeScript:

```typescript

// ✅ Correct: plain JavaScript

const text = await page.evaluate(() => {

return document.body.innerText;

});

// ❌ Wrong: TypeScript syntax will fail at runtime

const text = await page.evaluate(() => {

const el: HTMLElement = document.body; // Type annotation breaks in browser!

return el.innerText;

});

```

Scraping Data

For scraping large datasets, intercept and replay network requests rather than scrolling the DOM. See [references/scraping.md](references/scraping.md) for the complete guide covering request capture, schema discovery, and paginated API replay.

Client API

```typescript

const client = await connect();

const page = await client.page("name"); // Get or create named page

const pages = await client.list(); // List all page names

await client.close("name"); // Close a page

await client.disconnect(); // Disconnect (pages persist)

// ARIA Snapshot methods

const snapshot = await client.getAISnapshot("name"); // Get accessibility tree

const element = await client.selectSnapshotRef("name", "e5"); // Get element by ref

```

The page object is a standard Playwright Page.

Waiting

```typescript

import { waitForPageLoad } from "@/client.js";

await waitForPageLoad(page); // After navigation

await page.waitForSelector(".results"); // For specific elements

await page.waitForURL("**/success"); // For specific URL

```

Inspecting Page State

Screenshots

```typescript

await page.screenshot({ path: "tmp/screenshot.png" });

await page.screenshot({ path: "tmp/full.png", fullPage: true });

```

ARIA Snapshot (Element Discovery)

Use getAISnapshot() to discover page elements. Returns YAML-formatted accessibility tree:

```yaml

banner:

- link "Hacker News" [ref=e1]

- navigation:

- link "new" [ref=e2]

main:

- list:

- listitem:

- link "Article Title" [ref=e8]

- link "328 comments" [ref=e9]

contentinfo:

- textbox [ref=e10]

- /placeholder: "Search"

```

Interpreting refs:

[ref=eN] - Element reference for interaction (visible, clickable elements only)
[checked], [disabled], [expanded] - Element states
[level=N] - Heading level
/url:, /placeholder: - Element properties

Interacting with refs:

```typescript

const snapshot = await client.getAISnapshot("hackernews");

console.log(snapshot); // Find the ref you need

const element = await client.selectSnapshotRef("hackernews", "e2");

await element.click();

```

Error Recovery

Page state persists after failures. Debug with:

```bash

cd ~/.codex/skills/browser/dev-browser && npx tsx <<'EOF'

import { connect } from "@/client.js";

const client = await connect();

const page = await client.page("hackernews");

await page.screenshot({ path: "tmp/debug.png" });

console.log({

url: page.url(),

title: await page.title(),

bodyText: await page.textContent("body").then((t) => t?.slice(0, 200)),

});

await client.disconnect();

EOF

```

More from this repository10

🎯

read-github🎯Skill

Retrieves and searches GitHub repository documentation and code via gitmcp.io, enabling easy exploration of project contents and details.

🎯

context7🎯Skill

Retrieves up-to-date library documentation via Context7 API, ensuring access to current technical references beyond training data limitations.

🎯

agent browser🎯Skill

Enables AI-powered browser automation using a fast Rust-based CLI with Node.js fallback, allowing programmatic web interaction and control.

🎯

frontend-design🎯Skill

Crafts distinctive, production-grade frontend interfaces with exceptional design quality, avoiding generic AI aesthetics.

🎯

plan-harder🎯Skill

Generates comprehensive, phased implementation plans with atomic tasks, sprints, and detailed requirements analysis for complex development requests.

🎯

planner🎯Skill

Generates comprehensive, phased implementation plans with detailed sprints, atomic tasks, and clear requirements clarification across multiple project stages.

🎯

parallel-task🎯Skill

Launches parallel subagents to simultaneously execute tasks from a markdown plan file, triggered by "/parallel-task" command.

🎯

cli-design-guidelines🎯Skill

Provides comprehensive design guidelines for creating user-friendly, robust, and intuitive command-line interfaces with best practices and human-first UX principles.

🎯

web-design-guidelines🎯Skill

Provides comprehensive web design guidelines and best practices for creating user-friendly, accessible, and visually appealing websites.

🎯

vue-best-practices🎯Skill

Enforces Vue 3 TypeScript best practices by providing comprehensive type-safe rules for component props, templates, modules, and performance optimization.