gemini-computer-use
🎯Skillfrom am-will/codex-skills
Collection of agent skills for planning, documentation access, frontend development, and browser automation, featuring multi-agent orchestration with planner, parallel tasks, and LLM council capabilities.
Overview
Gemini Computer Use is a skill from the CodexSkills collection that enables building and running Gemini 2.5 Computer Use browser-control agents with Playwright. It implements a full agent loop of screenshot capture, function call parsing, action execution, and response handling, with a built-in safety confirmation mechanism for risky UI actions.
Key Features
- Browser Control Agent Loop - Implements the full screenshot-to-action-to-response cycle using Gemini 2.5 Computer Use model
- Playwright Integration - Uses Playwright for browser automation with support for Chromium, Chrome, Edge, and custom browsers like Brave
- Safety Confirmation System - Prompts users for confirmation before executing potentially risky browser actions flagged by the model
- Configurable Turn Limits - Set maximum interaction turns and exclude specific risky actions from execution
- Sandboxed Execution - Designed to run in sandboxed browser profiles or containers for safe automated browsing
Who is this for?
This skill is for developers who need to automate web browser tasks using AI vision models, such as testing web applications, scraping dynamic content, or building browser-based automation workflows. It is particularly useful for those who want Gemini-powered browser control with built-in safety guardrails.
Same repository
am-will/codex-skills(19 items)
Installation
npx vibeindex add am-will/codex-skills --skill gemini-computer-usenpx skills add am-will/codex-skills --skill gemini-computer-use~/.claude/skills/gemini-computer-use/SKILL.mdSKILL.md
More from this repository10
Frontend design skill for creating distinctive, production-grade web interfaces with high design quality, avoiding generic AI aesthetics through bold creative choices and exceptional attention to detail
Collection of agent skills for planning, documentation access, frontend development, and browser automation, featuring multi-agent orchestration with planner, parallel tasks, and LLM council capabilities.
Context7 documentation fetcher skill for retrieving current library documentation via Context7 API, proactively looking up APIs for React, Next.js, Supabase, and other libraries instead of relying on outdated knowledge
Creates comprehensive, phased implementation plans with sprints and atomic tasks for planning features, breaking down work, and building structured roadmaps.
Reads and searches GitHub repository documentation via the gitmcp.io MCP service, converting GitHub URLs to documentation endpoints.
Collection of agent skills for planning, documentation access, frontend development, and browser automation, featuring multi-agent orchestration with planner, parallel tasks, and LLM council capabilities.
Detailed implementation planning skill that creates phased plans with sprints and atomic tasks, covering codebase research, requirements clarification, and structured implementation phases for bugs, features, or tasks
Collection of agent skills for planning, documentation access, frontend development, and browser automation, featuring multi-agent orchestration with planner, parallel tasks, and LLM council capabilities.
Queries the OpenAI developer documentation MCP server via CLI (curl/jq) to search, browse, and fetch authoritative docs for the OpenAI API, SDKs, ChatGPT Apps SDK, Codex, and MCP integrations with up-to-date official guidance.
A collection of Codex/agent skills for planning, documentation access, frontend development, and browser automation, including parallel task execution, LLM council multi-agent orchestration, Context7 doc fetching, and Gemini Computer Use browser control.