🎯

gemini-computer-use

🎯Skill

from am-will/codex-skills

VibeIndex|
What it does
|

Collection of agent skills for planning, documentation access, frontend development, and browser automation, featuring multi-agent orchestration with planner, parallel tasks, and LLM council capabilities.

Overview

Gemini Computer Use is a skill from the CodexSkills collection that enables building and running Gemini 2.5 Computer Use browser-control agents with Playwright. It implements a full agent loop of screenshot capture, function call parsing, action execution, and response handling, with a built-in safety confirmation mechanism for risky UI actions.

Key Features

  • Browser Control Agent Loop - Implements the full screenshot-to-action-to-response cycle using Gemini 2.5 Computer Use model
  • Playwright Integration - Uses Playwright for browser automation with support for Chromium, Chrome, Edge, and custom browsers like Brave
  • Safety Confirmation System - Prompts users for confirmation before executing potentially risky browser actions flagged by the model
  • Configurable Turn Limits - Set maximum interaction turns and exclude specific risky actions from execution
  • Sandboxed Execution - Designed to run in sandboxed browser profiles or containers for safe automated browsing

Who is this for?

This skill is for developers who need to automate web browser tasks using AI vision models, such as testing web applications, scraping dynamic content, or building browser-based automation workflows. It is particularly useful for those who want Gemini-powered browser control with built-in safety guardrails.

📦

Same repository

am-will/codex-skills(19 items)

gemini-computer-use

Installation

Vibe Index InstallInstalls to .claude/skills/ - auto-recognized by Claude Code
npx vibeindex add am-will/codex-skills --skill gemini-computer-use
skills.sh Install⚠ Installs to .agents/skills/ - may not be auto-recognized by Claude Code
npx skills add am-will/codex-skills --skill gemini-computer-use
Manual InstallCopy SKILL.md content and save to the path below
~/.claude/skills/gemini-computer-use/SKILL.md

SKILL.md

1,159Installs
322
-
Last UpdatedJan 29, 2026

More from this repository10

🎯
frontend-design🎯Skill

Frontend design skill for creating distinctive, production-grade web interfaces with high design quality, avoiding generic AI aesthetics through bold creative choices and exceptional attention to detail

🎯
frontend responsive design standards🎯Skill

Collection of agent skills for planning, documentation access, frontend development, and browser automation, featuring multi-agent orchestration with planner, parallel tasks, and LLM council capabilities.

🎯
context7🎯Skill

Context7 documentation fetcher skill for retrieving current library documentation via Context7 API, proactively looking up APIs for React, Next.js, Supabase, and other libraries instead of relying on outdated knowledge

🎯
planner🎯Skill

Creates comprehensive, phased implementation plans with sprints and atomic tasks for planning features, breaking down work, and building structured roadmaps.

🎯
read-github🎯Skill

Reads and searches GitHub repository documentation via the gitmcp.io MCP service, converting GitHub URLs to documentation endpoints.

🎯
parallel-task🎯Skill

Collection of agent skills for planning, documentation access, frontend development, and browser automation, featuring multi-agent orchestration with planner, parallel tasks, and LLM council capabilities.

🎯
plan-harder🎯Skill

Detailed implementation planning skill that creates phased plans with sprints and atomic tasks, covering codebase research, requirements clarification, and structured implementation phases for bugs, features, or tasks

🎯
markdown-url🎯Skill

Collection of agent skills for planning, documentation access, frontend development, and browser automation, featuring multi-agent orchestration with planner, parallel tasks, and LLM council capabilities.

🎯
openai-docs-skill🎯Skill

Queries the OpenAI developer documentation MCP server via CLI (curl/jq) to search, browse, and fetch authoritative docs for the OpenAI API, SDKs, ChatGPT Apps SDK, Codex, and MCP integrations with up-to-date official guidance.

🎯
agent browser🎯Skill

A collection of Codex/agent skills for planning, documentation access, frontend development, and browser automation, including parallel task execution, LLM council multi-agent orchestration, Context7 doc fetching, and Gemini Computer Use browser control.