🎯

image-generation

🎯Skill

from xiangyu-cas/vision-skills

What it does

Generates, edits, and composes images using Google's Gemini AI with text prompts, multi-reference inputs, and search grounding.

📦

Part of

xiangyu-cas/vision-skills(3 items)

image-generation

Installation

git cloneClone repository

git clone https://github.com/Xiangyu-CAS/Vision-Skills.git

📖 Extracted from docs: xiangyu-cas/vision-skills

Need more details? View full documentation on GitHub →

3Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Gemini image generation and editing skill for text-to-image, image-to-image edits, multi-reference composition, and Google Search grounding. Use when creating or modifying images via Gemini (default model gemini-3-pro-image-preview) with the Python SDK.

Overview

# Image Generation with Gemini

Use this skill when the user asks to generate or edit images with Gemini using the Python SDK. Default to gemini-3-pro-image-preview, and mention gemini-2.5-flash-image only as an optional faster/cheaper alternative.

Workflow

1) Identify task type (text-to-image, edit, or multi-reference).

2) Ensure GEMINI_API_KEY is available (env or stored in .env), then use the Python SDK. This will make network requests to the Gemini API

3) Choose model + output (response_modalities=["IMAGE"] if image-only) and run. Generation can take ~30 seconds; allow 30–60 seconds before retrying.

4) Save returned images with part.as_image(); if none, report a clear error.

Use these references

references/python.md for Python SDK usage

Response handling (Python SDK)

Use part.as_image() to access image outputs and save them. If no image parts are returned, surface a clear error and suggest checking the API key, model name, and response modalities.

Timing note

Image generation may take around 30 seconds. When running commands via the shell tool, set a longer timeout (e.g., 60–120 seconds) to avoid premature timeouts.

More from this repository2

🎯

video-generation🎯Skill

Generates high-quality videos from text or images using Gemini's Veo 3.1, with customizable parameters like resolution, duration, and interpolation.

🎯

bbdown-cli🎯Skill

Downloads Bilibili videos via CLI, supporting 720p preference, authentication methods, and flexible output configuration.