🎯

image-generate

🎯Skill

from agntswrm/agent-media

What it does

Generates images from text prompts using AI image generation services like Fal.ai, Replicate, or Runpod.

image-generate

Installation

Install skill:

npx skills add https://github.com/agntswrm/agent-media --skill image-generate

AddedJan 25, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Overview

# agent-media

Media processing CLI for AI agents.

Image: generate, edit, remove-background, resize, convert, extend, crop
Video: generate (text-to-video and image-to-video)
Audio: extract from video, transcribe (with speaker identification)

Installation

Global

```bash

npm install -g agent-media@latest

```

From Source

```bash

git clone https://github.com/agntswrm/agent-media

cd agent-media

pnpm install && pnpm build && pnpm link --global

```

Via bunx / npx

Run directly without installing:

```bash

bunx agent-media@latest --help

npx agent-media@latest --help

```

Skills for AI Agents

Install agent-media skills to your coding agent (Claude Code, Cursor, Codex, etc.):

```bash

npx skills add agntswrm/agent-media

```

This adds media processing skills that your AI agent can use automatically. Available skills:

agent-media - Overview of all capabilities
image-generate - Generate images from text
image-edit - Edit images with text prompts
image-resize - Resize images
image-convert - Convert image formats
image-extend - Extend image canvas with padding
image-remove-background - Remove backgrounds
image-crop - Crop images to specified dimensions
audio-extract - Extract audio from video
audio-transcribe - Transcribe audio to text
video-generate - Generate videos from text or images

Quick Start

```bash

# generate an image

agent-media image generate --prompt "a robot" --out rob.png

# remove background

agent-media image remove-background --in rob.png --out rob_nobg.png

# edit the image

agent-media image edit --in rob_nobg.png --prompt "the robot is sitting on a bench next to a cat, in the background you can see the Eiffel Tower in Paris" --out rob_cat_paris.png

# generate a video with audio (cat meows, robot speaks!)

agent-media video generate --in rob_cat_paris.png --prompt "the cat meows and the robot says: \"Yes, me too.\"" --audio --out rob_cat_video.mp4

# extract audio from video

agent-media audio extract --in rob_cat_video.mp4 --out rob_cat_audio.mp3

# transcribe the audio

agent-media audio transcribe --in rob_cat_audio.mp3

```

Requirements

Node.js >= 18.0.0
API key from [fal.ai](https://fal.ai/dashboard/keys), [Replicate](https://replicate.com/account/api-tokens), [Runpod](https://www.runpod.io/console/user/settings), or [AI Gateway](https://vercel.com/ai-gateway) for AI features

Local processing (no API key): resize, convert, extend, crop, audio extract, remove-background, transcribe

Cloud processing (API key required): image generate, image edit, video generate, remove-background, transcribe

> Note: You may see a mutex lock failed error when using local remove-background or transcribe — ignore it, the output is correct if JSON shows "ok": true.

---

image

```bash

agent-media image resize --in [options]

agent-media image convert --in --format

agent-media image extend --in --padding --color

agent-media image crop --in --width --height

agent-media image generate --prompt

agent-media image edit --in --prompt

agent-media image remove-background --in

```

resize

local

```bash

agent-media image resize --in sunset-mountains.jpg --width 800

agent-media image resize --in sunset-mountains.jpg --height 600

agent-media image resize --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --width 800

```

| Option | Description |

|--------|-------------|

| --in | Input file path or URL (required) |

| --width | Target width in pixels |

| --height | Target height in pixels |

| --out | Output path, filename or directory (default: ./) |

convert

local

```bash

agent-media image convert --in sunset-mountains.png --format webp

agent-media image convert --in sunset-mountains.jpg --format png

agent-media image convert --in https://ytrzap04kkm0giml.public.blob.vercel-stor

More from this repository8

🎯

image-remove-background🎯Skill

Removes the background from an input image, creating a transparent or clean-cut image with the main subject isolated.

🎯

video-generate🎯Skill

Generates videos from text descriptions or input images, optionally adding audio narration or dialogue.

🎯

agent-media🎯Skill

Provides AI-powered media processing capabilities for coding agents, enabling automatic generation, editing, conversion, and analysis of images, videos, and audio through a CLI tool with skills lik...

🎯

image-convert🎯Skill

Converts image files between different formats (e.g., PNG to JPEG, WEBP to PNG) while preserving image quality and metadata.

🎯

image-extend🎯Skill

image-extend skill from agntswrm/agent-media

🎯

image-edit🎯Skill

Edits images by applying text-based modifications, allowing users to transform existing images through descriptive prompts that can change content, add elements, or modify scenes.

🎯

image-resize🎯Skill

Resizes images to specified dimensions, allowing users to change image width and height without losing quality.

🎯

image-crop🎯Skill

Crops images to specified dimensions, allowing precise control over image size and framing.