image-remove-background
π―Skillfrom agntswrm/agent-media
Removes the background from an input image, creating a transparent or clean-cut image with the main subject isolated.
Installation
npx skills add https://github.com/agntswrm/agent-media --skill image-remove-backgroundSkill Details
Overview
# agent-media
Media processing CLI for AI agents.
- Image: generate, edit, remove-background, resize, convert, extend, crop
- Video: generate (text-to-video and image-to-video)
- Audio: extract from video, transcribe (with speaker identification)
Installation
Global
```bash
npm install -g agent-media@latest
```
From Source
```bash
git clone https://github.com/agntswrm/agent-media
cd agent-media
pnpm install && pnpm build && pnpm link --global
```
Via bunx / npx
Run directly without installing:
```bash
bunx agent-media@latest --help
npx agent-media@latest --help
```
Skills for AI Agents
Install agent-media skills to your coding agent (Claude Code, Cursor, Codex, etc.):
```bash
npx skills add agntswrm/agent-media
```
This adds media processing skills that your AI agent can use automatically. Available skills:
agent-media- Overview of all capabilitiesimage-generate- Generate images from textimage-edit- Edit images with text promptsimage-resize- Resize imagesimage-convert- Convert image formatsimage-extend- Extend image canvas with paddingimage-remove-background- Remove backgroundsimage-crop- Crop images to specified dimensionsaudio-extract- Extract audio from videoaudio-transcribe- Transcribe audio to textvideo-generate- Generate videos from text or images
Quick Start
```bash
# generate an image
agent-media image generate --prompt "a robot" --out rob.png
# remove background
agent-media image remove-background --in rob.png --out rob_nobg.png
# edit the image
agent-media image edit --in rob_nobg.png --prompt "the robot is sitting on a bench next to a cat, in the background you can see the Eiffel Tower in Paris" --out rob_cat_paris.png
# generate a video with audio (cat meows, robot speaks!)
agent-media video generate --in rob_cat_paris.png --prompt "the cat meows and the robot says: \"Yes, me too.\"" --audio --out rob_cat_video.mp4
# extract audio from video
agent-media audio extract --in rob_cat_video.mp4 --out rob_cat_audio.mp3
# transcribe the audio
agent-media audio transcribe --in rob_cat_audio.mp3
```
Requirements
- Node.js >= 18.0.0
- API key from [fal.ai](https://fal.ai/dashboard/keys), [Replicate](https://replicate.com/account/api-tokens), [Runpod](https://www.runpod.io/console/user/settings), or [AI Gateway](https://vercel.com/ai-gateway) for AI features
Local processing (no API key): resize, convert, extend, crop, audio extract, remove-background, transcribe
Cloud processing (API key required): image generate, image edit, video generate, remove-background, transcribe
> Note: You may see a mutex lock failed error when using local remove-background or transcribe β ignore it, the output is correct if JSON shows "ok": true.
---
image
```bash
agent-media image resize --in
agent-media image convert --in
agent-media image extend --in
agent-media image crop --in
agent-media image generate --prompt
agent-media image edit --in
agent-media image remove-background --in
```
resize
local
```bash
agent-media image resize --in sunset-mountains.jpg --width 800
agent-media image resize --in sunset-mountains.jpg --height 600
agent-media image resize --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --width 800
```
| Option | Description |
|--------|-------------|
| --in | Input file path or URL (required) |
| --width | Target width in pixels |
| --height | Target height in pixels |
| --out | Output path, filename or directory (default: ./) |
convert
local
```bash
agent-media image convert --in sunset-mountains.png --format webp
agent-media image convert --in sunset-mountains.jpg --format png
agent-media image convert --in https://ytrzap04kkm0giml.public.blob.vercel-stor
More from this repository8
Generates videos from text descriptions or input images, optionally adding audio narration or dialogue.
Provides AI-powered media processing capabilities for coding agents, enabling automatic generation, editing, conversion, and analysis of images, videos, and audio through a CLI tool with skills lik...
Converts image files between different formats (e.g., PNG to JPEG, WEBP to PNG) while preserving image quality and metadata.
image-extend skill from agntswrm/agent-media
Edits images by applying text-based modifications, allowing users to transform existing images through descriptive prompts that can change content, add elements, or modify scenes.
Resizes images to specified dimensions, allowing users to change image width and height without losing quality.
Crops images to specified dimensions, allowing precise control over image size and framing.
Generates images from text prompts using AI image generation services like Fal.ai, Replicate, or Runpod.