marswaveai/skills

View on GitHub

13 resources in this repository

GitHub

🎯13Skills

🎯Skills13

🎯listenhub🎯Skill

Provides AI skills for ListenHub to explain content from videos, podcasts, and other media formats.

listenhub

🎯tts🎯Skill

Text-to-speech via the `listenhub` CLI with two modes: Quick (single voice, low-latency, synchronous) for snippets and casual reading, and Script (multi-speaker, per-segment voice assignment) for dialogue and audiobooks. Enforces AskUserQuestion-based parameter collection and follows shared CLI auth, config, and speaker-selection patterns.

tts

🎯asr🎯Skill

Transcribe audio files to text fully offline via the `coli asr` CLI, using local speech-recognition models — `sensevoice` for Chinese/English/Japanese/Korean/Cantonese (with language, emotion, and audio-event detection) or `whisper-tiny.en` for English only. Optionally polishes transcripts (punctuation cleanup, filler removal) and can export to Markdown with front-matter metadata.

asr

🎯image-gen🎯Skill

Generates AI images via the `listenhub image create` CLI with `gemini-3-pro-image-preview` (pro) or `gemini-3.1-flash-image-preview` (flash) models, supporting 1K/2K/4K resolutions, multiple aspect ratios (16:9, 1:1, 9:16, plus 1:4/4:1/1:8/8:1 on flash), and up to 5 reference images via `--reference`. Saves outputs to `.listenhub/image-gen/YYYY-MM-DD-{jobId}/` with inline/download/both display modes.

image-gen

🎯podcast🎯Skill

Generate podcast episodes (1–2 AI speakers) from a topic, URL, or text via the `listenhub` CLI in three modes — Quick (brief overview), Deep (in-depth analysis), and Debate (two-speaker argument). Triggers on Chinese/English keywords (`podcast`, `播客`, `录一期节目`, `debate`, etc.), uses AskUserQuestion for every choice, defaults to 2 speakers, auto-detects language, and only calls generation APIs after the user confirms the summary.

podcast

🎯explainer🎯Skill

A skill that creates explainer videos combining narrator voiceover with AI-generated visuals. Supports text-only script generation or full text + video output for product introductions and tutorials.

explainer

🎯content-parser🎯Skill

A skill that extracts and parses content from URLs, returning structured data including body, metadata, and references. Useful as a preprocessing step for content generation via the ListenHub API.

content-parser

🎯creator🎯Skill

listenhub-driven creator workflow that turns a topic/URL/text/audio into a platform-ready content package (WeChat article, Xiaohongshu post, narration script) with article + images + metadata — enforces one-question-at-a-time AskUserQuestion, explicit confirmation gates, UI-language mirroring, and `listenhub auth`/`LISTENHUB_API_KEY` checks before remote calls.

creator

🎯music🎯Skill

Generates original AI music from text prompts or creates cover versions from reference audio, supporting custom styles, titles, and instrumental-only options through the ListenHub CLI.

music

🎯listenhub-cli🎯Skill

A router skill that identifies user intent and delegates to specialized ListenHub skills for podcasts, explainer videos, slides, text-to-speech, image generation, music creation, content extraction, and audio transcription.

listenhub-cli

🎯slides🎯Skill

Generates slide decks with AI-created visuals from topics, URLs, or text input, with optional audio narration support — ideal for presentations, summaries, and visual storytelling.

slides

🎯cola-avatar-pack🎯Skill

Generates pixel-art self-portraits, profile cards, emoji GIFs, and meme stickers for the Cola AI companion, with automatic mood-based expression selection and language-adaptive interactions in Chinese and English.

cola-avatar-pack

🎯video-gen🎯Skill

AI video generation skill supporting text-to-video, image animation (first-frame/last-frame), and reference-guided video creation using models like Seedance for producing short-form video content.

video-gen

Back to Home