16 results for tag "ai-multimodal"
Processes and generates multimedia content using Google Gemini API, including audio transcription, image analysis, video processing, and document extraction across multiple formats.
Processes and generates multimedia content using Google Gemini API, including audio analysis, image understanding, video processing, and document extraction with enhanced vision capabilities.
A skill for multimodal AI processing via Google Gemini API with 2M token context, supporting audio transcription, image captioning/OCR/object detection, video analysis, PDF extraction, and image generation.