SYS.BLOG

I Built an Open Source MCP Server to Unlock YouTube for AI Agents

An open-source MCP server that lets Claude and other AI agents analyze YouTube videos using Google's Gemini API: summaries, Q&A, transcripts, and frame extraction, no downloading required.

February 18, 2026|Aditya Bawankule

MCPTypeScriptGeminiOpen SourceAI Agents

I just built and open sourced a MCP server that lets Claude and other AI agents analyze YouTube videos using Google's Gemini API. YouTube holds an enormous amount of knowledge (tutorials, talks, interviews, walkthroughs), most of it inaccessible to AI agents that can only read text. This server bridges that gap.

What It Does

Pass any YouTube URL and you can:

Summarize videos: brief, medium, or detailed with timestamps
Ask specific questions about the video content
Extract screenshots and frames at specific moments
Get a full transcript for any video
Search within a video for specific topics or moments

It works with both Claude Code and Claude Desktop, anywhere that supports MCP.

The Interesting Technical Detail

Gemini can analyze YouTube URLs directly. No downloading. No transcription APIs. No chunking video into audio files and piping them through Whisper. You just pass the URL and Gemini handles the rest.

Most other AI models can't do this. They're text-only and have no native way to consume video content. This MCP server essentially proxies that capability, making it available to Claude and any other MCP-compatible agent.

The result is that Claude Code can now answer questions like "what does this conference talk say about distributed systems?" or "summarize this tutorial and give me the key steps", without you having to copy-paste a transcript first.

Tech Stack

TypeScript: the whole server is typed end to end
Anthropic's Model Context Protocol (MCP): the standard for giving AI agents access to external tools
Google Gemini API: handles the multimodal video understanding

Get It

The code is on GitHub. PRs and issues welcome. If you find it useful or have ideas for what else it should support, I'd love to hear from you. I've also built AGINEAR, an MCP server for markdown-native project management, same pattern of giving AI agents structured tool access instead of raw file parsing.