I Built an Open Source MCP Server to Unlock YouTube for AI Agents

SYS.BLOG

I Built an Open Source MCP Server to Unlock YouTube for AI Agents

An open-source MCP server that lets Claude and other AI agents analyze YouTube videos using Google's Gemini API: summaries, Q&A, transcripts, and frame extraction, no downloading required.

|Aditya Bawankule
MCPTypeScriptGeminiOpen SourceAI Agents

I just built and open sourced a MCP server that lets Claude and other AI agents analyze YouTube videos using Google's Gemini API. YouTube holds an enormous amount of knowledge (tutorials, talks, interviews, walkthroughs), most of it inaccessible to AI agents that can only read text. This server bridges that gap.


What It Does

Pass any YouTube URL and you can:

  • Summarize videos: brief, medium, or detailed with timestamps
  • Ask specific questions about the video content
  • Extract screenshots and frames at specific moments
  • Get a full transcript for any video
  • Search within a video for specific topics or moments

It works with both Claude Code and Claude Desktop, anywhere that supports MCP.


The Interesting Technical Detail

Gemini can analyze YouTube URLs directly. No downloading. No transcription APIs. No chunking video into audio files and piping them through Whisper. You just pass the URL and Gemini handles the rest.

Most other AI models can't do this. They're text-only and have no native way to consume video content. This MCP server essentially proxies that capability, making it available to Claude and any other MCP-compatible agent.

The result is that Claude Code can now answer questions like "what does this conference talk say about distributed systems?" or "summarize this tutorial and give me the key steps", without you having to copy-paste a transcript first.


Tech Stack

  • TypeScript: the whole server is typed end to end
  • Anthropic's Model Context Protocol (MCP): the standard for giving AI agents access to external tools
  • Google Gemini API: handles the multimodal video understanding

Get It

The code is on GitHub. PRs and issues welcome. If you find it useful or have ideas for what else it should support, I'd love to hear from you. I've also built AGINEAR, an MCP server for markdown-native project management, same pattern of giving AI agents structured tool access instead of raw file parsing.

FREQUENTLY ASKED QUESTIONS

Is there an AI that can analyze YouTube videos?

Yes, the YouTube MCP server lets Claude analyze any YouTube video by extracting transcripts, frames, and metadata. You can ask questions about video content, search within videos, and get AI-powered summaries without watching the entire video yourself.

Can ChatGPT or Claude analyze a YouTube video?

Claude can analyze YouTube videos when connected to a YouTube MCP server. It extracts transcripts and visual frames, then uses its multimodal capabilities to answer questions about both spoken content and what appears on screen.

What is the best AI summarizer for YouTube?

An MCP server approach is more flexible than dedicated YouTube summarizer tools because it lets you ask follow-up questions and search within videos. Claude with the YouTube MCP server can summarize, extract key points, and answer specific questions about any video.

RELATED CONTENT