Gobbler: Extending AI Agent Reach | Work

The Problem

I can access a lot of information—YouTube videos, PDFs, web pages behind logins, audio recordings, my NotebookLM research. But my AI agents can’t. They’re limited to what fits in a context window or what I manually copy-paste.

The gap between what I can access and what my agents can access is a bottleneck. Every time I need Claude to reason about a video transcript, a research paper, or a web page, I’m the slow middleware doing format conversion by hand.

Gobbler closes that gap. It makes all the information I have access to accessible to my AI agents.

The Approach

Gobbler gives AI agents the same content access I have. When I can watch a YouTube video, my agent can get its transcript. When I can read a PDF, my agent can too. When I’m logged into a web app, my agent can extract content from that authenticated session.

gobbler youtube "https://youtube.com/watch?v=..." -o transcript.md
gobbler document report.pdf -o report.md
gobbler audio meeting.mp3 -o meeting.md
gobbler webpage "https://docs.example.com" -o docs.md
gobbler browser extract  # From my authenticated browser session

Everything converts to markdown with YAML frontmatter—the format AI agents work best with:

---
source: https://youtube.com/watch?v=VIDEO_ID
type: youtube_transcript
title: "Video Title"
duration: 847
word_count: 2341
---

# Video Title

Content here, ready for AI reasoning...

One output format. Consistent metadata. Preserved provenance.

How Agents Access Gobbler

Gobbler exposes capabilities through three patterns, depending on how the agent prefers to work:

MCP Protocol — For Claude Code and Claude Desktop. The agent calls Gobbler tools directly:

claude mcp add gobbler-mcp -- uv --directory /path/to/gobbler run gobbler-mcp

Skills — Markdown instruction files that teach Claude how to use the CLI. Progressive disclosure keeps context windows efficient—Claude only loads full instructions when triggered:

skills/
├── gobbler-youtube/     # YouTube transcription
├── gobbler-audio/       # Audio/video transcription  
├── gobbler-document/    # Document conversion
├── gobbler-webpage/     # Web scraping
├── gobbler-browser/     # Browser automation + AI chat integrations
└── gobbler-setup/       # Installation and troubleshooting

CLI — Direct command-line usage. Agents with shell access can call Gobbler directly:

gobbler youtube URL              # YouTube transcripts
gobbler audio FILE               # Audio/video transcription
gobbler document FILE            # PDF, DOCX, PPTX, XLSX
gobbler webpage URL              # Web pages (JS-rendered)

The Browser Bridge

The hardest content to access is behind authentication—my NotebookLM research, internal documentation, web apps I’m logged into. Gobbler’s browser extension bridges this gap.

The extension creates a WebSocket connection between my browser and Gobbler. My agents can extract content from pages I’m logged into, query my NotebookLM notebooks, even interact with other AI interfaces:

gobbler browser extract          # Extract current page
gobbler notebooklm query "..."   # Query my NotebookLM research
gobbler chatgpt query "..."      # Send to ChatGPT
gobbler claude query "..."       # Send to Claude.ai

Security model: Only tabs I explicitly add to a “Gobbler” tab group are accessible. No accidental access to banking, email, or anything I haven’t opted in. The agent can only reach what I’ve explicitly shared.

Pluggable Backends

Different situations call for different tradeoffs. Gobbler’s provider system lets me swap backends without changing how I (or my agents) use it:

Category	Provider	Tradeoff
Transcription	`whisper-local`	Free, private, runs locally
Transcription	`openai-whisper`	Faster, costs money
Document	`docling`	Local Docker service
Webpage	`crawl4ai`	JavaScript rendering via Docker

# Privacy-first: local transcription
gobbler audio meeting.mp3

# Speed-first: API transcription  
gobbler audio meeting.mp3 --provider openai-whisper

Why This Matters

The value of AI agents scales with what they can access. An agent that can only see what I paste into a chat window is limited. An agent that can reach into my YouTube research, my PDFs, my authenticated browser sessions—that’s an agent that can actually help with my real work.

Gobbler is infrastructure for agent capability. The more content types it can convert, the more useful my agents become.

Open Source

MIT licensed. Built on Crawl4AI, Docling, faster-whisper, and youtube-transcript-api.

GitHub →