Intelligently fetch web page content using a browser engine.
Built for AI agents to consume web content efficiently.
Modern AI agents need web content in clean, token-efficient formats. snag solves this by:
- Markdown output - AI models work better with markdown than HTML (70% fewer tokens)
- Real browser rendering - Handles JavaScript, dynamic content, lazy loading automatically
- Authentication support - Access private/authenticated pages through persistent browser sessions
- Tab management - List, select, and reuse existing browser tabs without creating new ones
- Content archival - Build reference libraries of web content for future AI agent use
- Simple CLI interface - One command, clean output, no complex automation scripts
Perfect for:
- AI coding assistants fetching documentation
- Building knowledge bases from authenticated sites
- Capturing dynamic web content for analysis
- Piping web content into AI processing pipelines
- Taking page screenshots for CSS/Style analysis
# Install via Homebrew
brew tap grantcarthew/tap
brew install grantcarthew/tap/snag
# Fetch a page as Markdown (default format)
snag example.com
# Save to file
snag docs.example.com > docs.mdThat's it! snag auto-detects your Chromium-based browser and handles everything else.
snag requires a Chromium-based (Chrome) browser:
Linux:
# Ubuntu/Debian
sudo apt update && sudo apt install chromium-browser
# Fedora
sudo dnf check-update && sudo dnf install chromium
# Arch Linux
sudo pacman -Sy chromium
# Homebrew
brew install chromiummacOS:
# Chromium (recommended) via Homebrew
# or Chrome - download from https://www.google.com/chrome/
brew install chromiumSupported browsers: Chrome, Chromium, Microsoft Edge, Brave, other Chromium-based browsers
Homebrew (Linux/macOS):
Note: There's a name conflict with an older deprecated tool. Use the full tap name:
brew tap grantcarthew/tap
brew install grantcarthew/tap/snagGo Install:
go install github.com/grantcarthew/snag@latestBuild from Source:
git clone https://github.com/grantcarthew/snag.git
cd snag
go build
./snag --version# Fetch page as Markdown (default)
snag example.com
snag https://example.com
# Save to file
snag -o output.md https://example.com
snag example.com > output.md
# Get raw HTML instead
snag --format html https://example.com
# Get plain text only (strips all HTML)
snag --format text https://example.com
# Quiet mode (content only, no logs)
snag --quiet https://example.com
# Get page metadata as JSON
snag --info https://example.com
# Wait for dynamic content to load
snag --wait-for ".content-loaded" https://dynamic-site.com
# Increase timeout for slow sites
snag --timeout 60 https://slow-site.com
# Verbose logging for debugging
snag --verbose https://example.comsnag supports 5 output formats for different use cases. Format names are case-insensitive and support aliases for convenience.
Markdown (default):
Clean, readable text format optimized for AI agents and documentation. Uses 70% fewer tokens than HTML.
# Default format (no flag needed)
snag https://example.com
# Explicit format
snag --format md https://example.com
# Alias also works (backward compatibility)
snag --format markdown https://example.com
# Case-insensitive
snag --format MD https://example.com
snag --format Markdown https://example.comHTML:
Raw HTML output, preserving original page structure.
# Get raw HTML
snag --format html https://example.com
# Case-insensitive
snag --format HTML https://example.comText:
Plain text only, strips all HTML tags and formatting.
# Extract plain text
snag --format text https://example.com
# Alias also works
snag --format txt https://example.com
# Case-insensitive
snag --format TEXT https://example.comBinary formats automatically generate filenames to prevent terminal corruption. Files are saved to the current directory unless you specify a location.
PDF:
Visual rendering as a PDF document using Chrome's native rendering engine.
# Auto-generates filename in current directory
snag --format pdf https://example.com
# Creates: 2025-10-22-142033-example-domain.pdf
# Specify custom filename
snag --format pdf -o report.pdf https://example.com
# Save to specific directory with auto-generated name
snag --format pdf -d ~/Downloads https://example.com
# Creates: ~/Downloads/2025-10-22-142033-example-domain.pdf
# Case-insensitive
snag --format PDF https://example.comPNG:
Full-page screenshot as a PNG image.
# Auto-generates filename in current directory
snag --format png https://example.com
# Creates: 2025-10-22-142033-example-domain.png
# Specify custom filename
snag --format png -o screenshot.png https://example.com
# Save to specific directory with auto-generated name
snag --format png -d ~/screenshots https://example.com
# Creates: ~/screenshots/2025-10-22-142033-example-domain.png
# Case-insensitive
snag --format PNG https://example.comWhy auto-generate filenames?
Binary formats (PDF, PNG) cannot output to stdout because binary data corrupts terminal display. When you don't specify -o or -d, snag automatically generates a timestamped filename in the current directory.
Auto-generated filename format:
yyyy-mm-dd-hhmmss-{page-title-slug}.{ext}
Example: 2025-10-22-142033-github-snag-repo.png
Get page metadata as JSON for automation scripts. Useful for extracting page titles, generating directory names, or building indexes.
# Get page metadata as JSON
snag --info https://example.com
snag -i https://example.com
# Output:
# {
# "title": "Example Domain",
# "url": "https://example.com/",
# "domain": "example.com",
# "slug": "example-domain",
# "timestamp": "2025-02-04T14:30:22+10:00"
# }
# Save info to file
snag --info -o page-info.json https://example.com
# Get info from existing tab
snag --info --tab 1
snag -i -t "github"
# Use with jq for scripting
title=$(snag -i example.com | jq -r '.title')
slug=$(snag -i example.com | jq -r '.slug')JSON fields:
| Field | Description |
|---|---|
| title | Page title from <title> tag |
| url | Final URL after redirects |
| domain | Domain name (without www. prefix) |
| slug | URL-safe slug from title (for filenames) |
| timestamp | ISO 8601 timestamp of fetch |
Notes:
--infois mutually exclusive with--format(always outputs JSON)- Output is quiet by default (no log messages, only JSON)
- Use
--verboseto see log messages alongside JSON output - Only supports single URL or
--tab(not multiple URLs or--all-tabs)
# Fetch API documentation for AI context
snag https://api.example.com/docs > api-reference.md
# Pipe directly to AI assistant
snag --quiet https://docs.python.org/3/library/os.html | your-ai-tool# Save multiple pages to a reference directory
snag -o reference/golang-basics.md https://go.dev/doc/tutorial/getting-started
snag -o reference/golang-concurrency.md https://go.dev/doc/effective_go#concurrency
snag -o reference/golang-errors.md https://go.dev/blog/error-handling-and-go# Wait for JavaScript to render content
snag --wait-for "#main-content" https://single-page-app.com
# Give slow sites more time
snag --timeout 90 --wait-for ".loaded" https://heavy-site.com# Step 1: Open browser and log in to your sites
snag --open-browser
# (Manually log in to your private sites in the browser window)
# Step 2: List tabs to see what's available
snag --list-tabs
# Example output:
# Available tabs in browser (4 tabs, sorted by URL):
# [1] about:blank (New Tab)
# [2] https://app.example.com/dashboard (Dashboard)
# [3] https://github.com/myorg/private-repo (My Private Repo)
# [4] https://internal.company.com/docs (Internal Documentation)
# Step 3: Fetch from authenticated tabs without re-logging in
snag -t 2 -o private-repo.md
snag -t "dashboard" -o dashboard.md
snag -t "internal" -o internal-docs.md
# All fetches reuse the existing authenticated session!# Collect documentation from tabs you already have open
snag -t "python" > python-docs.md
snag -t "golang" > golang-docs.md
snag -t "rust" > rust-docs.md
# Use patterns to match specific tabs
snag -t "github.com/.*" > github-content.md
snag -t ".*/dashboard" > dashboard.md
# Fetch by index if you know the tab position
for i in 1 2 3 4; do
snag -t $i -o "tab-$i.md"
done
# Process all open tabs at once
snag --all-tabs --output-dir ~/my-tabs
snag -a -d ~/reference
# Combine --all-tabs with format options
snag --all-tabs --format pdf -d ~/pdfs
snag --all-tabs --format png -d ~/screenshots# Process multiple URLs inline
snag -d output/ https://example.com https://github.com https://go.dev
# Process URLs from a file
snag --url-file urls.txt -d output/
# Pipe URLs from stdin
cat urls.txt | snag --url-file - -d output/
# Pipe filtered URLs
grep "^https://docs" urls.txt | snag --url-file - -d ./documentation/
# Using heredoc
snag --url-file - -d pages/ <<EOF
# My URLs
example.com
github.com/grantcarthew/snag
go.dev
EOF
# Process URLs from a file (shell loop alternative)
while read url; do
filename=$(echo "$url" | sed 's/[^a-zA-Z0-9]/_/g').md
snag --quiet -o "$filename" "$url"
done < urls.txt
# Combine multiple pages
for url in https://example.com/page1 https://example.com/page2; do
snag --quiet "$url" >> combined.md
echo -e "\n---\n" >> combined.md
done# Fetch documentation in CI pipeline
snag --force-headless --timeout 30 https://docs.example.com > docs.md
# Quiet mode for clean logs
snag --quiet --force-headless https://example.com > output.mdsnag makes it easy to fetch content from authenticated/private sites using persistent browser sessions.
Open a browser, authenticate manually, then snag connects to it:
# Step 1: Open browser in visible mode and log in manually
# Note: Using the --open-browser (-b) switch enables the required DevTools protocol
snag --open-browser
# Step 2: In the browser window, navigate to your site and log in
# (Leave the browser open)
# Step 3: Fetch authenticated content - snag reuses your session
snag https://private.example.com
# Step 4: Fetch more pages with the same session
snag https://private.example.com/dashboard
snag https://private.example.com/settingsLet snag launch the browser for you:
# Open browser and navigate to page for authentication
snag --open-browser https://private.example.com
# Authenticate in the browser window that opens
# Then leave it running
# Subsequent calls reuse the session
snag https://private.example.com/other-pageKeep one browser session for multiple snag calls:
# Terminal 1: Start Chromium with remote debugging
chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-profile
# Log in to your sites manually in this browser
# Terminal 2: Use snag with the existing session
snag https://authenticated-site1.com
snag https://authenticated-site2.com
snag https://authenticated-site3.comAll three commands share authentication state - no repeated logins required!
You can use your existing Chrome profile with all its saved logins and cookies:
Option A: Daily workflow - Use snag as your Chrome launcher
If you use snag regularly, you can make it your primary way to launch Chrome:
# Close your regular Chrome first, then launch via snag:
snag --open-browser --user-data-dir ~/.config/google-chrome # Linux
snag --open-browser --user-data-dir ~/.config/chromium # Linux Chromium
snag --open-browser --user-data-dir ~/Library/Application\ Support/Google/Chrome # macOS
# Now browse normally AND use snag for tab fetching:
snag --list-tabs
snag -t 1 # Fetch from any tab
snag https://example.com # Open new tabsThis gives you your full Chrome experience (bookmarks, extensions, history, passwords) PLUS snag's tab management capabilities!
Option B: One-off fetches with your profile
# Must close Chrome first!
snag --user-data-dir ~/.config/google-chrome \
https://private.example.comImportant caveats:
-
Chrome must be closed - You cannot run both Chrome and snag with the same profile simultaneously. Chrome locks profile directories to prevent corruption.
-
Risk of corruption - If something goes wrong, you could corrupt your primary profile data. Consider using a separate profile for automation.
-
Profile structure - Chrome's
--user-data-dirpoints to the parent directory containing multiple profiles (Default, Profile 1, etc.). Chrome will use the Default profile unless you specify otherwise.
Option C: Safer alternative - Use a dedicated profile for snag
# Create and use a dedicated profile for snag
snag --user-data-dir ~/.config/google-chrome/snag-profile \
--open-browser
# Authenticate once in the browser window
# Profile persists between runs - no need to re-authenticate!
# Subsequent fetches reuse the same profile
snag --user-data-dir ~/.config/google-chrome/snag-profile \
https://private.example.comThe dedicated profile approach gives you persistence without risking your main Chrome profile.
Bypass headless detection or mimic specific browsers:
# Linux Firefox user agent
snag --user-agent "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0" \
https://example.com
# Custom bot identifier
snag --user-agent "MyBot/1.0 (+https://example.com/bot)" \
https://api-docs.example.com# See what's happening during fetch
snag --verbose https://problematic-site.com
# Full debug output including browser messages
snag --debug https://problematic-site.com 2> debug.log
# Open browser to see what snag sees
snag --open-browser https://problematic-site.comsnag can list and fetch content from existing browser tabs, making it easy to reuse authenticated sessions and reduce tab clutter.
List all open tabs:
# See what tabs are currently open
snag --list-tabs
snag -l
# Example output:
# Available tabs in browser (3 tabs, sorted by URL):
# [1] https://app.example.com/dashboard (Dashboard (authenticated))
# [2] https://docs.python.org/3/ (3.13.1 Documentation)
# [3] https://github.com/grantcarthew/snag (grantcarthew/snag: Intelligent web content fetcher)Fetch from specific tab by index:
# Fetch from first tab
snag --tab 1
snag -t 1
# Fetch from third tab and save to file
snag -t 3 -o docs.md
# Get HTML instead of Markdown
snag -t 2 --format html
# Get as PDF or PNG
snag -t 3 --format pdf -o docs.pdf
snag -t 1 --format png -o screenshot.pngFetch from tab by URL pattern:
# Exact URL match (case-insensitive)
snag -t "https://docs.python.org/3/"
snag -t "GITHUB.COM/grantcarthew/snag"
# Contains/substring match (processes ALL matching tabs if multiple)
snag -t "dashboard" # Outputs to stdout if 1 match, auto-saves all if multiple
snag -t "python" # Fetches all tabs containing "python"
snag -t "github" -d ./ # Saves all github tabs to current directory
# Regex pattern match (processes ALL matching tabs if multiple)
snag -t "https://.*\.com" # All .com URLs
snag -t ".*/dashboard" # All dashboard URLs
snag -t "(github|gitlab)\.com" # All github or gitlab tabsPattern matching behavior:
- Tries in order: exact URL match → contains match → regex match
- Single match: Outputs to stdout (or to file with
-o) - Multiple matches: Auto-saves all tabs with generated filenames (use
-dfor custom directory)
Why use tabs?
- Reuse authenticated sessions without re-logging in
- Fetch from multiple pages without creating new tabs
- Quick access to content you already have open
Tab closing behavior:
# Close tab after fetching (default in headless mode)
snag --close-tab https://example.com
# Keep tab open (default in visible mode)
snag https://example.com# Use different port if 9222 is busy
snag --port 9223 https://example.com
# Connect to Chromium running on custom port
chromium --remote-debugging-port=9223 &
snag --port 9223 https://example.com<url> URL to fetch (required, unless using --list-tabs or --tab)
-v, --version Display version information
-h, --help Show help message and exit
-l, --list-tabs List all open tabs in the browser
-t, --tab <PATTERN> Fetch from existing tab by index (1, 2, 3...) or URL pattern
Patterns can be:
- Index number: 1, 2, 3 (tab position)
- Exact URL: https://example.com (case-insensitive)
- Substring: dashboard, github, docs (contains match)
- Regex: https://.*\.com, .*/dashboard, (github|gitlab)\.com
-a, --all-tabs Process all open browser tabs (saves with auto-generated filenames)
Requires --output-dir or saves to current directory
Note: Tabs are sorted alphabetically by URL (primary), then Title (secondary), then ID (tertiary) for predictable ordering. Chrome DevTools Protocol doesn't guarantee visual left-to-right tab order, so snag sorts tabs to ensure consistent, reproducible results. Tab [1] = first tab alphabetically by URL, not the first visual tab in your browser.
-o, --output <file> Save output to file instead of stdout
-d, --output-dir <dir> Save files with auto-generated names to directory
-f, --format <FORMAT> Output format: md (default) | html | text | pdf | png
Format aliases: markdown→md, txt→text
Case-insensitive: MD, MARKDOWN, Html, PDF, etc.
-i, --info Output page metadata as JSON (title, URL, domain, slug, timestamp)
Mutually exclusive with --format (always outputs JSON)
Output is quiet by default (no log messages)
--timeout <seconds> Page load timeout in seconds (default: 30)
-w, --wait-for <selector> Wait for CSS selector before extracting content
-p, --port <port> Chromium remote debugging port (default: 9222)
-c, --close-tab Close the browser tab after fetching content
--force-headless Force headless mode even if Chromium is running
-b, --open-browser Open Chromium browser in visible state (no URL required)
-k, --kill-browser Kill browser processes with remote debugging enabled
--verbose Enable verbose logging output
-q, --quiet Suppress all output except errors and content
--debug Enable debug output with CDP messages
--user-agent <string> Custom user agent string (bypass headless detection)
"Browser not found" error
snag cannot locate Chrome/Chromium on your system.
Solutions:
- Install Chromium:
brew install chromium - Install Chrome from https://www.google.com/chrome/
- Ensure Chromium/Chrome is in your system PATH
"Failed to connect to existing browser"
Cannot connect to running browser instance.
Solutions:
- Ensure Chromium/Chrome is launched with
--remote-debugging-port=9222 - Try different port:
snag --port 9223 https://example.com - Kill existing Chromium/Chrome processes and let snag launch a new instance
"Stuck or lingering browser processes"
Browser processes with remote debugging enabled remain after snag exits.
Solutions:
- Kill all debugging browsers:
snag --kill-browserorsnag -k - Kill specific port only:
snag --kill-browser --port 9223 - Note: Only kills browsers with
--remote-debugging-portenabled (development browsers), never regular browsing sessions - Safe for scripting: exits with code 0 even if no browsers found (idempotent)
Get comprehensive diagnostic information about your snag environment:
# Run diagnostics
snag --doctor
# Check specific port
snag --doctor --port 9223This displays:
- snag and Go versions (with update check)
- Detected browser and version
- Browser connection status and tab counts
- Profile locations for all common browsers
- Environment variables
- Working directory
Use this when:
- Troubleshooting issues
- Reporting bugs (include doctor output)
- Checking if browser is running
- Finding profile paths
- Verifying snag installation
"Authentication required" error
Page requires login but snag cannot authenticate.
Solutions:
- Open browser with
snag --open-browser, log in, then run snag again - Use
--list-tabsto find authenticated tabs, then--tabto fetch from them - Browser session persists authentication across snag calls
"No Chrome instance running" when using --list-tabs or --tab
Tab features require an existing browser with remote debugging enabled.
Solutions:
- Open browser first:
snag --open-browser - Or manually start Chrome/Chromium:
chromium --remote-debugging-port=9222 - Then run
snag --list-tabsto verify connection
"Tab index out of range" or "No tab matches pattern"
Cannot find the specified tab.
Solutions:
- Run
snag --list-tabsto see available tabs and their indexes - Tab indexes are 1-based (first tab is 1, not 0)
- Tabs are sorted by URL, not visual browser order - tab [1] is first alphabetically by URL
- For patterns, try simpler matches:
snag -t "example"instead of complex regex - Remember: pattern matching is case-insensitive
Pattern not matching expected tab
Your pattern matches a different tab than expected.
Solutions:
- Use
--list-tabsto see exact URLs of open tabs - Be more specific with your pattern: use full URL instead of substring
- Remember: multiple matching tabs will all be processed and auto-saved (not just first match)
- For single specific tab: use exact URL pattern or tab index:
snag -t 3
"Page load timeout" error
Page takes too long to load.
Solutions:
- Increase timeout:
snag --timeout 60 https://example.com - Use
--wait-forfor specific element:snag --wait-for ".content" https://example.com - Check network connectivity
- Try
--verboseto see what's happening
Page loads but content is missing
Dynamic content hasn't appeared yet.
Solutions:
- Use
--wait-forwith selector:snag --wait-for "#main-content" https://example.com - Increase timeout to allow for slow loading
- Inspect page with
--format htmlto see raw output
Output is empty or incomplete
Fetched page but content is missing.
Solutions:
- Try
--format htmlto see raw HTML - Try
--format textto see plain text extraction - Use
--verboseto check if page loaded correctly - Page may require authentication (see authentication section)
- Content may be loaded dynamically (use
--wait-for)
Markdown formatting looks wrong
Converted Markdown has formatting issues.
Solutions:
- Use
--format htmlto get raw HTML instead - Use
--format textfor plain text only (no formatting) - Some complex HTML structures may not convert perfectly to Markdown
- Report specific issues at https://github.com/grantcarthew/snag/issues
Linux: "No DISPLAY environment variable"
Running in headless environment without display.
Solutions:
- Headless mode should work automatically
- Ensure Xvfb is installed:
sudo apt install xvfb - Use
--force-headlessexplicitly
macOS: "Chromium.app cannot be opened"
macOS security blocking Chromium/Chrome launch.
Solutions:
- Open Chromium manually first:
open -a Chromiumoropen -a "Google Chrome" - Check System Preferences > Security & Privacy
- Allow the browser in privacy settings
macOS: Browser processes remain after closing window
On macOS, closing a Chrome/Chromium window doesn't quit the application - processes continue running in the background.
This is normal macOS behavior. To fully quit:
- Press Cmd+Q in the browser window
- Right-click Chrome icon in Dock → Quit
- Or:
pkill -f "Chrome.*remote-debugging-port"
Still having issues?
- Run with
--debugflag for detailed logs - Check existing issues: https://github.com/grantcarthew/snag/issues
- Create new issue with:
- snag version:
snag --version - Operating system and version
- Full command you ran
- Complete error message
- Output from
--debugflag
- snag version:
- Session Detection: Auto-detects existing Chromium-based browser instance with remote debugging enabled
- Mode Selection:
- If Chromium browser is running → Connect to existing session (preserves auth/cookies)
- If no browser found → Launch headless mode
- Use
--open-browserto open visible browser for authentication
- Tab Management:
- List tabs with
--list-tabsto see what's currently open - Fetch from specific tabs using
--tab(by index or URL pattern) - Tabs stay open in visible mode, close in headless mode (or use
--close-tab) - Reuse authenticated sessions without creating new tabs
- List tabs with
- stdout: Content only (HTML/Markdown) - enables piping to other tools
- stderr: All logs, warnings, errors, progress indicators
This design makes snag perfect for shell pipelines and AI agent integration.
- Language: Go 1.25.3
- Browser Control: Chrome DevTools Protocol via go-rod/rod
- HTML Conversion: html-to-markdown/v2
- CLI Framework: cobra
Contributions welcome! Please:
- Check existing issues: https://github.com/grantcarthew/snag/issues
- Create issue for bugs or feature requests
- Submit pull requests against
mainbranch
Include:
- snag version:
snag --version - Operating system and version
- Full command and error message
- Output from
--debugflag
snag is licensed under the Mozilla Public License 2.0.
This project uses the following open-source libraries:
- go-rod/rod - MIT License
- cobra - Apache 2.0 License
- html-to-markdown - MIT License
See the LICENSES directory for full license texts.
Grant Carthew grant@carthew.net