name: browser-use description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, or extract information from web pages.

allowed-tools: Bash(browser-use:*)

Browser Automation with browser-use CLI

The browser-use command provides fast, persistent browser automation. It maintains browser sessions across commands, enabling complex multi-step workflows.

Quick Start

browser-use open https://example.com           # Navigate to URL
browser-use state                              # Get page elements with indices
browser-use click 5                            # Click element by index
browser-use type "Hello World"                 # Type text
browser-use screenshot                         # Take screenshot
browser-use close                              # Close browser

Core Workflow

Navigate: browser-use open <url> - Opens URL (starts browser if needed)
Inspect: browser-use state - Returns clickable elements with indices
Interact: Use indices from state to interact (browser-use click 5, browser-use input 3 "text")
Verify: browser-use state or browser-use screenshot to confirm actions
Repeat: Browser stays open between commands

Browser Modes

browser-use --browser chromium open <url>      # Default: headless Chromium
browser-use --browser chromium --headed open <url>  # Visible Chromium window
browser-use --browser real open <url>          # User's Chrome with login sessions
browser-use --browser remote open <url>        # Cloud browser (requires API key)

chromium: Fast, isolated, headless by default
real: Uses your Chrome with cookies, extensions, logged-in sessions
remote: Cloud-hosted browser with proxy support (requires BROWSER_USE_API_KEY)

Commands

Navigation

browser-use open <url>                    # Navigate to URL
browser-use back                          # Go back in history
browser-use scroll down                   # Scroll down
browser-use scroll up                     # Scroll up

Page State

browser-use state                         # Get URL, title, and clickable elements
browser-use screenshot                    # Take screenshot (outputs base64)
browser-use screenshot path.png           # Save screenshot to file
browser-use screenshot --full path.png    # Full page screenshot

Interactions (use indices from `browser-use state`)

browser-use click <index>                 # Click element
browser-use type "text"                   # Type text into focused element
browser-use input <index> "text"          # Click element, then type text
browser-use keys "Enter"                  # Send keyboard keys
browser-use keys "Control+a"              # Send key combination
browser-use select <index> "option"       # Select dropdown option

Tab Management

browser-use switch <tab>                  # Switch to tab by index
browser-use close-tab                     # Close current tab
browser-use close-tab <tab>               # Close specific tab

JavaScript & Data

browser-use eval "document.title"         # Execute JavaScript, return result
browser-use extract "all product prices"  # Extract data using LLM (requires API key)

Python Execution (Persistent Session)

browser-use python "x = 42"               # Set variable
browser-use python "print(x)"             # Access variable (outputs: 42)
browser-use python "print(browser.url)"   # Access browser object
browser-use python --vars                 # Show defined variables
browser-use python --reset                # Clear Python namespace
browser-use python --file script.py       # Execute Python file

The Python session maintains state across commands. The browser object provides:

browser.url - Current page URL
browser.title - Page title
browser.goto(url) - Navigate
browser.click(index) - Click element
browser.type(text) - Type text
browser.screenshot(path) - Take screenshot
browser.scroll() - Scroll page
browser.html - Get page HTML

Agent Tasks (Requires API Key)

browser-use run "Fill the contact form with test data"    # Run AI agent
browser-use run "Extract all product prices" --max-steps 50

Agent tasks use an LLM to autonomously complete complex browser tasks. Requires BROWSER_USE_API_KEY or configured LLM API key (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc).

Session Management

browser-use sessions                      # List active sessions
browser-use close                         # Close current session
browser-use close --all                   # Close all sessions

Server Control

browser-use server status                 # Check if server is running
browser-use server stop                   # Stop server
browser-use server logs                   # View server logs

Global Options

Option	Description
`--session NAME`	Use named session (default: "default")
`--browser MODE`	Browser mode: chromium, real, remote
`--headed`	Show browser window (chromium mode)
`--profile NAME`	Chrome profile (real mode only)
`--json`	Output as JSON
`--api-key KEY`	Override API key

Session behavior: All commands without --session use the same "default" session. The browser stays open and is reused across commands. Use --session NAME to run multiple browsers in parallel.

Examples

Form Submission

browser-use open https://example.com/contact
browser-use state
# Shows: [0] input "Name", [1] input "Email", [2] textarea "Message", [3] button "Submit"
browser-use input 0 "John Doe"
browser-use input 1 "john@example.com"
browser-use input 2 "Hello, this is a test message."
browser-use click 3
browser-use state  # Verify success

Multi-Session Workflows

browser-use --session work open https://work.example.com
browser-use --session personal open https://personal.example.com
browser-use --session work state    # Check work session
browser-use --session personal state  # Check personal session
browser-use close --all             # Close both sessions

Data Extraction with Python

browser-use open https://example.com/products
browser-use python "
products = []
for i in range(20):
    browser.scroll('down')
browser.screenshot('products.png')
"
browser-use python "print(f'Captured {len(products)} products')"

Using Real Browser (Logged-In Sessions)

browser-use --browser real open https://gmail.com
# Uses your actual Chrome with existing login sessions
browser-use state  # Already logged in!

Tips

Always run browser-use state first to see available elements and their indices
Use --headed for debugging to see what the browser is doing
Sessions persist - the browser stays open between commands
Use --json for parsing output programmatically
Python variables persist across browser-use python commands within a session
Real browser mode preserves your login sessions and extensions

Troubleshooting

Browser won't start?

browser-use server stop               # Stop any stuck server
browser-use --headed open <url>       # Try with visible window

Element not found?

browser-use state                     # Check current elements
browser-use scroll down               # Element might be below fold
browser-use state                     # Check again

Session issues?

browser-use sessions                  # Check active sessions
browser-use close --all               # Clean slate
browser-use open <url>                # Fresh start

Cleanup

Always close the browser when done. Run this after completing browser automation:

browser-use close

browser-use.md 7.8 KB Permalink Histórico Raw