Oct 2, 2025
Research
Introducing the Hybrid Browser Toolkit: Faster, Smarter Web Automation for MCP
All you need to know about the Hybrid browser toolkit
Oct 2, 2025
Research
All you need to know about the Hybrid browser toolkit
If you've been working with our original Camel BrowserToolkit, you might have noticed a pattern: it got the job done but had some limitations. It operated mainly by taking screenshots and injecting custom IDs into pages to find elements. It was a single-mode, monolithic Python setup that worked for basic tasks, but we knew we could do better. The screenshot-based approach meant you were essentially teaching an AI to click on pictures, which worked but felt a bit like using a smartphone with thick gloves on. Plus, the quality of those visual snapshots wasn't always great, and you couldn't easily access the underlying page structure when you needed it.
That's where our new Hybrid Browser Toolkit comes in. We've rebuilt everything from the ground up using a TypeScript-Python architecture that gives you the best of both worlds. Why TypeScript? It's not just about Playwright's native AI-friendly snapshot features – TypeScript is fundamentally better suited for efficient browser operations with its event-driven nature, native async/await support, and direct access to browser APIs without the overhead of language bridges. The TypeScript server handles all the heavy lifting of browser control while Python provides the familiar interface you love. But we didn't stop there. We've added support for CDP (Chrome DevTools Protocol) mode to connect to existing browser instances, and MCP (Model Context Protocol) integration for enhanced AI agent capabilities. You're not just limited to visual operations anymore – now you can seamlessly switch between visual and text-based interactions, access detailed DOM information, and enjoy snapshots that are crisp, accurate, and actually make sense. We've obsessed over the details too, from better element detection to smarter action handling, making the whole experience feel more natural and reliable. It's like upgrading from that smartphone-with-gloves setup to having direct, precise control over everything you need to do in a browser.
This blog organized into four main chapters:
This chapter provides a comprehensive comparison between the legacy BrowserToolkit and the new HybridBrowserToolkit, highlighting the architectural improvements, new features, and enhanced capabilities.
The name "Hybrid" hints at the key change: we combined Python and TypeScript to get the best of both worlds. Instead of one heavy Python process doing everything, we now have a layered architecture where Python and a Node.js (TypeScript) server work together.
In the hybrid_browser_toolkit architecture, when you issue a command, it goes through a WebSocket to a TypeScript server that is tightly integrated with Playwright's Node.js API. This server manages a pool of browser instances and routes commands asynchronously. Python remains the interface (so you still write Python code to use the toolkit), but the heavy lifting happens in Node.js. Why is this good news? Because direct Node.js calls to Playwright are faster, and Playwright's latest features (like new selectors or the _snapshotForAI function) are fully available to us.
This layered design also makes the system modular. We have:
In simpler terms, Python is the brain giving high-level instructions, and TypeScript is the brawn executing them efficiently. By splitting responsibilities this way, the toolkit can do more in parallel and handle complicated tasks without getting stuck.
The HybridBrowserToolkit introduces a modular, multi-layer architecture:
The HybridBrowserToolkit supports three distinct operating modes:
# Custom ID injection
page.evaluate("__elementId = '123'")
target = page.locator("[__elementId='123']")
// Native Playwright ARIA selectors
await page.locator('[aria-label="Submit"]').click()
await page.getByRole('button', { name: 'Submit' }).click()//
_snapshotForAI integration
const snapshot = await page._snapshotForAI();// Returns structured element data with ref mappings
Pipeline:
Key Stealth Enhancements:
Key Differences from Legacy:
New Features:
This chapter provides a comprehensive reference for all tools available in the HybridBrowserToolkit. Each tool is designed for specific browser automation tasks, from basic navigation to complex interactions
Opens a new browser session. This must be the first browser action before any other operations.
Parameters:
Returns:
Example:
# Basic browser opening
toolkit = HybridBrowserToolkit(headless=False)
result = await toolkit.browser_open()
print(f"Browser opened: {result['result']}")
print(f"Initial page snapshot: {result['snapshot']}")
print(f"Total tabs: {result['total_tabs']}")
# With default URL configuration
toolkit = HybridBrowserToolkit(
default_start_url="https://www.google.com"
)
result = await toolkit.browser_open()
# Browser opens directly to Google
Closes the browser session and releases all resources. Should be called at the end of automation tasks.
Parameters:
Returns:
Example:
# Always close the browser when done
try:
await toolkit.browser_open()
# ... perform automation tasks ...
finally:
result = await toolkit.browser_close()
print(result) # "Browser session closed."
Opens a URL in a new browser tab and switches to it. Creates a new tab each time it’s called.
Parameters:
Returns:
Example:
# Visit a single page
result = await toolkit.browser_visit_page("https://example.com")
print(f"Navigated to: {result['result']}")
print(f"Page elements: {result['snapshot']}")
# Visit multiple pages (creates multiple tabs)
sites = ["https://github.com", "https://google.com", "https://stackoverflow.com"]
for site in sites:
result = await toolkit.browser_visit_page(site)
print(f"Tab {result['current_tab']}: {site}")
print(f"Total tabs open: {result['total_tabs']}")
Navigates back to the previous page in browser history for the current tab.
Parameters:
Returns:
Example:
# Navigate through history
await toolkit.browser_visit_page("https://example.com")
await toolkit.browser_visit_page("https://example.com/about")
# Go back
result = await toolkit.browser_back()
print(f"Navigated back to: {result['result']}")
Navigates forward to the next page in browser history for the current tab.
Parameters:
Returns:
Example:
# Navigate forward after going back
await toolkit.browser_visit_page("https://example.com")
await toolkit.browser_back() # Back to homepage
# Go forward again
result = await toolkit.browser_forward()
print(f"Navigated forward to: {result['result']}")
Note: This is a passive tool that must be explicitly called to retrieve page information. It does not trigger any page actions.
Gets a textual snapshot of all interactive elements on the current page. Each element is assigned a unique ref ID for interaction.
Parameters:
Returns:
Example:
# Get full page snapshot
snapshot = await toolkit.browser_get_page_snapshot()
print(snapshot)
# Output:
# - link "Home" [ref=1]
# - button "Sign In" [ref=2]
# - textbox "Search" [ref=3]
# - link "Products" [ref=4]
# With viewport limiting
toolkit_limited = HybridBrowserToolkit(viewport_limit=True)
visible_snapshot = await toolkit_limited.browser_get_page_snapshot()
# Only returns elements currently visible in viewport
Captures a screenshot with interactive elements highlighted and marked with ref IDs (Set of Marks). This tool uses an advanced injection-based approach with browser-side optimizations for accurate element detection.
Technical Features:
1. Injection-based Implementation: The SoM (Set of Marks) functionality is injected directly into the browser context, ensuring accurate element detection and positioning
2. Efficient Occlusion Detection: Browser-side algorithms detect when elements are hidden behind other elements, preventing false positives
3. Parent-Child Element Fusion: Intelligently merges parent and child elements when they represent the same interactive component (e.g., a button containing an icon and text)
4. Smart Label Positioning: Automatically finds optimal positions for ref ID labels to avoid overlapping with page content
Parameters:
Returns:
Example:
# Basic screenshot capture
result = await toolkit.browser_get_som_screenshot(read_image=False)
print(result)
# "Screenshot captured with 42 interactive elements marked (saved to: ./screenshots/page_123456_som.png)"
# With AI analysis
result = await toolkit.browser_get_som_screenshot(
read_image=True,
instruction="Find all form input fields"
)
# "Screenshot captured... Agent analysis: Found 5 form fields: username [ref=3], password [ref=4], email [ref=5], phone [ref=6], submit button [ref=7]"
# For visual verification
result = await toolkit.browser_get_som_screenshot(
read_image=True,
instruction="Verify the login button is visible and properly styled"
)
# Complex UI with overlapping elements
result = await toolkit.browser_get_som_screenshot(read_image=False)
# The tool automatically handles:
# - Dropdown menus that overlay other content
# - Modal dialogs
# - Nested interactive elements
# - Elements with transparency
# Parent-child fusion example
# A button containing an icon and text will be marked as one element, not three
# <button [ref=5]>
# <i class="icon"></i>
# <span>Submit</span>
# </button>
# Will appear as single "button Submit [ref=5]" instead of separate elements
Note: This is a passive information retrieval tool that provides current tab state without modifying anything.
Gets information about all open browser tabs including titles, URLs, and which tab is active.
Parameters:
Returns:
Example:
# Check all open tabs
tab_info = await toolkit.browser_get_tab_info()
print(f"Total tabs: {tab_info['total_tabs']}")
print(f"Active tab index: {tab_info['current_tab']}")
for i, tab in enumerate(tab_info['tabs']):
status = "ACTIVE" if tab['is_current'] else ""
print(f"Tab {i}: {tab['title']} - {tab['url']} {status}")
# Find a specific tab
github_tab = next(
(tab for tab in tab_info['tabs'] if 'github.com' in tab['url']),
None
)
if github_tab:
await toolkit.browser_switch_tab(tab_id=github_tab['id'])
Performs a click action on an element identified by its ref ID.
Parameters:
Returns:
Example:
# Simple click
result = await toolkit.browser_click(ref="2")
print(f"Clicked: {result['result']}")
# Click that opens new tab
result = await toolkit.browser_click(ref="external-link")
if 'newTabId' in result:
print(f"New tab opened with ID: {result['newTabId']}")
# Switch to the new tab
await toolkit.browser_switch_tab(tab_id=result['newTabId'])
# Click with error handling
try:
result = await toolkit.browser_click(ref="submit-button")
except Exception as e:
print(f"Click failed: {e}")
Types text into input elements. Supports both single and multiple inputs with intelligent dropdown detection and automatic child element discovery.
Special Features:
Parameters (Single Input):
Parameters (Multiple Inputs):
Returns:
Example:
# Single input
result = await toolkit.browser_type(ref="3", text="john.doe@example.com")
# Handle dropdown/autocomplete with intelligent detection
result = await toolkit.browser_type(ref="search", text="laptop")
if 'diffSnapshot' in result:
print("Dropdown options appeared:")
print(result['diffSnapshot'])
# Example output:
# - option "Laptop Computers" [ref=45]
# - option "Laptop Bags" [ref=46]
# - option "Laptop Accessories" [ref=47]
# Click on one of the options
await toolkit.browser_click(ref="45")
else:
# No dropdown appeared, continue with regular snapshot
print("Page snapshot:", result['snapshot'])
# Autocomplete example with diff detection
result = await toolkit.browser_type(ref="city-input", text="San")
if 'diffSnapshot' in result:
# Only shows newly appeared suggestions
print("City suggestions:")
print(result['diffSnapshot'])
# - option "San Francisco" [ref=23]
# - option "San Diego" [ref=24]
# - option "San Antonio" [ref=25]
# Multiple inputs at once
inputs = [
{'ref': '3', 'text': 'username123'},
{'ref': '4', 'text': 'SecurePass123!'},
{'ref': '5', 'text': 'john.doe@example.com'}
]
result = await toolkit.browser_type(inputs=inputs)
print(result['details']) # Success/failure for each input
# Clear and type
await toolkit.browser_click(ref="3") # Focus
await toolkit.browser_press_key(keys=["Control+a"]) # Select all
await toolkit.browser_type(ref="3", text="new_value") # Replaces content
# Working with combobox elements
async def handle_searchable_dropdown():
# Type to search/filter options
result = await toolkit.browser_type(ref="country-select", text="United")
if 'diffSnapshot' in result:
# Shows only countries containing "United"
print("Filtered countries:", result['diffSnapshot'])
# - option "United States" [ref=87]
# - option "United Kingdom" [ref=88]
# - option "United Arab Emirates" [ref=89]
# Select one of the filtered options
await toolkit.browser_click(ref="87")
# Automatic child element discovery
# When the ref points to a container, browser_type finds the input child
result = await toolkit.browser_type(ref="search-container", text="product name")
# Even though ref="search-container" might be a <div>, the tool will find
# and type into the actual <input> element inside it
# Complex UI component example
# The visible element might be a styled wrapper
result = await toolkit.browser_type(ref="styled-date-picker", text="2024-03-15")
# Tool automatically finds the actual input field within the date picker component
Selects an option in a dropdown (<select>) element.
Parameters:
Returns:
Example:
# Select by value attribute
result = await toolkit.browser_select(ref="country-select", value="US")
# Common pattern: type to filter, then select
await toolkit.browser_type(ref="5", text="Uni") # Type to filter
# Snapshot shows filtered options
result = await toolkit.browser_select(ref="5", value="united-states")
Simulates pressing the Enter key on the currently focused element. Useful for form submission.
Parameters:
Returns:
Example:
# Submit search form
await toolkit.browser_type(ref="search-box", text="Python tutorials")
result = await toolkit.browser_enter()
# Page navigates to search results
# Submit login form
await toolkit.browser_type(ref="username", text="user123")
await toolkit.browser_type(ref="password", text="pass123")
await toolkit.browser_enter() # Submits the form
Scrolls the current page window in the specified direction.
Parameters:
Returns:
Example:
# Basic scrolling
await toolkit.browser_scroll(direction="down", amount=500)
await toolkit.browser_scroll(direction="up", amount=300)
# Scroll to load more content
async def scroll_to_bottom():
"""Scroll until no new content loads"""
previous_snapshot = ""
while True:
result = await toolkit.browser_scroll(direction="down", amount=1000)
if result['snapshot'] == previous_snapshot:
break # No new content loaded
previous_snapshot = result['snapshot']
await asyncio.sleep(1) # Wait for content to load
# Paginated scrolling
for i in range(5):
await toolkit.browser_scroll(direction="down", amount=800)
snapshot = await toolkit.browser_get_page_snapshot()
print(f"Page {i+1} content loaded")
Controls mouse actions at specific coordinates.
Parameters:
Returns:
Example:
# Click at specific coordinates
await toolkit.browser_mouse_control(control="click", x=350.5, y=200)
# Right-click for context menu
await toolkit.browser_mouse_control(control="right_click", x=400, y=300)
# Double-click to select text
await toolkit.browser_mouse_control(control="dblclick", x=250, y=150)
# Click on canvas or image maps
canvas_result = await toolkit.browser_mouse_control(
control="click",
x=523.5,
y=412.3
)
Performs drag and drop operations between elements.
Parameters:
Returns:
Example:
# Drag item to trash
await toolkit.browser_mouse_drag(from_ref="item-5", to_ref="trash-bin")
# Reorder list items
await toolkit.browser_mouse_drag(from_ref="task-3", to_ref="task-1")
# Move file to folder
result = await toolkit.browser_mouse_drag(
from_ref="file-report.pdf",
to_ref="folder-documents"
)
print(f"Drag result: {result['result']}")
Presses keyboard keys or key combinations.
Parameters:
Returns:
Example:
# Single key press
await toolkit.browser_press_key(keys=["Tab"])
await toolkit.browser_press_key(keys=["Escape"])
# Key combinations
await toolkit.browser_press_key(keys=["Control+a"]) # Select all
await toolkit.browser_press_key(keys=["Control+c"]) # Copy
await toolkit.browser_press_key(keys=["Control+v"]) # Paste
# Navigation shortcuts
await toolkit.browser_press_key(keys=["Control+t"]) # New tab
await toolkit.browser_press_key(keys=["Control+w"]) # Close tab
await toolkit.browser_press_key(keys=["Alt+Left"]) # Back
# Function keys
await toolkit.browser_press_key(keys=["F5"]) # Refresh
await toolkit.browser_press_key(keys=["F11"]) # Fullscreen
Switches to a different browser tab using its ID.
Parameters:
Returns:
Example:
# Get current tabs and switch
tab_info = await toolkit.browser_get_tab_info()
tabs = tab_info['tabs']
# Switch to the second tab
if len(tabs) > 1:
await toolkit.browser_switch_tab(tab_id=tabs[1]['id'])
# Switch back to first tab
await toolkit.browser_switch_tab(tab_id=tabs[0]['id'])
# Pattern: Open link in new tab and switch
result = await toolkit.browser_click(ref="external-link")
if 'newTabId' in result:
await toolkit.browser_switch_tab(tab_id=result['newTabId'])
Closes a specific browser tab.
Parameters:
Returns:
Example:
# Close current tab
tab_info = await toolkit.browser_get_tab_info()
current_tab_id = None
for tab in tab_info['tabs']:
if tab['is_current']:
current_tab_id = tab['id']
break
if current_tab_id:
await toolkit.browser_close_tab(tab_id=current_tab_id)
# Close all tabs except the first
tab_info = await toolkit.browser_get_tab_info()
for i, tab in enumerate(tab_info['tabs']):
if i > 0: # Keep first tab
await toolkit.browser_close_tab(tab_id=tab['id'])
Views console logs from the current page.
Parameters:
Returns:
Example:
# Check for JavaScript errors
console_info = await toolkit.browser_console_view()
errors = [
msg for msg in console_info['console_messages']
if msg['type'] == 'error'
]
if errors:
print("Page has JavaScript errors:")
for error in errors:
print(f"- {error['text']}")
# Monitor console during interaction
await toolkit.browser_click(ref="dynamic-button")
console_info = await toolkit.browser_console_view()
print(f"Console messages after click: {len(console_info['console_messages'])}")
Executes JavaScript code in the browser console.
Parameters:
Returns:
Example:
# Get page information
result = await toolkit.browser_console_exec(
"document.title + ' - ' + window.location.href"
)
# Modify page elements
await toolkit.browser_console_exec("""
document.querySelector('#message').innerText = 'Updated by automation';
document.querySelector('#message').style.color = 'red';
""")
# Extract data
result = await toolkit.browser_console_exec("""
Array.from(document.querySelectorAll('.product')).map(p => ({
name: p.querySelector('.name').textContent,
price: p.querySelector('.price').textContent
}))
""")
# Trigger custom events
await toolkit.browser_console_exec("""
const event = new CustomEvent('customAction', { detail: { action: 'refresh' } });
document.dispatchEvent(event);
""")
# Check element states
is_visible = await toolkit.browser_console_exec("""
const elem = document.querySelector('#submit-button');
elem && !elem.disabled && elem.offsetParent !== null
""")
Pauses execution and waits for human intervention. Useful for manual steps like CAPTCHA solving.
Parameters:
Returns:
Example:
# Wait for CAPTCHA solving
print("Please solve the CAPTCHA in the browser window...")
result = await toolkit.browser_wait_user(timeout_sec=120)
if "Timeout" in result['result']:
print("User didn't complete CAPTCHA in time")
else:
print("User completed the action, continuing...")
await toolkit.browser_click(ref="submit")
# Indefinite wait for complex manual steps
print("Please complete the payment process manually.")
print("Press Enter when done...")
await toolkit.browser_wait_user() # No timeout
# Wait with instructions
async def handle_manual_verification():
await toolkit.browser_get_som_screenshot() # Show current state
print("\nManual steps required:")
print("1. Complete the identity verification")
print("2. Upload required documents")
print("3. Press Enter when finished")
result = await toolkit.browser_wait_user(timeout_sec=300)
return "User resumed" in result['result']
async def complete_web_automation():
"""Example combining multiple tools for a complete workflow"""
toolkit = HybridBrowserToolkit(
headless=False,
viewport_limit=True
)
try:
# Start browser
await toolkit.browser_open()
# Navigate to site
await toolkit.browser_visit_page("https://example-shop.com")
# Check page loaded
snapshot = await toolkit.browser_get_page_snapshot()
# Search for product
await toolkit.browser_type(ref="search", text="laptop")
await toolkit.browser_enter()
# Scroll through results
await toolkit.browser_scroll(direction="down", amount=800)
# Take screenshot for verification
await toolkit.browser_get_som_screenshot(
read_image=True,
instruction="Find laptops under $1000"
)
# Click on product
await toolkit.browser_click(ref="product-1")
# Check multiple tabs
tab_info = await toolkit.browser_get_tab_info()
print(f"Tabs open: {tab_info['total_tabs']}")
# Add to cart and checkout
await toolkit.browser_click(ref="add-to-cart")
await toolkit.browser_click(ref="checkout")
# Fill checkout form
inputs = [
{'ref': 'name', 'text': 'John Doe'},
{'ref': 'email', 'text': 'john@example.com'},
{'ref': 'address', 'text': '123 Main St'}
]
await toolkit.browser_type(inputs=inputs)
# Select shipping
await toolkit.browser_select(ref="shipping", value="standard")
# Execute custom validation
await toolkit.browser_console_exec(
"document.querySelector('form').checkValidity()"
)
# Submit order
await toolkit.browser_click(ref="place-order")
finally:
# Always close browser
await toolkit.browser_close()
This comprehensive reference covers all available tools in the HybridBrowserToolkit, providing you with the complete set of browser automation capabilities.
The HybridBrowserToolkit offers two primary operating modes for web automation, each optimized for different use cases and interaction patterns. This chapter provides a comprehensive guide to understanding and using both modes effectively.
The HybridBrowserToolkit combines non-visual DOM-based automation with visual screenshot-based capabilities, providing flexibility for various web automation scenarios. Understanding when and how to use each mode is crucial for building efficient automation solutions.
Text mode provides a DOM-based, non-visual approach to browser automation. This is the default operating mode where the toolkit returns textual snapshots of page elements.
from camel.toolkits import HybridBrowserToolkit
# Initialize toolkit in default text mode
toolkit = HybridBrowserToolkit(
headless=True, # Can run without display
viewport_limit=False, # Include all elements, not just visible ones
default_timeout=30000, # 30 seconds default timeout
navigation_timeout=60000 # 60 seconds for page loads
)
# Open browser and get initial snapshot
result = await toolkit.browser_open()
print(result['snapshot']) # Shows all interactive elements with ref IDs
print(f"Total tabs: {result['total_tabs']}")
# Navigate to a page - automatically returns new snapshot
result = await toolkit.browser_visit_page("https://example.com")
print(result['snapshot'])
# Output example:
# - link "Home" [ref=1]
# - button "Login" [ref=2]
# - textbox "Username" [ref=3]
# - textbox "Password" [ref=4]
# - link "Register" [ref=5]
# Interact with elements
result = await toolkit.browser_click(ref="2")
print(f"Action result: {result['result']}")
print(f"Updated snapshot: {result['snapshot']}")
# Type into input fields
result = await toolkit.browser_type(ref="3", text="user@example.com")
result = await toolkit.browser_type(ref="4", text="password123")
# Submit form
result = await toolkit.browser_enter() # Simulates pressing Enter
# Get snapshot without performing actions
snapshot = await toolkit.browser_get_page_snapshot()
print(snapshot)
# Viewport-limited snapshot (only visible elements)
toolkit_limited = HybridBrowserToolkit(viewport_limit=True)
visible_snapshot = await toolkit_limited.browser_get_page_snapshot()
# Initialize with full_visual_mode to disable automatic snapshots
toolkit = HybridBrowserToolkit(
full_visual_mode=True # Actions won't return snapshots
)
# Actions now return minimal information
result = await toolkit.browser_click(ref="1")
print(result) # {'result': 'Clicked on link "Home"', 'tabs': [...]}
# Must explicitly request snapshots when needed
snapshot = await toolkit.browser_get_page_snapshot()
# Useful for performance-critical operations
async def bulk_operations():
# Perform multiple actions without snapshot overhead
await toolkit.browser_click(ref="menu")
await toolkit.browser_click(ref="submenu-1")
await toolkit.browser_click(ref="option-3")
# Get snapshot only at the end
final_snapshot = await toolkit.browser_get_page_snapshot()
return final_snapshot
The intelligent diff detection feature for dropdowns is one of the most powerful features in text mode. When typing into a combobox or search field, the toolkit automatically detects if new options appear and returns only the newly appeared options via diffSnapshot instead of the full page snapshot. This optimization reduces noise and makes it easier to interact with dynamic dropdowns.
Visual mode enables screenshot-based interaction with visual element recognition. This mode is essential when you need to “see” the page as a human would.
# Initialize toolkit for visual operations
toolkit = HybridBrowserToolkit(
headless=False, # Often used with display for debugging
cache_dir="./screenshots", # Directory for saving screenshots
screenshot_timeout=10000 # 10 seconds timeout for screenshots
)
# Basic screenshot capture
result = await toolkit.browser_get_som_screenshot()
print(result)
# Output: "Screenshot captured with 23 interactive elements marked
# (saved to: ./screenshots/example_com_home_123456_som.png)"
# Screenshot with custom analysis
result = await toolkit.browser_get_som_screenshot(
read_image=True,
instruction="Identify all buttons related to shopping cart"
)
# print(result)
# ref e4 is shopping cart
Both modes are designed to work seamlessly together, allowing you to build sophisticated automation solutions that can handle any web interaction scenario.
This chapter describes the different connection modes available in HybridBrowserToolkit, including standard Playwright connection and Chrome DevTools Protocol (CDP) connection.
HybridBrowserToolkit supports two primary connection modes:
Each mode serves different purposes and offers unique advantages for various automation scenarios.
The standard mode creates a new browser instance managed entirely by the toolkit. This is the default and most common usage pattern.
from camel.toolkits import HybridBrowserToolkit
# Basic initialization - creates new browser instance
toolkit = HybridBrowserToolkit()
# Open browser
await toolkit.browser_open()
# Perform actions
await toolkit.browser_visit_page("https://example.com")
await toolkit.browser_click(ref="1")
# Close browser when done
await toolkit.browser_close()
# Headless mode (default) - no visible browser window
toolkit_headless = HybridBrowserToolkit(
headless=True
)
# Headed mode - visible browser window
toolkit_headed = HybridBrowserToolkit(
headless=False
)
# Headed mode with specific window size
toolkit_sized = HybridBrowserToolkit(
headless=False,
# Additional viewport configuration can be set via browser launch args
)
# Persist browser data (cookies, localStorage, etc.)
toolkit_persistent = HybridBrowserToolkit(
user_data_dir="/path/to/user/data"
)
# Example: Login once, reuse session
async def persistent_session_example():
# First run - perform login
toolkit = HybridBrowserToolkit(
user_data_dir="./browser_sessions/user1",
headless=False
)
await toolkit.browser_open()
await toolkit.browser_visit_page("https://example.com/login")
# ... perform login ...
await toolkit.browser_close()
# Subsequent runs - already logged in
toolkit_reuse = HybridBrowserToolkit(
user_data_dir="./browser_sessions/user1"
)
await toolkit_reuse.browser_open()
await toolkit_reuse.browser_visit_page("https://example.com/dashboard")
# Already authenticated!
The comprehensive stealth mode helps avoid bot detection by applying multiple anti-detection techniques. This includes browser fingerprint masking, WebDriver property hiding, and normalized navigator properties.
# Comprehensive timeout configuration
toolkit_custom_timeouts = HybridBrowserToolkit(
default_timeout=30000, # 30 seconds default
navigation_timeout=60000, # 60 seconds for page loads
network_idle_timeout=5000, # 5 seconds for network idle
screenshot_timeout=10000, # 10 seconds for screenshots
page_stability_timeout=2000, # 2 seconds for page stability
dom_content_loaded_timeout=30000 # 30 seconds for DOM ready
)
# Development setup with logging
toolkit_dev = HybridBrowserToolkit(
headless=False,
browser_log_to_file=True,
log_dir="./test_logs",
session_id="test_run_001"
)
# All browser actions will be logged
await toolkit_dev.browser_open()
await toolkit_dev.browser_visit_page("http://localhost:3000")
# Logs include timing, inputs, outputs, and errors
# Create multiple independent browser sessions
async def multi_session_example():
# Create base toolkit
base_toolkit = HybridBrowserToolkit(
headless=True,
cache_dir="./base_cache"
)
# Clone for parallel sessions
session1 = base_toolkit.clone_for_new_session("user_1")
session2 = base_toolkit.clone_for_new_session("user_2")
# Run parallel automations
await asyncio.gather(
automate_user_flow(session1),
automate_user_flow(session2)
)
# Configuration for long-running tasks
toolkit_long = HybridBrowserToolkit(
headless=True,
user_data_dir="./long_running_session",
default_timeout=60000, # Longer timeouts
navigation_timeout=120000
)
async def monitor_website():
await toolkit_long.browser_open()
while True:
try:
await toolkit_long.browser_visit_page("https://status.example.com")
snapshot = await toolkit_long.browser_get_page_snapshot()
# Check status
if "All Systems Operational" not in snapshot:
send_alert()
await asyncio.sleep(300) # Check every 5 minutes
except Exception as e:
logger.error(f"Monitoring error: {e}")
# Reconnect on error
await toolkit_long.browser_close()
await toolkit_long.browser_open()
Chrome DevTools Protocol (CDP) connection allows the toolkit to connect to an already running browser instance. This is particularly useful for debugging, connecting to remote browsers, or integrating with existing browser sessions.
CDP (Chrome DevTools Protocol) is a protocol that allows tools to instrument, inspect, debug, and profile Chrome/Chromium browsers. The toolkit can connect to any browser that exposes a CDP endpoint.
# Launch Chrome with remote debugging port
# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-debug
# Windows
"C:\Program Files\Google\Chrome\Application\chrome.exe" \
--remote-debugging-port=9222 \
--user-data-dir=C:\temp\chrome-debug
# Linux
google-chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-debug
import requests
# Get browser WebSocket endpoint
response = requests.get('http://localhost:9222/json/version')
ws_endpoint = response.json()['webSocketDebuggerUrl']
print(f"WebSocket endpoint: {ws_endpoint}")
# Example: ws://localhost:9222/devtools/browser/abc123...
# Connect to existing browser
toolkit_cdp = HybridBrowserToolkit(
cdp_url="ws://localhost:9222/devtools/browser/abc123..."
)
# The browser is already running, so we don't call browser_open()
# We can immediately start interacting with it
# Get current tab info
tab_info = await toolkit_cdp.browser_get_tab_info()
print(f"Connected to {tab_info['total_tabs']} tabs")
# Work with existing tabs
await toolkit_cdp.browser_switch_tab(tab_info['tabs'][0]['id'])
snapshot = await toolkit_cdp.browser_get_page_snapshot()
CDP Configuration
In this mode, Hybrid_browser_toolkit won’t open new web page in browser but directly using current page.
# Connect without creating new tabs
toolkit_keep_page = HybridBrowserToolkit(
cdp_url=ws_endpoint,
cdp_keep_current_page=True # Don't create new tabs
)
# Work with the existing page
current_snapshot = await toolkit_keep_page.browser_get_page_snapshot()
await toolkit_keep_page.browser_click(ref="1")
# CDP connection with full configuration
toolkit_cdp_custom = HybridBrowserToolkit(
cdp_url=ws_endpoint,
cdp_keep_current_page=False, # Create new tab on connection
headless=False, # Ignored in CDP mode
default_timeout=30000, # Still applies to operations
viewport_limit=True, # Limit to viewport elements
full_visual_mode=False # Return snapshots normally
)
The HybridBrowserToolkit can be accessed through MCP (Model Context Protocol), allowing AI assistants like Claude to control browsers directly.
git clone https://github.com/camel-ai/browser_agent.git
cd browser_agent
pip install -e .
Add to your Claude configuration file:
{
"mcpServers": {
"hybrid-browser": {
"command": "python",
"args": ["-m", "hybrid_browser_mcp.server"]
}
}
}
Configuration Success Example:
After adding the configuration, completely restart Claude Desktop. The browser tools will appear when you click the 🔌 icon in the chat interface.
Browser Tools in Action:
Once connected, you'll have access to:
# Claude can now control browsers with simple commands:
await browser_open()
await browser_visit_page("https://example.com")
await browser_type(ref="search", text="AI automation")
await browser_click(ref="submit-button")
await browser_get_som_screenshot()
await browser_close()
Modify browser behavior in browser_agent/config.py:
BROWSER_CONFIG = {
"headless": False, # Show browser window
"stealth": True, # Avoid bot detection
"enabled_tools": [...] # Specify which tools to enable
}
If the server doesn't appear in Claude:
For more details, visit the hybrid-browser-mcp repository.
This journey reflects a fundamental rethinking of how browser automation should work. Instead of treating the browser as a black box that we interact with through screenshots, the new HybridBrowserToolkit speaks the browser's native language through TypeScript while maintaining Python's friendly interface for developers.
The toolkit now operates in multiple modes to suit different needs. When speed matters, text mode provides a lightweight DOM-based view of the page. When you need to verify layouts or find elements visually, visual mode captures annotated screenshots. The hybrid mode intelligently switches between these approaches based on the task at hand. This flexibility extends to how you connect to browsers too – you can spin up fresh instances, attach to existing browsers through Chrome DevTools Protocol, or integrate with AI systems via Model Context Protocol.
What really makes the new toolkit shine are the thoughtful details scattered throughout. It knows when you're typing in a search box that might trigger a dropdown and shows you just the new suggestions that appear. It can fill out entire forms in one go, automatically finding the right input fields even if you click on their labels. The screenshot tool has gotten smarter too, handling overlapping elements gracefully and finding clear spots to place element markers. All these improvements come together to create a tool that feels natural to use, whether you're building test suites, scraping data, or creating AI agents that can navigate the web.