baseClassTools.py 使用指南

📋 概述

baseClassTools.py 是基于 browser-use 原生类实现的浏览器自动化工具集，与之前的 browserUseTools.py（基于 Playwright）相比，具有以下核心优势：

✅ 核心优势

特性	browserUseTools.py (旧)	baseClassTools.py (新)
底层协议	Playwright → CDP	直接使用 CDP
浏览器会话	每次调用创建新实例	全局持久化会话
启动时间	每次 1-3 秒	只启动一次
登录状态	❌ 每次丢失	✅ 自动保持
Cookie/Storage	❌ 不保存	✅ 自动保存
DOM 状态缓存	❌ 无	✅ 有
元素高亮	❌ 无	✅ 自动高亮
多标签页管理	❌ 困难	✅ 原生支持
性能	🐢 慢（频繁启动）	🚀 快（持久化）
内存占用	💾 高（频繁创建）	💾 低（单实例）
代码重复	⚠️ 多（每个工具都初始化）	✅ 无（全局管理）

🚀 快速开始

1. 基础使用

import asyncio
from tools.baseClassTools import (
    init_browser_session,
    navigate_to_url,
    get_page_html,
    cleanup_browser_session
)

async def main():
    # 1. 初始化浏览器（只需一次）
    await init_browser_session(headless=False)

    # 2. 使用工具
    await navigate_to_url("https://www.baidu.com")
    result = await get_page_html()
    print(result.output)

    # 3. 清理
    await cleanup_browser_session()

asyncio.run(main())

2. 在 Agent 中集成

from tools.baseClassTools import init_browser_session, navigate_to_url

class MyAgent:
    async def start(self):
        # 初始化浏览器会话
        await init_browser_session(
            headless=False,
            profile_name="my_agent_profile"  # 持久化配置
        )

    async def do_task(self):
        # 直接使用工具，浏览器状态会保持
        await navigate_to_url("https://example.com")
        # ... 其他操作

🔧 核心功能

1. 会话管理

`init_browser_session()` - 初始化浏览器会话

await init_browser_session(
    headless=False,              # 是否无头模式
    user_data_dir=None,          # 用户数据目录（自动持久化）
    profile_name="default",      # 配置文件名称
    browser_profile=None,        # BrowserProfile 对象（预设 cookies）
    **kwargs                     # 其他 BrowserSession 参数
)

特性：

✅ 自动持久化：登录状态、Cookie、LocalStorage 等自动保存
✅ 单例模式：全局只有一个浏览器实例
✅ 配置隔离：不同的 profile_name 使用不同的配置文件

示例：

# 方式1：使用默认配置
await init_browser_session()

# 方式2：指定配置文件名（推荐）
await init_browser_session(profile_name="xiaohongshu")

# 方式3：指定完整路径
await init_browser_session(user_data_dir="/path/to/profile")

# 方式4：预设 cookies
from browser_use import BrowserProfile

profile = BrowserProfile(
    cookies=[{
        'name': 'session_id',
        'value': 'abc123',
        'domain': 'example.com',
        'path': '/'
    }]
)
await init_browser_session(browser_profile=profile)

`cleanup_browser_session()` - 优雅停止

await cleanup_browser_session()

保存浏览器状态（登录、Cookie等）
停止会话但不关闭浏览器进程
下次运行可以继续使用保存的状态

`kill_browser_session()` - 强制终止

await kill_browser_session()

完全关闭浏览器进程
适用于需要彻底清理的场景

2. 导航类工具

`navigate_to_url(url, new_tab=False)` - 页面导航

# 在当前标签页打开
await navigate_to_url("https://www.baidu.com")

# 在新标签页打开
await navigate_to_url("https://www.google.com", new_tab=True)

`search_web(query, engine='google')` - 网页搜索

await search_web("Python async programming", engine="google")
await search_web("Machine Learning", engine="duckduckgo")

支持的搜索引擎：google, duckduckgo, bing

`go_back()` - 返回上一页

await go_back()

`wait(seconds)` - 等待

await wait(seconds=5)  # 等待5秒

3. 元素交互工具

`click_element(index)` - 点击元素

# 通过索引点击（需要先获取 selector_map）
await click_element(index=5)

`input_text(index, text, clear=True)` - 输入文本

# 清空后输入
await input_text(index=3, text="Hello World", clear=True)

# 追加输入
await input_text(index=3, text=" More text", clear=False)

`send_keys(keys)` - 发送按键

await send_keys("Enter")          # 回车
await send_keys("Control+A")      # 全选
await send_keys("Escape")         # ESC

`upload_file(index, path)` - 上传文件

await upload_file(index=7, path="/path/to/file.pdf")

4. 滚动和视图工具

`scroll_page(down=True, pages=1.0)` - 滚动页面

# 向下滚动 2 页
await scroll_page(down=True, pages=2.0)

# 向上滚动 1 页
await scroll_page(down=False, pages=1.0)

# 滚动到底部
await scroll_page(down=True, pages=10.0)

`find_text(text)` - 查找文本

await find_text("Contact Us")

`screenshot()` - 截图

await screenshot()

5. 内容提取工具

`get_page_html()` - 获取页面 HTML

result = await get_page_html()

# 访问数据
html = result.metadata['html']
url = result.metadata['url']
title = result.metadata['title']

`get_selector_map()` - 获取元素映射

result = await get_selector_map()

# 查看可交互元素
print(result.output)

# 访问元素映射（用于后续点击、输入等操作）
selector_map = result.metadata['selector_map']

`extract_content(query)` - AI 提取内容

# 注意：需要配置 LLM
result = await extract_content(
    query="提取页面上所有产品的名称和价格",
    extract_links=True
)

6. JavaScript 执行

`evaluate(code)` - 执行 JavaScript

# 获取页面信息
result = await evaluate("document.title")

# 执行复杂操作
js_code = """
(function(){
    try {
        return {
            title: document.title,
            links: document.querySelectorAll('a').length
        };
    } catch(e) {
        return 'Error: ' + e.message;
    }
})()
"""
result = await evaluate(code=js_code)

7. 等待用户操作

`wait_for_user_action(message, timeout=300)` - 等待用户操作

# 检测需要登录
html_result = await get_page_html()
html = html_result.metadata['html']

if "登录" in html:
    # 等待用户手动登录
    await wait_for_user_action(
        message="请在浏览器中登录",
        timeout=180  # 3分钟
    )

控制台输出：

============================================================
⏸️  WAITING FOR USER ACTION
============================================================
📝 请在浏览器中登录
⏱️  Timeout: 180 seconds

👉 Please complete the action in the browser window
👉 Press ENTER when done, or wait for timeout
============================================================

💡 实际应用场景

场景 1: 需要登录的网站（推荐方案）

import asyncio
from tools.baseClassTools import (
    init_browser_session,
    navigate_to_url,
    wait,
    get_page_html,
    wait_for_user_action,
    cleanup_browser_session
)

async def xiaohongshu_task():
    # 1. 使用专门的配置文件（保存登录状态）
    await init_browser_session(
        headless=False,
        profile_name="xiaohongshu_profile"  # 关键！
    )

    # 2. 导航到小红书
    await navigate_to_url("https://www.xiaohongshu.com")
    await wait(seconds=2)

    # 3. 检查是否需要登录
    html_result = await get_page_html()
    html = html_result.metadata['html']

    if "登录" in html or "login" in html.lower():
        print("⚠️ 检测到需要登录")
        # 等待用户登录
        await wait_for_user_action("请登录小红书", timeout=180)
    else:
        print("✅ 已经登录（使用了保存的状态）")

    # 4. 执行后续任务...
    # ...

    # 5. 清理（保存状态）
    await cleanup_browser_session()

# 第一次运行：需要手动登录
# 第二次运行：自动使用保存的登录状态 ✅
asyncio.run(xiaohongshu_task())

场景 2: 多标签页操作

from tools.baseClassTools import (
    init_browser_session,
    navigate_to_url,
    switch_tab,
    close_tab,
    get_page_html
)

async def multi_tab_task():
    await init_browser_session()

    # 打开多个标签页
    await navigate_to_url("https://www.baidu.com", new_tab=False)
    await navigate_to_url("https://www.google.com", new_tab=True)
    await navigate_to_url("https://www.bing.com", new_tab=True)

    # 切换到第二个标签页
    await switch_tab(tab_id="xxxx")  # 需要从 browser 获取实际的 tab_id

    # 获取当前标签页内容
    result = await get_page_html()
    print(result.metadata['url'])

    # 关闭标签页
    await close_tab(tab_id="xxxx")

场景 3: 在自己的 Agent 中集成

from tools.baseClassTools import (
    init_browser_session,
    navigate_to_url,
    get_selector_map,
    click_element,
    input_text,
    send_keys,
    wait,
    cleanup_browser_session
)

class MySearchAgent:
    """搜索 Agent 示例"""

    def __init__(self):
        self.initialized = False

    async def start(self):
        """启动 Agent"""
        if not self.initialized:
            await init_browser_session(
                headless=False,
                profile_name="search_agent"
            )
            self.initialized = True

    async def search(self, keyword: str):
        """执行搜索任务"""
        await self.start()

        # 1. 导航到搜索引擎
        await navigate_to_url("https://www.baidu.com")
        await wait(seconds=2)

        # 2. 获取元素映射
        selector_result = await get_selector_map()
        print(selector_result.output)

        # 3. 输入搜索关键词（假设输入框是索引 1）
        await input_text(index=1, text=keyword, clear=True)

        # 4. 提交搜索
        await send_keys(keys="Enter")
        await wait(seconds=3)

        # 5. 返回结果
        html_result = await get_page_html()
        return html_result.metadata['html']

    async def stop(self):
        """停止 Agent"""
        if self.initialized:
            await cleanup_browser_session()
            self.initialized = False


# 使用 Agent
async def main():
    agent = MySearchAgent()
    try:
        results = await agent.search("Python async")
        print(f"搜索结果长度: {len(results)}")
    finally:
        await agent.stop()

asyncio.run(main())

🆚 与 browserUseTools.py 的对比

browserUseTools.py（旧实现）

# ❌ 问题示例
@tool()
async def navigate_to_url(url: str, new_tab: bool = False, uid: str = "") -> ToolResult:
    try:
        from playwright.async_api import async_playwright
        # 每次都创建新的浏览器实例！
        async with async_playwright() as p:
            browser = await p.chromium.launch(headless=False)  # 1-3秒启动时间
            context = await browser.new_context()
            page = await context.new_page()
            await page.goto(url)
            title = await page.title()
            # 函数结束后浏览器关闭，所有状态丢失！
            return ToolResult(...)

问题：

❌ 每次调用都启动新浏览器（1-3秒）
❌ 登录状态丢失
❌ Cookie 不保存
❌ 无法维护页面导航历史
❌ 多个工具调用之间无法共享状态

baseClassTools.py（新实现）

# ✅ 改进示例
# 全局浏览器会话
_browser_session: Optional[BrowserSession] = None
_browser_tools: Optional[Tools] = None

async def init_browser_session(...):
    """只启动一次"""
    global _browser_session, _browser_tools
    if _browser_session is None:
        _browser_session = BrowserSession(...)
        await _browser_session.start()  # 只启动一次！
        _browser_tools = Tools()

@tool()
async def navigate_to_url(url: str, new_tab: bool = False, uid: str = "") -> ToolResult:
    try:
        # 复用全局会话
        browser, tools = await get_browser_session()

        # 使用 browser-use 原生工具
        result = await tools.navigate(
            url=url,
            new_tab=new_tab,
            browser_session=browser
        )

        return action_result_to_tool_result(result, f"导航到 {url}")

优势：

✅ 浏览器只启动一次（快！）
✅ 登录状态自动保持
✅ Cookie 自动保存
✅ 导航历史维护
✅ 所有工具共享同一个浏览器会话

⚙️ 高级配置

1. 使用预设的 Cookies

from browser_use import BrowserProfile

# 预设 cookies
profile = BrowserProfile(
    cookies=[
        {
            'name': 'session_id',
            'value': 'your_session_id',
            'domain': 'example.com',
            'path': '/',
            'secure': True,
            'httpOnly': True
        }
    ]
)

await init_browser_session(browser_profile=profile)

2. 使用代理

await init_browser_session(
    proxy={
        'server': 'http://proxy.example.com:8080',
        'username': 'user',
        'password': 'pass'
    }
)

3. 自定义视口大小

await init_browser_session(
    extra_chromium_args=[
        '--window-size=1920,1080',
        '--start-maximized'
    ]
)

4. 多个独立的浏览器配置

# 配置 1：小红书专用
await init_browser_session(profile_name="xiaohongshu")
# ... 使用工具
await cleanup_browser_session()

# 配置 2：淘宝专用
await init_browser_session(profile_name="taobao")
# ... 使用工具
await cleanup_browser_session()

📝 注意事项

1. 必须先初始化

# ❌ 错误
await navigate_to_url("https://example.com")  # 未初始化，会自动创建默认会话

# ✅ 正确
await init_browser_session(profile_name="my_profile")
await navigate_to_url("https://example.com")

2. 元素索引可能变化

# 获取元素映射
selector_result = await get_selector_map()

# 使用索引
await click_element(index=5)  # ✅

# 页面变化后，索引可能失效
await click_element(index=5)  # ❌ 可能失败

# 解决方案：重新获取元素映射
selector_result = await get_selector_map()
await click_element(index=5)  # ✅

3. 异步操作

所有工具都是异步函数，必须使用 await：

# ❌ 错误
result = navigate_to_url("https://example.com")

# ✅ 正确
result = await navigate_to_url("https://example.com")

4. 清理资源

使用完毕后记得清理：

try:
    await init_browser_session()
    # ... 使用工具
finally:
    await cleanup_browser_session()  # 保存状态
    # 或
    await kill_browser_session()      # 完全关闭

🔍 调试技巧

1. 查看页面元素

# 获取元素映射
result = await get_selector_map()
print(result.output)  # 显示所有可交互元素

# 获取页面 HTML
html_result = await get_page_html()
print(html_result.metadata['html'][:500])  # 显示前500字符

2. 使用截图

# 请求截图
await screenshot()

# 截图会在下次观察时包含
# 可以用于 debug 页面状态

3. 执行 JavaScript 调试

# 检查页面状态
result = await evaluate("""
(function(){
    return {
        url: window.location.href,
        title: document.title,
        body_height: document.body.scrollHeight,
        viewport_height: window.innerHeight
    };
})()
""")
print(result.output)

📚 完整示例

查看 tools/baseClassTools_examples.py 文件，包含：

✅ 基础使用示例
✅ 搜索和数据提取示例
✅ 表单填写示例
✅ 登录场景示例
✅ JavaScript 执行示例
✅ Agent 集成示例

运行示例：

python tools/baseClassTools_examples.py

🎯 推荐使用方式

方式 1: 直接使用（简单任务）

from tools.baseClassTools import *

async def task():
    await init_browser_session()
    await navigate_to_url("https://example.com")
    # ... 其他操作
    await cleanup_browser_session()

方式 2: 封装到 Agent（复杂任务）

from tools.baseClassTools import *

class MyAgent:
    async def start(self):
        await init_browser_session(profile_name="my_agent")

    async def task(self):
        # 使用各种工具
        pass

    async def stop(self):
        await cleanup_browser_session()

方式 3: 集成到现有系统

# 在你的 Agent 初始化时
from tools.baseClassTools import init_browser_session

class ExistingAgent:
    async def __init__(self):
        # 初始化浏览器
        await init_browser_session(
            headless=False,
            profile_name="existing_agent"
        )

        # 注册工具到你的系统
        from tools import baseClassTools
        self.register_tools(baseClassTools)

🚀 下一步

✅ 运行示例代码了解基本用法
✅ 在自己的 Agent 中集成
✅ 为不同网站创建不同的 profile
✅ 享受持久化会话带来的便利！

文档版本: 1.0 创建日期: 2026-01-29 适用项目: Agent 浏览器自动化工具

baseClassTools_README.md 18 KB Permalink Lịch sử Raw