# baseClassTools.py 使用指南

## 📋 概述

`baseClassTools.py` 是基于 **browser-use 原生类**实现的浏览器自动化工具集，与之前的 `browserUseTools.py`（基于 Playwright）相比，具有以下核心优势：

### ✅ 核心优势

| 特性 | browserUseTools.py (旧) | baseClassTools.py (新) |
|------|------------------------|------------------------|
| **底层协议** | Playwright → CDP | 直接使用 CDP |
| **浏览器会话** | 每次调用创建新实例 | 全局持久化会话 |
| **启动时间** | 每次 1-3 秒 | 只启动一次 |
| **登录状态** | ❌ 每次丢失 | ✅ 自动保持 |
| **Cookie/Storage** | ❌ 不保存 | ✅ 自动保存 |
| **DOM 状态缓存** | ❌ 无 | ✅ 有 |
| **元素高亮** | ❌ 无 | ✅ 自动高亮 |
| **多标签页管理** | ❌ 困难 | ✅ 原生支持 |
| **性能** | 🐢 慢（频繁启动） | 🚀 快（持久化） |
| **内存占用** | 💾 高（频繁创建） | 💾 低（单实例） |
| **代码重复** | ⚠️ 多（每个工具都初始化） | ✅ 无（全局管理） |

---

## 🚀 快速开始

### 1. 基础使用

```python
import asyncio
from tools.baseClassTools import (
    init_browser_session,
    navigate_to_url,
    get_page_html,
    cleanup_browser_session
)

async def main():
    # 1. 初始化浏览器（只需一次）
    await init_browser_session(headless=False)

    # 2. 使用工具
    await navigate_to_url("https://www.baidu.com")
    result = await get_page_html()
    print(result.output)

    # 3. 清理
    await cleanup_browser_session()

asyncio.run(main())
```

### 2. 在 Agent 中集成

```python
from tools.baseClassTools import init_browser_session, navigate_to_url

class MyAgent:
    async def start(self):
        # 初始化浏览器会话
        await init_browser_session(
            headless=False,
            profile_name="my_agent_profile"  # 持久化配置
        )

    async def do_task(self):
        # 直接使用工具，浏览器状态会保持
        await navigate_to_url("https://example.com")
        # ... 其他操作
```

---

## 🔧 核心功能

### 1. 会话管理

#### `init_browser_session()` - 初始化浏览器会话

```python
await init_browser_session(
    headless=False,              # 是否无头模式
    user_data_dir=None,          # 用户数据目录（自动持久化）
    profile_name="default",      # 配置文件名称
    browser_profile=None,        # BrowserProfile 对象（预设 cookies）
    **kwargs                     # 其他 BrowserSession 参数
)
```

**特性**：
- ✅ **自动持久化**：登录状态、Cookie、LocalStorage 等自动保存
- ✅ **单例模式**：全局只有一个浏览器实例
- ✅ **配置隔离**：不同的 `profile_name` 使用不同的配置文件

**示例**：
```python
# 方式1：使用默认配置
await init_browser_session()

# 方式2：指定配置文件名（推荐）
await init_browser_session(profile_name="xiaohongshu")

# 方式3：指定完整路径
await init_browser_session(user_data_dir="/path/to/profile")

# 方式4：预设 cookies
from browser_use import BrowserProfile

profile = BrowserProfile(
    cookies=[{
        'name': 'session_id',
        'value': 'abc123',
        'domain': 'example.com',
        'path': '/'
    }]
)
await init_browser_session(browser_profile=profile)
```

#### `cleanup_browser_session()` - 优雅停止

```python
await cleanup_browser_session()
```

- 保存浏览器状态（登录、Cookie等）
- 停止会话但不关闭浏览器进程
- 下次运行可以继续使用保存的状态

#### `kill_browser_session()` - 强制终止

```python
await kill_browser_session()
```

- 完全关闭浏览器进程
- 适用于需要彻底清理的场景

### 2. 导航类工具

#### `navigate_to_url(url, new_tab=False)` - 页面导航

```python
# 在当前标签页打开
await navigate_to_url("https://www.baidu.com")

# 在新标签页打开
await navigate_to_url("https://www.google.com", new_tab=True)
```

#### `search_web(query, engine='google')` - 网页搜索

```python
await search_web("Python async programming", engine="google")
await search_web("Machine Learning", engine="duckduckgo")
```

支持的搜索引擎：`google`, `duckduckgo`, `bing`

#### `go_back()` - 返回上一页

```python
await go_back()
```

#### `wait(seconds)` - 等待

```python
await wait(seconds=5)  # 等待5秒
```

### 3. 元素交互工具

#### `click_element(index)` - 点击元素

```python
# 通过索引点击（需要先获取 selector_map）
await click_element(index=5)
```

#### `input_text(index, text, clear=True)` - 输入文本

```python
# 清空后输入
await input_text(index=3, text="Hello World", clear=True)

# 追加输入
await input_text(index=3, text=" More text", clear=False)
```

#### `send_keys(keys)` - 发送按键

```python
await send_keys("Enter")          # 回车
await send_keys("Control+A")      # 全选
await send_keys("Escape")         # ESC
```

#### `upload_file(index, path)` - 上传文件

```python
await upload_file(index=7, path="/path/to/file.pdf")
```

### 4. 滚动和视图工具

#### `scroll_page(down=True, pages=1.0)` - 滚动页面

```python
# 向下滚动 2 页
await scroll_page(down=True, pages=2.0)

# 向上滚动 1 页
await scroll_page(down=False, pages=1.0)

# 滚动到底部
await scroll_page(down=True, pages=10.0)
```

#### `find_text(text)` - 查找文本

```python
await find_text("Contact Us")
```

#### `screenshot()` - 截图

```python
await screenshot()
```

### 5. 内容提取工具

#### `get_page_html()` - 获取页面 HTML

```python
result = await get_page_html()

# 访问数据
html = result.metadata['html']
url = result.metadata['url']
title = result.metadata['title']
```

#### `get_selector_map()` - 获取元素映射

```python
result = await get_selector_map()

# 查看可交互元素
print(result.output)

# 访问元素映射（用于后续点击、输入等操作）
selector_map = result.metadata['selector_map']
```

#### `extract_content(query)` - AI 提取内容

```python
# 注意：需要配置 LLM
result = await extract_content(
    query="提取页面上所有产品的名称和价格",
    extract_links=True
)
```

### 6. JavaScript 执行

#### `evaluate(code)` - 执行 JavaScript

```python
# 获取页面信息
result = await evaluate("document.title")

# 执行复杂操作
js_code = """
(function(){
    try {
        return {
            title: document.title,
            links: document.querySelectorAll('a').length
        };
    } catch(e) {
        return 'Error: ' + e.message;
    }
})()
"""
result = await evaluate(code=js_code)
```

### 7. 等待用户操作

#### `wait_for_user_action(message, timeout=300)` - 等待用户操作

```python
# 检测需要登录
html_result = await get_page_html()
html = html_result.metadata['html']

if "登录" in html:
    # 等待用户手动登录
    await wait_for_user_action(
        message="请在浏览器中登录",
        timeout=180  # 3分钟
    )
```

**控制台输出**：
```
============================================================
⏸️  WAITING FOR USER ACTION
============================================================
📝 请在浏览器中登录
⏱️  Timeout: 180 seconds

👉 Please complete the action in the browser window
👉 Press ENTER when done, or wait for timeout
============================================================
```

---

## 💡 实际应用场景

### 场景 1: 需要登录的网站（推荐方案）

```python
import asyncio
from tools.baseClassTools import (
    init_browser_session,
    navigate_to_url,
    wait,
    get_page_html,
    wait_for_user_action,
    cleanup_browser_session
)

async def xiaohongshu_task():
    # 1. 使用专门的配置文件（保存登录状态）
    await init_browser_session(
        headless=False,
        profile_name="xiaohongshu_profile"  # 关键！
    )

    # 2. 导航到小红书
    await navigate_to_url("https://www.xiaohongshu.com")
    await wait(seconds=2)

    # 3. 检查是否需要登录
    html_result = await get_page_html()
    html = html_result.metadata['html']

    if "登录" in html or "login" in html.lower():
        print("⚠️ 检测到需要登录")
        # 等待用户登录
        await wait_for_user_action("请登录小红书", timeout=180)
    else:
        print("✅ 已经登录（使用了保存的状态）")

    # 4. 执行后续任务...
    # ...

    # 5. 清理（保存状态）
    await cleanup_browser_session()

# 第一次运行：需要手动登录
# 第二次运行：自动使用保存的登录状态 ✅
asyncio.run(xiaohongshu_task())
```

### 场景 2: 多标签页操作

```python
from tools.baseClassTools import (
    init_browser_session,
    navigate_to_url,
    switch_tab,
    close_tab,
    get_page_html
)

async def multi_tab_task():
    await init_browser_session()

    # 打开多个标签页
    await navigate_to_url("https://www.baidu.com", new_tab=False)
    await navigate_to_url("https://www.google.com", new_tab=True)
    await navigate_to_url("https://www.bing.com", new_tab=True)

    # 切换到第二个标签页
    await switch_tab(tab_id="xxxx")  # 需要从 browser 获取实际的 tab_id

    # 获取当前标签页内容
    result = await get_page_html()
    print(result.metadata['url'])

    # 关闭标签页
    await close_tab(tab_id="xxxx")
```

### 场景 3: 在自己的 Agent 中集成

```python
from tools.baseClassTools import (
    init_browser_session,
    navigate_to_url,
    get_selector_map,
    click_element,
    input_text,
    send_keys,
    wait,
    cleanup_browser_session
)

class MySearchAgent:
    """搜索 Agent 示例"""

    def __init__(self):
        self.initialized = False

    async def start(self):
        """启动 Agent"""
        if not self.initialized:
            await init_browser_session(
                headless=False,
                profile_name="search_agent"
            )
            self.initialized = True

    async def search(self, keyword: str):
        """执行搜索任务"""
        await self.start()

        # 1. 导航到搜索引擎
        await navigate_to_url("https://www.baidu.com")
        await wait(seconds=2)

        # 2. 获取元素映射
        selector_result = await get_selector_map()
        print(selector_result.output)

        # 3. 输入搜索关键词（假设输入框是索引 1）
        await input_text(index=1, text=keyword, clear=True)

        # 4. 提交搜索
        await send_keys(keys="Enter")
        await wait(seconds=3)

        # 5. 返回结果
        html_result = await get_page_html()
        return html_result.metadata['html']

    async def stop(self):
        """停止 Agent"""
        if self.initialized:
            await cleanup_browser_session()
            self.initialized = False


# 使用 Agent
async def main():
    agent = MySearchAgent()
    try:
        results = await agent.search("Python async")
        print(f"搜索结果长度: {len(results)}")
    finally:
        await agent.stop()

asyncio.run(main())
```

---

## 🆚 与 browserUseTools.py 的对比

### browserUseTools.py（旧实现）

```python
# ❌ 问题示例
@tool()
async def navigate_to_url(url: str, new_tab: bool = False, uid: str = "") -> ToolResult:
    try:
        from playwright.async_api import async_playwright
        # 每次都创建新的浏览器实例！
        async with async_playwright() as p:
            browser = await p.chromium.launch(headless=False)  # 1-3秒启动时间
            context = await browser.new_context()
            page = await context.new_page()
            await page.goto(url)
            title = await page.title()
            # 函数结束后浏览器关闭，所有状态丢失！
            return ToolResult(...)
```

**问题**：
1. ❌ 每次调用都启动新浏览器（1-3秒）
2. ❌ 登录状态丢失
3. ❌ Cookie 不保存
4. ❌ 无法维护页面导航历史
5. ❌ 多个工具调用之间无法共享状态

### baseClassTools.py（新实现）

```python
# ✅ 改进示例
# 全局浏览器会话
_browser_session: Optional[BrowserSession] = None
_browser_tools: Optional[Tools] = None

async def init_browser_session(...):
    """只启动一次"""
    global _browser_session, _browser_tools
    if _browser_session is None:
        _browser_session = BrowserSession(...)
        await _browser_session.start()  # 只启动一次！
        _browser_tools = Tools()

@tool()
async def navigate_to_url(url: str, new_tab: bool = False, uid: str = "") -> ToolResult:
    try:
        # 复用全局会话
        browser, tools = await get_browser_session()

        # 使用 browser-use 原生工具
        result = await tools.navigate(
            url=url,
            new_tab=new_tab,
            browser_session=browser
        )

        return action_result_to_tool_result(result, f"导航到 {url}")
```

**优势**：
1. ✅ 浏览器只启动一次（快！）
2. ✅ 登录状态自动保持
3. ✅ Cookie 自动保存
4. ✅ 导航历史维护
5. ✅ 所有工具共享同一个浏览器会话

---

## ⚙️ 高级配置

### 1. 使用预设的 Cookies

```python
from browser_use import BrowserProfile

# 预设 cookies
profile = BrowserProfile(
    cookies=[
        {
            'name': 'session_id',
            'value': 'your_session_id',
            'domain': 'example.com',
            'path': '/',
            'secure': True,
            'httpOnly': True
        }
    ]
)

await init_browser_session(browser_profile=profile)
```

### 2. 使用代理

```python
await init_browser_session(
    proxy={
        'server': 'http://proxy.example.com:8080',
        'username': 'user',
        'password': 'pass'
    }
)
```

### 3. 自定义视口大小

```python
await init_browser_session(
    extra_chromium_args=[
        '--window-size=1920,1080',
        '--start-maximized'
    ]
)
```

### 4. 多个独立的浏览器配置

```python
# 配置 1：小红书专用
await init_browser_session(profile_name="xiaohongshu")
# ... 使用工具
await cleanup_browser_session()

# 配置 2：淘宝专用
await init_browser_session(profile_name="taobao")
# ... 使用工具
await cleanup_browser_session()
```

---

## 📝 注意事项

### 1. 必须先初始化

```python
# ❌ 错误
await navigate_to_url("https://example.com")  # 未初始化，会自动创建默认会话

# ✅ 正确
await init_browser_session(profile_name="my_profile")
await navigate_to_url("https://example.com")
```

### 2. 元素索引可能变化

```python
# 获取元素映射
selector_result = await get_selector_map()

# 使用索引
await click_element(index=5)  # ✅

# 页面变化后，索引可能失效
await click_element(index=5)  # ❌ 可能失败

# 解决方案：重新获取元素映射
selector_result = await get_selector_map()
await click_element(index=5)  # ✅
```

### 3. 异步操作

所有工具都是异步函数，必须使用 `await`：

```python
# ❌ 错误
result = navigate_to_url("https://example.com")

# ✅ 正确
result = await navigate_to_url("https://example.com")
```

### 4. 清理资源

使用完毕后记得清理：

```python
try:
    await init_browser_session()
    # ... 使用工具
finally:
    await cleanup_browser_session()  # 保存状态
    # 或
    await kill_browser_session()      # 完全关闭
```

---

## 🔍 调试技巧

### 1. 查看页面元素

```python
# 获取元素映射
result = await get_selector_map()
print(result.output)  # 显示所有可交互元素

# 获取页面 HTML
html_result = await get_page_html()
print(html_result.metadata['html'][:500])  # 显示前500字符
```

### 2. 使用截图

```python
# 请求截图
await screenshot()

# 截图会在下次观察时包含
# 可以用于 debug 页面状态
```

### 3. 执行 JavaScript 调试

```python
# 检查页面状态
result = await evaluate("""
(function(){
    return {
        url: window.location.href,
        title: document.title,
        body_height: document.body.scrollHeight,
        viewport_height: window.innerHeight
    };
})()
""")
print(result.output)
```

---

## 📚 完整示例

查看 `tools/baseClassTools_examples.py` 文件，包含：

1. ✅ 基础使用示例
2. ✅ 搜索和数据提取示例
3. ✅ 表单填写示例
4. ✅ 登录场景示例
5. ✅ JavaScript 执行示例
6. ✅ Agent 集成示例

运行示例：

```bash
python tools/baseClassTools_examples.py
```

---

## 🎯 推荐使用方式

### 方式 1: 直接使用（简单任务）

```python
from tools.baseClassTools import *

async def task():
    await init_browser_session()
    await navigate_to_url("https://example.com")
    # ... 其他操作
    await cleanup_browser_session()
```

### 方式 2: 封装到 Agent（复杂任务）

```python
from tools.baseClassTools import *

class MyAgent:
    async def start(self):
        await init_browser_session(profile_name="my_agent")

    async def task(self):
        # 使用各种工具
        pass

    async def stop(self):
        await cleanup_browser_session()
```

### 方式 3: 集成到现有系统

```python
# 在你的 Agent 初始化时
from tools.baseClassTools import init_browser_session

class ExistingAgent:
    async def __init__(self):
        # 初始化浏览器
        await init_browser_session(
            headless=False,
            profile_name="existing_agent"
        )

        # 注册工具到你的系统
        from tools import baseClassTools
        self.register_tools(baseClassTools)
```

---

## 🚀 下一步

1. ✅ 运行示例代码了解基本用法
2. ✅ 在自己的 Agent 中集成
3. ✅ 为不同网站创建不同的 profile
4. ✅ 享受持久化会话带来的便利！

---

**文档版本**: 1.0
**创建日期**: 2026-01-29
**适用项目**: Agent 浏览器自动化工具