# Agent Skill: Autonomous ComfyUI Workflow Builder (RunComfy Ecosystem)

## 📌 技能目标 (Skill Objective)
取代传统的“人类拖拽 UI 导出拖放脚本再由 Agent 无脑转发”的低级模式。本技能旨在赋能 Agent 拥有**“直接面向目标编写 ComfyUI底层计算流 (API JSON)”**的设计与执行能力，做到从0到1的全自动生图闭环。

---

## 🛠️ 核心认知架构 (Mental Model)

1. **摆脱UI图纸依赖**：ComfyUI 的图纸本质上是一组含有拓扑连线关系的 JSON 字典格式。Agent 完全可以通过 Python 的字典在内存里即时组装任何 `class_type` 及其 `inputs` 参数，再提交给 API！
2. **白板机即是军火库**：RunComfy 的默认云端镜像环境并非一无所有。它内置了 **209+个 Checkpoint**（如 `FLUX1/flux1-dev-fp8`, `Juggernaut_X` 等）和 **150+个主流 LoRA**。这意味着只要你手写的图纸调用这些标准大类，将享有 0 报错、秒过审的 VIP 丝滑度。
3. **降维打击控制流**：在构建复杂控制（如同时指定 深度图/姿态图 + 人物参考图）时，若云端恰好缺失重型冷门外挂（如 `ExpressionEditor` 或复杂的 `IPAdapter` 组接节点），**立刻转向使用复合型 LoRA 平替架构！** 例如，利用 `refcontrol_pose.safetensors` 结合 `ImageConcanate` 节点，在一张 LoRA 里直接完成双图结构的合并控参，实现优雅兼容！

---

## 🧠 标准工作流组装范式 (SOP: Standard Operating Procedure)

Agent 要实现一个全新的复杂生图任务，应严格遵循以下步骤：

### Step 1: 分析意图，选定底层架构
判断用户的需求。如果是绝对顶级真实感或多条件复合控制，首选 **FLUX.1 架构**。
*   需要初始化三个核心 Loader:
    1.  `UNETLoader` -> 加载主模型 (`flux1-dev-kontext_fp8_scaled.safetensors`)
    2.  `VAELoader` -> 加载 VAE (`ae.safetensors`)
    3.  `DualCLIPLoader` -> 加载文本编码器 (`clip_l.safetensors` & `t5xxl_fp8_e4m3fn_scaled.safetensors`)

### Step 2: 注入控制流 (Control / Reference)
不再试图盲猜云端有没有奇葩的 Custom Note，而是全盘使用最高效的组件组装：
*   **图像读入**：声明 `LoadImage` 节点读取远端服务器图片名。
*   **图像合并**：如果是多图并联作为联合输入控制，使用 `ImageConcanate`（方向 `down` / `right`）。
*   **加载特化版控制 LoRA**：声明 `LoraLoader` 加载 `refcontrol_pose.safetensors` ，将模型流从 UNET 转入强化通道。
*   **注入隐空间**：通过 `FluxKontextImageScale` 控制图幅后，用 `VAEEncode` 将它编码给 `ReferenceLatent`。

### Step 3: 手写图纸代码的 Python 实现模板
用纯 Python 面向对象重构计算流，而不是修改字符串模板。这将确保 ID 不起冲突且连线 100% 精确。

```python
class AgentWorkflowBuilder:
    def __init__(self):
        self.nodes = {}
        self.node_counter = 1
        
    def add(self, class_type, inputs):
        node_id = str(self.node_counter)
        self.nodes[node_id] = {"class_type": class_type, "inputs": inputs}
        self.node_counter += 1
        return node_id

def auto_generate():
    wf = AgentWorkflowBuilder()
    
    # 【加载器】
    unet = wf.add("UNETLoader", {"unet_name": "flux1-dev-kontext_fp8_scaled.safetensors", "weight_dtype": "default"})
    vae = wf.add("VAELoader", {"vae_name": "ae.safetensors"})
    clip = wf.add("DualCLIPLoader", {"clip_name1": "clip_l.safetensors", "clip_name2": "t5xxl_fp8_e4m3fn_scaled.safetensors", "type": "flux", "device": "default"})
    
    # 【加载控制 LoRA】
    lora = wf.add("LoraLoader", {"lora_name": "refcontrol_pose.safetensors", "strength_model": 1, "strength_clip": 1, "model": [unet, 0], "clip": [clip, 0]})
    
    # 【咒语与调度】
    pos_prompt = wf.add("CLIPTextEncode", {"text": "A beautiful cinematic shot, ultra detailed", "clip": [lora, 1]})
    neg_prompt = wf.add("ConditioningZeroOut", {"conditioning": [pos_prompt, 0]})
    
    # 【采样打图】
    sampler = wf.add("KSampler", {
        "seed": 1234567, "steps": 25, "cfg": 1, "sampler_name": "euler", 
        "scheduler": "simple", "denoise": 1, 
        "model": [lora, 0], "positive": [pos_prompt, 0], "negative": [neg_prompt, 0], 
        "latent_image": ["EmptyLatentImageNodeID", 0] # 根据需求连线
    })
    
    # 【解码并保存】
    decoded = wf.add("VAEDecode", {"samples": [sampler, 0], "vae": [vae, 0]})
    final_output = wf.add("SaveImage", {"filename_prefix": "AgentGen", "images": [decoded, 0]})
    
    return wf.nodes
```

### Step 4: 触发部署 (Deploy to RunComfy)
有了组装好的 `wf.nodes` 纯纯字典：
1. **素材打底**: 先调用 `runcomfy_workflow_executor.upload_file_from_base64()` 把用户给的图推到云服务器上。
2. **发射核弹**: 将字典赋给 `prompt` key，发起 `POST /prompt` 到你的机器实例（如 `deddf65f...`）。
3. **轮询守候**: Agent 强制执行 `wait_for_completion()`，并在成功后下载图像返给人类。

---

## 🚫 强力避坑雷区 (Critical Warnings)
- **永远别碰 UI-JSON**：千万别在 Agent 内试图尝试去正则修改 `.json` UI 文件里的长串内容（存在巨量隐藏 ID 和坐标参数干扰）！**唯一方案是从 0 开始用 Python 对象写入。**
- **400 Missing Error 处理机制**：如果你提交的节点由于 `missing_node_type` 报错导致不执行，立即检查该节点是否是必须品。如果是花里胡哨的外挂节点，**立刻用基础模型/LoRA写一套平替逻辑绕过**！
- **不确定的模型名先问库**：写死模型名称前，用探针在 `object_info.json` 里的 `ckpt_name` / `lora_name` 确认该白板机上是否含有此预置权重。