Agent Skill: Autonomous ComfyUI Workflow Builder (RunComfy Ecosystem)

📌 技能目标 (Skill Objective)

取代传统的“人类拖拽 UI 导出拖放脚本再由 Agent 无脑转发”的低级模式。本技能旨在赋能 Agent 拥有“直接面向目标编写 ComfyUI底层计算流 (API JSON)”的设计与执行能力，做到从0到1的全自动生图闭环。

🛠️ 核心认知架构 (Mental Model)

摆脱UI图纸依赖：ComfyUI 的图纸本质上是一组含有拓扑连线关系的 JSON 字典格式。Agent 完全可以通过 Python 的字典在内存里即时组装任何 class_type 及其 inputs 参数，再提交给 API！
白板机即是军火库：RunComfy 的默认云端镜像环境并非一无所有。它内置了 209+个 Checkpoint（如 FLUX1/flux1-dev-fp8, Juggernaut_X 等）和 150+个主流 LoRA。这意味着只要你手写的图纸调用这些标准大类，将享有 0 报错、秒过审的 VIP 丝滑度。
降维打击控制流：在构建复杂控制（如同时指定深度图/姿态图 + 人物参考图）时，若云端恰好缺失重型冷门外挂（如 ExpressionEditor 或复杂的 IPAdapter 组接节点），立刻转向使用复合型 LoRA 平替架构！ 例如，利用 refcontrol_pose.safetensors 结合 ImageConcanate 节点，在一张 LoRA 里直接完成双图结构的合并控参，实现优雅兼容！

🧠 标准工作流组装范式 (SOP: Standard Operating Procedure)

Agent 要实现一个全新的复杂生图任务，应严格遵循以下步骤：

Step 1: 分析意图，选定底层架构

判断用户的需求。如果是绝对顶级真实感或多条件复合控制，首选 FLUX.1 架构。

需要初始化三个核心 Loader:
1. UNETLoader -> 加载主模型 (flux1-dev-kontext_fp8_scaled.safetensors)
2. VAELoader -> 加载 VAE (ae.safetensors)
3. DualCLIPLoader -> 加载文本编码器 (clip_l.safetensors & t5xxl_fp8_e4m3fn_scaled.safetensors)

Step 2: 注入控制流 (Control / Reference)

不再试图盲猜云端有没有奇葩的 Custom Note，而是全盘使用最高效的组件组装：

图像读入：声明 LoadImage 节点读取远端服务器图片名。
图像合并：如果是多图并联作为联合输入控制，使用 ImageConcanate（方向 down / right）。
加载特化版控制 LoRA：声明 LoraLoader 加载 refcontrol_pose.safetensors ，将模型流从 UNET 转入强化通道。
注入隐空间：通过 FluxKontextImageScale 控制图幅后，用 VAEEncode 将它编码给 ReferenceLatent。

Step 3: 手写图纸代码的 Python 实现模板

用纯 Python 面向对象重构计算流，而不是修改字符串模板。这将确保 ID 不起冲突且连线 100% 精确。

class AgentWorkflowBuilder:
    def __init__(self):
        self.nodes = {}
        self.node_counter = 1
        
    def add(self, class_type, inputs):
        node_id = str(self.node_counter)
        self.nodes[node_id] = {"class_type": class_type, "inputs": inputs}
        self.node_counter += 1
        return node_id

def auto_generate():
    wf = AgentWorkflowBuilder()
    
    # 【加载器】
    unet = wf.add("UNETLoader", {"unet_name": "flux1-dev-kontext_fp8_scaled.safetensors", "weight_dtype": "default"})
    vae = wf.add("VAELoader", {"vae_name": "ae.safetensors"})
    clip = wf.add("DualCLIPLoader", {"clip_name1": "clip_l.safetensors", "clip_name2": "t5xxl_fp8_e4m3fn_scaled.safetensors", "type": "flux", "device": "default"})
    
    # 【加载控制 LoRA】
    lora = wf.add("LoraLoader", {"lora_name": "refcontrol_pose.safetensors", "strength_model": 1, "strength_clip": 1, "model": [unet, 0], "clip": [clip, 0]})
    
    # 【咒语与调度】
    pos_prompt = wf.add("CLIPTextEncode", {"text": "A beautiful cinematic shot, ultra detailed", "clip": [lora, 1]})
    neg_prompt = wf.add("ConditioningZeroOut", {"conditioning": [pos_prompt, 0]})
    
    # 【采样打图】
    sampler = wf.add("KSampler", {
        "seed": 1234567, "steps": 25, "cfg": 1, "sampler_name": "euler", 
        "scheduler": "simple", "denoise": 1, 
        "model": [lora, 0], "positive": [pos_prompt, 0], "negative": [neg_prompt, 0], 
        "latent_image": ["EmptyLatentImageNodeID", 0] # 根据需求连线
    })
    
    # 【解码并保存】
    decoded = wf.add("VAEDecode", {"samples": [sampler, 0], "vae": [vae, 0]})
    final_output = wf.add("SaveImage", {"filename_prefix": "AgentGen", "images": [decoded, 0]})
    
    return wf.nodes

Step 4: 触发部署 (Deploy to RunComfy)

有了组装好的 wf.nodes 纯纯字典：

素材打底: 先调用 runcomfy_workflow_executor.upload_file_from_base64() 把用户给的图推到云服务器上。
发射核弹: 将字典赋给 prompt key，发起 POST /prompt 到你的机器实例（如 deddf65f...）。
轮询守候: Agent 强制执行 wait_for_completion()，并在成功后下载图像返给人类。

🚫 强力避坑雷区 (Critical Warnings)

永远别碰 UI-JSON：千万别在 Agent 内试图尝试去正则修改 .json UI 文件里的长串内容（存在巨量隐藏 ID 和坐标参数干扰）！唯一方案是从 0 开始用 Python 对象写入。
400 Missing Error 处理机制：如果你提交的节点由于 missing_node_type 报错导致不执行，立即检查该节点是否是必须品。如果是花里胡哨的外挂节点，立刻用基础模型/LoRA写一套平替逻辑绕过！
不确定的模型名先问库：写死模型名称前，用探针在 object_info.json 里的 ckpt_name / lora_name 确认该白板机上是否含有此预置权重。

agent_runcomfy_skill.md 5.7 KB История Исходник