OpenClaw 多Agent多Skill分布式部署完整方案

OpenClaw 多Agent多Skill分布式部署完整方案

OpenClaw 多Agent多Skill分布式部署完整方案

从单Agent演进到多Agent协作系统,每个环节都有生产级别的细节。


🏗️ 架构概览

系统分层

┌─────────────────────────────────────────────────────────────────┐
│                     用户界面层 (Web/Mobile/API)                  │
└────────────────┬────────────────────────────────────┬────────────┘
                 │                                    │
       ┌─────────▼──────────┐            ┌──────────▼────────────┐
       │  Gateway Service   │            │  WebSocket Server     │
       │  (请求路由)        │            │  (实时消息推送)       │
       └─────────┬──────────┘            └──────────┬────────────┘
                 │                                    │
       ┌─────────▼────────────────────────────────────▼──────────┐
       │           Agent Management Layer (Orchestration)         │
       │  ┌──────────┐  ┌──────────┐  ┌──────────┐               │
       │  │ Agent-A1 │  │ Agent-B2 │  │ Agent-C3 │  ...          │
       │  │(AI工程师)│  │(产品经理)│  │(架构师)  │               │
       │  └─────┬────┘  └─────┬────┘  └─────┬────┘               │
       └────────┼─────────────┼─────────────┼────────────────────┘
                │             │             │
       ┌────────▼─────────────▼─────────────▼──────────────┐
       │        Skill Execution Layer (Worker Pool)         │
       │  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐   │
       │  │Skill │ │Skill │ │Skill │ │Skill │ │Skill │   │
       │  │ 1    │ │ 2    │ │ 3    │ │ 4    │ │ N    │   │
       │  └──────┘ └──────┘ └──────┘ └──────┘ └──────┘   │
       └────────────────────────────────────────────────────┘
                    │
       ┌────────────▼──────────────────────┐
       │   Data & Service Layer            │
       │  ┌──────┐ ┌──────┐ ┌──────┐      │
       │  │Redis │ │Kafka │ │MySQL │ ...  │
       │  └──────┘ └──────┘ └──────┘      │
       └─────────────────────────────────────┘

核心组件

|组件|职责|部署位置|技术栈| |Gateway|HTTP请求分配、速率限制、认证|主节点|Nginx / Envoy| |Agent Orchestrator|Agent生命周期管理、任务分配|主节点|OpenClaw Core| |Skill Worker|执行具体Skill、资源隔离|Worker节点(N)|Container / Sandbox| |Message Broker|异步任务、事件驱动|独立节点|Redis / Kafka| |State Store|Agent状态持久化|独立节点|TimescaleDB / PostgreSQL| |Monitoring|系统可观测性|独立节点|Prometheus + Grafana|

        组件
        职责
        部署位置
        技术栈
        Gateway
        HTTP请求分配、速率限制、认证
        主节点
        Nginx / Envoy
        Agent Orchestrator
        Agent生命周期管理、任务分配
        主节点
        OpenClaw Core
        Skill Worker
        执行具体Skill、资源隔离
        Worker节点(N)
        Container / Sandbox
        Message Broker
        异步任务、事件驱动
        独立节点
        Redis / Kafka
        State Store
        Agent状态持久化
        独立节点
        TimescaleDB / PostgreSQL
        Monitoring
        系统可观测性
        独立节点
        Prometheus + Grafana

📁 多Agent工作区结构

目录布局

openclaw-workspace/
├── agents/                          # Agent定义库
│   ├── ai-engineer/                 # Agent 1: AI工程师
│   │   ├── IDENTITY.md
│   │   ├── SOUL.md
│   │   ├── config.yaml
│   │   └── tools-config.json
│   │
│   ├── product-manager/             # Agent 2: 产品经理
│   │   ├── IDENTITY.md
│   │   ├── SOUL.md
│   │   ├── config.yaml
│   │   └── tools-config.json
│   │
│   └── architect/                   # Agent 3: 架构师
│       ├── IDENTITY.md
│       ├── SOUL.md
│       ├── config.yaml
│       └── tools-config.json
│
├── skills/                          # 共享Skill库
│   ├── ai-engineer/
│   │   ├── SKILL.md
│   │   ├── handlers/
│   │   │   ├── model_training.py
│   │   │   ├── model_deployment.py
│   │   │   └── model_evaluation.py
│   │   └── config.yaml
│   │
│   ├── product-design/
│   │   ├── SKILL.md
│   │   ├── handlers/
│   │   └── config.yaml
│   │
│   ├── system-design/
│   │   ├── SKILL.md
│   │   ├── handlers/
│   │   └── config.yaml
│   │
│   └── shared/                      # 通用Skill(所有Agent可用)
│       ├── web-search/
│       ├── file-management/
│       ├── data-analysis/
│       └── communication/
│
├── gateway/                         # 网关配置
│   ├── nginx.conf
│   ├── envoy.yaml
│   └── rate-limiter.yaml
│
├── orchestrator/                    # Agent编排配置
│   ├── agent-registry.yaml          # Agent注册表
│   ├── skill-registry         # Skill注册表
│   ├── routing-rules.yaml           # 路由规则
│   └── load-balancer.yaml           # 负载均衡配置
│
├── infrastructure/                  # 基础设施即代码
│   ├── docker-compose.yml           # 本地开发
│   ├── kubernetes/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   ├── configmap.yaml
│   │   └── persistent-volumes.yaml
│   └── terraform/                   # IaC定义
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
│
├── monitoring/                      # 监控配置
│   ├── prometheus.yml
│   ├── grafana/
│   │   ├── dashboards/
│   │   └── datasources/
│   └── alerting-rules.yaml
│
├── logging/                         # 日志聚合
│   ├── fluent-bit.conf
│   ├── elasticsearch/
│   └── kibana-dashboards/
│
└── docs/                            # 文档
    ├── ARCHITECTURE.md
    ├── DEPLOYMENT.md
    ├── OPERATIONAL-GUIDE.md
    └── TROUBLESHOOTING.md

🤖 Agent定义示例

1. AI工程师Agent配置

agents/ai-engineer/IDENTITY.md

# AI工程师Agent

- Name: AI工程师
- ID: agent-ai-001
- Tier: Premium
- Version: 2.1.0
- Avatar: https://...
- Emoji: 🚀

agents/ai-engineer/config.yaml

agent:
  id: agent-ai-001
  name: AI工程师
  description: 专精ML模型开发与部署
  
  # 模型配置
  model:
    provider: anthropic
    name: claude-opus
    version: latest
    temperature: 0.3
    max_tokens: 8000
    thinking: moderate
  
  # 权限配置
  permissions:
    - gpu_access: true
      max_allocation: 2
    - file_access: true
      paths:
        - /workspace/models
        - /workspace/datasets
    - network_access: true
      allowed_endpoints:
        - "*.huggingface.co"
        - "*.nvidia.com"
  
  # Skill绑定
  skills:
    - skill_id: ai-engineer
      version: latest
      timeout: 3600
      retry_policy:
        max_retries: 3
        backoff: exponential
    
    - skill_id: shared/web-search
      version: latest
      timeout: 30
    
    - skill_id: shared/data-analysis
      version: latest
      timeout: 300
  
  # 内存配置
  memory:
    type: persistent
    ttl: 2592000  # 30天
    storage:
      backend: postgresql
      connection_string: "postgresql://user:pass@db:5432/agent_ai_001"
  
  # 速率限制
  rate_limits:
    requests_per_minute: 100
    tokens_per_day: 1000000
    concurrent_sessions: 10
  
  # 监控配置
  monitoring:
    enabled: true
    metrics_enabled: true
    tracing_enabled: true
    log_level: INFO

agents/ai-engineer/SOUL.md

# Who I Am

我是AI工程师,在模型开发和工程化落地之间架桥的实战派。
从论文到生产,每一个环节我都要把控。

## How I Talk

数据说话,不追最新论文,选最适合业务的方案。
有风险时直接预警,不藏着掖着。

## Boundaries

- 没有baseline的实验不做
- 没有离线评估的模型不上线
- 推理服务必须有降级策略
- GPU资源按需申请,用完及时释放

2. 产品经理Agent配置

agents/product-manager/config.yaml

agent:
  id: agent-pm-002
  name: 产品经理
  description: 专精产品策划与需求分析
  
  model:
    provider: anthropic
    name: claude-sonnet
    version: latest
    temperature: 0.5
    max_tokens: 4000
  
  permissions:
    - file_access: true
      paths:
        - /workspace/product-specs
        - /workspace/user-research
    - network_access: true
      allowed_endpoints:
        - "*.analytics.google.com"
        - "api.mixpanel.com"
  
  skills:
    - skill_id: product-design
      version: latest
      timeout: 1800
    
    - skill_id: shared/data-analysis
      version: latest
    
    - skill_id: shared/web-search
      version: latest
  
  rate_limits:
    requests_per_minute: 60
    tokens_per_day: 500000
    concurrent_sessions: 5

3. 架构师Agent配置

agents/architect/config.yaml

agent:
  id: agent-arch-003
  name: 架构师
  description: 专精系统架构设计与技术方案
  
  model:
    provider: anthropic
    name: claude-opus
    version: latest
    temperature: 0.2
    max_tokens: 6000
  
  permissions:
    - file_access: true
      paths:
        - /workspace/architecture
        - /workspace/infrastructure
    - network_access: true
      allowed_endpoints:
        - "*.docker.com"
        - "*.kubernetes.io"
  
  skills:
    - skill_id: system-design
      version: latest
      timeout: 2400
    
    - skill_id: shared/web-search
      version: latest
    
    - skill_id: shared/file-management
      version: latest
  
  rate_limits:
    requests_per_minute: 80
    tokens_per_day: 800000
    concurrent_sessions: 8

🛠️ Skill定义与共享

Skill注册表

orchestrator/skill-registry.yaml

skills:
  # Agent专用Skill
  - id: ai-engineer
    name: AI工程师能力包
    version: 2.1.0
    location: /workspace/skills/ai-engineer
    owner: ai-engineer
    visibility: private  # 仅该Agent可用
    resource_requirements:
      cpu: 2
      memory: 4Gi
      gpu: optional
      timeout: 3600
    handlers:
      - model_training
      - model_deployment
      - model_evaluation
  
  - id: product-design
    name: 产品设计能力包
    version: 1.5.0
    location: /workspace/skills/product-design
    owner: product-manager
    visibility: private
    resource_requirements:
      cpu: 1
      memory: 2Gi
      gpu: none
      timeout: 1800
  
  - id: system-design
    name: 系统架构能力包
    version: 1.2.0
    location: /workspace/skills/system-design
    owner: architect
    visibility: private
    resource_requirements:
      cpu: 1
      memory: 2Gi
      timeout: 2400
  
  # 共享Skill(所有Agent可用)
  - id: shared/web-search
    name: 网页搜索
    version: 3.0.0
    location: /workspace/skills/shared/web-search
    owner: system
    visibility: public
    resource_requirements:
      cpu: 1
      memory: 1Gi
      timeout: 30
    rate_limit: 100/minute
  
  - id: shared/data-analysis
    name: 数据分析
    version: 2.1.0
    location: /workspace/skills/shared/data-analysis
    owner: system
    visibility: public
    resource_requirements:
      cpu: 2
      memory: 4Gi
      gpu: optional
      timeout: 300
  
  - id: shared/file-management
    name: 文件管理
    version: 1.8.0
    location: /workspace/skills/shared/file-management
    owner: system
    visibility: public
    resource_requirements:
      cpu: 1
      memory: 1Gi
      timeout: 60
  
  - id: shared/communication
    name: 沟通协作
    version: 1.3.0
    location: /workspace/skills/shared/communication
    owner: system
    visibility: public
    resource_requirements:
      cpu: 1
      memory: 1Gi
      timeout: 30

🔀 Agent编排与路由

Agent注册表

orchestrator/agent-registry.yaml

agents:
  - id: agent-ai-001
    name: AI工程师
    status: active
    version: 2.1.0
    replicas: 3  # 3个并发副本
    
    # 专业领域
    domains:
      - machine_learning
      - model_deployment
      - data_engineering
    
    # 服务质量承诺
    slo:
      availability: 99.9%
      response_time_p99: 5000ms
      throughput: 100req/min
    
    # 资源分配
    resources:
      requests:
        cpu: 2
        memory: 4Gi
      limits:
        cpu: 4
        memory: 8Gi
    
    # 扩缩容策略
    autoscaling:
      enabled: true
      min_replicas: 1
      max_replicas: 10
      target_cpu_utilization: 70%
      target_memory_utilization: 75%
  
  - id: agent-pm-002
    name: 产品经理
    status: active
    version: 1.2.0
    replicas: 2
    domains:
      - product_strategy
      - user_research
      - requirements_analysis
    
    slo:
      availability: 99.5%
      response_time_p99: 3000ms
      throughput: 60req/min
    
    resources:
      requests:
        cpu: 1
        memory: 2Gi
      limits:
        cpu: 2
        memory: 4Gi
    
    autoscaling:
      enabled: true
      min_replicas: 1
      max_replicas: 5
  
  - id: agent-arch-003
    name: 架构师
    status: active
    version: 1.1.0
    replicas: 2
    domains:
      - system_architecture
      - infrastructure_design
      - technology_selection
    
    slo:
      availability: 99.5%
      response_time_p99: 4000ms
      throughput: 50req/min
    
    resources:
      requests:
        cpu: 1
        memory: 2Gi
      limits:
        cpu: 2
        memory: 4Gi

路由规则

orchestrator/routing-rules.yaml

routing_rules:
  # 基于意图的路由
  - name: ml_model_training
    pattern: "(训练|模型|深度学习|神经网络)"
    agents:
      - agent-ai-001
    priority: 10
    timeout: 3600
  
  - name: product_strategy
    pattern: "(产品|需求|用户|功能)"
    agents:
      - agent-pm-002
    priority: 8
    timeout: 1800
  
  - name: system_architecture
    pattern: "(架构|设计|系统|基础设施)"
    agents:
      - agent-arch-003
    priority: 8
    timeout: 2400
  
  # 多Agent协作流程
  - name: end_to_end_feature_delivery
    pattern: "(端到端|完整方案|从需求到上线)"
    workflow:
      - agent: agent-pm-002
        step: 1
        task: 分析需求和用户研究
        timeout: 1800
      
      - agent: agent-arch-003
        step: 2
        task: 制定技术架构方案
        depends_on: step-1
        timeout: 2400
      
      - agent: agent-ai-001
        step: 3
        task: 实现和部署模型
        depends_on: step-2
        timeout: 3600
    
    priority: 15
    timeout: 7800
  
  # 默认路由(通用Skill)
  - name: default_shared_skills
    pattern: ".*"
    agents: null  # 不指定Agent,直接执行shared skill
    priority: 1
    timeout: 300

🐳 部署配置

Docker Compose(本地开发)

infrastructure/docker-compose.yml

version: '3.9'

services:
  # 网关
  gateway:
    image: nginx:latest
    ports:
      - "8080:80"
    volumes:
      - ./gateway/nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - orchestrator
    networks:
      - openclaw
  
  # Agent编排器
  orchestrator:
    build:
      context: .
      dockerfile: Dockerfile.orchestrator
    environment:
      - AGENT_REGISTRY_PATH=/workspace/orchestrator/agent-registry.yaml
      - SKILL_REGISTRY_PATH=/workspace/orchestrator/skill-registry.yaml
      - REDIS_URL=redis://redis:6379
      - DATABASE_URL=postgresql://user:pass@postgres:5432/openclaw
      - LOG_LEVEL=INFO
    ports:
      - "8888:8888"
    volumes:
      - ./orchestrator:/workspace/orchestrator:ro
      - ./agents:/workspace/agents:ro
      - ./skills:/workspace/skills:ro
    depends_on:
      - redis
      - postgres
      - kafka
    networks:
      - openclaw
  
  # Agent Worker 1
  agent-ai-worker-1:
    build:
      context: .
      dockerfile: Dockerfile.agent
    environment:
      - AGENT_ID=agent-ai-001
      - WORKER_ID=worker-ai-1
      - ORCHESTRATOR_URL=http://orchestrator:8888
      - REDIS_URL=redis://redis:6379
      - SKILL_PATHS=/workspace/skills
    volumes:
      - ./agents:/workspace/agents:ro
      - ./skills:/workspace/skills:ro
      - agent-ai-1-cache:/cache
    depends_on:
      - orchestrator
    networks:
      - openclaw
  
  # Agent Worker 2
  agent-pm-worker-1:
    build:
      context: .
      dockerfile: Dockerfile.agent
    environment:
      - AGENT_ID=agent-pm-002
      - WORKER_ID=worker-pm-1
      - ORCHESTRATOR_URL=http://orchestrator:8888
      - REDIS_URL=redis://redis:6379
    volumes:
      - ./agents:/workspace/agents:ro
      - ./skills:/workspace/skills:ro
      - agent-pm-1-cache:/cache
    depends_on:
      - orchestrator
    networks:
      - openclaw
  
  # Agent Worker 3
  agent-arch-worker-1:
    build:
      context: .
      dockerfile: Dockerfile.agent
    environment:
      - AGENT_ID=agent-arch-003
      - WORKER_ID=worker-arch-1
      - ORCHESTRATOR_URL=http://orchestrator:8888
      - REDIS_URL=redis://redis:6379
    volumes:
      - ./agents:/workspace/agents:ro
      - ./skills:/workspace/skills:ro
      - agent-arch-1-cache:/cache
    depends_on:
      - orchestrator
    networks:
      - openclaw
  
  # 消息队列
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data
    networks:
      - openclaw
  
  kafka:
    image: confluentinc/cp-kafka:7.4.0
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    depends_on:
      - zookeeper
    networks:
      - openclaw
  
  zookeeper:
    image: confluentinc/cp-zookeeper:7.4.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
    networks:
      - openclaw
  
  # 数据库
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: openclaw
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - openclaw
  
  # 监控
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    ports:
      - "9090:9090"
    networks:
      - openclaw
  
  grafana:
    image: grafana/grafana:latest
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - prometheus
    networks:
      - openclaw

volumes:
  redis-data:
  postgres-data:
  prometheus-data:
  grafana-data:
  agent-ai-1-cache:
  agent-pm-1-cache:
  agent-arch-1-cache:

networks:
  openclaw:
    driver: bridge

Kubernetes部署

infrastructure/kubernetes/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: openclaw-gateway
  namespace: openclaw
spec:
  replicas: 3
  selector:
    matchLabels:
      app: gateway
  template:
    metadata:
      labels:
        app: gateway
    spec:
      containers:
      - name: gateway
        image: openclaw/gateway:v1.0.0
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ohestrator
  namespace: openclaw
spec:
  replicas: 1
  selector:
    matchLabels:
      app: orchestrator
  template:
    metadata:
      labels:
        app: orchestrator
    spec:
      containers:
      - name: orchestrator
        image: openclaw/orchestrator:v1.0.0
        ports:
        - containerPort: 8888
        env:
        - name: AGENT_REGISTRY_PATH
          value: /etc/openclaw/agent-registry.yaml
        - name: SKILL_REGISTRY_PATH
          value: /etc/openclaw/skill-registry.yaml
        - name: REDIS_URL
          value: redis://redis-service:6379
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: openclaw-secrets
              key: database-url
        volumeMounts:
        - name: agent-registry
          mountPath: /etc/openclaw/agent-registry.yaml
          subPath: agent-registry.yaml
        - name: skill-registry
          mountPath: /etc/openclaw/skill-registry.yaml
          subPath: skill-registry.yaml
        resources:
          requests:
            cpu: 1
            memory: 2Gi
          limits:
            cpu: 2
            memory: 4Gi
   volumes:
      - name: agent-registry
        configMap:
          name: agent-registry
      - name: skill-registry
        configMap:
          name: skill-registry
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: openclaw-agent-ai
  namespace: openclaw
spec:
  serviceName: agent-ai-service
  replicas: 3
  selector:
    matchLabels:
      agent: ai-engineer
  template:
    metadata:
      labels:
        agent: ai-engineer
    spec:
      containers:
      - name: agent-worker
        image: openclaw/agent-worker:v1.0.0
        env:
        - name: AGENT_ID
          value: agent-ai-001
        - name: WORKER_ID
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: ORCHESTRATOR_URL
          value: http://orchestrator-service:8888
        - name: REDIS_URL
          value: redis://redis-service:6379
        resources:
          requests:
            cpu: 2
            memory: 4Gi
            nvidia.com/gpu: 1
          limits:
            cpu: 4
            memory: 8Gi
            nvidia.com/gpu: 1
        volumeMounts:
        - name: cache
          mountPath: /cache
        - name: models
          mountPath: /models
  volumeClaimTemplates:
  - metadata:
      name: cache
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 100Gi
  - metadata:
      name: models
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: nfs
      resources:
        requests:
          storan---
apiVersion: v1
kind: Service
metadata:
  name: gateway-service
  namespace: openclaw
spec:
  type: LoadBalancer
  selector:
    app: gateway
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: orchestrator-service
  namespace: openclaw
spec:
  selector:
    app: orchestrator
  ports:
  - protocol: TCP
    port: 8888
    targetPort: 8888

📊 监控与可观测性

Prometheus配置

monitoring/prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  # Agent指标
  - job_name: 'agents'
    static_configs:
      - targets:
        - 'orchestrator:8888'
        - 'agent-ai-001:9100'
        - 'agent-pm-002:9100'
        - 'agent-arch-003:9100'
    metrics_path: '/metrics'
  
  # 基础设施指标
  - job_name: 'infrastructure'
    static_configs:
      - targets: ['node-exporter:9100']
  
  # Redis指标
  - job_name: 'redis'
    static_configs:
      - targets: ['redis-exporter:9121']

alert:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

rule_files:
  - '/etc/prometheus/rules/*.yml'

关键指标

# 监控指标定义
metrics = {
    "agent_request_latency_seconds": {
        "type": "histogram",
        "help": "Agent请求延迟",
        "labels": ["agent_id", "skill_id"],
        "buckets": [0.1, 0.5, 1, 2, 5, 10]
    },
    "agent_requests_total": {
        "type": "counter",
        "help": "总请求数",
        "labels": ["agent_id", "status"]
    },
    "agent_tokens_consumed": {
        "type": "counter",
        "help": "消耗的Token数",
        "labels": ["agent_id"]
    },
    "skill_execution_duration_seconds": {
        "type": "histogram",
        "help": "Skill执行时间",
        "labels": ["skill_id", "status"]
    },
    "agent_active_sessions": {
        "type": "gauge",
        "help": "当前活跃会话数",
        "labels": ["agent_id"]
    },
    "message_queue_depth": {
        "type": "gauge",
        "help": "消息队列深度",
        "labels": ["queue_name"]
    }
}

🔄 多Agent协作示例

场景:端到端产品交付

workflow: end-to-end-delivery
steps:
  - id: step-1
    name: 需求分析
    agent: agent-pm-002
    inputs:
   edback
      - market_research
    skills:
      - product-design
      - shared/data-analysis
    expected_output: product_spec
    timeout: 1800
  
  - id: step-2
    name: 技术方案
    agent: agent-arch-003
    inputs:
      - product_spec  # 来自step-1
    skills:
      - system-design
      - shared/web-search
    expected_output: tech_design_doc
    timeout: 2400
    depends_on: step-1
  
  - id: step-3
    name: 模型训练与部署
    agent: agent-ai-001
    inputs:
      - tech_design_doc  # 来自step-2
      - training_data
    skills:
      - ai-engineer
      - shared/data-analysis
    expected_output: deployed_model
    timeout: 3600
    depends_on: step-2
  
  - id: step-4
    name: 交叉评审
    agents:
      - agent-pm-002
      - agent-arch-003
      - agent-ai-001
    inputs:
      - deployed_model
      - product_spec
      - tech_design_doc
    skills:
      - shared/communication
    expected_output: review_report
    timeout: 900
    depends_on: step-3

total_timeout: 8700  # 2.4小时

⚙️ 运维操作

快速启动

# 本地开发环境
docker-compose up -d

# Kubernetes环境
kubectl create namespace openclaw
kubectl apply -f infrastructure/kubernetes/

# 检查所有Agent状态
kubectl get pods -n openclaw -l app=agent-worker

# 查看编排器日志
kubectl logs -n openclaw deployment/openclaw-orchestrator -f

添加新Agent

# 1. 创建Agent配置
mkdir -p agents/new-agent
cp agents/template/* agents/new-agent/
# 编辑 IDENTITY.md, SOUL.md, config.yaml

# 2. 更新Agent注册表
# 编辑 orchestrator/agent-registry.yaml

# 3. 绑定Skills
# 编辑 agents/new-agent/config.yaml 的 skills 部分

# 4. 重新部署
docker-compose restart orchestrator
# 或
kubectl rollout restart deployment/openclaw-orchestrator -n openclaw

扩容/缩容

# Kubernetes自动扩缩容
kubectl autoscale statefulset openclaw-agent-ai \
  --min=1 --max=10 --cpu-percent=70 -n openclaw

# 手动扩容到N个副本
kubectl scale statefulset openclaw-agent-ai --replicas=5 -n openclaw

✅ 部署检查清单

  • [ ] 所有Agent配置文件已创建(IDENTITY.md、SOUL.md、config.yaml)
  • [ ] 所有Skill已注册到skill-registry.yaml
  • [ ] 路由规则已配置到routing-rules.yaml
  • [ ] 网关已配置(Nginx/Envoy)
  • [ ] 消息队列(Redis/Kafka)已部署
  • [ ] 数据库已迁移
  • [ ] 监控告警已配置
  • [ ] 日志聚合已启用
  • [ ] 本地docker-compose测试通过
  • [ ] Kubernetes集群已准备就绪
  • [ ] 多Agent协作流程已测试
  • [ ] 负载均衡已验证
  • [ ] 故障转移已验证
  • [ ] 文档已更新

📚 关键文件速查

评论 (0)

发表评论