From d1a746a8cb6da9227b8d2e78c654ac1b1f0e4739 Mon Sep 17 00:00:00 2001 From: Zer4tul Date: Tue, 12 May 2026 14:57:05 +0800 Subject: [PATCH] docs: add agent API reference, onboarding guide, and universal skill - docs/agent-api-reference.md (473 lines): complete HTTP API reference for all 12 endpoints - docs/agent-onboarding-guide.md (272 lines): ssh_cli and http_pull workflows, Forgejo integration - skill/SKILL.md (281 lines): universal agent skill, platform-agnostic, curl-based examples All content in English. No code changes. --- docs/agent-api-reference.md | 473 ++++++++++++++++++ docs/agent-onboarding-guide.md | 272 ++++++++++ .../agent-onboarding-docs/.openspec.yaml | 2 + .../changes/agent-onboarding-docs/design.md | 65 +++ .../changes/agent-onboarding-docs/proposal.md | 37 ++ .../specs/agent-api-reference/spec.md | 43 ++ .../specs/agent-skill/spec.md | 41 ++ .../changes/agent-onboarding-docs/tasks.md | 36 ++ skill/SKILL.md | 281 +++++++++++ 9 files changed, 1250 insertions(+) create mode 100644 docs/agent-api-reference.md create mode 100644 docs/agent-onboarding-guide.md create mode 100644 openspec/changes/agent-onboarding-docs/.openspec.yaml create mode 100644 openspec/changes/agent-onboarding-docs/design.md create mode 100644 openspec/changes/agent-onboarding-docs/proposal.md create mode 100644 openspec/changes/agent-onboarding-docs/specs/agent-api-reference/spec.md create mode 100644 openspec/changes/agent-onboarding-docs/specs/agent-skill/spec.md create mode 100644 openspec/changes/agent-onboarding-docs/tasks.md create mode 100644 skill/SKILL.md diff --git a/docs/agent-api-reference.md b/docs/agent-api-reference.md new file mode 100644 index 0000000..3c07b8c --- /dev/null +++ b/docs/agent-api-reference.md @@ -0,0 +1,473 @@ +# Agent Fleet — HTTP API Reference + +Base URL: `http://:9090` +Content-Type: `application/json` for all request/response bodies unless noted. +All timestamps are ISO 8601 (RFC 3339). + +--- + +## Authentication + +### http_pull Bearer Token + +Endpoints that are specific to `http_pull` agents require a Bearer token in the `Authorization` header. The token is configured in `config.toml` as `orchestrator.http_pull_token`. If no token is configured in the config, authentication is skipped (open mode). + +``` +Authorization: Bearer +``` + +Affected endpoints: `POST /api/v1/tasks/dequeue`, `POST /api/v1/tasks/{task_id}/status`. + +### Webhook HMAC-SHA256 + +The `POST /api/v1/webhooks/forgejo` endpoint requires an `X-Hub-Signature-256` (or `X-Gitea-Signature` / `X-Forgejo-Signature`) header containing `sha256=` of the request body using the configured `webhook_secret`. + +``` +X-Hub-Signature-256: sha256=abcdef... +``` + +--- + +## Error Responses + +All errors return JSON: + +```json +{ "error": "" } +``` + +| Status | Meaning | Trigger | +|--------|---------|---------| +| 400 | Bad Request | Invalid state transition, wrong execution_mode, malformed input | +| 401 | Unauthorized | Missing or invalid Bearer token for http_pull endpoints | +| 404 | Not Found | Task or agent does not exist | +| 500 | Internal Server Error | Database failure, lock poisoning, unexpected errors | + +--- + +## Endpoints + +### Health Check + +``` +GET /healthz +``` + +**Response:** `200 OK` — body: `ok` + +```bash +curl http://localhost:9090/healthz +``` + +--- + +### Register Agent + +``` +POST /api/v1/agents/register +``` + +Register a new agent or update an existing one (upsert by `agent_id`). + +**Request:** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| agent_id | string | yes | Unique identifier | +| agent_type | string | yes | `openclaw`, `claude-code`, `codex-cli`, `hermes`, `acp`, `shell`, or custom | +| hostname | string | yes | Machine hostname | +| capabilities | string[] | yes | e.g. `["code:rust", "review"]` | +| max_concurrency | u32 | yes | Max parallel tasks | +| metadata | object | no | Arbitrary key-value pairs | + +**Response:** `200 OK` + +```json +{ + "agent_id": "worker-01", + "registry_token": "registry_a1b2c3d4..." +} +``` + +```bash +curl -X POST http://localhost:9090/api/v1/agents/register \ + -H 'Content-Type: application/json' \ + -d '{ + "agent_id": "worker-01", + "agent_type": "codex-cli", + "hostname": "host-worker-01", + "capabilities": ["code:rust"], + "max_concurrency": 2 + }' +``` + +--- + +### Heartbeat + +``` +POST /api/v1/agents/heartbeat +``` + +**Request:** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| agent_id | string | yes | Agent to update | + +**Response:** `200 OK` + +```json +{ + "agent_id": "worker-01", + "status": "online", + "last_heartbeat_at": "2025-01-15T10:30:00Z" +} +``` + +**Errors:** `404` if agent not found. + +```bash +curl -X POST http://localhost:9090/api/v1/agents/heartbeat \ + -H 'Content-Type: application/json' \ + -d '{"agent_id": "worker-01"}' +``` + +--- + +### Deregister Agent + +``` +POST /api/v1/agents/deregister +``` + +Sets agent offline and requeues all its active tasks back to `created`. + +**Request:** + +| Field | Type | Required | +|-------|------|----------| +| agent_id | string | yes | + +**Response:** `200 OK` + +```json +{ + "agent_id": "worker-01", + "status": "offline", + "requeued_tasks": 3 +} +``` + +```bash +curl -X POST http://localhost:9090/api/v1/agents/deregister \ + -H 'Content-Type: application/json' \ + -d '{"agent_id": "worker-01"}' +``` + +--- + +### List Agents + +``` +GET /api/v1/agents +``` + +**Query Parameters:** + +| Param | Type | Description | +|-------|------|-------------| +| capability | string | Filter by capability (e.g. `code:rust`) | +| status | string | Filter: `online`, `offline`, `draining` | + +**Response:** `200 OK` — JSON array of [Agent](#agent-object) objects. + +```bash +curl 'http://localhost:9090/api/v1/agents?status=online' +``` + +--- + +### List Tasks + +``` +GET /api/v1/tasks +``` + +**Query Parameters:** + +| Param | Type | Description | +|-------|------|-------------| +| status | string | Filter by status (e.g. `created`, `running`, `failed`) | +| agent_id | string | Filter by assigned agent | + +**Response:** `200 OK` — JSON array of [Task](#task-object) objects. Ordered by `created_at` descending. + +```bash +curl 'http://localhost:9090/api/v1/tasks?status=running' +``` + +--- + +### Get Task + +``` +GET /api/v1/tasks/{task_id} +``` + +**Response:** `200 OK` — single [Task](#task-object) object. + +**Errors:** `404` if task not found. + +```bash +curl http://localhost:9090/api/v1/tasks/org%2Frepo%2342 +``` + +--- + +### Dequeue Task (http_pull only) + +``` +POST /api/v1/tasks/dequeue +``` + +Requires Bearer token if `http_pull_token` is configured. Only returns tasks with `execution_mode = http_pull`. + +**Request:** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| agent_id | string | yes | Agent claiming the task | +| capabilities | string[] | no | Capabilities to match against task labels | + +**Response:** `200 OK` with [Task](#task-object) object, or `204 No Content` if no matching task. + +**Errors:** `401` if token required and missing/invalid. + +```bash +curl -X POST http://localhost:9090/api/v1/tasks/dequeue \ + -H 'Content-Type: application/json' \ + -H 'Authorization: Bearer my-token' \ + -d '{"agent_id": "worker-03", "capabilities": ["code:rust"]}' +``` + +--- + +### Update Task Status (http_pull only) + +``` +POST /api/v1/tasks/{task_id}/status +``` + +Requires Bearer token. Only works for tasks with `execution_mode = http_pull`. + +**Request:** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| status | string | yes | Target status: `running`, `review_pending`, etc. | + +**Response:** `200 OK` — updated [Task](#task-object). + +**Errors:** `400` if task is not `http_pull` mode or transition is invalid. `404` if task not found. + +```bash +curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/status \ + -H 'Content-Type: application/json' \ + -H 'Authorization: Bearer my-token' \ + -d '{"status": "running"}' +``` + +--- + +### Complete Task + +``` +POST /api/v1/tasks/{task_id}/complete +``` + +Works for both `ssh_cli` and `http_pull` tasks. Submit a receipt to mark the task done. + +**Request:** A [Receipt](#receipt-object) object. + +**Response:** `200 OK` + +```json +{ + "task_id": "org/repo#42", + "status": "completed" +} +``` + +**Errors:** `404` if task not found. `400` if task is not in a completable state. + +```bash +curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/complete \ + -H 'Content-Type: application/json' \ + -d '{ + "task_id": "org/repo#42", + "agent_id": "worker-01", + "status": "completed", + "duration_seconds": 120, + "summary": "Implemented feature X", + "artifacts": [ + {"artifact_type": "pr", "url": "https://git.example/org/repo/pulls/7"} + ], + "error": null + }' +``` + +--- + +### Retry Task + +``` +POST /api/v1/tasks/{task_id}/retry +``` + +Retry a `failed` or `agent_lost` task. Transitions it back to `assigned`. + +**Response:** `200 OK` — updated [Task](#task-object). + +**Errors:** `400` if task status is not `failed` or `agent_lost`. `404` if task not found. + +```bash +curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/retry +``` + +--- + +### Submit Receipt + +``` +POST /api/v1/receipts +``` + +Submit a receipt for a task. Validates artifacts (e.g. checks PR exists via Forgejo API). + +**Request:** A [Receipt](#receipt-object) object. + +**Response:** `200 OK` + +**Errors:** `404` if task not found. `400` if validation fails. + +```bash +curl -X POST http://localhost:9090/api/v1/receipts \ + -H 'Content-Type: application/json' \ + -d '{ + "task_id": "org/repo#42", + "agent_id": "worker-01", + "status": "completed", + "duration_seconds": 95, + "summary": "Fixed the bug", + "artifacts": [], + "error": null + }' +``` + +--- + +### Forgejo Webhook + +``` +POST /api/v1/webhooks/forgejo +``` + +Receives Forgejo webhook events. Requires HMAC-SHA256 signature header. + +**Headers:** `X-Forgejo-Event` or `X-Gitea-Event` determines the event type. + +**Supported events:** + +| Event | Action | +|-------|--------| +| `issues` (opened) | Creates a task from the Issue (requires `agent:*` label) | +| `pull_request` (opened) | Sets task to `review_pending` (branch name → task_id) | +| `pull_request` (merged/closed with `merged: true`) | Sets task to `completed`, auto-generates receipt | +| `push` (to `task/*` branch) | Updates `last_activity_at` on the task | + +**Response:** `200 OK` + +```json +{ + "accepted": true, + "task_id": "org/repo#42" +} +``` + +**Errors:** `401` if signature invalid. `400` if payload unparseable. + +--- + +## Object Schemas + +### Agent Object + +```json +{ + "agent_id": "worker-01", + "agent_type": "codex-cli", + "hostname": "host-worker-01", + "capabilities": ["code:rust"], + "max_concurrency": 2, + "current_tasks": 1, + "status": "online", + "last_heartbeat_at": "2025-01-15T10:30:00Z", + "registered_at": "2025-01-15T09:00:00Z", + "metadata": {} +} +``` + +### Task Object + +```json +{ + "task_id": "org/repo#42", + "source": "forgejo:org/repo#42", + "task_type": "code", + "priority": "normal", + "status": "created", + "execution_mode": "ssh_cli", + "assigned_agent_id": null, + "assigned_host": null, + "requirements": "Implement the feature described in the issue body", + "labels": ["agent:code", "code:rust"], + "branch_name": "task/org%2Frepo%2342", + "pr_title": "feat: Implement feature (#42)", + "created_at": "2025-01-15T10:00:00Z", + "assigned_at": null, + "started_at": null, + "completed_at": null, + "last_activity_at": null, + "retry_count": 0, + "max_retries": 2, + "review_count": 0, + "timeout_seconds": 1800 +} +``` + +**Status values:** `created`, `assigned`, `running`, `review_pending`, `completed`, `failed`, `agent_lost`, `cancelled` + +**Priority values:** `low`, `normal`, `high`, `urgent` + +**Execution mode values:** `ssh_cli`, `http_pull` + +### Receipt Object + +```json +{ + "task_id": "org/repo#42", + "agent_id": "worker-01", + "status": "completed", + "duration_seconds": 120, + "summary": "Implemented the feature", + "artifacts": [ + {"artifact_type": "pr", "url": "https://git.example/org/repo/pulls/7", "path": null, "description": null} + ], + "error": null +} +``` + +**Receipt status values:** `completed`, `failed`, `partial` + +**Artifact type values:** `pr`, `commit`, `file`, `comment`, `url` diff --git a/docs/agent-onboarding-guide.md b/docs/agent-onboarding-guide.md new file mode 100644 index 0000000..e9e99ec --- /dev/null +++ b/docs/agent-onboarding-guide.md @@ -0,0 +1,272 @@ +# Agent Fleet — Agent Onboarding Guide + +This guide explains how to integrate an agent with the Agent Fleet Orchestrator. + +--- + +## Execution Modes + +Agent Fleet supports two execution modes. The mode is set per-task at creation time (defaults to `ssh_cli`). + +| Aspect | `ssh_cli` | `http_pull` | +|--------|-----------|-------------| +| Who initiates? | Orchestrator (via SSH or local subprocess) | Agent (via HTTP API) | +| Control flow | Orchestrator builds prompt, runs CLI, collects output | Agent decides when to dequeue and execute | +| Agent requirements | CLI binary on a configured host | HTTP client, can call REST API | +| Auth needed? | No (Orchestrator manages) | Yes (Bearer token) | +| Best for | Codex CLI, Claude Code, OpenCode — agents with CLIs | OpenClaw/Jeeves, Hermes — agents with their own schedulers | +| Task creation trigger | Forgejo Issue webhook (default) | Same, or API call | + +--- + +## ssh_cli Workflow + +### 1. Configure a Host + +Add a `[[hosts]]` section to `config.toml` on the Orchestrator: + +```toml +[[hosts]] +host_id = "host-worker-01" +hostname = "192.168.1.100" +ssh_user = "deploy" +ssh_port = 22 +ssh_key_path = "/home/deploy/.ssh/id_ed25519" +work_dir = "/opt/agent-workspace" +agents = [ + { agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust", "code:python"] }, +] +``` + +For local execution (same machine as Orchestrator), use `hostname = "localhost"` — the Orchestrator uses a local subprocess instead of SSH. + +### 2. Install the Agent CLI + +The CLI binary must be available on the target host in `$PATH`. The Orchestrator checks availability with `which `. + +Built-in CLI templates: + +| Agent Type | CLI Command | +|------------|-------------| +| `codex-cli` | `codex exec --json '{prompt}'` | +| `claude-code` | `claude -p '{prompt}' --output-format json --dangerously-skip-permissions` | + +Custom templates can be defined in `config.toml` under `[adapters]`. + +### 3. Orchestrator Handles Everything + +When a Forgejo Issue with an `agent:*` label arrives: + +1. Orchestrator creates a task (`execution_mode = ssh_cli`) +2. Dispatch loop picks the task, selects a host by capability + load +3. SSH (or local subprocess) executes the CLI with a structured prompt +4. Output is parsed (Codex JSON or Claude JSON format) +5. Task status updates: `created` → `assigned` → `running` → `completed` (or `failed`) + +### 4. What the Agent Receives (Structured Prompt) + +The Orchestrator constructs this prompt and passes it as the `{prompt}` variable: + +``` +Task ID: org/repo#42 +Type: code +Goal: +Implement the feature described in the issue body + +Constraints: +- Execution mode: ssh_cli +- Labels: code:rust +- Branch: task/org%2Frepo%2342 +- Expected output: JSON receipt + +Validation: +- Run relevant tests if code changed +- Summarize changes and artifacts +``` + +### 5. Expected CLI Output + +The CLI must output JSON to stdout. The format depends on the parser: + +**Codex JSON:** +```json +{"status": "completed", "summary": "done", "duration_seconds": 120, "artifacts": [{"artifact_type": "pr", "url": "https://..."}]} +``` + +**Claude JSON:** +```json +{"status": "completed", "summary": "done", "duration_seconds": 95, "error": null} +``` + +If output is not valid JSON, the task is marked `failed`. + +--- + +## http_pull Workflow + +### 1. Register + +```bash +curl -X POST http://localhost:9090/api/v1/agents/register \ + -H 'Content-Type: application/json' \ + -d '{"agent_id": "worker-03", "agent_type": "openclaw", "hostname": "arm0", "capabilities": ["code:rust"], "max_concurrency": 2}' +``` + +Response contains a `registry_token`. Keep it for subsequent API calls (if `http_pull_token` is configured, use that shared token instead). + +### 2. Heartbeat (periodic) + +Send a heartbeat every N seconds (default interval: 60s). If the Orchestrator doesn't receive one within `heartbeat_interval_secs × heartbeat_timeout_threshold`, the agent is marked offline and its tasks are requeued. + +```bash +curl -X POST http://localhost:9090/api/v1/agents/heartbeat \ + -H 'Content-Type: application/json' \ + -d '{"agent_id": "worker-03"}' +``` + +### 3. Dequeue a Task + +```bash +curl -X POST http://localhost:9090/api/v1/tasks/dequeue \ + -H 'Content-Type: application/json' \ + -H 'Authorization: Bearer ' \ + -d '{"agent_id": "worker-03", "capabilities": ["code:rust"]}' +``` + +Returns `200 OK` with a Task object, or `204 No Content` if nothing available. + +Only tasks with `execution_mode = http_pull` are returned. + +### 4. Update Status While Working + +```bash +curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/status \ + -H 'Content-Type: application/json' \ + -H 'Authorization: Bearer ' \ + -d '{"status": "running"}' +``` + +### 5. Complete the Task + +```bash +curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/complete \ + -H 'Content-Type: application/json' \ + -d '{ + "task_id": "org/repo#42", + "agent_id": "worker-03", + "status": "completed", + "duration_seconds": 180, + "summary": "Fixed the issue", + "artifacts": [{"artifact_type": "pr", "url": "https://git.example/org/repo/pulls/15"}], + "error": null + }' +``` + +Or use the receipts endpoint: + +```bash +curl -X POST http://localhost:9090/api/v1/receipts \ + -H 'Content-Type: application/json' \ + -d '' +``` + +### 6. Deregister When Done + +```bash +curl -X POST http://localhost:9090/api/v1/agents/deregister \ + -H 'Content-Type: application/json' \ + -d '{"agent_id": "worker-03"}' +``` + +--- + +## Forgejo Integration + +### How Issues Become Tasks + +1. A Forgejo Issue is opened with a label matching `agent:*` (e.g. `agent:code`) +2. Forgejo sends an `issues` webhook to `POST /api/v1/webhooks/forgejo` +3. The `agent:*` label value becomes `task_type` (e.g. `code`) +4. Priority is inferred from labels: `priority:urgent`, `priority:high`, `priority:low` (default: `normal`) +5. A task is created with: + - `task_id` = `{repo_full_name}#{issue_number}` (e.g. `org/repo#42`) + - `execution_mode` = `ssh_cli` (default for Forgejo-originated tasks) + - `branch_name` = `task/{url_encoded_task_id}` (e.g. `task/org%2Frepo%2342`) + - `pr_title` = `feat: {issue_title} (#{issue_number})` + +### Branch Naming Convention + +- Branch: `task/{url_encoded_task_id}` +- Example: task `org/repo#42` → branch `task/org%2Frepo%2342` + +### PR Lifecycle + +| Event | Effect | +|-------|--------| +| PR opened (branch = `task/*`) | Task → `review_pending` | +| PR merged | Task → `completed`, auto receipt generated | +| Push to `task/*` branch | Task `last_activity_at` updated | + +### Task Status Flow + +``` +created → assigned → running → review_pending → completed + ↘ failed + ↘ agent_lost + ↘ cancelled +``` + +Any `failed` or `agent_lost` task can be retried via `POST /api/v1/tasks/{task_id}/retry` (transitions to `assigned`). Retry is limited by `max_retries` (default: 2). + +--- + +## Structured Prompt Format (ssh_cli) + +When the Orchestrator executes an agent via SSH, it constructs a structured prompt: + +``` +Task ID: +Type: +Goal: + + +Constraints: +- Execution mode: ssh_cli +- Labels: > +- Branch: +- Expected output: JSON receipt + +Validation: +- Run relevant tests if code changed +- Summarize changes and artifacts +``` + +The prompt is injected into the CLI template as the `{prompt}` variable. Other available variables: `{work_dir}`, `{task_id}`, `{branch}`. + +--- + +## FAQ + +**Q: How do I know which execution mode to use?** +A: If you have a CLI binary and run on a configured host → `ssh_cli`. If you have your own scheduler or run outside configured hosts → `http_pull`. + +**Q: Do I need to register for ssh_cli mode?** +A: No. The Orchestrator manages ssh_cli tasks entirely. Registration is only for `http_pull` agents. + +**Q: What happens if my agent crashes during ssh_cli execution?** +A: The task is marked `failed`. If `retry_count < max_retries`, the dispatch loop will retry automatically. + +**Q: What happens if my http_pull agent stops sending heartbeats?** +A: After `heartbeat_interval_secs × heartbeat_timeout_threshold` seconds, the agent is marked offline and all its tasks are requeued with status `created`. + +**Q: Can a task switch between execution modes?** +A: No. The `execution_mode` is set at creation time and cannot be changed. + +**Q: How do I create a task manually?** +A: Use the Forgejo webhook flow (open an Issue with `agent:*` label), or directly insert into the database. There is no public "create task" API endpoint. + +**Q: What label format triggers task creation?** +A: Issues must have a label starting with `agent:` (e.g. `agent:code`, `agent:review`). The value after `agent:` becomes the task type. Issues without such a label are ignored. + +**Q: How does the review loop work?** +A: When a PR is opened (not merged), the task goes to `review_pending`. If the PR is not merged and the review cycle count exceeds `max_retries`, the task is marked `failed`. For `ssh_cli`, the Orchestrator re-dispatches automatically. diff --git a/openspec/changes/agent-onboarding-docs/.openspec.yaml b/openspec/changes/agent-onboarding-docs/.openspec.yaml new file mode 100644 index 0000000..40cc12f --- /dev/null +++ b/openspec/changes/agent-onboarding-docs/.openspec.yaml @@ -0,0 +1,2 @@ +schema: spec-driven +created: 2026-05-12 diff --git a/openspec/changes/agent-onboarding-docs/design.md b/openspec/changes/agent-onboarding-docs/design.md new file mode 100644 index 0000000..d213442 --- /dev/null +++ b/openspec/changes/agent-onboarding-docs/design.md @@ -0,0 +1,65 @@ +## Context + +agent-fleet 核心功能已经实现并部署到 arm0 上运行。但没有任何 Agent 知道怎么用它。项目的可用性完全取决于 Agent 能否正确接入。 + +需要两个交付物: +1. **API 参考文档**:给 Agent 看的 HTTP API 手册 +2. **通用 Skill**:遵循标准 skill 规范的能力描述,不绑定特定平台 + +关键约束:Skill 必须是平台无关的。承担 Team Leader 角色的不一定是 OpenClaw,Codex、Claude Code、OpenCode、Hermes Agent 都可能是调度者。 + +## Goals / Non-Goals + +**Goals:** +- 提供完整、准确、可直接使用的 API 参考文档 +- 提供通用 Skill,任何 Agent 加载后就知道如何与 agent-fleet 交互 +- 覆盖两种执行模式(ssh_cli + http_pull)的完整工作流 +- 覆盖 Forgejo 集成的 Git 工作流 + +**Non-Goals:** +- 不写人类运维文档(部署、配置、排障)→ 这是另一个 change +- 不写特定平台的集成脚本(如 OpenClaw skill 的安装脚本) +- 不实现 SDK 或客户端库 + +## Decisions + +### Decision 1: 通用 Skill 规范,不绑定平台 + +**选择**: Skill 使用标准 YAML frontmatter + Markdown body 格式 + +**理由**: +- 所有主流 Agent 平台都支持这种格式(OpenClaw、Claude Code、Codex CLI、OpenCode) +- 不包含任何平台特定语法,Agent 自行转换 +- curl 格式是通用语言,所有 Agent 都能理解 + +**替代方案**: +- OpenClaw 专用 skill:限制了使用范围 +- 多平台各自写:重复劳动,容易不一致 + +### Decision 2: 文档放在 repo 内 + +**选择**: `docs/` 目录放 API 参考和接入指南,`skill/` 目录放 SKILL.md + +**理由**: +- 与代码同仓库,版本一致 +- Agent 可以通过 Forgejo 直接读取文档 +- Skill 可以被各平台 fork 或 symlink + +### Decision 3: 文档从代码自动生成 + 手动补充 + +**选择**: API 端点列表手动维护(Phase 1),后续考虑从代码注释自动生成 + +**理由**: +- Phase 1 端点数量有限(~12 个),手动维护成本低 +- 自动生成需要额外工具链(如 `utoipa`),Phase 1 不值得投入 + +## Risks / Trade-offs + +- **[文档过时] 代码变更后文档可能不一致** → 文档与代码同仓库,PR review 时检查 +- **[Skill 通用性限制] 通用意味着不能利用平台特性** → 通用是正确选择,平台特定优化由各 Agent 自行处理 + +## Open Questions + +_(resolved)_ + +- ~~Skill 是否需要包含多语言版本(中/英)?~~ → 全部使用英文。原因:LLM 训练语料以英文为主,英文更 token-efficient、语义歧义更小。Skill 的受众是 Agent 不是人类。 diff --git a/openspec/changes/agent-onboarding-docs/proposal.md b/openspec/changes/agent-onboarding-docs/proposal.md new file mode 100644 index 0000000..e77e4f5 --- /dev/null +++ b/openspec/changes/agent-onboarding-docs/proposal.md @@ -0,0 +1,37 @@ +## Why + +agent-fleet 的所有核心功能(双执行模型、Forgejo 集成、Receipt 验证)已经实现并在 arm0 上跑通。但没有任何 Agent 知道如何使用它。 + +当前状态: +- API 端点已经实现(注册、心跳、dequeue、status、receipt、webhook 等) +- 双执行模式(ssh_cli + http_pull)已经实现 +- 但没有任何文档告诉 Agent "怎么接入、怎么调 API、怎么配合工作流" + +项目的可用性完全取决于 Agent 能否正确接入。没有文档和 skill,agent-fleet 就是一个没人会用的 API。 + +同时,需要的是一个**通用 skill**(不绑定 OpenClaw),因为: +- 承担 Team Leader 角色的不一定是 OpenClaw +- Codex、Claude Code、OpenCode、Hermes Agent 等都需要能理解和使用 agent-fleet +- Skill 是通用的 Agent 能力描述,遵循通用规范 + +## What Changes + +- 新增 `docs/agent-api-reference.md`:完整的 HTTP API 参考文档,供任何 Agent 阅读 +- 新增 `docs/agent-onboarding-guide.md`:Agent 接入指南,包含两种模式的完整工作流程 +- 新增 `skill/` 目录:通用 Agent Skill 定义(SKILL.md),遵循通用 skill 规范 +- Skill 内容:API 调用方式、认证、任务生命周期、Forgejo 工作流、错误处理 + +## Capabilities + +### New Capabilities +- `agent-api-reference`: HTTP API 完整参考文档(端点、请求/响应格式、错误码、示例) +- `agent-skill`: 通用 Agent Skill 定义,描述 Agent 如何与 agent-fleet 交互 + +### Modified Capabilities +_(无)_ + +## Impact + +- **文档**:新增 2 个 Markdown 文档 + 1 个 Skill 定义 +- **代码**:无代码变更 +- **项目**:Skill 目录是新增结构,可能需要考虑放在 repo 的哪个位置 diff --git a/openspec/changes/agent-onboarding-docs/specs/agent-api-reference/spec.md b/openspec/changes/agent-onboarding-docs/specs/agent-api-reference/spec.md new file mode 100644 index 0000000..547e446 --- /dev/null +++ b/openspec/changes/agent-onboarding-docs/specs/agent-api-reference/spec.md @@ -0,0 +1,43 @@ +## ADDED Requirements + +### Requirement: Complete HTTP API reference documentation +项目 SHALL 提供完整的 HTTP API 参考文档(`docs/agent-api-reference.md`),供任何 Agent 阅读。文档 SHALL 覆盖所有公开端点,包含请求/响应格式、错误码、示例。 + +#### Scenario: Agent reads API reference to understand available endpoints +- **WHEN** Agent 阅读 `docs/agent-api-reference.md` +- **THEN** 文档 SHALL 列出所有端点:healthz、agents/register、agents/heartbeat、agents/deregister、agents (GET)、tasks (GET)、tasks/{id} (GET)、tasks/dequeue、tasks/{id}/status、tasks/{id}/retry、tasks/{id}/complete、receipts、webhooks/forgejo +- **AND** 每个端点 SHALL 包含:HTTP 方法、URL、请求体格式、响应格式、错误码、curl 示例 + +#### Scenario: Agent checks authentication requirements +- **WHEN** Agent 查看 API 参考的认证部分 +- **THEN** 文档 SHALL 说明:http_pull 模式需要 Bearer token(注册时获取),ssh_cli 模式不需要 Agent 认证,webhook 端点需要 HMAC-SHA256 签名 + +#### Scenario: Agent understands error responses +- **WHEN** Agent 收到错误响应 +- **THEN** 文档 SHALL 列出所有错误码:401 Unauthorized、403 Forbidden、404 Not Found、400 Bad Request、500 Internal Server Error +- **AND** 每个错误码 SHALL 包含触发场景描述 + +### Requirement: Agent onboarding guide +项目 SHALL 提供 Agent 接入指南(`docs/agent-onboarding-guide.md`),描述两种执行模式的完整工作流程。 + +#### Scenario: New agent team leader reads onboarding guide +- **WHEN** 新的 Team Leader Agent(如 Jeeves)阅读 onboarding guide +- **THEN** 文档 SHALL 描述两种执行模式的区别和使用场景: + - ssh_cli:Orchestrator 主动调度,适用于 Codex、Claude Code、OpenCode 等有 CLI 的 Agent + - http_pull:Agent 自主拉取,适用于 OpenClaw/Jeeves、Hermes 等有自己的调度器的 Agent + +#### Scenario: Agent follows ssh_cli workflow +- **WHEN** Agent 按 ssh_cli 模式接入 +- **THEN** 文档 SHALL 描述完整流程:配置 host → Agent 安装 CLI → Orchestrator 自动发现 → 任务自动分配和执行 → PR 创建 → webhook 回调 + +#### Scenario: Agent follows http_pull workflow +- **WHEN** Agent 按 http_pull 模式接入 +- **THEN** 文档 SHALL 描述完整流程:调用 register API → 获取 token → 定期 heartbeat → 调用 dequeue 拉任务 → 执行 → 调用 complete/receipt API + +#### Scenario: Agent understands Forgejo integration +- **WHEN** Agent 阅读 Forgejo 集成部分 +- **THEN** 文档 SHALL 描述:Issue 如何变成任务(webhook → label 解析)、任务如何关联 Git 分支(`task/{task_id}`)、PR 生命周期如何驱动状态更新(opened → review_pending、merged → completed) + +#### Scenario: Agent understands structured prompt format +- **WHEN** ssh_cli 模式的 Agent 需要理解传入的 prompt +- **THEN** 文档 SHALL 描述结构化 prompt 的格式:Task ID、Type、Goal、Constraints、Branch、Expected output、Validation diff --git a/openspec/changes/agent-onboarding-docs/specs/agent-skill/spec.md b/openspec/changes/agent-onboarding-docs/specs/agent-skill/spec.md new file mode 100644 index 0000000..5503348 --- /dev/null +++ b/openspec/changes/agent-onboarding-docs/specs/agent-skill/spec.md @@ -0,0 +1,41 @@ +## ADDED Requirements + +### Requirement: Universal Agent Skill definition +项目 SHALL 提供一个通用 Agent Skill(`skill/SKILL.md`),遵循标准 skill 规范(YAML frontmatter + Markdown body)。Skill SHALL 不绑定任何特定 Agent 平台(OpenClaw、Claude Code、Codex、OpenCode、Hermes 等均可使用)。 + +#### Scenario: Any agent discovers and loads the skill +- **WHEN** 任意 Agent(Codex、Claude Code、OpenCode、Hermes 等)加载 skill/SKILL.md +- **THEN** Skill SHALL 包含 YAML frontmatter:`name: agent-fleet-integration`,`description` 描述用途和触发条件 +- **AND** Skill body SHALL 使用标准 Markdown 格式(标题、代码块、示例) + +#### Scenario: Skill teaches agent how to interact with agent-fleet +- **WHEN** Agent 阅读 Skill 内容 +- **THEN** Skill SHALL 包含 Quick Start 部分(最简单的接入示例,3 步以内) +- **AND** 包含 Instructions 部分(详细的 API 调用流程) +- **AND** 包含 Examples 部分(每种操作的 curl 示例) +- **AND** 包含 Guidelines 部分(错误处理、重试策略、认证规则) + +#### Scenario: Skill covers both execution modes +- **WHEN** Agent 需要选择执行模式 +- **THEN** Skill SHALL 清晰说明 ssh_cli 和 http_pull 的区别 +- **AND** 指导 Agent 如何判断自己应该使用哪种模式: + - 如果有 CLI 且在配置的主机上 → ssh_cli(由 Orchestrator 调度) + - 如果有自己的调度器或不在配置的主机上 → http_pull(自主拉取) + +#### Scenario: Skill includes Forgejo workflow +- **WHEN** Agent 需要理解 Git 工作流 +- **THEN** Skill SHALL 描述分支命名约定(`task/{task_id}`)、PR 创建流程、webhook 触发机制 + +#### Scenario: Skill includes error recovery guidance +- **WHEN** Agent 遇到 API 错误 +- **THEN** Skill SHALL 提供常见错误的处理方式: + - 401 → 检查 token,必要时重新注册 + - 404 → 任务可能已完成或不存在 + - 409/400 → 检查任务状态是否允许该操作 + - 网络错误 → 重试(指数退避) + +#### Scenario: Skill is portable across agent platforms +- **WHEN** Skill 被不同平台的 Agent 使用 +- **THEN** Skill SHALL 不包含任何平台特定的语法或指令(如 OpenClaw 的 `sessions_send`、Claude Code 的 `hooks` 等) +- **AND** 所有交互通过标准 HTTP 请求描述(curl 格式) +- **AND** Agent 可根据自身能力将 curl 转换为对应的 HTTP 调用方式 diff --git a/openspec/changes/agent-onboarding-docs/tasks.md b/openspec/changes/agent-onboarding-docs/tasks.md new file mode 100644 index 0000000..4c7eeee --- /dev/null +++ b/openspec/changes/agent-onboarding-docs/tasks.md @@ -0,0 +1,36 @@ +## 1. API 参考文档 + +- [ ] 1.1 创建 `docs/agent-api-reference.md` +- [ ] 1.2 列出所有公开端点(~12 个),每个包含:HTTP 方法、URL、请求体、响应体、错误码、curl 示例 +- [ ] 1.3 认证部分:http_pull token、webhook HMAC-SHA256 签名 +- [ ] 1.4 错误码汇总:401/403/404/400/500,每个附触发场景 +- [ ] 1.5 通用说明:base_url、Content-Type、字符编码、分页(如有) + +## 2. Agent 接入指南 + +- [ ] 2.1 创建 `docs/agent-onboarding-guide.md` +- [ ] 2.2 两种执行模式对比表(ssh_cli vs http_pull) +- [ ] 2.3 ssh_cli 模式完整工作流:配置 host → CLI 安装 → 自动调度 → PR 工作流 +- [ ] 2.4 http_pull 模式完整工作流:register → heartbeat → dequeue → execute → complete/receipt +- [ ] 2.5 Forgejo 集成说明:Issue → Task、分支命名、PR 生命周期 +- [ ] 2.6 结构化 prompt 格式说明(ssh_cli 模式下 Agent 收到的 prompt 结构) +- [ ] 2.7 常见问题 FAQ + +## 3. 通用 Agent Skill + +- [ ] 3.1 创建 `skill/SKILL.md`(YAML frontmatter + Markdown body) +- [ ] 3.2 Quick Start:最简接入示例(3 步以内) +- [ ] 3.3 Instructions:详细 API 调用流程(register → heartbeat → dequeue → execute → complete) +- [ ] 3.4 Examples:每种操作的 curl 示例 +- [ ] 3.5 Guidelines:错误处理、重试策略、认证规则 +- [ ] 3.6 执行模式选择指南:Agent 如何判断自己用 ssh_cli 还是 http_pull +- [ ] 3.7 Forgejo 工作流说明(分支命名、PR 创建、webhook 触发) +- [ ] 3.8 验证:Skill 内容与 API 参考文档一致、curl 示例可执行 + +## 4. 验证 + +- [ ] 4.1 API 参考文档覆盖所有已实现端点 +- [ ] 4.2 curl 示例基于 arm0 实例可执行 +- [ ] 4.3 Skill 格式符合标准规范(YAML frontmatter + Markdown body) +- [ ] 4.4 Skill 不包含任何平台特定语法 +- [ ] 4.5 接入指南与当前代码实现一致 diff --git a/skill/SKILL.md b/skill/SKILL.md new file mode 100644 index 0000000..fe57564 --- /dev/null +++ b/skill/SKILL.md @@ -0,0 +1,281 @@ +--- +name: agent-fleet-integration +description: | + Interact with the Agent Fleet Orchestrator. Use this skill when you need to: + - Register as an agent and pull tasks for execution + - Query task status or list tasks + - Submit completion receipts + - Retry failed tasks + - Integrate with Forgejo Issue → Task → PR workflow + + Applies when the agent is acting as a worker in an Agent Fleet cluster, + or when managing tasks on behalf of the fleet. +--- + +# Agent Fleet Integration Skill + +## Quick Start (http_pull mode) + +**Step 1.** Register your agent: +```bash +curl -X POST http://localhost:9090/api/v1/agents/register \ + -H 'Content-Type: application/json' \ + -d '{"agent_id":"my-agent","agent_type":"openclaw","hostname":"myhost","capabilities":["code:rust"],"max_concurrency":2}' +``` + +**Step 2.** Pull and execute a task: +```bash +curl -X POST http://localhost:9090/api/v1/tasks/dequeue \ + -H 'Content-Type: application/json' \ + -H 'Authorization: Bearer ' \ + -d '{"agent_id":"my-agent","capabilities":["code:rust"]}' +``` + +**Step 3.** Submit your result: +```bash +curl -X POST http://localhost:9090/api/v1/tasks//complete \ + -H 'Content-Type: application/json' \ + -d '{"task_id":"","agent_id":"my-agent","status":"completed","duration_seconds":60,"summary":"done","artifacts":[],"error":null}' +``` + +--- + +## Choosing Your Execution Mode + +| If you... | Use this mode | +|-----------|---------------| +| Have a CLI binary installed on a configured host | `ssh_cli` — Orchestrator calls you | +| Have your own scheduler or run outside configured hosts | `http_pull` — You call the API | + +- `ssh_cli` agents do **not** need to call any API. The Orchestrator handles everything via SSH or local subprocess. +- `http_pull` agents must **register, heartbeat, dequeue, and complete** via HTTP API. + +--- + +## Instructions + +### http_pull Agent Lifecycle + +``` +Register → Heartbeat (loop) → Dequeue → Execute → Complete/Deregister +``` + +1. **Register** once at startup via `POST /api/v1/agents/register`. +2. **Heartbeat** periodically (every 60s recommended) via `POST /api/v1/agents/heartbeat`. Without heartbeats, you will be marked offline and your tasks requeued. +3. **Dequeue** when ready for work via `POST /api/v1/tasks/dequeue`. Returns a Task or 204 No Content. +4. **Update status** to `running` via `POST /api/v1/tasks/{task_id}/status`. +5. **Complete** the task via `POST /api/v1/tasks/{task_id}/complete` with a Receipt. +6. **Deregister** when shutting down via `POST /api/v1/agents/deregister`. + +### ssh_cli Agent Notes + +No API interaction required. Ensure: +- Your CLI binary is in `$PATH` on the configured host. +- Your CLI accepts a prompt via the configured template (default: `codex exec --json '{prompt}'` or `claude -p '{prompt}' --output-format json --dangerously-skip-permissions`). +- Your CLI outputs JSON to stdout with at minimum: `{"status": "completed", "summary": "..."}`. + +--- + +## Examples + +### Register + +```bash +curl -X POST http://localhost:9090/api/v1/agents/register \ + -H 'Content-Type: application/json' \ + -d '{ + "agent_id": "worker-03", + "agent_type": "openclaw", + "hostname": "arm0", + "capabilities": ["code:rust", "review"], + "max_concurrency": 2, + "metadata": {"version": "1.0"} + }' +``` + +### Heartbeat + +```bash +curl -X POST http://localhost:9090/api/v1/agents/heartbeat \ + -H 'Content-Type: application/json' \ + -d '{"agent_id": "worker-03"}' +``` + +### List Available Tasks + +```bash +curl 'http://localhost:9090/api/v1/tasks?status=created' +``` + +### Dequeue + +```bash +curl -X POST http://localhost:9090/api/v1/tasks/dequeue \ + -H 'Content-Type: application/json' \ + -H 'Authorization: Bearer my-token' \ + -d '{"agent_id": "worker-03", "capabilities": ["code:rust"]}' +``` + +Returns 200 with Task JSON, or 204 if no matching task. + +### Get Task Detail + +```bash +curl 'http://localhost:9090/api/v1/tasks/org%2Frepo%2342' +``` + +### Update Task Status + +```bash +curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/status \ + -H 'Content-Type: application/json' \ + -H 'Authorization: Bearer my-token' \ + -d '{"status": "running"}' +``` + +### Complete Task with Receipt + +```bash +curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/complete \ + -H 'Content-Type: application/json' \ + -d '{ + "task_id": "org/repo#42", + "agent_id": "worker-03", + "status": "completed", + "duration_seconds": 180, + "summary": "Implemented the feature as described", + "artifacts": [ + {"artifact_type": "pr", "url": "https://git.example/org/repo/pulls/15"} + ], + "error": null + }' +``` + +### Submit Receipt + +```bash +curl -X POST http://localhost:9090/api/v1/receipts \ + -H 'Content-Type: application/json' \ + -d '{ + "task_id": "org/repo#42", + "agent_id": "worker-03", + "status": "completed", + "duration_seconds": 180, + "summary": "Done", + "artifacts": [], + "error": null + }' +``` + +### Retry a Failed Task + +```bash +curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/retry +``` + +Only works for tasks in `failed` or `agent_lost` status. + +### List Agents + +```bash +curl 'http://localhost:9090/api/v1/agents?status=online&capability=code:rust' +``` + +### Deregister + +```bash +curl -X POST http://localhost:9090/api/v1/agents/deregister \ + -H 'Content-Type: application/json' \ + -d '{"agent_id": "worker-03"}' +``` + +### Health Check + +```bash +curl http://localhost:9090/healthz +``` + +--- + +## Guidelines + +### Authentication + +- **http_pull endpoints** (`dequeue`, `status update`): require `Authorization: Bearer ` if `http_pull_token` is configured. If not configured, no auth is needed. +- **All other endpoints**: no authentication required. +- **Webhook endpoint**: requires HMAC-SHA256 signature header. + +### Error Handling + +| Code | Meaning | Action | +|------|---------|--------| +| 401 | Unauthorized | Check your Bearer token. If expired, re-register to get a new one. | +| 404 | Not Found | Task may have been completed or never existed. Move on. | +| 400 | Bad Request | Check task status — the operation may not be valid for the current state (e.g. retrying a `running` task). | +| 204 | No Content (dequeue) | No matching tasks available. Wait and retry. | +| 500 | Server Error | Retry with exponential backoff. Report if persistent. | + +### Retry Strategy + +- Use exponential backoff for transient errors (network, 500s): 1s, 2s, 4s, 8s, max 30s. +- Do not retry 400 errors — fix your request. +- For 404 on dequeue: poll again after a reasonable interval (e.g. 10–30 seconds). +- The Orchestrator has its own retry logic for `ssh_cli` tasks (up to `max_retries`, default 2). + +### Task Status Flow + +``` +created → assigned → running → review_pending → completed + ↘ failed + ↘ agent_lost + ↘ cancelled +``` + +- `failed` and `agent_lost` tasks can be retried via the retry endpoint. +- `review_pending` means a PR was opened and is awaiting merge/review. +- `completed` and `cancelled` are terminal states. + +### Heartbeat Requirements + +- Send heartbeats at least every `heartbeat_interval_secs` (default: 60s). +- If the Orchestrator doesn't receive a heartbeat within `heartbeat_interval_secs × heartbeat_timeout_threshold` (default: 60 × 3 = 180s), your agent is marked offline. +- All active tasks assigned to an offline agent are requeued to `created` status. + +--- + +## Forgejo Workflow + +### Task Creation (Issue → Task) + +1. Open a Forgejo Issue with a label `agent:` (e.g. `agent:code`). +2. The webhook creates a task with `task_id = {repo}#{issue_number}`. +3. Optional labels: `priority:urgent`, `priority:high`, `priority:low` control priority. + +### Branch Naming + +- Branch: `task/{url_encoded_task_id}` +- Example: `org/repo#42` → branch `task/org%2Frepo%2342` + +### PR Workflow + +1. Work on the `task/*` branch. +2. Open a PR from that branch. +3. Orchestrator receives `pull_request.opened` webhook → task goes to `review_pending`. +4. Pushes to the branch update `last_activity_at`. +5. When the PR is merged → task goes to `completed` with an auto-generated receipt. + +### For http_pull Agents + +After dequeuing a task, create the branch and PR yourself: + +```bash +git checkout -b task/org%2Frepo%2342 +# ... do the work ... +git push origin task/org%2Frepo%2342 +# Create PR via Forgejo API +# The webhook will update the task automatically +``` + +### For ssh_cli Agents + +The Orchestrator passes the branch name in the structured prompt. Create the branch, push, and open the PR as part of your CLI execution. The webhooks handle status updates.