docs: add agent API reference, onboarding guide, and universal skill

- docs/agent-api-reference.md (473 lines): complete HTTP API reference for all 12 endpoints - docs/agent-onboarding-guide.md (272 lines): ssh_cli and http_pull workflows, Forgejo integration - skill/SKILL.md (281 lines): universal agent skill, platform-agnostic, curl-based examples All content in English. No code changes.
2026-05-12 14:57:05 +08:00 · 2026-05-12 14:57:05 +08:00 · d1a746a8cb
commit d1a746a8cb
parent e39a16498c
9 changed files with 1250 additions and 0 deletions
--- a/docs/agent-api-reference.md
+++ b/docs/agent-api-reference.md
@ -0,0 +1,473 @@
+# Agent Fleet — HTTP API Reference
+
+Base URL: `http://<host>:9090`
+Content-Type: `application/json` for all request/response bodies unless noted.
+All timestamps are ISO 8601 (RFC 3339).
+
+---
+
+## Authentication
+
+### http_pull Bearer Token
+
+Endpoints that are specific to `http_pull` agents require a Bearer token in the `Authorization` header. The token is configured in `config.toml` as `orchestrator.http_pull_token`. If no token is configured in the config, authentication is skipped (open mode).
+
+```
+Authorization: Bearer <http_pull_token>
+```
+
+Affected endpoints: `POST /api/v1/tasks/dequeue`, `POST /api/v1/tasks/{task_id}/status`.
+
+### Webhook HMAC-SHA256
+
+The `POST /api/v1/webhooks/forgejo` endpoint requires an `X-Hub-Signature-256` (or `X-Gitea-Signature` / `X-Forgejo-Signature`) header containing `sha256=<hex_hmac>` of the request body using the configured `webhook_secret`.
+
+```
+X-Hub-Signature-256: sha256=abcdef...
+```
+
+---
+
+## Error Responses
+
+All errors return JSON:
+
+```json
+{ "error": "<human-readable message>" }
+```
+
+| Status | Meaning | Trigger |
+|--------|---------|---------|
+| 400 | Bad Request | Invalid state transition, wrong execution_mode, malformed input |
+| 401 | Unauthorized | Missing or invalid Bearer token for http_pull endpoints |
+| 404 | Not Found | Task or agent does not exist |
+| 500 | Internal Server Error | Database failure, lock poisoning, unexpected errors |
+
+---
+
+## Endpoints
+
+### Health Check
+
+```
+GET /healthz
+```
+
+**Response:** `200 OK` — body: `ok`
+
+```bash
+curl http://localhost:9090/healthz
+```
+
+---
+
+### Register Agent
+
+```
+POST /api/v1/agents/register
+```
+
+Register a new agent or update an existing one (upsert by `agent_id`).
+
+**Request:**
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| agent_id | string | yes | Unique identifier |
+| agent_type | string | yes | `openclaw`, `claude-code`, `codex-cli`, `hermes`, `acp`, `shell`, or custom |
+| hostname | string | yes | Machine hostname |
+| capabilities | string[] | yes | e.g. `["code:rust", "review"]` |
+| max_concurrency | u32 | yes | Max parallel tasks |
+| metadata | object | no | Arbitrary key-value pairs |
+
+**Response:** `200 OK`
+
+```json
+{
+  "agent_id": "worker-01",
+  "registry_token": "registry_a1b2c3d4..."
+}
+```
+
+```bash
+curl -X POST http://localhost:9090/api/v1/agents/register \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "agent_id": "worker-01",
+    "agent_type": "codex-cli",
+    "hostname": "host-worker-01",
+    "capabilities": ["code:rust"],
+    "max_concurrency": 2
+  }'
+```
+
+---
+
+### Heartbeat
+
+```
+POST /api/v1/agents/heartbeat
+```
+
+**Request:**
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| agent_id | string | yes | Agent to update |
+
+**Response:** `200 OK`
+
+```json
+{
+  "agent_id": "worker-01",
+  "status": "online",
+  "last_heartbeat_at": "2025-01-15T10:30:00Z"
+}
+```
+
+**Errors:** `404` if agent not found.
+
+```bash
+curl -X POST http://localhost:9090/api/v1/agents/heartbeat \
+  -H 'Content-Type: application/json' \
+  -d '{"agent_id": "worker-01"}'
+```
+
+---
+
+### Deregister Agent
+
+```
+POST /api/v1/agents/deregister
+```
+
+Sets agent offline and requeues all its active tasks back to `created`.
+
+**Request:**
+
+| Field | Type | Required |
+|-------|------|----------|
+| agent_id | string | yes |
+
+**Response:** `200 OK`
+
+```json
+{
+  "agent_id": "worker-01",
+  "status": "offline",
+  "requeued_tasks": 3
+}
+```
+
+```bash
+curl -X POST http://localhost:9090/api/v1/agents/deregister \
+  -H 'Content-Type: application/json' \
+  -d '{"agent_id": "worker-01"}'
+```
+
+---
+
+### List Agents
+
+```
+GET /api/v1/agents
+```
+
+**Query Parameters:**
+
+| Param | Type | Description |
+|-------|------|-------------|
+| capability | string | Filter by capability (e.g. `code:rust`) |
+| status | string | Filter: `online`, `offline`, `draining` |
+
+**Response:** `200 OK` — JSON array of [Agent](#agent-object) objects.
+
+```bash
+curl 'http://localhost:9090/api/v1/agents?status=online'
+```
+
+---
+
+### List Tasks
+
+```
+GET /api/v1/tasks
+```
+
+**Query Parameters:**
+
+| Param | Type | Description |
+|-------|------|-------------|
+| status | string | Filter by status (e.g. `created`, `running`, `failed`) |
+| agent_id | string | Filter by assigned agent |
+
+**Response:** `200 OK` — JSON array of [Task](#task-object) objects. Ordered by `created_at` descending.
+
+```bash
+curl 'http://localhost:9090/api/v1/tasks?status=running'
+```
+
+---
+
+### Get Task
+
+```
+GET /api/v1/tasks/{task_id}
+```
+
+**Response:** `200 OK` — single [Task](#task-object) object.
+
+**Errors:** `404` if task not found.
+
+```bash
+curl http://localhost:9090/api/v1/tasks/org%2Frepo%2342
+```
+
+---
+
+### Dequeue Task (http_pull only)
+
+```
+POST /api/v1/tasks/dequeue
+```
+
+Requires Bearer token if `http_pull_token` is configured. Only returns tasks with `execution_mode = http_pull`.
+
+**Request:**
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| agent_id | string | yes | Agent claiming the task |
+| capabilities | string[] | no | Capabilities to match against task labels |
+
+**Response:** `200 OK` with [Task](#task-object) object, or `204 No Content` if no matching task.
+
+**Errors:** `401` if token required and missing/invalid.
+
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/dequeue \
+  -H 'Content-Type: application/json' \
+  -H 'Authorization: Bearer my-token' \
+  -d '{"agent_id": "worker-03", "capabilities": ["code:rust"]}'
+```
+
+---
+
+### Update Task Status (http_pull only)
+
+```
+POST /api/v1/tasks/{task_id}/status
+```
+
+Requires Bearer token. Only works for tasks with `execution_mode = http_pull`.
+
+**Request:**
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| status | string | yes | Target status: `running`, `review_pending`, etc. |
+
+**Response:** `200 OK` — updated [Task](#task-object).
+
+**Errors:** `400` if task is not `http_pull` mode or transition is invalid. `404` if task not found.
+
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/status \
+  -H 'Content-Type: application/json' \
+  -H 'Authorization: Bearer my-token' \
+  -d '{"status": "running"}'
+```
+
+---
+
+### Complete Task
+
+```
+POST /api/v1/tasks/{task_id}/complete
+```
+
+Works for both `ssh_cli` and `http_pull` tasks. Submit a receipt to mark the task done.
+
+**Request:** A [Receipt](#receipt-object) object.
+
+**Response:** `200 OK`
+
+```json
+{
+  "task_id": "org/repo#42",
+  "status": "completed"
+}
+```
+
+**Errors:** `404` if task not found. `400` if task is not in a completable state.
+
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/complete \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "task_id": "org/repo#42",
+    "agent_id": "worker-01",
+    "status": "completed",
+    "duration_seconds": 120,
+    "summary": "Implemented feature X",
+    "artifacts": [
+      {"artifact_type": "pr", "url": "https://git.example/org/repo/pulls/7"}
+    ],
+    "error": null
+  }'
+```
+
+---
+
+### Retry Task
+
+```
+POST /api/v1/tasks/{task_id}/retry
+```
+
+Retry a `failed` or `agent_lost` task. Transitions it back to `assigned`.
+
+**Response:** `200 OK` — updated [Task](#task-object).
+
+**Errors:** `400` if task status is not `failed` or `agent_lost`. `404` if task not found.
+
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/retry
+```
+
+---
+
+### Submit Receipt
+
+```
+POST /api/v1/receipts
+```
+
+Submit a receipt for a task. Validates artifacts (e.g. checks PR exists via Forgejo API).
+
+**Request:** A [Receipt](#receipt-object) object.
+
+**Response:** `200 OK`
+
+**Errors:** `404` if task not found. `400` if validation fails.
+
+```bash
+curl -X POST http://localhost:9090/api/v1/receipts \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "task_id": "org/repo#42",
+    "agent_id": "worker-01",
+    "status": "completed",
+    "duration_seconds": 95,
+    "summary": "Fixed the bug",
+    "artifacts": [],
+    "error": null
+  }'
+```
+
+---
+
+### Forgejo Webhook
+
+```
+POST /api/v1/webhooks/forgejo
+```
+
+Receives Forgejo webhook events. Requires HMAC-SHA256 signature header.
+
+**Headers:** `X-Forgejo-Event` or `X-Gitea-Event` determines the event type.
+
+**Supported events:**
+
+| Event | Action |
+|-------|--------|
+| `issues` (opened) | Creates a task from the Issue (requires `agent:*` label) |
+| `pull_request` (opened) | Sets task to `review_pending` (branch name → task_id) |
+| `pull_request` (merged/closed with `merged: true`) | Sets task to `completed`, auto-generates receipt |
+| `push` (to `task/*` branch) | Updates `last_activity_at` on the task |
+
+**Response:** `200 OK`
+
+```json
+{
+  "accepted": true,
+  "task_id": "org/repo#42"
+}
+```
+
+**Errors:** `401` if signature invalid. `400` if payload unparseable.
+
+---
+
+## Object Schemas
+
+### Agent Object
+
+```json
+{
+  "agent_id": "worker-01",
+  "agent_type": "codex-cli",
+  "hostname": "host-worker-01",
+  "capabilities": ["code:rust"],
+  "max_concurrency": 2,
+  "current_tasks": 1,
+  "status": "online",
+  "last_heartbeat_at": "2025-01-15T10:30:00Z",
+  "registered_at": "2025-01-15T09:00:00Z",
+  "metadata": {}
+}
+```
+
+### Task Object
+
+```json
+{
+  "task_id": "org/repo#42",
+  "source": "forgejo:org/repo#42",
+  "task_type": "code",
+  "priority": "normal",
+  "status": "created",
+  "execution_mode": "ssh_cli",
+  "assigned_agent_id": null,
+  "assigned_host": null,
+  "requirements": "Implement the feature described in the issue body",
+  "labels": ["agent:code", "code:rust"],
+  "branch_name": "task/org%2Frepo%2342",
+  "pr_title": "feat: Implement feature (#42)",
+  "created_at": "2025-01-15T10:00:00Z",
+  "assigned_at": null,
+  "started_at": null,
+  "completed_at": null,
+  "last_activity_at": null,
+  "retry_count": 0,
+  "max_retries": 2,
+  "review_count": 0,
+  "timeout_seconds": 1800
+}
+```
+
+**Status values:** `created`, `assigned`, `running`, `review_pending`, `completed`, `failed`, `agent_lost`, `cancelled`
+
+**Priority values:** `low`, `normal`, `high`, `urgent`
+
+**Execution mode values:** `ssh_cli`, `http_pull`
+
+### Receipt Object
+
+```json
+{
+  "task_id": "org/repo#42",
+  "agent_id": "worker-01",
+  "status": "completed",
+  "duration_seconds": 120,
+  "summary": "Implemented the feature",
+  "artifacts": [
+    {"artifact_type": "pr", "url": "https://git.example/org/repo/pulls/7", "path": null, "description": null}
+  ],
+  "error": null
+}
+```
+
+**Receipt status values:** `completed`, `failed`, `partial`
+
+**Artifact type values:** `pr`, `commit`, `file`, `comment`, `url`
--- a/docs/agent-onboarding-guide.md
+++ b/docs/agent-onboarding-guide.md
@ -0,0 +1,272 @@
+# Agent Fleet — Agent Onboarding Guide
+
+This guide explains how to integrate an agent with the Agent Fleet Orchestrator.
+
+---
+
+## Execution Modes
+
+Agent Fleet supports two execution modes. The mode is set per-task at creation time (defaults to `ssh_cli`).
+
+| Aspect | `ssh_cli` | `http_pull` |
+|--------|-----------|-------------|
+| Who initiates? | Orchestrator (via SSH or local subprocess) | Agent (via HTTP API) |
+| Control flow | Orchestrator builds prompt, runs CLI, collects output | Agent decides when to dequeue and execute |
+| Agent requirements | CLI binary on a configured host | HTTP client, can call REST API |
+| Auth needed? | No (Orchestrator manages) | Yes (Bearer token) |
+| Best for | Codex CLI, Claude Code, OpenCode — agents with CLIs | OpenClaw/Jeeves, Hermes — agents with their own schedulers |
+| Task creation trigger | Forgejo Issue webhook (default) | Same, or API call |
+
+---
+
+## ssh_cli Workflow
+
+### 1. Configure a Host
+
+Add a `[[hosts]]` section to `config.toml` on the Orchestrator:
+
+```toml
+[[hosts]]
+host_id = "host-worker-01"
+hostname = "192.168.1.100"
+ssh_user = "deploy"
+ssh_port = 22
+ssh_key_path = "/home/deploy/.ssh/id_ed25519"
+work_dir = "/opt/agent-workspace"
+agents = [
+  { agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust", "code:python"] },
+]
+```
+
+For local execution (same machine as Orchestrator), use `hostname = "localhost"` — the Orchestrator uses a local subprocess instead of SSH.
+
+### 2. Install the Agent CLI
+
+The CLI binary must be available on the target host in `$PATH`. The Orchestrator checks availability with `which <binary>`.
+
+Built-in CLI templates:
+
+| Agent Type | CLI Command |
+|------------|-------------|
+| `codex-cli` | `codex exec --json '{prompt}'` |
+| `claude-code` | `claude -p '{prompt}' --output-format json --dangerously-skip-permissions` |
+
+Custom templates can be defined in `config.toml` under `[adapters]`.
+
+### 3. Orchestrator Handles Everything
+
+When a Forgejo Issue with an `agent:*` label arrives:
+
+1. Orchestrator creates a task (`execution_mode = ssh_cli`)
+2. Dispatch loop picks the task, selects a host by capability + load
+3. SSH (or local subprocess) executes the CLI with a structured prompt
+4. Output is parsed (Codex JSON or Claude JSON format)
+5. Task status updates: `created` → `assigned` → `running` → `completed` (or `failed`)
+
+### 4. What the Agent Receives (Structured Prompt)
+
+The Orchestrator constructs this prompt and passes it as the `{prompt}` variable:
+
+```
+Task ID: org/repo#42
+Type: code
+Goal:
+Implement the feature described in the issue body
+
+Constraints:
+- Execution mode: ssh_cli
+- Labels: code:rust
+- Branch: task/org%2Frepo%2342
+- Expected output: JSON receipt
+
+Validation:
+- Run relevant tests if code changed
+- Summarize changes and artifacts
+```
+
+### 5. Expected CLI Output
+
+The CLI must output JSON to stdout. The format depends on the parser:
+
+**Codex JSON:**
+```json
+{"status": "completed", "summary": "done", "duration_seconds": 120, "artifacts": [{"artifact_type": "pr", "url": "https://..."}]}
+```
+
+**Claude JSON:**
+```json
+{"status": "completed", "summary": "done", "duration_seconds": 95, "error": null}
+```
+
+If output is not valid JSON, the task is marked `failed`.
+
+---
+
+## http_pull Workflow
+
+### 1. Register
+
+```bash
+curl -X POST http://localhost:9090/api/v1/agents/register \
+  -H 'Content-Type: application/json' \
+  -d '{"agent_id": "worker-03", "agent_type": "openclaw", "hostname": "arm0", "capabilities": ["code:rust"], "max_concurrency": 2}'
+```
+
+Response contains a `registry_token`. Keep it for subsequent API calls (if `http_pull_token` is configured, use that shared token instead).
+
+### 2. Heartbeat (periodic)
+
+Send a heartbeat every N seconds (default interval: 60s). If the Orchestrator doesn't receive one within `heartbeat_interval_secs × heartbeat_timeout_threshold`, the agent is marked offline and its tasks are requeued.
+
+```bash
+curl -X POST http://localhost:9090/api/v1/agents/heartbeat \
+  -H 'Content-Type: application/json' \
+  -d '{"agent_id": "worker-03"}'
+```
+
+### 3. Dequeue a Task
+
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/dequeue \
+  -H 'Content-Type: application/json' \
+  -H 'Authorization: Bearer <token>' \
+  -d '{"agent_id": "worker-03", "capabilities": ["code:rust"]}'
+```
+
+Returns `200 OK` with a Task object, or `204 No Content` if nothing available.
+
+Only tasks with `execution_mode = http_pull` are returned.
+
+### 4. Update Status While Working
+
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/status \
+  -H 'Content-Type: application/json' \
+  -H 'Authorization: Bearer <token>' \
+  -d '{"status": "running"}'
+```
+
+### 5. Complete the Task
+
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/complete \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "task_id": "org/repo#42",
+    "agent_id": "worker-03",
+    "status": "completed",
+    "duration_seconds": 180,
+    "summary": "Fixed the issue",
+    "artifacts": [{"artifact_type": "pr", "url": "https://git.example/org/repo/pulls/15"}],
+    "error": null
+  }'
+```
+
+Or use the receipts endpoint:
+
+```bash
+curl -X POST http://localhost:9090/api/v1/receipts \
+  -H 'Content-Type: application/json' \
+  -d '<same receipt body>'
+```
+
+### 6. Deregister When Done
+
+```bash
+curl -X POST http://localhost:9090/api/v1/agents/deregister \
+  -H 'Content-Type: application/json' \
+  -d '{"agent_id": "worker-03"}'
+```
+
+---
+
+## Forgejo Integration
+
+### How Issues Become Tasks
+
+1. A Forgejo Issue is opened with a label matching `agent:*` (e.g. `agent:code`)
+2. Forgejo sends an `issues` webhook to `POST /api/v1/webhooks/forgejo`
+3. The `agent:*` label value becomes `task_type` (e.g. `code`)
+4. Priority is inferred from labels: `priority:urgent`, `priority:high`, `priority:low` (default: `normal`)
+5. A task is created with:
+   - `task_id` = `{repo_full_name}#{issue_number}` (e.g. `org/repo#42`)
+   - `execution_mode` = `ssh_cli` (default for Forgejo-originated tasks)
+   - `branch_name` = `task/{url_encoded_task_id}` (e.g. `task/org%2Frepo%2342`)
+   - `pr_title` = `feat: {issue_title} (#{issue_number})`
+
+### Branch Naming Convention
+
+- Branch: `task/{url_encoded_task_id}`
+- Example: task `org/repo#42` → branch `task/org%2Frepo%2342`
+
+### PR Lifecycle
+
+| Event | Effect |
+|-------|--------|
+| PR opened (branch = `task/*`) | Task → `review_pending` |
+| PR merged | Task → `completed`, auto receipt generated |
+| Push to `task/*` branch | Task `last_activity_at` updated |
+
+### Task Status Flow
+
+```
+created → assigned → running → review_pending → completed
+                               ↘ failed
+                  ↘ agent_lost
+         ↘ cancelled
+```
+
+Any `failed` or `agent_lost` task can be retried via `POST /api/v1/tasks/{task_id}/retry` (transitions to `assigned`). Retry is limited by `max_retries` (default: 2).
+
+---
+
+## Structured Prompt Format (ssh_cli)
+
+When the Orchestrator executes an agent via SSH, it constructs a structured prompt:
+
+```
+Task ID: <task_id>
+Type: <task_type>
+Goal:
+<requirements>
+
+Constraints:
+- Execution mode: ssh_cli
+- Labels: <comma-separated labels or <none>>
+- Branch: <branch_name>
+- Expected output: JSON receipt
+
+Validation:
+- Run relevant tests if code changed
+- Summarize changes and artifacts
+```
+
+The prompt is injected into the CLI template as the `{prompt}` variable. Other available variables: `{work_dir}`, `{task_id}`, `{branch}`.
+
+---
+
+## FAQ
+
+**Q: How do I know which execution mode to use?**
+A: If you have a CLI binary and run on a configured host → `ssh_cli`. If you have your own scheduler or run outside configured hosts → `http_pull`.
+
+**Q: Do I need to register for ssh_cli mode?**
+A: No. The Orchestrator manages ssh_cli tasks entirely. Registration is only for `http_pull` agents.
+
+**Q: What happens if my agent crashes during ssh_cli execution?**
+A: The task is marked `failed`. If `retry_count < max_retries`, the dispatch loop will retry automatically.
+
+**Q: What happens if my http_pull agent stops sending heartbeats?**
+A: After `heartbeat_interval_secs × heartbeat_timeout_threshold` seconds, the agent is marked offline and all its tasks are requeued with status `created`.
+
+**Q: Can a task switch between execution modes?**
+A: No. The `execution_mode` is set at creation time and cannot be changed.
+
+**Q: How do I create a task manually?**
+A: Use the Forgejo webhook flow (open an Issue with `agent:*` label), or directly insert into the database. There is no public "create task" API endpoint.
+
+**Q: What label format triggers task creation?**
+A: Issues must have a label starting with `agent:` (e.g. `agent:code`, `agent:review`). The value after `agent:` becomes the task type. Issues without such a label are ignored.
+
+**Q: How does the review loop work?**
+A: When a PR is opened (not merged), the task goes to `review_pending`. If the PR is not merged and the review cycle count exceeds `max_retries`, the task is marked `failed`. For `ssh_cli`, the Orchestrator re-dispatches automatically.
--- a/openspec/changes/agent-onboarding-docs/.openspec.yaml
+++ b/openspec/changes/agent-onboarding-docs/.openspec.yaml
@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-05-12
--- a/openspec/changes/agent-onboarding-docs/design.md
+++ b/openspec/changes/agent-onboarding-docs/design.md
@ -0,0 +1,65 @@
+## Context
+
+agent-fleet 核心功能已经实现并部署到 arm0 上运行。但没有任何 Agent 知道怎么用它。项目的可用性完全取决于 Agent 能否正确接入。
+
+需要两个交付物：
+1. **API 参考文档**：给 Agent 看的 HTTP API 手册
+2. **通用 Skill**：遵循标准 skill 规范的能力描述，不绑定特定平台
+
+关键约束：Skill 必须是平台无关的。承担 Team Leader 角色的不一定是 OpenClaw，Codex、Claude Code、OpenCode、Hermes Agent 都可能是调度者。
+
+## Goals / Non-Goals
+
+**Goals:**
+- 提供完整、准确、可直接使用的 API 参考文档
+- 提供通用 Skill，任何 Agent 加载后就知道如何与 agent-fleet 交互
+- 覆盖两种执行模式（ssh_cli + http_pull）的完整工作流
+- 覆盖 Forgejo 集成的 Git 工作流
+
+**Non-Goals:**
+- 不写人类运维文档（部署、配置、排障）→ 这是另一个 change
+- 不写特定平台的集成脚本（如 OpenClaw skill 的安装脚本）
+- 不实现 SDK 或客户端库
+
+## Decisions
+
+### Decision 1: 通用 Skill 规范，不绑定平台
+
+**选择**: Skill 使用标准 YAML frontmatter + Markdown body 格式
+
+**理由**:
+- 所有主流 Agent 平台都支持这种格式（OpenClaw、Claude Code、Codex CLI、OpenCode）
+- 不包含任何平台特定语法，Agent 自行转换
+- curl 格式是通用语言，所有 Agent 都能理解
+
+**替代方案**:
+- OpenClaw 专用 skill：限制了使用范围
+- 多平台各自写：重复劳动，容易不一致
+
+### Decision 2: 文档放在 repo 内
+
+**选择**: `docs/` 目录放 API 参考和接入指南，`skill/` 目录放 SKILL.md
+
+**理由**:
+- 与代码同仓库，版本一致
+- Agent 可以通过 Forgejo 直接读取文档
+- Skill 可以被各平台 fork 或 symlink
+
+### Decision 3: 文档从代码自动生成 + 手动补充
+
+**选择**: API 端点列表手动维护（Phase 1），后续考虑从代码注释自动生成
+
+**理由**:
+- Phase 1 端点数量有限（~12 个），手动维护成本低
+- 自动生成需要额外工具链（如 `utoipa`），Phase 1 不值得投入
+
+## Risks / Trade-offs
+
+- **[文档过时] 代码变更后文档可能不一致** → 文档与代码同仓库，PR review 时检查
+- **[Skill 通用性限制] 通用意味着不能利用平台特性** → 通用是正确选择，平台特定优化由各 Agent 自行处理
+
+## Open Questions
+
+_(resolved)_
+
+- ~~Skill 是否需要包含多语言版本（中/英）？~~ → 全部使用英文。原因：LLM 训练语料以英文为主，英文更 token-efficient、语义歧义更小。Skill 的受众是 Agent 不是人类。
--- a/openspec/changes/agent-onboarding-docs/proposal.md
+++ b/openspec/changes/agent-onboarding-docs/proposal.md
@ -0,0 +1,37 @@
+## Why
+
+agent-fleet 的所有核心功能（双执行模型、Forgejo 集成、Receipt 验证）已经实现并在 arm0 上跑通。但没有任何 Agent 知道如何使用它。
+
+当前状态：
+- API 端点已经实现（注册、心跳、dequeue、status、receipt、webhook 等）
+- 双执行模式（ssh_cli + http_pull）已经实现
+- 但没有任何文档告诉 Agent "怎么接入、怎么调 API、怎么配合工作流"
+
+项目的可用性完全取决于 Agent 能否正确接入。没有文档和 skill，agent-fleet 就是一个没人会用的 API。
+
+同时，需要的是一个**通用 skill**（不绑定 OpenClaw），因为：
+- 承担 Team Leader 角色的不一定是 OpenClaw
+- Codex、Claude Code、OpenCode、Hermes Agent 等都需要能理解和使用 agent-fleet
+- Skill 是通用的 Agent 能力描述，遵循通用规范
+
+## What Changes
+
+- 新增 `docs/agent-api-reference.md`：完整的 HTTP API 参考文档，供任何 Agent 阅读
+- 新增 `docs/agent-onboarding-guide.md`：Agent 接入指南，包含两种模式的完整工作流程
+- 新增 `skill/` 目录：通用 Agent Skill 定义（SKILL.md），遵循通用 skill 规范
+- Skill 内容：API 调用方式、认证、任务生命周期、Forgejo 工作流、错误处理
+
+## Capabilities
+
+### New Capabilities
+- `agent-api-reference`: HTTP API 完整参考文档（端点、请求/响应格式、错误码、示例）
+- `agent-skill`: 通用 Agent Skill 定义，描述 Agent 如何与 agent-fleet 交互
+
+### Modified Capabilities
+_(无)_
+
+## Impact
+
+- **文档**：新增 2 个 Markdown 文档 + 1 个 Skill 定义
+- **代码**：无代码变更
+- **项目**：Skill 目录是新增结构，可能需要考虑放在 repo 的哪个位置
--- a/openspec/changes/agent-onboarding-docs/specs/agent-api-reference/spec.md
+++ b/openspec/changes/agent-onboarding-docs/specs/agent-api-reference/spec.md
@ -0,0 +1,43 @@
+## ADDED Requirements
+
+### Requirement: Complete HTTP API reference documentation
+项目 SHALL 提供完整的 HTTP API 参考文档（`docs/agent-api-reference.md`），供任何 Agent 阅读。文档 SHALL 覆盖所有公开端点，包含请求/响应格式、错误码、示例。
+
+#### Scenario: Agent reads API reference to understand available endpoints
+- **WHEN** Agent 阅读 `docs/agent-api-reference.md`
+- **THEN** 文档 SHALL 列出所有端点：healthz、agents/register、agents/heartbeat、agents/deregister、agents (GET)、tasks (GET)、tasks/{id} (GET)、tasks/dequeue、tasks/{id}/status、tasks/{id}/retry、tasks/{id}/complete、receipts、webhooks/forgejo
+- **AND** 每个端点 SHALL 包含：HTTP 方法、URL、请求体格式、响应格式、错误码、curl 示例
+
+#### Scenario: Agent checks authentication requirements
+- **WHEN** Agent 查看 API 参考的认证部分
+- **THEN** 文档 SHALL 说明：http_pull 模式需要 Bearer token（注册时获取），ssh_cli 模式不需要 Agent 认证，webhook 端点需要 HMAC-SHA256 签名
+
+#### Scenario: Agent understands error responses
+- **WHEN** Agent 收到错误响应
+- **THEN** 文档 SHALL 列出所有错误码：401 Unauthorized、403 Forbidden、404 Not Found、400 Bad Request、500 Internal Server Error
+- **AND** 每个错误码 SHALL 包含触发场景描述
+
+### Requirement: Agent onboarding guide
+项目 SHALL 提供 Agent 接入指南（`docs/agent-onboarding-guide.md`），描述两种执行模式的完整工作流程。
+
+#### Scenario: New agent team leader reads onboarding guide
+- **WHEN** 新的 Team Leader Agent（如 Jeeves）阅读 onboarding guide
+- **THEN** 文档 SHALL 描述两种执行模式的区别和使用场景：
+  - ssh_cli：Orchestrator 主动调度，适用于 Codex、Claude Code、OpenCode 等有 CLI 的 Agent
+  - http_pull：Agent 自主拉取，适用于 OpenClaw/Jeeves、Hermes 等有自己的调度器的 Agent
+
+#### Scenario: Agent follows ssh_cli workflow
+- **WHEN** Agent 按 ssh_cli 模式接入
+- **THEN** 文档 SHALL 描述完整流程：配置 host → Agent 安装 CLI → Orchestrator 自动发现 → 任务自动分配和执行 → PR 创建 → webhook 回调
+
+#### Scenario: Agent follows http_pull workflow
+- **WHEN** Agent 按 http_pull 模式接入
+- **THEN** 文档 SHALL 描述完整流程：调用 register API → 获取 token → 定期 heartbeat → 调用 dequeue 拉任务 → 执行 → 调用 complete/receipt API
+
+#### Scenario: Agent understands Forgejo integration
+- **WHEN** Agent 阅读 Forgejo 集成部分
+- **THEN** 文档 SHALL 描述：Issue 如何变成任务（webhook → label 解析）、任务如何关联 Git 分支（`task/{task_id}`）、PR 生命周期如何驱动状态更新（opened → review_pending、merged → completed）
+
+#### Scenario: Agent understands structured prompt format
+- **WHEN** ssh_cli 模式的 Agent 需要理解传入的 prompt
+- **THEN** 文档 SHALL 描述结构化 prompt 的格式：Task ID、Type、Goal、Constraints、Branch、Expected output、Validation
--- a/openspec/changes/agent-onboarding-docs/specs/agent-skill/spec.md
+++ b/openspec/changes/agent-onboarding-docs/specs/agent-skill/spec.md
@ -0,0 +1,41 @@
+## ADDED Requirements
+
+### Requirement: Universal Agent Skill definition
+项目 SHALL 提供一个通用 Agent Skill（`skill/SKILL.md`），遵循标准 skill 规范（YAML frontmatter + Markdown body）。Skill SHALL 不绑定任何特定 Agent 平台（OpenClaw、Claude Code、Codex、OpenCode、Hermes 等均可使用）。
+
+#### Scenario: Any agent discovers and loads the skill
+- **WHEN** 任意 Agent（Codex、Claude Code、OpenCode、Hermes 等）加载 skill/SKILL.md
+- **THEN** Skill SHALL 包含 YAML frontmatter：`name: agent-fleet-integration`，`description` 描述用途和触发条件
+- **AND** Skill body SHALL 使用标准 Markdown 格式（标题、代码块、示例）
+
+#### Scenario: Skill teaches agent how to interact with agent-fleet
+- **WHEN** Agent 阅读 Skill 内容
+- **THEN** Skill SHALL 包含 Quick Start 部分（最简单的接入示例，3 步以内）
+- **AND** 包含 Instructions 部分（详细的 API 调用流程）
+- **AND** 包含 Examples 部分（每种操作的 curl 示例）
+- **AND** 包含 Guidelines 部分（错误处理、重试策略、认证规则）
+
+#### Scenario: Skill covers both execution modes
+- **WHEN** Agent 需要选择执行模式
+- **THEN** Skill SHALL 清晰说明 ssh_cli 和 http_pull 的区别
+- **AND** 指导 Agent 如何判断自己应该使用哪种模式：
+  - 如果有 CLI 且在配置的主机上 → ssh_cli（由 Orchestrator 调度）
+  - 如果有自己的调度器或不在配置的主机上 → http_pull（自主拉取）
+
+#### Scenario: Skill includes Forgejo workflow
+- **WHEN** Agent 需要理解 Git 工作流
+- **THEN** Skill SHALL 描述分支命名约定（`task/{task_id}`）、PR 创建流程、webhook 触发机制
+
+#### Scenario: Skill includes error recovery guidance
+- **WHEN** Agent 遇到 API 错误
+- **THEN** Skill SHALL 提供常见错误的处理方式：
+  - 401 → 检查 token，必要时重新注册
+  - 404 → 任务可能已完成或不存在
+  - 409/400 → 检查任务状态是否允许该操作
+  - 网络错误 → 重试（指数退避）
+
+#### Scenario: Skill is portable across agent platforms
+- **WHEN** Skill 被不同平台的 Agent 使用
+- **THEN** Skill SHALL 不包含任何平台特定的语法或指令（如 OpenClaw 的 `sessions_send`、Claude Code 的 `hooks` 等）
+- **AND** 所有交互通过标准 HTTP 请求描述（curl 格式）
+- **AND** Agent 可根据自身能力将 curl 转换为对应的 HTTP 调用方式
--- a/openspec/changes/agent-onboarding-docs/tasks.md
+++ b/openspec/changes/agent-onboarding-docs/tasks.md
@ -0,0 +1,36 @@
+## 1. API 参考文档
+
+- [ ] 1.1 创建 `docs/agent-api-reference.md`
+- [ ] 1.2 列出所有公开端点（~12 个），每个包含：HTTP 方法、URL、请求体、响应体、错误码、curl 示例
+- [ ] 1.3 认证部分：http_pull token、webhook HMAC-SHA256 签名
+- [ ] 1.4 错误码汇总：401/403/404/400/500，每个附触发场景
+- [ ] 1.5 通用说明：base_url、Content-Type、字符编码、分页（如有）
+
+## 2. Agent 接入指南
+
+- [ ] 2.1 创建 `docs/agent-onboarding-guide.md`
+- [ ] 2.2 两种执行模式对比表（ssh_cli vs http_pull）
+- [ ] 2.3 ssh_cli 模式完整工作流：配置 host → CLI 安装 → 自动调度 → PR 工作流
+- [ ] 2.4 http_pull 模式完整工作流：register → heartbeat → dequeue → execute → complete/receipt
+- [ ] 2.5 Forgejo 集成说明：Issue → Task、分支命名、PR 生命周期
+- [ ] 2.6 结构化 prompt 格式说明（ssh_cli 模式下 Agent 收到的 prompt 结构）
+- [ ] 2.7 常见问题 FAQ
+
+## 3. 通用 Agent Skill
+
+- [ ] 3.1 创建 `skill/SKILL.md`（YAML frontmatter + Markdown body）
+- [ ] 3.2 Quick Start：最简接入示例（3 步以内）
+- [ ] 3.3 Instructions：详细 API 调用流程（register → heartbeat → dequeue → execute → complete）
+- [ ] 3.4 Examples：每种操作的 curl 示例
+- [ ] 3.5 Guidelines：错误处理、重试策略、认证规则
+- [ ] 3.6 执行模式选择指南：Agent 如何判断自己用 ssh_cli 还是 http_pull
+- [ ] 3.7 Forgejo 工作流说明（分支命名、PR 创建、webhook 触发）
+- [ ] 3.8 验证：Skill 内容与 API 参考文档一致、curl 示例可执行
+
+## 4. 验证
+
+- [ ] 4.1 API 参考文档覆盖所有已实现端点
+- [ ] 4.2 curl 示例基于 arm0 实例可执行
+- [ ] 4.3 Skill 格式符合标准规范（YAML frontmatter + Markdown body）
+- [ ] 4.4 Skill 不包含任何平台特定语法
+- [ ] 4.5 接入指南与当前代码实现一致
--- a/skill/SKILL.md
+++ b/skill/SKILL.md
@ -0,0 +1,281 @@
+---
+name: agent-fleet-integration
+description: |
+  Interact with the Agent Fleet Orchestrator. Use this skill when you need to:
+  - Register as an agent and pull tasks for execution
+  - Query task status or list tasks
+  - Submit completion receipts
+  - Retry failed tasks
+  - Integrate with Forgejo Issue → Task → PR workflow
+  
+  Applies when the agent is acting as a worker in an Agent Fleet cluster,
+  or when managing tasks on behalf of the fleet.
+---
+
+# Agent Fleet Integration Skill
+
+## Quick Start (http_pull mode)
+
+**Step 1.** Register your agent:
+```bash
+curl -X POST http://localhost:9090/api/v1/agents/register \
+  -H 'Content-Type: application/json' \
+  -d '{"agent_id":"my-agent","agent_type":"openclaw","hostname":"myhost","capabilities":["code:rust"],"max_concurrency":2}'
+```
+
+**Step 2.** Pull and execute a task:
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/dequeue \
+  -H 'Content-Type: application/json' \
+  -H 'Authorization: Bearer <token>' \
+  -d '{"agent_id":"my-agent","capabilities":["code:rust"]}'
+```
+
+**Step 3.** Submit your result:
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/<task_id>/complete \
+  -H 'Content-Type: application/json' \
+  -d '{"task_id":"<task_id>","agent_id":"my-agent","status":"completed","duration_seconds":60,"summary":"done","artifacts":[],"error":null}'
+```
+
+---
+
+## Choosing Your Execution Mode
+
+| If you... | Use this mode |
+|-----------|---------------|
+| Have a CLI binary installed on a configured host | `ssh_cli` — Orchestrator calls you |
+| Have your own scheduler or run outside configured hosts | `http_pull` — You call the API |
+
+- `ssh_cli` agents do **not** need to call any API. The Orchestrator handles everything via SSH or local subprocess.
+- `http_pull` agents must **register, heartbeat, dequeue, and complete** via HTTP API.
+
+---
+
+## Instructions
+
+### http_pull Agent Lifecycle
+
+```
+Register → Heartbeat (loop) → Dequeue → Execute → Complete/Deregister
+```
+
+1. **Register** once at startup via `POST /api/v1/agents/register`.
+2. **Heartbeat** periodically (every 60s recommended) via `POST /api/v1/agents/heartbeat`. Without heartbeats, you will be marked offline and your tasks requeued.
+3. **Dequeue** when ready for work via `POST /api/v1/tasks/dequeue`. Returns a Task or 204 No Content.
+4. **Update status** to `running` via `POST /api/v1/tasks/{task_id}/status`.
+5. **Complete** the task via `POST /api/v1/tasks/{task_id}/complete` with a Receipt.
+6. **Deregister** when shutting down via `POST /api/v1/agents/deregister`.
+
+### ssh_cli Agent Notes
+
+No API interaction required. Ensure:
+- Your CLI binary is in `$PATH` on the configured host.
+- Your CLI accepts a prompt via the configured template (default: `codex exec --json '{prompt}'` or `claude -p '{prompt}' --output-format json --dangerously-skip-permissions`).
+- Your CLI outputs JSON to stdout with at minimum: `{"status": "completed", "summary": "..."}`.
+
+---
+
+## Examples
+
+### Register
+
+```bash
+curl -X POST http://localhost:9090/api/v1/agents/register \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "agent_id": "worker-03",
+    "agent_type": "openclaw",
+    "hostname": "arm0",
+    "capabilities": ["code:rust", "review"],
+    "max_concurrency": 2,
+    "metadata": {"version": "1.0"}
+  }'
+```
+
+### Heartbeat
+
+```bash
+curl -X POST http://localhost:9090/api/v1/agents/heartbeat \
+  -H 'Content-Type: application/json' \
+  -d '{"agent_id": "worker-03"}'
+```
+
+### List Available Tasks
+
+```bash
+curl 'http://localhost:9090/api/v1/tasks?status=created'
+```
+
+### Dequeue
+
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/dequeue \
+  -H 'Content-Type: application/json' \
+  -H 'Authorization: Bearer my-token' \
+  -d '{"agent_id": "worker-03", "capabilities": ["code:rust"]}'
+```
+
+Returns 200 with Task JSON, or 204 if no matching task.
+
+### Get Task Detail
+
+```bash
+curl 'http://localhost:9090/api/v1/tasks/org%2Frepo%2342'
+```
+
+### Update Task Status
+
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/status \
+  -H 'Content-Type: application/json' \
+  -H 'Authorization: Bearer my-token' \
+  -d '{"status": "running"}'
+```
+
+### Complete Task with Receipt
+
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/complete \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "task_id": "org/repo#42",
+    "agent_id": "worker-03",
+    "status": "completed",
+    "duration_seconds": 180,
+    "summary": "Implemented the feature as described",
+    "artifacts": [
+      {"artifact_type": "pr", "url": "https://git.example/org/repo/pulls/15"}
+    ],
+    "error": null
+  }'
+```
+
+### Submit Receipt
+
+```bash
+curl -X POST http://localhost:9090/api/v1/receipts \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "task_id": "org/repo#42",
+    "agent_id": "worker-03",
+    "status": "completed",
+    "duration_seconds": 180,
+    "summary": "Done",
+    "artifacts": [],
+    "error": null
+  }'
+```
+
+### Retry a Failed Task
+
+```bash
+curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/retry
+```
+
+Only works for tasks in `failed` or `agent_lost` status.
+
+### List Agents
+
+```bash
+curl 'http://localhost:9090/api/v1/agents?status=online&capability=code:rust'
+```
+
+### Deregister
+
+```bash
+curl -X POST http://localhost:9090/api/v1/agents/deregister \
+  -H 'Content-Type: application/json' \
+  -d '{"agent_id": "worker-03"}'
+```
+
+### Health Check
+
+```bash
+curl http://localhost:9090/healthz
+```
+
+---
+
+## Guidelines
+
+### Authentication
+
+- **http_pull endpoints** (`dequeue`, `status update`): require `Authorization: Bearer <token>` if `http_pull_token` is configured. If not configured, no auth is needed.
+- **All other endpoints**: no authentication required.
+- **Webhook endpoint**: requires HMAC-SHA256 signature header.
+
+### Error Handling
+
+| Code | Meaning | Action |
+|------|---------|--------|
+| 401 | Unauthorized | Check your Bearer token. If expired, re-register to get a new one. |
+| 404 | Not Found | Task may have been completed or never existed. Move on. |
+| 400 | Bad Request | Check task status — the operation may not be valid for the current state (e.g. retrying a `running` task). |
+| 204 | No Content (dequeue) | No matching tasks available. Wait and retry. |
+| 500 | Server Error | Retry with exponential backoff. Report if persistent. |
+
+### Retry Strategy
+
+- Use exponential backoff for transient errors (network, 500s): 1s, 2s, 4s, 8s, max 30s.
+- Do not retry 400 errors — fix your request.
+- For 404 on dequeue: poll again after a reasonable interval (e.g. 10–30 seconds).
+- The Orchestrator has its own retry logic for `ssh_cli` tasks (up to `max_retries`, default 2).
+
+### Task Status Flow
+
+```
+created → assigned → running → review_pending → completed
+                               ↘ failed
+                  ↘ agent_lost
+         ↘ cancelled
+```
+
+- `failed` and `agent_lost` tasks can be retried via the retry endpoint.
+- `review_pending` means a PR was opened and is awaiting merge/review.
+- `completed` and `cancelled` are terminal states.
+
+### Heartbeat Requirements
+
+- Send heartbeats at least every `heartbeat_interval_secs` (default: 60s).
+- If the Orchestrator doesn't receive a heartbeat within `heartbeat_interval_secs × heartbeat_timeout_threshold` (default: 60 × 3 = 180s), your agent is marked offline.
+- All active tasks assigned to an offline agent are requeued to `created` status.
+
+---
+
+## Forgejo Workflow
+
+### Task Creation (Issue → Task)
+
+1. Open a Forgejo Issue with a label `agent:<type>` (e.g. `agent:code`).
+2. The webhook creates a task with `task_id = {repo}#{issue_number}`.
+3. Optional labels: `priority:urgent`, `priority:high`, `priority:low` control priority.
+
+### Branch Naming
+
+- Branch: `task/{url_encoded_task_id}`
+- Example: `org/repo#42` → branch `task/org%2Frepo%2342`
+
+### PR Workflow
+
+1. Work on the `task/*` branch.
+2. Open a PR from that branch.
+3. Orchestrator receives `pull_request.opened` webhook → task goes to `review_pending`.
+4. Pushes to the branch update `last_activity_at`.
+5. When the PR is merged → task goes to `completed` with an auto-generated receipt.
+
+### For http_pull Agents
+
+After dequeuing a task, create the branch and PR yourself:
+
+```bash
+git checkout -b task/org%2Frepo%2342
+# ... do the work ...
+git push origin task/org%2Frepo%2342
+# Create PR via Forgejo API
+# The webhook will update the task automatically
+```
+
+### For ssh_cli Agents
+
+The Orchestrator passes the branch name in the structured prompt. Create the branch, push, and open the PR as part of your CLI execution. The webhooks handle status updates.