agent-fleet/docs/agent-onboarding-guide.md
Zer4tul d1a746a8cb docs: add agent API reference, onboarding guide, and universal skill
- docs/agent-api-reference.md (473 lines): complete HTTP API reference for all 12 endpoints
- docs/agent-onboarding-guide.md (272 lines): ssh_cli and http_pull workflows, Forgejo integration
- skill/SKILL.md (281 lines): universal agent skill, platform-agnostic, curl-based examples

All content in English. No code changes.
2026-05-12 14:57:05 +08:00

272 lines
8.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Agent Fleet — Agent Onboarding Guide
This guide explains how to integrate an agent with the Agent Fleet Orchestrator.
---
## Execution Modes
Agent Fleet supports two execution modes. The mode is set per-task at creation time (defaults to `ssh_cli`).
| Aspect | `ssh_cli` | `http_pull` |
|--------|-----------|-------------|
| Who initiates? | Orchestrator (via SSH or local subprocess) | Agent (via HTTP API) |
| Control flow | Orchestrator builds prompt, runs CLI, collects output | Agent decides when to dequeue and execute |
| Agent requirements | CLI binary on a configured host | HTTP client, can call REST API |
| Auth needed? | No (Orchestrator manages) | Yes (Bearer token) |
| Best for | Codex CLI, Claude Code, OpenCode — agents with CLIs | OpenClaw/Jeeves, Hermes — agents with their own schedulers |
| Task creation trigger | Forgejo Issue webhook (default) | Same, or API call |
---
## ssh_cli Workflow
### 1. Configure a Host
Add a `[[hosts]]` section to `config.toml` on the Orchestrator:
```toml
[[hosts]]
host_id = "host-worker-01"
hostname = "192.168.1.100"
ssh_user = "deploy"
ssh_port = 22
ssh_key_path = "/home/deploy/.ssh/id_ed25519"
work_dir = "/opt/agent-workspace"
agents = [
{ agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust", "code:python"] },
]
```
For local execution (same machine as Orchestrator), use `hostname = "localhost"` — the Orchestrator uses a local subprocess instead of SSH.
### 2. Install the Agent CLI
The CLI binary must be available on the target host in `$PATH`. The Orchestrator checks availability with `which <binary>`.
Built-in CLI templates:
| Agent Type | CLI Command |
|------------|-------------|
| `codex-cli` | `codex exec --json '{prompt}'` |
| `claude-code` | `claude -p '{prompt}' --output-format json --dangerously-skip-permissions` |
Custom templates can be defined in `config.toml` under `[adapters]`.
### 3. Orchestrator Handles Everything
When a Forgejo Issue with an `agent:*` label arrives:
1. Orchestrator creates a task (`execution_mode = ssh_cli`)
2. Dispatch loop picks the task, selects a host by capability + load
3. SSH (or local subprocess) executes the CLI with a structured prompt
4. Output is parsed (Codex JSON or Claude JSON format)
5. Task status updates: `created``assigned``running``completed` (or `failed`)
### 4. What the Agent Receives (Structured Prompt)
The Orchestrator constructs this prompt and passes it as the `{prompt}` variable:
```
Task ID: org/repo#42
Type: code
Goal:
Implement the feature described in the issue body
Constraints:
- Execution mode: ssh_cli
- Labels: code:rust
- Branch: task/org%2Frepo%2342
- Expected output: JSON receipt
Validation:
- Run relevant tests if code changed
- Summarize changes and artifacts
```
### 5. Expected CLI Output
The CLI must output JSON to stdout. The format depends on the parser:
**Codex JSON:**
```json
{"status": "completed", "summary": "done", "duration_seconds": 120, "artifacts": [{"artifact_type": "pr", "url": "https://..."}]}
```
**Claude JSON:**
```json
{"status": "completed", "summary": "done", "duration_seconds": 95, "error": null}
```
If output is not valid JSON, the task is marked `failed`.
---
## http_pull Workflow
### 1. Register
```bash
curl -X POST http://localhost:9090/api/v1/agents/register \
-H 'Content-Type: application/json' \
-d '{"agent_id": "worker-03", "agent_type": "openclaw", "hostname": "arm0", "capabilities": ["code:rust"], "max_concurrency": 2}'
```
Response contains a `registry_token`. Keep it for subsequent API calls (if `http_pull_token` is configured, use that shared token instead).
### 2. Heartbeat (periodic)
Send a heartbeat every N seconds (default interval: 60s). If the Orchestrator doesn't receive one within `heartbeat_interval_secs × heartbeat_timeout_threshold`, the agent is marked offline and its tasks are requeued.
```bash
curl -X POST http://localhost:9090/api/v1/agents/heartbeat \
-H 'Content-Type: application/json' \
-d '{"agent_id": "worker-03"}'
```
### 3. Dequeue a Task
```bash
curl -X POST http://localhost:9090/api/v1/tasks/dequeue \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{"agent_id": "worker-03", "capabilities": ["code:rust"]}'
```
Returns `200 OK` with a Task object, or `204 No Content` if nothing available.
Only tasks with `execution_mode = http_pull` are returned.
### 4. Update Status While Working
```bash
curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/status \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{"status": "running"}'
```
### 5. Complete the Task
```bash
curl -X POST http://localhost:9090/api/v1/tasks/org%2Frepo%2342/complete \
-H 'Content-Type: application/json' \
-d '{
"task_id": "org/repo#42",
"agent_id": "worker-03",
"status": "completed",
"duration_seconds": 180,
"summary": "Fixed the issue",
"artifacts": [{"artifact_type": "pr", "url": "https://git.example/org/repo/pulls/15"}],
"error": null
}'
```
Or use the receipts endpoint:
```bash
curl -X POST http://localhost:9090/api/v1/receipts \
-H 'Content-Type: application/json' \
-d '<same receipt body>'
```
### 6. Deregister When Done
```bash
curl -X POST http://localhost:9090/api/v1/agents/deregister \
-H 'Content-Type: application/json' \
-d '{"agent_id": "worker-03"}'
```
---
## Forgejo Integration
### How Issues Become Tasks
1. A Forgejo Issue is opened with a label matching `agent:*` (e.g. `agent:code`)
2. Forgejo sends an `issues` webhook to `POST /api/v1/webhooks/forgejo`
3. The `agent:*` label value becomes `task_type` (e.g. `code`)
4. Priority is inferred from labels: `priority:urgent`, `priority:high`, `priority:low` (default: `normal`)
5. A task is created with:
- `task_id` = `{repo_full_name}#{issue_number}` (e.g. `org/repo#42`)
- `execution_mode` = `ssh_cli` (default for Forgejo-originated tasks)
- `branch_name` = `task/{url_encoded_task_id}` (e.g. `task/org%2Frepo%2342`)
- `pr_title` = `feat: {issue_title} (#{issue_number})`
### Branch Naming Convention
- Branch: `task/{url_encoded_task_id}`
- Example: task `org/repo#42` → branch `task/org%2Frepo%2342`
### PR Lifecycle
| Event | Effect |
|-------|--------|
| PR opened (branch = `task/*`) | Task → `review_pending` |
| PR merged | Task → `completed`, auto receipt generated |
| Push to `task/*` branch | Task `last_activity_at` updated |
### Task Status Flow
```
created → assigned → running → review_pending → completed
↘ failed
↘ agent_lost
↘ cancelled
```
Any `failed` or `agent_lost` task can be retried via `POST /api/v1/tasks/{task_id}/retry` (transitions to `assigned`). Retry is limited by `max_retries` (default: 2).
---
## Structured Prompt Format (ssh_cli)
When the Orchestrator executes an agent via SSH, it constructs a structured prompt:
```
Task ID: <task_id>
Type: <task_type>
Goal:
<requirements>
Constraints:
- Execution mode: ssh_cli
- Labels: <comma-separated labels or <none>>
- Branch: <branch_name>
- Expected output: JSON receipt
Validation:
- Run relevant tests if code changed
- Summarize changes and artifacts
```
The prompt is injected into the CLI template as the `{prompt}` variable. Other available variables: `{work_dir}`, `{task_id}`, `{branch}`.
---
## FAQ
**Q: How do I know which execution mode to use?**
A: If you have a CLI binary and run on a configured host → `ssh_cli`. If you have your own scheduler or run outside configured hosts → `http_pull`.
**Q: Do I need to register for ssh_cli mode?**
A: No. The Orchestrator manages ssh_cli tasks entirely. Registration is only for `http_pull` agents.
**Q: What happens if my agent crashes during ssh_cli execution?**
A: The task is marked `failed`. If `retry_count < max_retries`, the dispatch loop will retry automatically.
**Q: What happens if my http_pull agent stops sending heartbeats?**
A: After `heartbeat_interval_secs × heartbeat_timeout_threshold` seconds, the agent is marked offline and all its tasks are requeued with status `created`.
**Q: Can a task switch between execution modes?**
A: No. The `execution_mode` is set at creation time and cannot be changed.
**Q: How do I create a task manually?**
A: Use the Forgejo webhook flow (open an Issue with `agent:*` label), or directly insert into the database. There is no public "create task" API endpoint.
**Q: What label format triggers task creation?**
A: Issues must have a label starting with `agent:` (e.g. `agent:code`, `agent:review`). The value after `agent:` becomes the task type. Issues without such a label are ignored.
**Q: How does the review loop work?**
A: When a PR is opened (not merged), the task goes to `review_pending`. If the PR is not merged and the review cycle count exceeds `max_retries`, the task is marked `failed`. For `ssh_cli`, the Orchestrator re-dispatches automatically.