Zer4tul 1f351a1734 docs: replace localhost with FLEET_API_URL, add persistence and heartbeat guidance

- Replace all localhost:9090 with FLEET_API_URL:PORT placeholder
- Add Step 0: persist Fleet API URL to agent memory
- Clarify heartbeat must be periodic loop (60s interval)
- Add execution mode self-selection decision flow
- Add persisting configuration section (URL, agent_id, token)

2026-05-12 18:34:23 +08:00

12 KiB

Raw Blame History

Agent Fleet — Agent Onboarding Guide

FLEET_API_URL:PORT means the address of your Agent Fleet Orchestrator (for example, 100.102.101.43:9090). If you do not know it, ask your user for the Fleet API address before using these examples.

This guide explains how to integrate an agent with the Agent Fleet Orchestrator.

Execution Modes

Agent Fleet supports two execution modes. The mode is set per-task at creation time (defaults to ssh_cli).

Aspect	`ssh_cli`	`http_pull`
Who initiates?	Orchestrator (via SSH or local subprocess)	Agent (via HTTP API)
Control flow	Orchestrator builds prompt, runs CLI, collects output	Agent decides when to dequeue and execute
Agent requirements	CLI binary on a configured host	HTTP client, can call REST API
Auth needed?	No (Orchestrator manages)	Yes (Bearer token)
Best for	Codex CLI, Claude Code, OpenCode — agents with CLIs	OpenClaw/Jeeves, Hermes — agents with their own schedulers
Task creation trigger	Forgejo Issue webhook (default)	Same, or API call

How to Choose an Execution Mode

Start by asking: "What is the address of the Agent Fleet Orchestrator?" If you do not know the address, ask your user.

Then decide based on connectivity and control model:

Use ssh_cli if:
- You have a CLI binary that can run non-interactively
- The Fleet orchestrator can SSH into your machine or server
- You want the Orchestrator to initiate execution for each task
Use http_pull if:
- The Fleet orchestrator cannot SSH into your machine
- You are behind NAT, a firewall, or on a network the Orchestrator cannot reach
- You already have your own scheduler, worker loop, or bot runtime
- You want your agent to decide when to pull work
If unsure, ask your user:
- "Can the Fleet orchestrator SSH into this machine?"
- If yes → ssh_cli
- If no → http_pull

Self-Determination Decision Flow

As an agent, you can self-determine your mode by answering these questions:

1. Do I have a CLI binary that can execute tasks non-interactively?
   └─ NO → Use http_pull
   └─ YES → Continue

2. Can the Fleet orchestrator SSH into my machine?
   └─ NO → Use http_pull
   └─ YES → Continue

3. Do I have my own scheduler, worker loop, or bot runtime?
   └─ YES → Use http_pull (you prefer to control your own task flow)
   └─ NO → Continue

4. Do I want the Orchestrator to initiate execution remotely?
   └─ YES → Use ssh_cli
   └─ NO → Use http_pull

Quick reference: If you can be reached via SSH and don't have a custom scheduler, use ssh_cli. Otherwise, use http_pull.

Decision Table

Question	If Yes	If No
Do you have a CLI binary that can execute tasks non-interactively?	Continue evaluating for `ssh_cli`	Use `http_pull`
Can the Fleet orchestrator SSH into this machine?	Use `ssh_cli`	Use `http_pull`
Is this agent behind NAT, a firewall, or otherwise unreachable from the Orchestrator?	Use `http_pull`	Continue evaluating
Does the agent already run its own scheduler or task loop?	Use `http_pull`	Either mode may fit
Do you want the Orchestrator to launch the agent process remotely?	Use `ssh_cli`	Use `http_pull`

Common Scenarios

Scenario	Recommended Mode	Why
Codex / Claude Code / OpenCode on a reachable server	`ssh_cli`	Fleet can SSH in and run the CLI directly
OpenClaw / Hermes Agent / bot framework	`http_pull`	The agent already has a runtime and should pull work itself
Agent running on a laptop behind NAT	`http_pull`	Fleet cannot reach it reliably over SSH
Shared VM with a well-known SSH host and installed CLI	`ssh_cli`	Centralized orchestration is simpler

Simple Rule of Thumb

If the Fleet server can reach you, ssh_cli is usually simpler.
If you must reach the Fleet server, use http_pull.

ssh_cli Workflow

1. Configure a Host

Add a [[hosts]] section to config.toml on the Orchestrator:

[[hosts]]
host_id = "host-worker-01"
hostname = "192.168.1.100"
ssh_user = "deploy"
ssh_port = 22
ssh_key_path = "/home/deploy/.ssh/id_ed25519"
work_dir = "/opt/agent-workspace"
agents = [
  { agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust", "code:python"] },
]

For local execution (same machine as Orchestrator), use hostname = "localhost" — the Orchestrator uses a local subprocess instead of SSH.

2. Install the Agent CLI

The CLI binary must be available on the target host in $PATH. The Orchestrator checks availability with which <binary>.

Built-in CLI templates:

Agent Type	CLI Command
`codex-cli`	`codex exec --json '{prompt}'`
`claude-code`	`claude -p '{prompt}' --output-format json --dangerously-skip-permissions`

Custom templates can be defined in config.toml under [adapters].

3. Orchestrator Handles Everything

When a Forgejo Issue with an agent:* label arrives:

Orchestrator creates a task (execution_mode = ssh_cli)
Dispatch loop picks the task, selects a host by capability + load
SSH (or local subprocess) executes the CLI with a structured prompt
Output is parsed (Codex JSON or Claude JSON format)
Task status updates: created → assigned → running → completed (or failed)

4. What the Agent Receives (Structured Prompt)

The Orchestrator constructs this prompt and passes it as the {prompt} variable:

Task ID: org/repo#42
Type: code
Goal:
Implement the feature described in the issue body

Constraints:
- Execution mode: ssh_cli
- Labels: code:rust
- Branch: task/org%2Frepo%2342
- Expected output: JSON receipt

Validation:
- Run relevant tests if code changed
- Summarize changes and artifacts

5. Expected CLI Output

The CLI must output JSON to stdout. The format depends on the parser:

Codex JSON:

{"status": "completed", "summary": "done", "duration_seconds": 120, "artifacts": [{"artifact_type": "pr", "url": "https://..."}]}

Claude JSON:

{"status": "completed", "summary": "done", "duration_seconds": 95, "error": null}

If output is not valid JSON, the task is marked failed.

http_pull Workflow

1. Register

curl -X POST http://FLEET_API_URL:PORT/api/v1/agents/register \
  -H 'Content-Type: application/json' \
  -d '{"agent_id": "worker-03", "agent_type": "openclaw", "hostname": "arm0", "capabilities": ["code:rust"], "max_concurrency": 2}'

Response contains a registry_token. Keep it for subsequent API calls (if http_pull_token is configured, use that shared token instead).

Recommended immediately after registration:

Persist FLEET_API_URL, your agent_id, and the returned registry_token
Start the heartbeat loop before your first dequeue request

2. Heartbeat (periodic)

Heartbeat must be a background loop, not a one-shot call.

Default heartbeat interval: every 60 seconds
Recommended behavior: start the loop immediately after registration, before the first dequeue
If the Orchestrator does not receive a heartbeat within heartbeat_interval_secs × heartbeat_timeout_threshold (default: 180 seconds), the agent is marked offline
When an agent is marked offline, its assigned tasks are requeued
The heartbeat loop should run for the entire lifetime of the agent

curl -X POST http://FLEET_API_URL:PORT/api/v1/agents/heartbeat \
  -H 'Content-Type: application/json' \
  -d '{"agent_id": "worker-03"}'

3. Dequeue a Task

curl -X POST http://FLEET_API_URL:PORT/api/v1/tasks/dequeue \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <token>' \
  -d '{"agent_id": "worker-03", "capabilities": ["code:rust"]}'

Returns 200 OK with a Task object, or 204 No Content if nothing available.

Only tasks with execution_mode = http_pull are returned.

4. Update Status While Working

curl -X POST http://FLEET_API_URL:PORT/api/v1/tasks/org%2Frepo%2342/status \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <token>' \
  -d '{"status": "running"}'

5. Complete the Task

curl -X POST http://FLEET_API_URL:PORT/api/v1/tasks/org%2Frepo%2342/complete \
  -H 'Content-Type: application/json' \
  -d '{
    "task_id": "org/repo#42",
    "agent_id": "worker-03",
    "status": "completed",
    "duration_seconds": 180,
    "summary": "Fixed the issue",
    "artifacts": [{"artifact_type": "pr", "url": "https://git.example/org/repo/pulls/15"}],
    "error": null
  }'

Or use the receipts endpoint:

curl -X POST http://FLEET_API_URL:PORT/api/v1/receipts \
  -H 'Content-Type: application/json' \
  -d '<same receipt body>'

6. Deregister When Done

curl -X POST http://FLEET_API_URL:PORT/api/v1/agents/deregister \
  -H 'Content-Type: application/json' \
  -d '{"agent_id": "worker-03"}'

Forgejo Integration

How Issues Become Tasks

A Forgejo Issue is opened with a label matching agent:* (e.g. agent:code)
Forgejo sends an issues webhook to POST /api/v1/webhooks/forgejo
The agent:* label value becomes task_type (e.g. code)
Priority is inferred from labels: priority:urgent, priority:high, priority:low (default: normal)
A task is created with:
- task_id = {repo_full_name}#{issue_number} (e.g. org/repo#42)
- execution_mode = ssh_cli (default for Forgejo-originated tasks)
- branch_name = task/{url_encoded_task_id} (e.g. task/org%2Frepo%2342)
- pr_title = feat: {issue_title} (#{issue_number})

Branch Naming Convention

Branch: task/{url_encoded_task_id}
Example: task org/repo#42 → branch task/org%2Frepo%2342

PR Lifecycle

Event	Effect
PR opened (branch = `task/*`)	Task → `review_pending`
PR merged	Task → `completed`, auto receipt generated
Push to `task/*` branch	Task `last_activity_at` updated

Task Status Flow

created → assigned → running → review_pending → completed
                               ↘ failed
                  ↘ agent_lost
         ↘ cancelled

Any failed or agent_lost task can be retried via POST /api/v1/tasks/{task_id}/retry (transitions to assigned). Retry is limited by max_retries (default: 2).

Structured Prompt Format (ssh_cli)

When the Orchestrator executes an agent via SSH, it constructs a structured prompt:

Task ID: <task_id>
Type: <task_type>
Goal:
<requirements>

Constraints:
- Execution mode: ssh_cli
- Labels: <comma-separated labels or <none>>
- Branch: <branch_name>
- Expected output: JSON receipt

Validation:
- Run relevant tests if code changed
- Summarize changes and artifacts

The prompt is injected into the CLI template as the {prompt} variable. Other available variables: {work_dir}, {task_id}, {branch}.

FAQ

Q: How do I know which execution mode to use? A: First determine the Fleet API address, then ask whether the Orchestrator can SSH into the machine. If it can, ssh_cli is usually the best fit. If it cannot, use http_pull.

Q: Do I need to register for ssh_cli mode? A: No. The Orchestrator manages ssh_cli tasks entirely. Registration is only for http_pull agents.

Q: What happens if my agent crashes during ssh_cli execution? A: The task is marked failed. If retry_count < max_retries, the dispatch loop will retry automatically.

Q: What happens if my http_pull agent stops sending heartbeats? A: After heartbeat_interval_secs × heartbeat_timeout_threshold seconds, the agent is marked offline and all its tasks are requeued with status created.

Q: Can a task switch between execution modes? A: No. The execution_mode is set at creation time and cannot be changed.

Q: How do I create a task manually? A: Use the Forgejo webhook flow (open an Issue with agent:* label), or directly insert into the database. There is no public "create task" API endpoint.

Q: What label format triggers task creation? A: Issues must have a label starting with agent: (e.g. agent:code, agent:review). The value after agent: becomes the task type. Issues without such a label are ignored.

Q: How does the review loop work? A: When a PR is opened (not merged), the task goes to review_pending. If the PR is not merged and the review cycle count exceeds max_retries, the task is marked failed. For ssh_cli, the Orchestrator re-dispatches automatically.

12 KiB Raw Blame History Unescape Escape