Previous bug: only code:* and review labels were checked, so agent:document, agent:tests etc. were never filtered. Any agent could pick up any task. Now: labels with agent: prefix are matched against agent capabilities. Other labels are treated as metadata. Includes regression test.
9 KiB
Agent Fleet Platform
Agent Fleet is a multi-agent orchestration system built with Rust, designed to coordinate AI agents for task execution across distributed environments. It integrates with Forgejo for task management and supports dual execution modes (SSH/CLI and HTTP pull).
Overview
Agent Fleet acts as the central orchestrator that:
- Receives tasks from Forgejo Issues via webhooks
- Dispatches tasks to agents based on capabilities and load
- Tracks task lifecycle through a state machine
- Validates receipts and artifacts (e.g., PRs)
- Manages agent heartbeats and health
Key Features
- Dual Execution Modes:
ssh_cli(orchestrator-initiated) andhttp_pull(agent-initiated) - Event-Sourced State: All task state transitions are recorded as events
- Capability-Based Dispatch: Tasks are routed to agents based on label matching
- Auto-Retry: Failed tasks can be retried up to
max_retriestimes - Timeout Enforcement: Tasks are marked
failedif they exceedtask_timeout_secs - Forgejo Integration: Automatic task creation from labeled issues, PR lifecycle tracking
Architecture
┌─────────────┐ ┌─────────────────┐
│ Forgejo │◄──webhook────────┤ Agent Fleet │
│ (Issues) │ │ Orchestrator │
└─────────────┘ └───────┬─────────┘
│
┌──────────────────────────┼──────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ ssh_cli Hosts │ │ http_pull │ │ Dispatcher │
│ (SSH/Local) │ │ Agents │ │ Loop │
└───────────────┘ └───────────────┘ └───────────────┘
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Agent CLIs │ │ Event Store │
│ (codex, etc) │ │ (SQLite) │
└───────────────┘ └───────────────┘
Components
- Event Store (
src/core/event_store.rs): SQLite-backed persistent event store - State Machine (
src/core/state_machine.rs): Validates and executes state transitions - Task Queue (
src/core/task_queue.rs): HTTP pull task queue with capability matching - Dispatcher (
src/dispatch.rs): Periodic dispatch loop forssh_clitasks - SshExecutor (
src/execution/mod.rs): Executes agent CLIs via SSH or local subprocess - Forgejo Client (
src/integrations/forgejo.rs): Forgejo API integration and webhook handling - API Handlers (
src/api.rs): REST API for agents and task management
Quick Start
Prerequisites
- Rust 2024 edition
- cargo-zigbuild (for cross-compilation)
- Forgejo instance (or compatible forge)
Development Setup
# Clone the repository
git clone https://git.0x08.org/zer4tul/agent-fleet.git
cd agent-fleet
# Copy example config
cp config.example.toml config.toml
# Edit config.toml with your settings
# - Forgejo URL and token
# - Webhook secret
# - Host configurations for ssh_cli mode
Local Development
# Run tests
cargo test
# Run the server
cargo run
# Or with custom bind/port
cargo run -- --bind 127.0.0.1 --port 9090
Building for aarch64
# Install cargo-zigbuild if not already installed
cargo install cargo-zigbuild
# Cross-compile for aarch64-unknown-linux-gnu
cargo zigbuild --target aarch64-unknown-linux-gnu --release
# Binary will be at: target/aarch64-unknown-linux-gnu/release/agent-fleet
Configuration
Configuration is done via TOML file. See config.example.toml for a complete example.
Server Settings
[server]
bind = "0.0.0.0" # Listen address
port = 9090 # HTTP port
Forgejo Integration
[forgejo]
url = "https://git.0x08.org"
token = "your-api-token" # Forgejo API token
webhook_secret = "your-webhook-secret" # Shared secret for webhook validation
Orchestrator Settings
[orchestrator]
db_path = "data/agent-fleet.db" # SQLite database path
heartbeat_interval_secs = 60 # Agent heartbeat interval
heartbeat_timeout_threshold = 3 # Missed heartbeats before offline
task_timeout_secs = 1800 # Default task timeout (30 min)
default_max_retries = 2 # Max retry attempts
dispatch_interval_secs = 10 # Dispatch loop interval
# http_pull_token = "optional-bearer-token" # Auth for http_pull agents
SSH CLI Hosts
Configure remote hosts for ssh_cli execution:
[[hosts]]
host_id = "host-worker-01"
hostname = "192.168.1.100"
ssh_user = "deploy"
ssh_port = 22
ssh_key_path = "/home/deploy/.ssh/id_ed25519"
work_dir = "/opt/agent-workspace"
agents = [
{ agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust", "code:python"] },
{ agent_type = "claude-code", max_concurrency = 1, capabilities = ["code:rust"] },
]
# For local execution (same machine as orchestrator)
[[hosts]]
host_id = "local"
hostname = "localhost"
ssh_user = "runner"
work_dir = "/tmp/agent-workspace"
agents = [
{ agent_type = "codex-cli", max_concurrency = 1, capabilities = ["code:rust"] },
]
API Summary
Agent Fleet exposes a REST API for agent registration, task management, and webhooks.
Agent Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/v1/agents/register |
POST | Register or update an agent |
/api/v1/agents/heartbeat |
POST | Update agent heartbeat |
/api/v1/agents/deregister |
POST | Deregister an agent |
/api/v1/agents |
GET | List agents with filters |
Task Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/v1/tasks |
GET | List tasks |
/api/v1/tasks/{task_id} |
GET | Get task details |
/api/v1/tasks/dequeue |
POST | Dequeue task (http_pull only) |
/api/v1/tasks/{task_id}/status |
POST | Update task status (http_pull only) |
/api/v1/tasks/{task_id}/complete |
POST | Complete task with receipt |
/api/v1/tasks/{task_id}/retry |
POST | Retry failed task |
Other Endpoints
| Endpoint | Method | Description |
|---|---|---|
/healthz |
GET | Health check |
/api/v1/webhooks/forgejo |
POST | Forgejo webhook handler |
/api/v1/receipts |
POST | Submit task receipt |
For detailed API documentation, see docs/agent-api-reference.md.
Deployment
See docs/deployment.md for detailed deployment instructions including:
- Cross-compilation with cargo-zigbuild
- Systemd service configuration
- Caddy reverse proxy setup
Architecture Details
For in-depth architectural information, see docs/architecture.md covering:
- Dual execution model comparison
- Dispatch loop internals
- Task lifecycle and state machine
- Forgejo integration flow
Agent Integration
See docs/agent-onboarding-guide.md for:
- Choosing between
ssh_cliandhttp_pullmodes - Agent registration and heartbeat
- Task dequeue and completion workflows
Development
Running Tests
cargo test
Code Style
- Rust 2024 edition
thiserrorfor error typesserdefor serialization- All DB operations go through
EventStore Arc<Mutex<EventStore>>for shared state
Project Structure
src/
├── main.rs # Entry point, server setup
├── config.rs # TOML configuration
├── api.rs # HTTP API handlers
├── dispatch.rs # Task dispatch loop
├── execution/ # SSH execution
├── integrations/ # Forgejo client
├── adapters/ # Agent adapter interface
└── core/ # Business logic
├── models.rs # Data models
├── event_store.rs # Event sourcing
├── state_machine.rs # State transitions
├── task_queue.rs # HTTP pull queue
├── timeout.rs # Timeout checker
└── retry.rs # Retry policy
License
MIT