Previous bug: only code:* and review labels were checked, so agent:document, agent:tests etc. were never filtered. Any agent could pick up any task. Now: labels with agent: prefix are matched against agent capabilities. Other labels are treated as metadata. Includes regression test.
261 lines
9 KiB
Markdown
261 lines
9 KiB
Markdown
# Agent Fleet Platform
|
|
|
|
Agent Fleet is a multi-agent orchestration system built with Rust, designed to coordinate AI agents for task execution across distributed environments. It integrates with [Forgejo](https://forgejo.org/) for task management and supports dual execution modes (SSH/CLI and HTTP pull).
|
|
|
|
## Overview
|
|
|
|
Agent Fleet acts as the central orchestrator that:
|
|
- Receives tasks from Forgejo Issues via webhooks
|
|
- Dispatches tasks to agents based on capabilities and load
|
|
- Tracks task lifecycle through a state machine
|
|
- Validates receipts and artifacts (e.g., PRs)
|
|
- Manages agent heartbeats and health
|
|
|
|
### Key Features
|
|
|
|
- **Dual Execution Modes**: `ssh_cli` (orchestrator-initiated) and `http_pull` (agent-initiated)
|
|
- **Event-Sourced State**: All task state transitions are recorded as events
|
|
- **Capability-Based Dispatch**: Tasks are routed to agents based on label matching
|
|
- **Auto-Retry**: Failed tasks can be retried up to `max_retries` times
|
|
- **Timeout Enforcement**: Tasks are marked `failed` if they exceed `task_timeout_secs`
|
|
- **Forgejo Integration**: Automatic task creation from labeled issues, PR lifecycle tracking
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────┐ ┌─────────────────┐
|
|
│ Forgejo │◄──webhook────────┤ Agent Fleet │
|
|
│ (Issues) │ │ Orchestrator │
|
|
└─────────────┘ └───────┬─────────┘
|
|
│
|
|
┌──────────────────────────┼──────────────────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
|
│ ssh_cli Hosts │ │ http_pull │ │ Dispatcher │
|
|
│ (SSH/Local) │ │ Agents │ │ Loop │
|
|
└───────────────┘ └───────────────┘ └───────────────┘
|
|
│ │
|
|
▼ ▼
|
|
┌───────────────┐ ┌───────────────┐
|
|
│ Agent CLIs │ │ Event Store │
|
|
│ (codex, etc) │ │ (SQLite) │
|
|
└───────────────┘ └───────────────┘
|
|
```
|
|
|
|
### Components
|
|
|
|
- **Event Store** (`src/core/event_store.rs`): SQLite-backed persistent event store
|
|
- **State Machine** (`src/core/state_machine.rs`): Validates and executes state transitions
|
|
- **Task Queue** (`src/core/task_queue.rs`): HTTP pull task queue with capability matching
|
|
- **Dispatcher** (`src/dispatch.rs`): Periodic dispatch loop for `ssh_cli` tasks
|
|
- **SshExecutor** (`src/execution/mod.rs`): Executes agent CLIs via SSH or local subprocess
|
|
- **Forgejo Client** (`src/integrations/forgejo.rs`): Forgejo API integration and webhook handling
|
|
- **API Handlers** (`src/api.rs`): REST API for agents and task management
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- Rust 2024 edition
|
|
- cargo-zigbuild (for cross-compilation)
|
|
- Forgejo instance (or compatible forge)
|
|
|
|
### Development Setup
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone https://git.0x08.org/zer4tul/agent-fleet.git
|
|
cd agent-fleet
|
|
|
|
# Copy example config
|
|
cp config.example.toml config.toml
|
|
|
|
# Edit config.toml with your settings
|
|
# - Forgejo URL and token
|
|
# - Webhook secret
|
|
# - Host configurations for ssh_cli mode
|
|
```
|
|
|
|
### Local Development
|
|
|
|
```bash
|
|
# Run tests
|
|
cargo test
|
|
|
|
# Run the server
|
|
cargo run
|
|
|
|
# Or with custom bind/port
|
|
cargo run -- --bind 127.0.0.1 --port 9090
|
|
```
|
|
|
|
### Building for aarch64
|
|
|
|
```bash
|
|
# Install cargo-zigbuild if not already installed
|
|
cargo install cargo-zigbuild
|
|
|
|
# Cross-compile for aarch64-unknown-linux-gnu
|
|
cargo zigbuild --target aarch64-unknown-linux-gnu --release
|
|
|
|
# Binary will be at: target/aarch64-unknown-linux-gnu/release/agent-fleet
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Configuration is done via TOML file. See `config.example.toml` for a complete example.
|
|
|
|
### Server Settings
|
|
|
|
```toml
|
|
[server]
|
|
bind = "0.0.0.0" # Listen address
|
|
port = 9090 # HTTP port
|
|
```
|
|
|
|
### Forgejo Integration
|
|
|
|
```toml
|
|
[forgejo]
|
|
url = "https://git.0x08.org"
|
|
token = "your-api-token" # Forgejo API token
|
|
webhook_secret = "your-webhook-secret" # Shared secret for webhook validation
|
|
```
|
|
|
|
### Orchestrator Settings
|
|
|
|
```toml
|
|
[orchestrator]
|
|
db_path = "data/agent-fleet.db" # SQLite database path
|
|
heartbeat_interval_secs = 60 # Agent heartbeat interval
|
|
heartbeat_timeout_threshold = 3 # Missed heartbeats before offline
|
|
task_timeout_secs = 1800 # Default task timeout (30 min)
|
|
default_max_retries = 2 # Max retry attempts
|
|
dispatch_interval_secs = 10 # Dispatch loop interval
|
|
# http_pull_token = "optional-bearer-token" # Auth for http_pull agents
|
|
```
|
|
|
|
### SSH CLI Hosts
|
|
|
|
Configure remote hosts for `ssh_cli` execution:
|
|
|
|
```toml
|
|
[[hosts]]
|
|
host_id = "host-worker-01"
|
|
hostname = "192.168.1.100"
|
|
ssh_user = "deploy"
|
|
ssh_port = 22
|
|
ssh_key_path = "/home/deploy/.ssh/id_ed25519"
|
|
work_dir = "/opt/agent-workspace"
|
|
agents = [
|
|
{ agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust", "code:python"] },
|
|
{ agent_type = "claude-code", max_concurrency = 1, capabilities = ["code:rust"] },
|
|
]
|
|
|
|
# For local execution (same machine as orchestrator)
|
|
[[hosts]]
|
|
host_id = "local"
|
|
hostname = "localhost"
|
|
ssh_user = "runner"
|
|
work_dir = "/tmp/agent-workspace"
|
|
agents = [
|
|
{ agent_type = "codex-cli", max_concurrency = 1, capabilities = ["code:rust"] },
|
|
]
|
|
```
|
|
|
|
## API Summary
|
|
|
|
Agent Fleet exposes a REST API for agent registration, task management, and webhooks.
|
|
|
|
### Agent Endpoints
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|---------|-------------|
|
|
| `/api/v1/agents/register` | POST | Register or update an agent |
|
|
| `/api/v1/agents/heartbeat` | POST | Update agent heartbeat |
|
|
| `/api/v1/agents/deregister` | POST | Deregister an agent |
|
|
| `/api/v1/agents` | GET | List agents with filters |
|
|
|
|
### Task Endpoints
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|---------|-------------|
|
|
| `/api/v1/tasks` | GET | List tasks |
|
|
| `/api/v1/tasks/{task_id}` | GET | Get task details |
|
|
| `/api/v1/tasks/dequeue` | POST | Dequeue task (http_pull only) |
|
|
| `/api/v1/tasks/{task_id}/status` | POST | Update task status (http_pull only) |
|
|
| `/api/v1/tasks/{task_id}/complete` | POST | Complete task with receipt |
|
|
| `/api/v1/tasks/{task_id}/retry` | POST | Retry failed task |
|
|
|
|
### Other Endpoints
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|---------|-------------|
|
|
| `/healthz` | GET | Health check |
|
|
| `/api/v1/webhooks/forgejo` | POST | Forgejo webhook handler |
|
|
| `/api/v1/receipts` | POST | Submit task receipt |
|
|
|
|
For detailed API documentation, see [docs/agent-api-reference.md](docs/agent-api-reference.md).
|
|
|
|
## Deployment
|
|
|
|
See [docs/deployment.md](docs/deployment.md) for detailed deployment instructions including:
|
|
- Cross-compilation with cargo-zigbuild
|
|
- Systemd service configuration
|
|
- Caddy reverse proxy setup
|
|
|
|
## Architecture Details
|
|
|
|
For in-depth architectural information, see [docs/architecture.md](docs/architecture.md) covering:
|
|
- Dual execution model comparison
|
|
- Dispatch loop internals
|
|
- Task lifecycle and state machine
|
|
- Forgejo integration flow
|
|
|
|
## Agent Integration
|
|
|
|
See [docs/agent-onboarding-guide.md](docs/agent-onboarding-guide.md) for:
|
|
- Choosing between `ssh_cli` and `http_pull` modes
|
|
- Agent registration and heartbeat
|
|
- Task dequeue and completion workflows
|
|
|
|
## Development
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
cargo test
|
|
```
|
|
|
|
### Code Style
|
|
|
|
- Rust 2024 edition
|
|
- `thiserror` for error types
|
|
- `serde` for serialization
|
|
- All DB operations go through `EventStore`
|
|
- `Arc<Mutex<EventStore>>` for shared state
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
src/
|
|
├── main.rs # Entry point, server setup
|
|
├── config.rs # TOML configuration
|
|
├── api.rs # HTTP API handlers
|
|
├── dispatch.rs # Task dispatch loop
|
|
├── execution/ # SSH execution
|
|
├── integrations/ # Forgejo client
|
|
├── adapters/ # Agent adapter interface
|
|
└── core/ # Business logic
|
|
├── models.rs # Data models
|
|
├── event_store.rs # Event sourcing
|
|
├── state_machine.rs # State transitions
|
|
├── task_queue.rs # HTTP pull queue
|
|
├── timeout.rs # Timeout checker
|
|
└── retry.rs # Retry policy
|
|
```
|
|
|
|
## License
|
|
|
|
MIT
|