# Agent Fleet Platform Agent Fleet is a multi-agent orchestration system built with Rust, designed to coordinate AI agents for task execution across distributed environments. It integrates with [Forgejo](https://forgejo.org/) for task management and supports dual execution modes (SSH/CLI and HTTP pull). ## Overview Agent Fleet acts as the central orchestrator that: - Receives tasks from Forgejo Issues via webhooks - Dispatches tasks to agents based on capabilities and load - Tracks task lifecycle through a state machine - Validates receipts and artifacts (e.g., PRs) - Manages agent heartbeats and health ### Key Features - **Dual Execution Modes**: `ssh_cli` (orchestrator-initiated) and `http_pull` (agent-initiated) - **Event-Sourced State**: All task state transitions are recorded as events - **Capability-Based Dispatch**: Tasks are routed to agents based on label matching - **Auto-Retry**: Failed tasks can be retried up to `max_retries` times - **Timeout Enforcement**: Tasks are marked `failed` if they exceed `task_timeout_secs` - **Forgejo Integration**: Automatic task creation from labeled issues, PR lifecycle tracking ## Architecture ``` ┌─────────────┐ ┌─────────────────┐ │ Forgejo │◄──webhook────────┤ Agent Fleet │ │ (Issues) │ │ Orchestrator │ └─────────────┘ └───────┬─────────┘ │ ┌──────────────────────────┼──────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ ssh_cli Hosts │ │ http_pull │ │ Dispatcher │ │ (SSH/Local) │ │ Agents │ │ Loop │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ ▼ ▼ ┌───────────────┐ ┌───────────────┐ │ Agent CLIs │ │ Event Store │ │ (codex, etc) │ │ (SQLite) │ └───────────────┘ └───────────────┘ ``` ### Components - **Event Store** (`src/core/event_store.rs`): SQLite-backed persistent event store - **State Machine** (`src/core/state_machine.rs`): Validates and executes state transitions - **Task Queue** (`src/core/task_queue.rs`): HTTP pull task queue with capability matching - **Dispatcher** (`src/dispatch.rs`): Periodic dispatch loop for `ssh_cli` tasks - **SshExecutor** (`src/execution/mod.rs`): Executes agent CLIs via SSH or local subprocess - **Forgejo Client** (`src/integrations/forgejo.rs`): Forgejo API integration and webhook handling - **API Handlers** (`src/api.rs`): REST API for agents and task management ## Quick Start ### Prerequisites - Rust 2024 edition - cargo-zigbuild (for cross-compilation) - Forgejo instance (or compatible forge) ### Development Setup ```bash # Clone the repository git clone https://git.0x08.org/zer4tul/agent-fleet.git cd agent-fleet # Copy example config cp config.example.toml config.toml # Edit config.toml with your settings # - Forgejo URL and token # - Webhook secret # - Host configurations for ssh_cli mode ``` ### Local Development ```bash # Run tests cargo test # Run the server cargo run # Or with custom bind/port cargo run -- --bind 127.0.0.1 --port 9090 ``` ### Building for aarch64 ```bash # Install cargo-zigbuild if not already installed cargo install cargo-zigbuild # Cross-compile for aarch64-unknown-linux-gnu cargo zigbuild --target aarch64-unknown-linux-gnu --release # Binary will be at: target/aarch64-unknown-linux-gnu/release/agent-fleet ``` ## Configuration Configuration is done via TOML file. See `config.example.toml` for a complete example. ### Server Settings ```toml [server] bind = "0.0.0.0" # Listen address port = 9090 # HTTP port ``` ### Forgejo Integration ```toml [forgejo] url = "https://git.0x08.org" token = "your-api-token" # Forgejo API token webhook_secret = "your-webhook-secret" # Shared secret for webhook validation ``` ### Orchestrator Settings ```toml [orchestrator] db_path = "data/agent-fleet.db" # SQLite database path heartbeat_interval_secs = 60 # Agent heartbeat interval heartbeat_timeout_threshold = 3 # Missed heartbeats before offline task_timeout_secs = 1800 # Default task timeout (30 min) default_max_retries = 2 # Max retry attempts dispatch_interval_secs = 10 # Dispatch loop interval # http_pull_token = "optional-bearer-token" # Auth for http_pull agents ``` ### SSH CLI Hosts Configure remote hosts for `ssh_cli` execution: ```toml [[hosts]] host_id = "host-worker-01" hostname = "192.168.1.100" ssh_user = "deploy" ssh_port = 22 ssh_key_path = "/home/deploy/.ssh/id_ed25519" work_dir = "/opt/agent-workspace" agents = [ { agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust", "code:python"] }, { agent_type = "claude-code", max_concurrency = 1, capabilities = ["code:rust"] }, ] # For local execution (same machine as orchestrator) [[hosts]] host_id = "local" hostname = "localhost" ssh_user = "runner" work_dir = "/tmp/agent-workspace" agents = [ { agent_type = "codex-cli", max_concurrency = 1, capabilities = ["code:rust"] }, ] ``` ## API Summary Agent Fleet exposes a REST API for agent registration, task management, and webhooks. ### Agent Endpoints | Endpoint | Method | Description | |----------|---------|-------------| | `/api/v1/agents/register` | POST | Register or update an agent | | `/api/v1/agents/heartbeat` | POST | Update agent heartbeat | | `/api/v1/agents/deregister` | POST | Deregister an agent | | `/api/v1/agents` | GET | List agents with filters | ### Task Endpoints | Endpoint | Method | Description | |----------|---------|-------------| | `/api/v1/tasks` | GET | List tasks | | `/api/v1/tasks/{task_id}` | GET | Get task details | | `/api/v1/tasks/dequeue` | POST | Dequeue task (http_pull only) | | `/api/v1/tasks/{task_id}/status` | POST | Update task status (http_pull only) | | `/api/v1/tasks/{task_id}/complete` | POST | Complete task with receipt | | `/api/v1/tasks/{task_id}/retry` | POST | Retry failed task | ### Other Endpoints | Endpoint | Method | Description | |----------|---------|-------------| | `/healthz` | GET | Health check | | `/api/v1/webhooks/forgejo` | POST | Forgejo webhook handler | | `/api/v1/receipts` | POST | Submit task receipt | For detailed API documentation, see [docs/agent-api-reference.md](docs/agent-api-reference.md). ## Deployment See [docs/deployment.md](docs/deployment.md) for detailed deployment instructions including: - Cross-compilation with cargo-zigbuild - Systemd service configuration - Caddy reverse proxy setup ## Architecture Details For in-depth architectural information, see [docs/architecture.md](docs/architecture.md) covering: - Dual execution model comparison - Dispatch loop internals - Task lifecycle and state machine - Forgejo integration flow ## Agent Integration See [docs/agent-onboarding-guide.md](docs/agent-onboarding-guide.md) for: - Choosing between `ssh_cli` and `http_pull` modes - Agent registration and heartbeat - Task dequeue and completion workflows ## Development ### Running Tests ```bash cargo test ``` ### Code Style - Rust 2024 edition - `thiserror` for error types - `serde` for serialization - All DB operations go through `EventStore` - `Arc>` for shared state ### Project Structure ``` src/ ├── main.rs # Entry point, server setup ├── config.rs # TOML configuration ├── api.rs # HTTP API handlers ├── dispatch.rs # Task dispatch loop ├── execution/ # SSH execution ├── integrations/ # Forgejo client ├── adapters/ # Agent adapter interface └── core/ # Business logic ├── models.rs # Data models ├── event_store.rs # Event sourcing ├── state_machine.rs # State transitions ├── task_queue.rs # HTTP pull queue ├── timeout.rs # Timeout checker └── retry.rs # Retry policy ``` ## License MIT