fix: agent capability matching in dispatch — only agent: labels are requirements
Previous bug: only code:* and review labels were checked, so agent:document, agent:tests etc. were never filtered. Any agent could pick up any task. Now: labels with agent: prefix are matched against agent capabilities. Other labels are treated as metadata. Includes regression test.
This commit is contained in:
parent
1f351a1734
commit
a18cb2824e
6 changed files with 1271 additions and 8 deletions
261
README.md
Normal file
261
README.md
Normal file
|
|
@ -0,0 +1,261 @@
|
|||
# Agent Fleet Platform
|
||||
|
||||
Agent Fleet is a multi-agent orchestration system built with Rust, designed to coordinate AI agents for task execution across distributed environments. It integrates with [Forgejo](https://forgejo.org/) for task management and supports dual execution modes (SSH/CLI and HTTP pull).
|
||||
|
||||
## Overview
|
||||
|
||||
Agent Fleet acts as the central orchestrator that:
|
||||
- Receives tasks from Forgejo Issues via webhooks
|
||||
- Dispatches tasks to agents based on capabilities and load
|
||||
- Tracks task lifecycle through a state machine
|
||||
- Validates receipts and artifacts (e.g., PRs)
|
||||
- Manages agent heartbeats and health
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Dual Execution Modes**: `ssh_cli` (orchestrator-initiated) and `http_pull` (agent-initiated)
|
||||
- **Event-Sourced State**: All task state transitions are recorded as events
|
||||
- **Capability-Based Dispatch**: Tasks are routed to agents based on label matching
|
||||
- **Auto-Retry**: Failed tasks can be retried up to `max_retries` times
|
||||
- **Timeout Enforcement**: Tasks are marked `failed` if they exceed `task_timeout_secs`
|
||||
- **Forgejo Integration**: Automatic task creation from labeled issues, PR lifecycle tracking
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────────┐
|
||||
│ Forgejo │◄──webhook────────┤ Agent Fleet │
|
||||
│ (Issues) │ │ Orchestrator │
|
||||
└─────────────┘ └───────┬─────────┘
|
||||
│
|
||||
┌──────────────────────────┼──────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
||||
│ ssh_cli Hosts │ │ http_pull │ │ Dispatcher │
|
||||
│ (SSH/Local) │ │ Agents │ │ Loop │
|
||||
└───────────────┘ └───────────────┘ └───────────────┘
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────────┐ ┌───────────────┐
|
||||
│ Agent CLIs │ │ Event Store │
|
||||
│ (codex, etc) │ │ (SQLite) │
|
||||
└───────────────┘ └───────────────┘
|
||||
```
|
||||
|
||||
### Components
|
||||
|
||||
- **Event Store** (`src/core/event_store.rs`): SQLite-backed persistent event store
|
||||
- **State Machine** (`src/core/state_machine.rs`): Validates and executes state transitions
|
||||
- **Task Queue** (`src/core/task_queue.rs`): HTTP pull task queue with capability matching
|
||||
- **Dispatcher** (`src/dispatch.rs`): Periodic dispatch loop for `ssh_cli` tasks
|
||||
- **SshExecutor** (`src/execution/mod.rs`): Executes agent CLIs via SSH or local subprocess
|
||||
- **Forgejo Client** (`src/integrations/forgejo.rs`): Forgejo API integration and webhook handling
|
||||
- **API Handlers** (`src/api.rs`): REST API for agents and task management
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Rust 2024 edition
|
||||
- cargo-zigbuild (for cross-compilation)
|
||||
- Forgejo instance (or compatible forge)
|
||||
|
||||
### Development Setup
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://git.0x08.org/zer4tul/agent-fleet.git
|
||||
cd agent-fleet
|
||||
|
||||
# Copy example config
|
||||
cp config.example.toml config.toml
|
||||
|
||||
# Edit config.toml with your settings
|
||||
# - Forgejo URL and token
|
||||
# - Webhook secret
|
||||
# - Host configurations for ssh_cli mode
|
||||
```
|
||||
|
||||
### Local Development
|
||||
|
||||
```bash
|
||||
# Run tests
|
||||
cargo test
|
||||
|
||||
# Run the server
|
||||
cargo run
|
||||
|
||||
# Or with custom bind/port
|
||||
cargo run -- --bind 127.0.0.1 --port 9090
|
||||
```
|
||||
|
||||
### Building for aarch64
|
||||
|
||||
```bash
|
||||
# Install cargo-zigbuild if not already installed
|
||||
cargo install cargo-zigbuild
|
||||
|
||||
# Cross-compile for aarch64-unknown-linux-gnu
|
||||
cargo zigbuild --target aarch64-unknown-linux-gnu --release
|
||||
|
||||
# Binary will be at: target/aarch64-unknown-linux-gnu/release/agent-fleet
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Configuration is done via TOML file. See `config.example.toml` for a complete example.
|
||||
|
||||
### Server Settings
|
||||
|
||||
```toml
|
||||
[server]
|
||||
bind = "0.0.0.0" # Listen address
|
||||
port = 9090 # HTTP port
|
||||
```
|
||||
|
||||
### Forgejo Integration
|
||||
|
||||
```toml
|
||||
[forgejo]
|
||||
url = "https://git.0x08.org"
|
||||
token = "your-api-token" # Forgejo API token
|
||||
webhook_secret = "your-webhook-secret" # Shared secret for webhook validation
|
||||
```
|
||||
|
||||
### Orchestrator Settings
|
||||
|
||||
```toml
|
||||
[orchestrator]
|
||||
db_path = "data/agent-fleet.db" # SQLite database path
|
||||
heartbeat_interval_secs = 60 # Agent heartbeat interval
|
||||
heartbeat_timeout_threshold = 3 # Missed heartbeats before offline
|
||||
task_timeout_secs = 1800 # Default task timeout (30 min)
|
||||
default_max_retries = 2 # Max retry attempts
|
||||
dispatch_interval_secs = 10 # Dispatch loop interval
|
||||
# http_pull_token = "optional-bearer-token" # Auth for http_pull agents
|
||||
```
|
||||
|
||||
### SSH CLI Hosts
|
||||
|
||||
Configure remote hosts for `ssh_cli` execution:
|
||||
|
||||
```toml
|
||||
[[hosts]]
|
||||
host_id = "host-worker-01"
|
||||
hostname = "192.168.1.100"
|
||||
ssh_user = "deploy"
|
||||
ssh_port = 22
|
||||
ssh_key_path = "/home/deploy/.ssh/id_ed25519"
|
||||
work_dir = "/opt/agent-workspace"
|
||||
agents = [
|
||||
{ agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust", "code:python"] },
|
||||
{ agent_type = "claude-code", max_concurrency = 1, capabilities = ["code:rust"] },
|
||||
]
|
||||
|
||||
# For local execution (same machine as orchestrator)
|
||||
[[hosts]]
|
||||
host_id = "local"
|
||||
hostname = "localhost"
|
||||
ssh_user = "runner"
|
||||
work_dir = "/tmp/agent-workspace"
|
||||
agents = [
|
||||
{ agent_type = "codex-cli", max_concurrency = 1, capabilities = ["code:rust"] },
|
||||
]
|
||||
```
|
||||
|
||||
## API Summary
|
||||
|
||||
Agent Fleet exposes a REST API for agent registration, task management, and webhooks.
|
||||
|
||||
### Agent Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|---------|-------------|
|
||||
| `/api/v1/agents/register` | POST | Register or update an agent |
|
||||
| `/api/v1/agents/heartbeat` | POST | Update agent heartbeat |
|
||||
| `/api/v1/agents/deregister` | POST | Deregister an agent |
|
||||
| `/api/v1/agents` | GET | List agents with filters |
|
||||
|
||||
### Task Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|---------|-------------|
|
||||
| `/api/v1/tasks` | GET | List tasks |
|
||||
| `/api/v1/tasks/{task_id}` | GET | Get task details |
|
||||
| `/api/v1/tasks/dequeue` | POST | Dequeue task (http_pull only) |
|
||||
| `/api/v1/tasks/{task_id}/status` | POST | Update task status (http_pull only) |
|
||||
| `/api/v1/tasks/{task_id}/complete` | POST | Complete task with receipt |
|
||||
| `/api/v1/tasks/{task_id}/retry` | POST | Retry failed task |
|
||||
|
||||
### Other Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|---------|-------------|
|
||||
| `/healthz` | GET | Health check |
|
||||
| `/api/v1/webhooks/forgejo` | POST | Forgejo webhook handler |
|
||||
| `/api/v1/receipts` | POST | Submit task receipt |
|
||||
|
||||
For detailed API documentation, see [docs/agent-api-reference.md](docs/agent-api-reference.md).
|
||||
|
||||
## Deployment
|
||||
|
||||
See [docs/deployment.md](docs/deployment.md) for detailed deployment instructions including:
|
||||
- Cross-compilation with cargo-zigbuild
|
||||
- Systemd service configuration
|
||||
- Caddy reverse proxy setup
|
||||
|
||||
## Architecture Details
|
||||
|
||||
For in-depth architectural information, see [docs/architecture.md](docs/architecture.md) covering:
|
||||
- Dual execution model comparison
|
||||
- Dispatch loop internals
|
||||
- Task lifecycle and state machine
|
||||
- Forgejo integration flow
|
||||
|
||||
## Agent Integration
|
||||
|
||||
See [docs/agent-onboarding-guide.md](docs/agent-onboarding-guide.md) for:
|
||||
- Choosing between `ssh_cli` and `http_pull` modes
|
||||
- Agent registration and heartbeat
|
||||
- Task dequeue and completion workflows
|
||||
|
||||
## Development
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
cargo test
|
||||
```
|
||||
|
||||
### Code Style
|
||||
|
||||
- Rust 2024 edition
|
||||
- `thiserror` for error types
|
||||
- `serde` for serialization
|
||||
- All DB operations go through `EventStore`
|
||||
- `Arc<Mutex<EventStore>>` for shared state
|
||||
|
||||
### Project Structure
|
||||
|
||||
```
|
||||
src/
|
||||
├── main.rs # Entry point, server setup
|
||||
├── config.rs # TOML configuration
|
||||
├── api.rs # HTTP API handlers
|
||||
├── dispatch.rs # Task dispatch loop
|
||||
├── execution/ # SSH execution
|
||||
├── integrations/ # Forgejo client
|
||||
├── adapters/ # Agent adapter interface
|
||||
└── core/ # Business logic
|
||||
├── models.rs # Data models
|
||||
├── event_store.rs # Event sourcing
|
||||
├── state_machine.rs # State transitions
|
||||
├── task_queue.rs # HTTP pull queue
|
||||
├── timeout.rs # Timeout checker
|
||||
└── retry.rs # Retry policy
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
Loading…
Add table
Add a link
Reference in a new issue