fix: agent capability matching in dispatch — only agent: labels are requirements

Previous bug: only code:* and review labels were checked, so agent:document,
agent:tests etc. were never filtered. Any agent could pick up any task.

Now: labels with agent: prefix are matched against agent capabilities.
Other labels are treated as metadata. Includes regression test.
This commit is contained in:
Zer4tul 2026-05-12 23:51:08 +08:00
parent 1f351a1734
commit a18cb2824e
6 changed files with 1271 additions and 8 deletions

465
docs/deployment.md Normal file
View file

@ -0,0 +1,465 @@
# Agent Fleet Deployment Guide
This guide covers deploying Agent Fleet Orchestrator to production, including cross-compilation, systemd service setup, and reverse proxy configuration with Caddy.
## Prerequisites
- Development machine with Rust and cargo
- Target server (e.g., aarch64 Linux)
- cargo-zigbuild for cross-compilation
- Caddy web server (optional, for reverse proxy)
## Building with cargo-zigbuild
cargo-zigbuild enables cross-compilation by using Zig as the C toolchain, avoiding the need for target-specific cross-compilation toolchains.
### Installing cargo-zigbuild
```bash
cargo install cargo-zigbuild
```
### Cross-Compiling for aarch64-unknown-linux-gnu
```bash
# Build the release binary
cargo zigbuild --target aarch64-unknown-linux-gnu --release
# The binary will be at:
# target/aarch64-unknown-linux-gnu/release/agent-fleet
```
### Other Target Architectures
```bash
# For x86_64 Linux (standard)
cargo build --release
# For aarch64 (ARM64 servers)
cargo zigbuild --target aarch64-unknown-linux-gnu --release
# For musl (static linking)
cargo zigbuild --target x86_64-unknown-linux-musl --release
```
### Building for Local Testing
For local development on the same architecture:
```bash
cargo build --release
```
## Deployment to aarch64
### 1. Transfer the Binary
After building, transfer the binary to your target server:
```bash
# Using scp
scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@target-host:/opt/agent-fleet/
# Or using rsync
rsync -avz target/aarch64-unknown-linux-gnu/release/agent-fleet user@target-host:/opt/agent-fleet/
```
### 2. Set Up Directory Structure
```bash
ssh user@target-host
# Create directory and user
sudo useradd -r -s /bin/false agent-fleet
sudo mkdir -p /opt/agent-fleet/{bin,data,config}
sudo chown -R agent-fleet:agent-fleet /opt/agent-fleet
# Copy binary
sudo cp /path/to/agent-fleet /opt/agent-fleet/bin/
sudo chmod +x /opt/agent-fleet/bin/agent-fleet
```
### 3. Create Configuration File
Create `/opt/agent-fleet/config/config.toml`:
```toml
[server]
bind = "127.0.0.1"
port = 9090
[forgejo]
url = "https://git.0x08.org"
token = "your-forgejo-api-token"
webhook_secret = "your-webhook-secret"
[orchestrator]
db_path = "/opt/agent-fleet/data/agent-fleet.db"
heartbeat_interval_secs = 60
heartbeat_timeout_threshold = 3
task_timeout_secs = 1800
default_max_retries = 2
dispatch_interval_secs = 10
# Configure remote hosts for ssh_cli execution
[[hosts]]
host_id = "worker-01"
hostname = "192.168.1.100"
ssh_user = "agent"
ssh_port = 22
ssh_key_path = "/home/agent/.ssh/id_ed25519"
work_dir = "/opt/agent-workspace"
agents = [
{ agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust"] },
]
```
### 4. Create Environment Variables (Optional)
For sensitive values, use a `.env.local` file instead of config:
```bash
# /opt/agent-fleet/.env.local
FORGEJO_TOKEN="your-token"
WEBHOOK_SECRET="your-secret"
```
Then reference them in config (currently not supported, use direct config values).
## Systemd Service
### Create Systemd Service File
Create `/etc/systemd/system/agent-fleet.service`:
```ini
[Unit]
Description=Agent Fleet Orchestrator
After=network.target
Documentation=https://git.0x08.org/zer4tul/agent-fleet
[Service]
Type=simple
User=agent-fleet
Group=agent-fleet
WorkingDirectory=/opt/agent-fleet
ExecStart=/opt/agent-fleet/bin/agent-fleet --config /opt/agent-fleet/config/config.toml
Restart=always
RestartSec=10
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/agent-fleet/data
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=agent-fleet
[Install]
WantedBy=multi-user.target
```
### Enable and Start the Service
```bash
# Reload systemd
sudo systemctl daemon-reload
# Enable to start on boot
sudo systemctl enable agent-fleet
# Start the service
sudo systemctl start agent-fleet
# Check status
sudo systemctl status agent-fleet
# View logs
sudo journalctl -u agent-fleet -f
```
### Management Commands
```bash
# Restart
sudo systemctl restart agent-fleet
# Stop
sudo systemctl stop agent-fleet
# Disable
sudo systemctl disable agent-fleet
```
## Caddy Reverse Proxy
Using Caddy as a reverse proxy provides:
- Automatic HTTPS with Let's Encrypt
- Path-based routing
- Basic auth (optional)
- Request logging
### Install Caddy
```bash
# Ubuntu/Debian
sudo apt install caddy
# Or using the official package
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update
sudo apt install caddy
```
### Configure Caddy
Create or edit `/etc/caddy/Caddyfile`:
```caddyfile
your-domain.example.com {
reverse_proxy 127.0.0.1:9090
# Optional: Basic authentication
# basicauth {
# admin $2a$14$Zkx19YLhRnJ8O6l0ZPd.OqG9vXK4wQ6Y5wZQH5Y5x5x5x5x5x5x
# }
# Optional: Log requests
log {
output file /var/log/caddy/agent-fleet.log
format json
}
}
# Or with path prefix
your-domain.example.com/agent-fleet {
reverse_proxy 127.0.0.1:9090
}
```
### Test and Reload Caddy
```bash
# Test configuration
sudo caddy validate --config /etc/caddy/Caddyfile
# Reload Caddy
sudo systemctl reload caddy
# Or restart
sudo systemctl restart caddy
```
### Verify HTTPS
After Caddy starts, it will automatically provision an SSL certificate from Let's Encrypt. Verify:
```bash
curl https://your-domain.example.com/healthz
```
## Configuration Walkthrough
### Server Configuration
```toml
[server]
bind = "127.0.0.1" # Bind to localhost (Caddy handles external traffic)
port = 9090 # Internal port
```
**Notes:**
- Use `127.0.0.1` when behind a reverse proxy
- Use `0.0.0.0` if direct access is needed
### Forgejo Configuration
```toml
[forgejo]
url = "https://git.0x08.org"
token = "your-api-token"
webhook_secret = "your-webhook-secret"
```
**Setup Steps:**
1. Generate a Forgejo API token: User Settings → Applications → Generate New Token
2. Configure webhook in Forgejo repo settings:
- URL: `https://your-domain.com/api/v1/webhooks/forgejo`
- Secret: same as `webhook_secret`
- Events: Issues, Pull Requests, Push
### Orchestrator Configuration
```toml
[orchestrator]
db_path = "/opt/agent-fleet/data/agent-fleet.db"
heartbeat_interval_secs = 60
heartbeat_timeout_threshold = 3
task_timeout_secs = 1800
default_max_retries = 2
dispatch_interval_secs = 10
```
**Explanation:**
- `heartbeat_interval_secs`: How often agents should send heartbeats
- `heartbeat_timeout_threshold`: How many missed heartbeats before marking agent offline (3 × 60 = 180 seconds)
- `task_timeout_secs`: Default timeout for tasks (1800 seconds = 30 minutes)
- `default_max_retries`: How many times to retry failed tasks
- `dispatch_interval_secs`: How often the dispatch loop checks for new `ssh_cli` tasks
### Host Configuration for SSH CLI
```toml
[[hosts]]
host_id = "worker-01"
hostname = "192.168.1.100"
ssh_user = "deploy"
ssh_port = 22
ssh_key_path = "/home/agent-fleet/.ssh/id_ed25519"
work_dir = "/opt/agent-workspace"
agents = [
{ agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust"] },
]
```
**SSH Key Setup:**
1. Generate SSH key on the orchestrator server:
```bash
sudo -u agent-fleet ssh-keygen -t ed25519 -f /home/agent-fleet/.ssh/id_ed25519
```
2. Add public key to remote host's `~/.ssh/authorized_keys`
3. Test SSH connection:
```bash
sudo -u agent-fleet ssh -p 22 deploy@192.168.1.100
```
**Agent CLI Setup on Remote Host:**
1. Ensure agent CLI is in `$PATH`
2. Verify: `ssh deploy@host "which codex"`
## Troubleshooting
### Service Won't Start
```bash
# Check service status
sudo systemctl status agent-fleet
# View logs
sudo journalctl -u agent-fleet -n 100
# Check file permissions
ls -la /opt/agent-fleet/
```
### Webhook Not Received
```bash
# Check Caddy logs
sudo journalctl -u caddy -f
# Check agent-fleet logs for webhook errors
sudo journalctl -u agent-fleet | grep webhook
# Verify webhook secret matches
# The secret in config.toml must match Forgejo webhook secret
```
### SSH Connection Fails
```bash
# Test SSH as the agent-fleet user
sudo -u agent-fleet ssh -v deploy@host
# Check SSH key path exists
sudo -u agent-fleet ls -la /home/agent-fleet/.ssh/
# Verify key permissions
sudo -u agent-fleet chmod 600 /home/agent-fleet/.ssh/id_ed25519
```
### Database Lock Issues
If the database becomes locked (e.g., after crash):
```bash
# Stop the service
sudo systemctl stop agent-fleet
# Backup and remove old database
mv /opt/agent-fleet/data/agent-fleet.db /opt/agent-fleet/data/agent-fleet.db.backup
# Restart (will create new database)
sudo systemctl start agent-fleet
```
## Monitoring
### Health Check
```bash
curl http://localhost:9090/healthz
# Expected output: "ok"
```
### Check Task Queue
```bash
curl http://localhost:9090/api/v1/tasks?status=running
```
### Check Agents
```bash
curl http://localhost:9090/api/v1/agents?status=online
```
### Log Monitoring
```bash
# Follow logs
sudo journalctl -u agent-fleet -f
# Search for errors
sudo journalctl -u agent-fleet | grep -i error
```
## Updates
### Updating the Binary
```bash
# Build new version locally
cargo zigbuild --target aarch64-unknown-linux-gnu --release
# Transfer to server
scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@host:/tmp/
# On the server, stop and replace
sudo systemctl stop agent-fleet
sudo cp /tmp/agent-fleet /opt/agent-fleet/bin/
sudo chmod +x /opt/agent-fleet/bin/agent-fleet
sudo systemctl start agent-fleet
```
### Zero-Downtime Updates
For production deployments with minimal downtime:
```bash
# Upload new binary alongside old one
scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@host:/opt/agent-fleet/bin/agent-fleet.new
# On server
sudo mv /opt/agent-fleet/bin/agent-fleet.new /opt/agent-fleet/bin/agent-fleet
sudo systemctl restart agent-fleet
```
Systemd will handle the restart gracefully with minimal downtime.