Previous bug: only code:* and review labels were checked, so agent:document, agent:tests etc. were never filtered. Any agent could pick up any task. Now: labels with agent: prefix are matched against agent capabilities. Other labels are treated as metadata. Includes regression test.
10 KiB
Agent Fleet Deployment Guide
This guide covers deploying Agent Fleet Orchestrator to production, including cross-compilation, systemd service setup, and reverse proxy configuration with Caddy.
Prerequisites
- Development machine with Rust and cargo
- Target server (e.g., aarch64 Linux)
- cargo-zigbuild for cross-compilation
- Caddy web server (optional, for reverse proxy)
Building with cargo-zigbuild
cargo-zigbuild enables cross-compilation by using Zig as the C toolchain, avoiding the need for target-specific cross-compilation toolchains.
Installing cargo-zigbuild
cargo install cargo-zigbuild
Cross-Compiling for aarch64-unknown-linux-gnu
# Build the release binary
cargo zigbuild --target aarch64-unknown-linux-gnu --release
# The binary will be at:
# target/aarch64-unknown-linux-gnu/release/agent-fleet
Other Target Architectures
# For x86_64 Linux (standard)
cargo build --release
# For aarch64 (ARM64 servers)
cargo zigbuild --target aarch64-unknown-linux-gnu --release
# For musl (static linking)
cargo zigbuild --target x86_64-unknown-linux-musl --release
Building for Local Testing
For local development on the same architecture:
cargo build --release
Deployment to aarch64
1. Transfer the Binary
After building, transfer the binary to your target server:
# Using scp
scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@target-host:/opt/agent-fleet/
# Or using rsync
rsync -avz target/aarch64-unknown-linux-gnu/release/agent-fleet user@target-host:/opt/agent-fleet/
2. Set Up Directory Structure
ssh user@target-host
# Create directory and user
sudo useradd -r -s /bin/false agent-fleet
sudo mkdir -p /opt/agent-fleet/{bin,data,config}
sudo chown -R agent-fleet:agent-fleet /opt/agent-fleet
# Copy binary
sudo cp /path/to/agent-fleet /opt/agent-fleet/bin/
sudo chmod +x /opt/agent-fleet/bin/agent-fleet
3. Create Configuration File
Create /opt/agent-fleet/config/config.toml:
[server]
bind = "127.0.0.1"
port = 9090
[forgejo]
url = "https://git.0x08.org"
token = "your-forgejo-api-token"
webhook_secret = "your-webhook-secret"
[orchestrator]
db_path = "/opt/agent-fleet/data/agent-fleet.db"
heartbeat_interval_secs = 60
heartbeat_timeout_threshold = 3
task_timeout_secs = 1800
default_max_retries = 2
dispatch_interval_secs = 10
# Configure remote hosts for ssh_cli execution
[[hosts]]
host_id = "worker-01"
hostname = "192.168.1.100"
ssh_user = "agent"
ssh_port = 22
ssh_key_path = "/home/agent/.ssh/id_ed25519"
work_dir = "/opt/agent-workspace"
agents = [
{ agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust"] },
]
4. Create Environment Variables (Optional)
For sensitive values, use a .env.local file instead of config:
# /opt/agent-fleet/.env.local
FORGEJO_TOKEN="your-token"
WEBHOOK_SECRET="your-secret"
Then reference them in config (currently not supported, use direct config values).
Systemd Service
Create Systemd Service File
Create /etc/systemd/system/agent-fleet.service:
[Unit]
Description=Agent Fleet Orchestrator
After=network.target
Documentation=https://git.0x08.org/zer4tul/agent-fleet
[Service]
Type=simple
User=agent-fleet
Group=agent-fleet
WorkingDirectory=/opt/agent-fleet
ExecStart=/opt/agent-fleet/bin/agent-fleet --config /opt/agent-fleet/config/config.toml
Restart=always
RestartSec=10
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/agent-fleet/data
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=agent-fleet
[Install]
WantedBy=multi-user.target
Enable and Start the Service
# Reload systemd
sudo systemctl daemon-reload
# Enable to start on boot
sudo systemctl enable agent-fleet
# Start the service
sudo systemctl start agent-fleet
# Check status
sudo systemctl status agent-fleet
# View logs
sudo journalctl -u agent-fleet -f
Management Commands
# Restart
sudo systemctl restart agent-fleet
# Stop
sudo systemctl stop agent-fleet
# Disable
sudo systemctl disable agent-fleet
Caddy Reverse Proxy
Using Caddy as a reverse proxy provides:
- Automatic HTTPS with Let's Encrypt
- Path-based routing
- Basic auth (optional)
- Request logging
Install Caddy
# Ubuntu/Debian
sudo apt install caddy
# Or using the official package
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update
sudo apt install caddy
Configure Caddy
Create or edit /etc/caddy/Caddyfile:
your-domain.example.com {
reverse_proxy 127.0.0.1:9090
# Optional: Basic authentication
# basicauth {
# admin $2a$14$Zkx19YLhRnJ8O6l0ZPd.OqG9vXK4wQ6Y5wZQH5Y5x5x5x5x5x5x
# }
# Optional: Log requests
log {
output file /var/log/caddy/agent-fleet.log
format json
}
}
# Or with path prefix
your-domain.example.com/agent-fleet {
reverse_proxy 127.0.0.1:9090
}
Test and Reload Caddy
# Test configuration
sudo caddy validate --config /etc/caddy/Caddyfile
# Reload Caddy
sudo systemctl reload caddy
# Or restart
sudo systemctl restart caddy
Verify HTTPS
After Caddy starts, it will automatically provision an SSL certificate from Let's Encrypt. Verify:
curl https://your-domain.example.com/healthz
Configuration Walkthrough
Server Configuration
[server]
bind = "127.0.0.1" # Bind to localhost (Caddy handles external traffic)
port = 9090 # Internal port
Notes:
- Use
127.0.0.1when behind a reverse proxy - Use
0.0.0.0if direct access is needed
Forgejo Configuration
[forgejo]
url = "https://git.0x08.org"
token = "your-api-token"
webhook_secret = "your-webhook-secret"
Setup Steps:
- Generate a Forgejo API token: User Settings → Applications → Generate New Token
- Configure webhook in Forgejo repo settings:
- URL:
https://your-domain.com/api/v1/webhooks/forgejo - Secret: same as
webhook_secret - Events: Issues, Pull Requests, Push
- URL:
Orchestrator Configuration
[orchestrator]
db_path = "/opt/agent-fleet/data/agent-fleet.db"
heartbeat_interval_secs = 60
heartbeat_timeout_threshold = 3
task_timeout_secs = 1800
default_max_retries = 2
dispatch_interval_secs = 10
Explanation:
heartbeat_interval_secs: How often agents should send heartbeatsheartbeat_timeout_threshold: How many missed heartbeats before marking agent offline (3 × 60 = 180 seconds)task_timeout_secs: Default timeout for tasks (1800 seconds = 30 minutes)default_max_retries: How many times to retry failed tasksdispatch_interval_secs: How often the dispatch loop checks for newssh_clitasks
Host Configuration for SSH CLI
[[hosts]]
host_id = "worker-01"
hostname = "192.168.1.100"
ssh_user = "deploy"
ssh_port = 22
ssh_key_path = "/home/agent-fleet/.ssh/id_ed25519"
work_dir = "/opt/agent-workspace"
agents = [
{ agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust"] },
]
SSH Key Setup:
-
Generate SSH key on the orchestrator server:
sudo -u agent-fleet ssh-keygen -t ed25519 -f /home/agent-fleet/.ssh/id_ed25519 -
Add public key to remote host's
~/.ssh/authorized_keys -
Test SSH connection:
sudo -u agent-fleet ssh -p 22 deploy@192.168.1.100
Agent CLI Setup on Remote Host:
- Ensure agent CLI is in
$PATH - Verify:
ssh deploy@host "which codex"
Troubleshooting
Service Won't Start
# Check service status
sudo systemctl status agent-fleet
# View logs
sudo journalctl -u agent-fleet -n 100
# Check file permissions
ls -la /opt/agent-fleet/
Webhook Not Received
# Check Caddy logs
sudo journalctl -u caddy -f
# Check agent-fleet logs for webhook errors
sudo journalctl -u agent-fleet | grep webhook
# Verify webhook secret matches
# The secret in config.toml must match Forgejo webhook secret
SSH Connection Fails
# Test SSH as the agent-fleet user
sudo -u agent-fleet ssh -v deploy@host
# Check SSH key path exists
sudo -u agent-fleet ls -la /home/agent-fleet/.ssh/
# Verify key permissions
sudo -u agent-fleet chmod 600 /home/agent-fleet/.ssh/id_ed25519
Database Lock Issues
If the database becomes locked (e.g., after crash):
# Stop the service
sudo systemctl stop agent-fleet
# Backup and remove old database
mv /opt/agent-fleet/data/agent-fleet.db /opt/agent-fleet/data/agent-fleet.db.backup
# Restart (will create new database)
sudo systemctl start agent-fleet
Monitoring
Health Check
curl http://localhost:9090/healthz
# Expected output: "ok"
Check Task Queue
curl http://localhost:9090/api/v1/tasks?status=running
Check Agents
curl http://localhost:9090/api/v1/agents?status=online
Log Monitoring
# Follow logs
sudo journalctl -u agent-fleet -f
# Search for errors
sudo journalctl -u agent-fleet | grep -i error
Updates
Updating the Binary
# Build new version locally
cargo zigbuild --target aarch64-unknown-linux-gnu --release
# Transfer to server
scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@host:/tmp/
# On the server, stop and replace
sudo systemctl stop agent-fleet
sudo cp /tmp/agent-fleet /opt/agent-fleet/bin/
sudo chmod +x /opt/agent-fleet/bin/agent-fleet
sudo systemctl start agent-fleet
Zero-Downtime Updates
For production deployments with minimal downtime:
# Upload new binary alongside old one
scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@host:/opt/agent-fleet/bin/agent-fleet.new
# On server
sudo mv /opt/agent-fleet/bin/agent-fleet.new /opt/agent-fleet/bin/agent-fleet
sudo systemctl restart agent-fleet
Systemd will handle the restart gracefully with minimal downtime.