agent-fleet/docs/deployment.md
Zer4tul a18cb2824e fix: agent capability matching in dispatch — only agent: labels are requirements
Previous bug: only code:* and review labels were checked, so agent:document,
agent:tests etc. were never filtered. Any agent could pick up any task.

Now: labels with agent: prefix are matched against agent capabilities.
Other labels are treated as metadata. Includes regression test.
2026-05-12 23:51:08 +08:00

10 KiB
Raw Permalink Blame History

Agent Fleet Deployment Guide

This guide covers deploying Agent Fleet Orchestrator to production, including cross-compilation, systemd service setup, and reverse proxy configuration with Caddy.

Prerequisites

  • Development machine with Rust and cargo
  • Target server (e.g., aarch64 Linux)
  • cargo-zigbuild for cross-compilation
  • Caddy web server (optional, for reverse proxy)

Building with cargo-zigbuild

cargo-zigbuild enables cross-compilation by using Zig as the C toolchain, avoiding the need for target-specific cross-compilation toolchains.

Installing cargo-zigbuild

cargo install cargo-zigbuild

Cross-Compiling for aarch64-unknown-linux-gnu

# Build the release binary
cargo zigbuild --target aarch64-unknown-linux-gnu --release

# The binary will be at:
# target/aarch64-unknown-linux-gnu/release/agent-fleet

Other Target Architectures

# For x86_64 Linux (standard)
cargo build --release

# For aarch64 (ARM64 servers)
cargo zigbuild --target aarch64-unknown-linux-gnu --release

# For musl (static linking)
cargo zigbuild --target x86_64-unknown-linux-musl --release

Building for Local Testing

For local development on the same architecture:

cargo build --release

Deployment to aarch64

1. Transfer the Binary

After building, transfer the binary to your target server:

# Using scp
scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@target-host:/opt/agent-fleet/

# Or using rsync
rsync -avz target/aarch64-unknown-linux-gnu/release/agent-fleet user@target-host:/opt/agent-fleet/

2. Set Up Directory Structure

ssh user@target-host

# Create directory and user
sudo useradd -r -s /bin/false agent-fleet
sudo mkdir -p /opt/agent-fleet/{bin,data,config}
sudo chown -R agent-fleet:agent-fleet /opt/agent-fleet

# Copy binary
sudo cp /path/to/agent-fleet /opt/agent-fleet/bin/
sudo chmod +x /opt/agent-fleet/bin/agent-fleet

3. Create Configuration File

Create /opt/agent-fleet/config/config.toml:

[server]
bind = "127.0.0.1"
port = 9090

[forgejo]
url = "https://git.0x08.org"
token = "your-forgejo-api-token"
webhook_secret = "your-webhook-secret"

[orchestrator]
db_path = "/opt/agent-fleet/data/agent-fleet.db"
heartbeat_interval_secs = 60
heartbeat_timeout_threshold = 3
task_timeout_secs = 1800
default_max_retries = 2
dispatch_interval_secs = 10

# Configure remote hosts for ssh_cli execution
[[hosts]]
host_id = "worker-01"
hostname = "192.168.1.100"
ssh_user = "agent"
ssh_port = 22
ssh_key_path = "/home/agent/.ssh/id_ed25519"
work_dir = "/opt/agent-workspace"
agents = [
  { agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust"] },
]

4. Create Environment Variables (Optional)

For sensitive values, use a .env.local file instead of config:

# /opt/agent-fleet/.env.local
FORGEJO_TOKEN="your-token"
WEBHOOK_SECRET="your-secret"

Then reference them in config (currently not supported, use direct config values).

Systemd Service

Create Systemd Service File

Create /etc/systemd/system/agent-fleet.service:

[Unit]
Description=Agent Fleet Orchestrator
After=network.target
Documentation=https://git.0x08.org/zer4tul/agent-fleet

[Service]
Type=simple
User=agent-fleet
Group=agent-fleet
WorkingDirectory=/opt/agent-fleet
ExecStart=/opt/agent-fleet/bin/agent-fleet --config /opt/agent-fleet/config/config.toml
Restart=always
RestartSec=10

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/agent-fleet/data

# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=agent-fleet

[Install]
WantedBy=multi-user.target

Enable and Start the Service

# Reload systemd
sudo systemctl daemon-reload

# Enable to start on boot
sudo systemctl enable agent-fleet

# Start the service
sudo systemctl start agent-fleet

# Check status
sudo systemctl status agent-fleet

# View logs
sudo journalctl -u agent-fleet -f

Management Commands

# Restart
sudo systemctl restart agent-fleet

# Stop
sudo systemctl stop agent-fleet

# Disable
sudo systemctl disable agent-fleet

Caddy Reverse Proxy

Using Caddy as a reverse proxy provides:

  • Automatic HTTPS with Let's Encrypt
  • Path-based routing
  • Basic auth (optional)
  • Request logging

Install Caddy

# Ubuntu/Debian
sudo apt install caddy

# Or using the official package
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update
sudo apt install caddy

Configure Caddy

Create or edit /etc/caddy/Caddyfile:

your-domain.example.com {
    reverse_proxy 127.0.0.1:9090

    # Optional: Basic authentication
    # basicauth {
    #     admin $2a$14$Zkx19YLhRnJ8O6l0ZPd.OqG9vXK4wQ6Y5wZQH5Y5x5x5x5x5x5x
    # }

    # Optional: Log requests
    log {
        output file /var/log/caddy/agent-fleet.log
        format json
    }
}

# Or with path prefix
your-domain.example.com/agent-fleet {
    reverse_proxy 127.0.0.1:9090
}

Test and Reload Caddy

# Test configuration
sudo caddy validate --config /etc/caddy/Caddyfile

# Reload Caddy
sudo systemctl reload caddy

# Or restart
sudo systemctl restart caddy

Verify HTTPS

After Caddy starts, it will automatically provision an SSL certificate from Let's Encrypt. Verify:

curl https://your-domain.example.com/healthz

Configuration Walkthrough

Server Configuration

[server]
bind = "127.0.0.1"  # Bind to localhost (Caddy handles external traffic)
port = 9090            # Internal port

Notes:

  • Use 127.0.0.1 when behind a reverse proxy
  • Use 0.0.0.0 if direct access is needed

Forgejo Configuration

[forgejo]
url = "https://git.0x08.org"
token = "your-api-token"
webhook_secret = "your-webhook-secret"

Setup Steps:

  1. Generate a Forgejo API token: User Settings → Applications → Generate New Token
  2. Configure webhook in Forgejo repo settings:
    • URL: https://your-domain.com/api/v1/webhooks/forgejo
    • Secret: same as webhook_secret
    • Events: Issues, Pull Requests, Push

Orchestrator Configuration

[orchestrator]
db_path = "/opt/agent-fleet/data/agent-fleet.db"
heartbeat_interval_secs = 60
heartbeat_timeout_threshold = 3
task_timeout_secs = 1800
default_max_retries = 2
dispatch_interval_secs = 10

Explanation:

  • heartbeat_interval_secs: How often agents should send heartbeats
  • heartbeat_timeout_threshold: How many missed heartbeats before marking agent offline (3 × 60 = 180 seconds)
  • task_timeout_secs: Default timeout for tasks (1800 seconds = 30 minutes)
  • default_max_retries: How many times to retry failed tasks
  • dispatch_interval_secs: How often the dispatch loop checks for new ssh_cli tasks

Host Configuration for SSH CLI

[[hosts]]
host_id = "worker-01"
hostname = "192.168.1.100"
ssh_user = "deploy"
ssh_port = 22
ssh_key_path = "/home/agent-fleet/.ssh/id_ed25519"
work_dir = "/opt/agent-workspace"
agents = [
  { agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust"] },
]

SSH Key Setup:

  1. Generate SSH key on the orchestrator server:

    sudo -u agent-fleet ssh-keygen -t ed25519 -f /home/agent-fleet/.ssh/id_ed25519
    
  2. Add public key to remote host's ~/.ssh/authorized_keys

  3. Test SSH connection:

    sudo -u agent-fleet ssh -p 22 deploy@192.168.1.100
    

Agent CLI Setup on Remote Host:

  1. Ensure agent CLI is in $PATH
  2. Verify: ssh deploy@host "which codex"

Troubleshooting

Service Won't Start

# Check service status
sudo systemctl status agent-fleet

# View logs
sudo journalctl -u agent-fleet -n 100

# Check file permissions
ls -la /opt/agent-fleet/

Webhook Not Received

# Check Caddy logs
sudo journalctl -u caddy -f

# Check agent-fleet logs for webhook errors
sudo journalctl -u agent-fleet | grep webhook

# Verify webhook secret matches
# The secret in config.toml must match Forgejo webhook secret

SSH Connection Fails

# Test SSH as the agent-fleet user
sudo -u agent-fleet ssh -v deploy@host

# Check SSH key path exists
sudo -u agent-fleet ls -la /home/agent-fleet/.ssh/

# Verify key permissions
sudo -u agent-fleet chmod 600 /home/agent-fleet/.ssh/id_ed25519

Database Lock Issues

If the database becomes locked (e.g., after crash):

# Stop the service
sudo systemctl stop agent-fleet

# Backup and remove old database
mv /opt/agent-fleet/data/agent-fleet.db /opt/agent-fleet/data/agent-fleet.db.backup

# Restart (will create new database)
sudo systemctl start agent-fleet

Monitoring

Health Check

curl http://localhost:9090/healthz
# Expected output: "ok"

Check Task Queue

curl http://localhost:9090/api/v1/tasks?status=running

Check Agents

curl http://localhost:9090/api/v1/agents?status=online

Log Monitoring

# Follow logs
sudo journalctl -u agent-fleet -f

# Search for errors
sudo journalctl -u agent-fleet | grep -i error

Updates

Updating the Binary

# Build new version locally
cargo zigbuild --target aarch64-unknown-linux-gnu --release

# Transfer to server
scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@host:/tmp/

# On the server, stop and replace
sudo systemctl stop agent-fleet
sudo cp /tmp/agent-fleet /opt/agent-fleet/bin/
sudo chmod +x /opt/agent-fleet/bin/agent-fleet
sudo systemctl start agent-fleet

Zero-Downtime Updates

For production deployments with minimal downtime:

# Upload new binary alongside old one
scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@host:/opt/agent-fleet/bin/agent-fleet.new

# On server
sudo mv /opt/agent-fleet/bin/agent-fleet.new /opt/agent-fleet/bin/agent-fleet
sudo systemctl restart agent-fleet

Systemd will handle the restart gracefully with minimal downtime.