# Agent Fleet Deployment Guide This guide covers deploying Agent Fleet Orchestrator to production, including cross-compilation, systemd service setup, and reverse proxy configuration with Caddy. ## Prerequisites - Development machine with Rust and cargo - Target server (e.g., aarch64 Linux) - cargo-zigbuild for cross-compilation - Caddy web server (optional, for reverse proxy) ## Building with cargo-zigbuild cargo-zigbuild enables cross-compilation by using Zig as the C toolchain, avoiding the need for target-specific cross-compilation toolchains. ### Installing cargo-zigbuild ```bash cargo install cargo-zigbuild ``` ### Cross-Compiling for aarch64-unknown-linux-gnu ```bash # Build the release binary cargo zigbuild --target aarch64-unknown-linux-gnu --release # The binary will be at: # target/aarch64-unknown-linux-gnu/release/agent-fleet ``` ### Other Target Architectures ```bash # For x86_64 Linux (standard) cargo build --release # For aarch64 (ARM64 servers) cargo zigbuild --target aarch64-unknown-linux-gnu --release # For musl (static linking) cargo zigbuild --target x86_64-unknown-linux-musl --release ``` ### Building for Local Testing For local development on the same architecture: ```bash cargo build --release ``` ## Deployment to aarch64 ### 1. Transfer the Binary After building, transfer the binary to your target server: ```bash # Using scp scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@target-host:/opt/agent-fleet/ # Or using rsync rsync -avz target/aarch64-unknown-linux-gnu/release/agent-fleet user@target-host:/opt/agent-fleet/ ``` ### 2. Set Up Directory Structure ```bash ssh user@target-host # Create directory and user sudo useradd -r -s /bin/false agent-fleet sudo mkdir -p /opt/agent-fleet/{bin,data,config} sudo chown -R agent-fleet:agent-fleet /opt/agent-fleet # Copy binary sudo cp /path/to/agent-fleet /opt/agent-fleet/bin/ sudo chmod +x /opt/agent-fleet/bin/agent-fleet ``` ### 3. Create Configuration File Create `/opt/agent-fleet/config/config.toml`: ```toml [server] bind = "127.0.0.1" port = 9090 [forgejo] url = "https://git.0x08.org" token = "your-forgejo-api-token" webhook_secret = "your-webhook-secret" [orchestrator] db_path = "/opt/agent-fleet/data/agent-fleet.db" heartbeat_interval_secs = 60 heartbeat_timeout_threshold = 3 task_timeout_secs = 1800 default_max_retries = 2 dispatch_interval_secs = 10 # Configure remote hosts for ssh_cli execution [[hosts]] host_id = "worker-01" hostname = "192.168.1.100" ssh_user = "agent" ssh_port = 22 ssh_key_path = "/home/agent/.ssh/id_ed25519" work_dir = "/opt/agent-workspace" agents = [ { agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust"] }, ] ``` ### 4. Create Environment Variables (Optional) For sensitive values, use a `.env.local` file instead of config: ```bash # /opt/agent-fleet/.env.local FORGEJO_TOKEN="your-token" WEBHOOK_SECRET="your-secret" ``` Then reference them in config (currently not supported, use direct config values). ## Systemd Service ### Create Systemd Service File Create `/etc/systemd/system/agent-fleet.service`: ```ini [Unit] Description=Agent Fleet Orchestrator After=network.target Documentation=https://git.0x08.org/zer4tul/agent-fleet [Service] Type=simple User=agent-fleet Group=agent-fleet WorkingDirectory=/opt/agent-fleet ExecStart=/opt/agent-fleet/bin/agent-fleet --config /opt/agent-fleet/config/config.toml Restart=always RestartSec=10 # Security hardening NoNewPrivileges=true PrivateTmp=true ProtectSystem=strict ProtectHome=true ReadWritePaths=/opt/agent-fleet/data # Logging StandardOutput=journal StandardError=journal SyslogIdentifier=agent-fleet [Install] WantedBy=multi-user.target ``` ### Enable and Start the Service ```bash # Reload systemd sudo systemctl daemon-reload # Enable to start on boot sudo systemctl enable agent-fleet # Start the service sudo systemctl start agent-fleet # Check status sudo systemctl status agent-fleet # View logs sudo journalctl -u agent-fleet -f ``` ### Management Commands ```bash # Restart sudo systemctl restart agent-fleet # Stop sudo systemctl stop agent-fleet # Disable sudo systemctl disable agent-fleet ``` ## Caddy Reverse Proxy Using Caddy as a reverse proxy provides: - Automatic HTTPS with Let's Encrypt - Path-based routing - Basic auth (optional) - Request logging ### Install Caddy ```bash # Ubuntu/Debian sudo apt install caddy # Or using the official package curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list sudo apt update sudo apt install caddy ``` ### Configure Caddy Create or edit `/etc/caddy/Caddyfile`: ```caddyfile your-domain.example.com { reverse_proxy 127.0.0.1:9090 # Optional: Basic authentication # basicauth { # admin $2a$14$Zkx19YLhRnJ8O6l0ZPd.OqG9vXK4wQ6Y5wZQH5Y5x5x5x5x5x5x # } # Optional: Log requests log { output file /var/log/caddy/agent-fleet.log format json } } # Or with path prefix your-domain.example.com/agent-fleet { reverse_proxy 127.0.0.1:9090 } ``` ### Test and Reload Caddy ```bash # Test configuration sudo caddy validate --config /etc/caddy/Caddyfile # Reload Caddy sudo systemctl reload caddy # Or restart sudo systemctl restart caddy ``` ### Verify HTTPS After Caddy starts, it will automatically provision an SSL certificate from Let's Encrypt. Verify: ```bash curl https://your-domain.example.com/healthz ``` ## Configuration Walkthrough ### Server Configuration ```toml [server] bind = "127.0.0.1" # Bind to localhost (Caddy handles external traffic) port = 9090 # Internal port ``` **Notes:** - Use `127.0.0.1` when behind a reverse proxy - Use `0.0.0.0` if direct access is needed ### Forgejo Configuration ```toml [forgejo] url = "https://git.0x08.org" token = "your-api-token" webhook_secret = "your-webhook-secret" ``` **Setup Steps:** 1. Generate a Forgejo API token: User Settings → Applications → Generate New Token 2. Configure webhook in Forgejo repo settings: - URL: `https://your-domain.com/api/v1/webhooks/forgejo` - Secret: same as `webhook_secret` - Events: Issues, Pull Requests, Push ### Orchestrator Configuration ```toml [orchestrator] db_path = "/opt/agent-fleet/data/agent-fleet.db" heartbeat_interval_secs = 60 heartbeat_timeout_threshold = 3 task_timeout_secs = 1800 default_max_retries = 2 dispatch_interval_secs = 10 ``` **Explanation:** - `heartbeat_interval_secs`: How often agents should send heartbeats - `heartbeat_timeout_threshold`: How many missed heartbeats before marking agent offline (3 × 60 = 180 seconds) - `task_timeout_secs`: Default timeout for tasks (1800 seconds = 30 minutes) - `default_max_retries`: How many times to retry failed tasks - `dispatch_interval_secs`: How often the dispatch loop checks for new `ssh_cli` tasks ### Host Configuration for SSH CLI ```toml [[hosts]] host_id = "worker-01" hostname = "192.168.1.100" ssh_user = "deploy" ssh_port = 22 ssh_key_path = "/home/agent-fleet/.ssh/id_ed25519" work_dir = "/opt/agent-workspace" agents = [ { agent_type = "codex-cli", max_concurrency = 2, capabilities = ["code:rust"] }, ] ``` **SSH Key Setup:** 1. Generate SSH key on the orchestrator server: ```bash sudo -u agent-fleet ssh-keygen -t ed25519 -f /home/agent-fleet/.ssh/id_ed25519 ``` 2. Add public key to remote host's `~/.ssh/authorized_keys` 3. Test SSH connection: ```bash sudo -u agent-fleet ssh -p 22 deploy@192.168.1.100 ``` **Agent CLI Setup on Remote Host:** 1. Ensure agent CLI is in `$PATH` 2. Verify: `ssh deploy@host "which codex"` ## Troubleshooting ### Service Won't Start ```bash # Check service status sudo systemctl status agent-fleet # View logs sudo journalctl -u agent-fleet -n 100 # Check file permissions ls -la /opt/agent-fleet/ ``` ### Webhook Not Received ```bash # Check Caddy logs sudo journalctl -u caddy -f # Check agent-fleet logs for webhook errors sudo journalctl -u agent-fleet | grep webhook # Verify webhook secret matches # The secret in config.toml must match Forgejo webhook secret ``` ### SSH Connection Fails ```bash # Test SSH as the agent-fleet user sudo -u agent-fleet ssh -v deploy@host # Check SSH key path exists sudo -u agent-fleet ls -la /home/agent-fleet/.ssh/ # Verify key permissions sudo -u agent-fleet chmod 600 /home/agent-fleet/.ssh/id_ed25519 ``` ### Database Lock Issues If the database becomes locked (e.g., after crash): ```bash # Stop the service sudo systemctl stop agent-fleet # Backup and remove old database mv /opt/agent-fleet/data/agent-fleet.db /opt/agent-fleet/data/agent-fleet.db.backup # Restart (will create new database) sudo systemctl start agent-fleet ``` ## Monitoring ### Health Check ```bash curl http://localhost:9090/healthz # Expected output: "ok" ``` ### Check Task Queue ```bash curl http://localhost:9090/api/v1/tasks?status=running ``` ### Check Agents ```bash curl http://localhost:9090/api/v1/agents?status=online ``` ### Log Monitoring ```bash # Follow logs sudo journalctl -u agent-fleet -f # Search for errors sudo journalctl -u agent-fleet | grep -i error ``` ## Updates ### Updating the Binary ```bash # Build new version locally cargo zigbuild --target aarch64-unknown-linux-gnu --release # Transfer to server scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@host:/tmp/ # On the server, stop and replace sudo systemctl stop agent-fleet sudo cp /tmp/agent-fleet /opt/agent-fleet/bin/ sudo chmod +x /opt/agent-fleet/bin/agent-fleet sudo systemctl start agent-fleet ``` ### Zero-Downtime Updates For production deployments with minimal downtime: ```bash # Upload new binary alongside old one scp target/aarch64-unknown-linux-gnu/release/agent-fleet user@host:/opt/agent-fleet/bin/agent-fleet.new # On server sudo mv /opt/agent-fleet/bin/agent-fleet.new /opt/agent-fleet/bin/agent-fleet sudo systemctl restart agent-fleet ``` Systemd will handle the restart gracefully with minimal downtime.