← All articles
February 27, 2026 3 min read

Running AI Workloads on a Self-Hosted Proxmox Lab (Without Going Broke)

Cloud GPUs are expensive if you leave them running. Here’s the hybrid setup I use: Proxmox for everything persistent, RunPod for burst inference, and Tailscale to make it feel like one machine.

Cloud GPUs are expensive if you leave them running. Here’s the hybrid setup I use: Proxmox for everything persistent, RunPod for burst inference, and Tailscale to make it feel like one machine.

The Problem With Cloud-Only AI Development

When I started running more serious AI workloads — training fine-tunes, running inference pipelines, testing ComfyUI workflows — I quickly hit the cost ceiling of cloud-only development. An A100 on RunPod costs around $1.50–2/hour. That sounds cheap until you realize you’re paying for storage, idle time between experiments, and the friction of uploading/downloading models constantly.

The alternative — buying high-end hardware outright — has a different problem: a single H100 costs more than most cars. So I built a hybrid setup that balances cost, convenience, and capability.

The Local Layer: Proxmox

My home lab runs on a single machine with a Ryzen 9 5900X, 64GB RAM, and an RTX 3090. Proxmox manages the virtualization layer — I run everything in LXC containers rather than full VMs, which keeps overhead low and resource sharing flexible.

The core containers:

  • comfyui-dev — ComfyUI with GPU passthrough, used for workflow development and testing
  • models-cache — NFS share that holds model weights, accessible from all containers and from RunPod pods via Tailscale
  • n8n — automation hub that orchestrates jobs between local and cloud
  • nginx-proxy — Nginx Proxy Manager, handles routing and SSL for everything exposed externally

GPU passthrough in LXC is slightly more involved than in full VMs but works reliably once configured. The key is passing through the correct device nodes and ensuring the container has the right cgroup permissions.

The Cloud Layer: RunPod on Demand

For workloads that need more than 24GB VRAM — training runs, large batch inference, video generation with Wan 2.2 — I spin up RunPod pods on demand. The workflow:

  1. n8n job triggers a RunPod API call to start a pod with the required GPU
  2. Pod comes up, mounts my network storage via Tailscale (models are already there — no upload needed)
  3. Job runs, results are written back to the shared storage
  4. n8n receives the completion webhook, pod is terminated automatically

Total idle cost: zero. I only pay for actual compute time.

Networking: Tailscale as the Glue

Tailscale is what makes this feel like one cohesive system rather than two separate environments. Every machine — local containers, RunPod pods, my laptop, client servers — is on the same Tailscale network. I access everything by hostname, no VPN configuration required.

The models-cache NFS share is mounted on RunPod pods over Tailscale. Read speeds are around 100–150 MB/s in practice, which is fast enough that loading a 7B model from the network share adds maybe 20 seconds versus local NVMe. Acceptable.

Costs in Practice

For the F5-TTS fine-tuning project: ~€38 in RunPod compute. For a typical month of ComfyUI workflow development: most work happens locally for free, maybe €10–15 in cloud compute for final production runs. Compare that to keeping a cloud GPU running 8 hours a day — that’s €360+/month for an A100.

The setup takes a few days to configure properly, but the ongoing savings make it worth it for anyone doing serious AI development work.

Interested in working together? Get in touch → ← All articles