OpenClaw Kubernetes Operator: Run Production AI Agents With a Single CRD

Running an OpenClaw agent on your laptop is easy. Running a fleet of them in production — with network isolation, automatic backups, rolling updates, and security hardening — is a different story. The OpenClaw Kubernetes Operator aims to close that gap. It wraps the entire lifecycle of an OpenClaw instance into a single Kubernetes custom resource, turning what would normally be nine or more separate manifests into one declarative YAML file.

With 289 GitHub stars, 424 commits, and Apache 2.0 licensing, the operator has quietly become the go-to choice for teams that want to self-host OpenClaw on their own infrastructure. Here is what it does, how to install it, and why it matters for anyone running AI agents at scale.

What the Operator Actually Does

At its core, the operator introduces a single custom resource definition called OpenClawInstance. When you apply one to your cluster, the operator automatically provisions a StatefulSet, Service, NetworkPolicy, PersistentVolumeClaim, and several other resources behind the scenes. You describe what you want — which AI provider to use, which skills to install, how much storage to allocate — and the operator figures out the rest.

This is not just a convenience wrapper. The operator actively manages the agent’s lifecycle: it watches for configuration changes, triggers rolling updates via SHA-256 content hashing, and can even handle automatic version upgrades by polling an OCI registry, snapshotting the workspace to S3 before updating, and rolling back if health checks fail.

Security by Default

One of the strongest selling points is the operator’s security posture. Every container runs as non-root (UID 1000) with a read-only root filesystem, dropped Linux capabilities, and seccomp set to RuntimeDefault. A default-deny NetworkPolicy locks down traffic so agents cannot reach anything you have not explicitly allowed. Validating webhooks catch misconfigurations before they ever hit the cluster. Gateway authentication tokens are auto-generated and stored in Kubernetes Secrets, sidestepping the Bonjour/mDNS pairing flow that does not work inside containers.

For organizations subject to compliance requirements, these defaults mean you spend less time writing policy exceptions and more time shipping.

Skills and Plugins, Declaratively

ClawHub skills can be declared directly in the CRD spec. The operator installs them on startup using npm: and pack: prefixes, so your agent arrives fully configured with the tools it needs — no manual post-deploy steps. Combined with config merge mode, the operator can deep-merge your declared settings with any runtime changes the agent makes to its own configuration, preserving self-modifications while still enforcing your baseline.

Sidecars for Every Occasion

The operator ships with optional sidecar containers that extend what an agent can do inside the cluster. A Chromium sidecar adds browser automation with persistent profile support — useful for agents that need to interact with web UIs. An Ollama sidecar provides local LLM inference with GPU resource allocation, letting you run models on-cluster without external API calls. A ttyd-based web terminal sidecar gives you a debugging shell without needing kubectl exec access. And Tailscale integration supports both Serve and Funnel modes with SSO authentication, making it straightforward to expose agents securely to external users.

Installation

You will need Kubernetes 1.28 or later and Helm 3. Installation is a single command:

helm install openclaw-operator \
  oci://ghcr.io/openclaw-rocks/charts/openclaw-operator \
  --namespace openclaw-operator-system \
  --create-namespace

After that, create a Secret with your API credentials (for example, ANTHROPIC_API_KEY) and apply an OpenClawInstance resource. The operator takes it from there, provisioning storage, networking, and the agent pod itself.

Observability and Scaling

Built-in Prometheus metrics and ServiceMonitor integration mean you can plug the operator into your existing monitoring stack immediately. Structured JSON logging and Kubernetes events provide visibility into what the operator is doing and why. Horizontal Pod Autoscaling works out of the box with CPU and memory metrics, and PodDisruptionBudgets keep your agents available during cluster maintenance. There is even an instance suspension feature that scales to zero and resumes instantly — handy for development environments or agents that only need to run during business hours.

Who Is This For?

If you are running a single OpenClaw agent on your laptop, you do not need this. But if you are a platform team managing agents for multiple projects, a startup deploying customer-facing AI assistants, or an enterprise that needs to keep agent workloads inside your own VPC, the operator removes a significant amount of undifferentiated heavy lifting. The combination of declarative configuration, automatic upgrades with rollback, and production-grade security defaults makes it considerably easier to treat AI agents like any other managed workload in your cluster.

The project is open source under Apache 2.0, actively maintained, and includes a detailed README, roadmap, security policy, and contributing guide on GitHub. If Kubernetes is already part of your stack, this is worth a look.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *