Claude Code as a Server Admin: From Migration to Daily Operations

Earlier this year I migrated my home server from Gentoo + Docker to Fedora Server 43 + MicroShift. Eight services: a full mail stack (Postfix, Dovecot, Rspamd, ClamAV), Nextcloud with Collabora Online, Vaultwarden, Paperless-NGX, Pihole, monitoring via Grafana Cloud — all on a single VPS, managed via GitOps with Flux CD.

Claude Code was my primary tool throughout. Not as a code generator I reviewed at arm’s length, but as something closer to a co-admin that I paired with in real time. This post is an honest look at what that actually looks like, with concrete examples.


The Numbers

The migration took roughly two weeks. When I wrote a one-month retrospective it came out to 368 commits across two repositories — 232 in the GitOps manifests repo, 136 in the runbook and documentation. Those commits don’t happen in isolation: each one is preceded by research, debugging, decision-making, and often several failed attempts.

Claude Code was involved in essentially all of it. Not in a “generate boilerplate” way, but in a “I’m seeing this error and I don’t know why” way.


What Migration Actually Looks Like

The biggest friction points during the migration were things that aren’t well-documented anywhere: MicroShift’s Security Context Constraints, a Btrfs quota interaction that crashed etcd every hour, and Dovecot 2.4’s breaking changes from 2.3. Each of these took hours of debugging. Here’s what the process looked like.

Debugging etcd Crashes

A few days into the migration, the cluster started becoming unresponsive every hour on the dot. The prompt was something like:

“MicroShift etcd is crashing every hour. journalctl -u microshift shows the process is killed. The last thing I see before the crash is a sequence of btrfs quota-related messages. The /var/lib/microshift subvolume has quotas enabled. What’s going on and how do I fix it?”

The result: a detailed explanation of why etcd is sensitive to filesystem latency spikes, why Btrfs quota accounting (especially with many snapshots) introduces exactly those spikes, and a concrete fix — disable quotas on the MicroShift subvolume, verify etcd can now write without latency outliers, adjust snapshot retention to prevent quota pressure from rebuilding. The crash never came back.

Mail Stack Service Name Collision

The mail stack involves six services in one namespace: Postfix, Dovecot, Rspamd, ClamAV, MariaDB, Redis. A subtle problem: Kubernetes injects environment variables for every service in the namespace using the service name as a prefix. If you name a service postfix, Kubernetes injects POSTFIX_PORT, POSTFIX_SERVICE_HOST, etc. — which collide with Postfix’s own environment variable conventions and break the daemon silently.

The prompt was roughly:

“My Postfix pod is starting but immediately mis-routing mail. It’s not reading the correct relay host from the config. I’ve checked the configuration file and it looks right. Could Kubernetes environment variables be interfering?”

Claude Code identified the collision immediately, explained the Kubernetes service env-var injection mechanism, and suggested the fix: rename postfixmail-postfix and dovecotmail-dovecot throughout the manifests. This is documented in the runbook now as a note future me will be glad to have.

Dovecot 2.4 Configuration

Dovecot 2.4 introduced breaking changes in how SQL authentication is configured — the old passdb sql { driver = sql ... } syntax is gone, replaced by a new auth_sql block. The error messages from Dovecot 2.4 when it encounters 2.3-style config are not helpful. The prompt:

“I’m migrating Dovecot config from 2.3 to 2.4. The old passdb/userdb SQL configuration no longer works. Dovecot starts but authentication fails silently. Here is the old config and the error I’m seeing in the logs.”

The result was a complete rewrite of the auth configuration for 2.4, with an explanation of what changed and why. Testing against the live pod confirmed it immediately.


MCP: Closing the Feedback Loop

After the migration, I set up two MCP servers that changed how I work with the cluster.

mcp-email: An email MCP server running as a pod in MicroShift, giving Claude Code direct IMAP/SMTP access to my mailbox. Instead of the loop of “run this command, paste the output”, Claude can read and send emails directly when I ask it to.

mcp-openshift: A Kubernetes/OpenShift MCP server (the openshift/openshift-mcp-server project) running in the cluster with cluster-admin rights. Claude can now inspect live cluster state, read pod logs, query resource definitions, and check events — without me serving as an intermediary.

A representative interaction from last week:

“The rspamd dashboard in Grafana shows a spike in rejected messages this morning. Can you check if rspamd is healthy, look at recent logs, and tell me what’s happening?”

Claude called pods_list (namespace: mailstack), identified the rspamd pod, called pods_log, found a pattern of rejections from a specific IP range, correlated it with the dashboard spike, and concluded it was a new spam campaign — not a configuration problem. Total time from question to answer: maybe 90 seconds, no manual command execution.

Compare that to the pre-MCP workflow: run kubectl logs, copy output, paste it in, wait for analysis, maybe run another command, paste again. The MCP servers don’t change what Claude can reason about — they change what Claude can observe without my help.


Day-to-Day Administration

Since the migration completed, Claude Code has handled:

  • Grafana dashboard fixes: Grafana Operator overwrites UI changes every ~10 minutes. The right fix is always in the GrafanaDashboard YAML, not the UI. When I spotted dashboard regressions, the workflow was: describe what’s wrong, Claude reads the YAML, proposes the correct panel or query change, I review and apply via GitOps.

  • Renovate integration: Automatic image updates via a GitHub Actions-based Renovate bot (every 6 hours). When a Renovate PR auto-merged but I wanted to verify the update was clean: “Check if the nextcloud pod came up healthy after today’s image update.” Claude reads the deployment status and pod events directly via MCP.

  • Script development: Several operational scripts live in the GitOps repo — snap-all (Btrfs snapshot across all subvolumes before changes), flux-upgrade.sh (automated Flux CLI + cluster upgrade), check-image-updates.sh (weekly digest of pending image updates via Telegram). These were developed iteratively: describe what the script needs to do, review the output, refine edge cases.

  • Security hardening: After spotting brute-force patterns in the mail logs, we went through a Postfix and Dovecot hardening session — adding reject_sender_login_mismatch, tightening fail2ban rules, switching fail2ban to the nftables backend. Each change was reasoned through before being applied, with the reasoning documented in the runbook.

  • This blog: Each post in this series — the MicroShift overview, the retrospective, the backup deep-dive, the fail2ban hardening post, the MCP server post — was written collaboratively with Claude Code. I describe what I want to cover, Claude drafts, I edit and rewrite the parts that don’t match how I actually think about the problem. The voice is mine; the speed of getting from “I should write about this” to “here’s a draft” is Claude’s.


What Works Well

Explaining failure modes. The debugging examples above are typical. I rarely come to Claude with “write me a manifest from scratch.” More often it’s “here is a thing that’s broken and here is what I know about it.” Claude is good at identifying failure modes I haven’t considered, especially in systems I know less well (Kubernetes internals, Postfix edge cases, Btrfs quota behavior).

Keeping documentation current. The runbook in the migration repository is ~550 lines across 12 files. Keeping it current as the system evolves is tedious but important. The workflow is “here’s what changed, update the relevant doc section.” I still review everything, but I don’t have to do the first draft.

GitOps discipline. With Claude Code in the loop, it’s easier to maintain the discipline of “everything goes through GitOps, nothing gets changed live and forgotten.” Claude will suggest committing a change to the repo rather than applying it directly, and will remind me to check that Flux has reconciled before declaring success.


What Doesn’t Work as Well

It doesn’t know what it doesn’t know. When a problem has an obscure root cause (the etcd/Btrfs quota issue, for instance), Claude sometimes gives a confident initial answer that turns out to be wrong. The fix is the same as with any colleague: stay skeptical, verify, and ask follow-up questions when something doesn’t add up. The MCP servers help here — live cluster state is harder to misinterpret than a description of it.

Context limits matter. A debugging session that spans many back-and-forth messages and large log outputs can hit context limits. I’ve learned to be selective about what I paste in, and to summarize intermediate findings rather than repeating raw output.


The Shift

The way I’d summarize the change: I used to do server administration and then document it. Now I do it with documentation happening in parallel, because the overhead of capturing decisions is low enough that it actually happens.

The 368 commits are partly a proxy for that. Each commit is a documented decision — not just a change, but a change with a commit message that explains why. That discipline existed before, but it’s easier to maintain when the tool you’re using to make the change is the same tool you’re using to write the explanation.

If you’re running a homelab at this level of complexity, Claude Code is worth evaluating seriously — not as a way to avoid understanding your systems, but as a way to engage with them more thoroughly than you could alone.