A complete walkthrough of running a Kubernetes-based self-hosted stack on a single VPS — WireGuard VPN, GitOps with Flux, Btrfs snapshots, wildcard TLS, and a full application suite including Nextcloud, Vaultwarden, Paperless, a mail stack, and monitoring. Everything bootstrapped from scratch. Every step documented.
This is written for a Linux administrator who wants to replicate the setup. All personal identifiers have been replaced with generic placeholders.
Last updated: 2026-05-15
Table of Contents
- Changelog
- Why MicroShift?
- Architecture Overview
- Prerequisites
- Step 1 — Base System
- Step 2 — Btrfs Storage Layout
- Step 3 — Firewall & WireGuard
- Step 4 — MicroShift
- Step 5 — GitOps with Flux CD
- Step 6 — Wildcard TLS with cert-manager + acme-dns
- Step 7 — Services
- Step 8 — Operations & Automation
- Step 9 — Backup & Disaster Recovery
- Key Lessons Learned
- Repository Structure
Changelog
What changed on 2026-05-15 (2):
- GitHub repository renamed from
gitopstohomelab - Manifest folder renamed from
gitops/toconfiguration/inside the repository; all Flux path references updated accordingly
What changed on 2026-05-15:
- snapper — extended to all 8 btrbk subvolumes (added
var,var_log,var_lib_containers); all configs use identical retention (24h/8d/5w);snap-allupdated accordingly - btrbk — local retention reduced:
snapshot_preserve_min latest,snapshot_preserve 2h 1d 0w; snapper handles local rollback, btrbk local snapshots only serve as parent reference for incremental send; remote retention (Pi) unchanged - check-backup-btrfs-subvolumes.sh — renamed from
check-btrbk-subvolumes.sh; extended to also verify every subvolume has a snapper config (not just btrbk) - pre-flux-snapshot.sh — now uses
snap-all(all 8 configs) instead ofsnapper -c root create - rspamd — memory limit raised from 512Mi to 768Mi; rspamd gets OOMKilled on every node reboot due to a startup memory spike exceeding the old limit
- postfixadmin — lifecycle postStart hook removed; the hook patched
config.local.phpto injectencrypt = system, butPOSTFIXADMIN_ENCRYPT=systemis natively supported bylibrary/postfixadmin:4.0.1-apache; the hook caused a restart on every node reboot (exit code 124, startup race: DB not yet ready →untilloop hangs → Kubernetes timeout) - grafana-operator — leader election disabled (
leaderElect: false); the operator was crashing ~3×/day (exit code 2) because the kube-apiserver lease renewal timed out during brief etcd compaction pauses; leader election is pointless on a single-node cluster
What changed on 2026-05-12:
- kindnet CNI — removed the broken
kindnet-fix.serviceworkaround; MicroShift 4.21 ships its own internal kindnet manifests with the wrong CIDR; fix:kustomizePathsoverride excludes/usr/lib/microshift/manifests.d/000-microshift-kindnet/ - snapper — extended to 5 subvolumes (root, home, data, var_lib_pvc, var_lib_microshift);
NUMBER_LIMITfor root reduced to 20;snap-allconvenience script - fail2ban — 3 new jails:
sshd-unknown(maxretry=1, 7d),dovecot-unknown(maxretry=1, 7d),postfix-rcpt-unknown(maxretry=3, 1h findtime, 7d) - Postfix / Dovecot —
hostPortreplaced byhostNetwork: true+dnsPolicy: ClusterFirstWithHostNetso fail2ban sees real client IPs;log-tailersidecar forkubectl logs - acme-dns — port 53 now via ClusterIP Service with
externalIPsinstead ofhostPort - Flux — version 2.8.3 → 2.8.6; all custom scripts centralised in
/data/scripts/(Btrfs-backed) with symlinks in/usr/local/bin/ - MicroShift config —
apiServer.logLevel: Warning+auditLoglimits (200 MB / 3 files / 7 days) - podman-image-cleanup — new weekly script (Mon 03:00) with Telegram report
Why MicroShift?
Standard Kubernetes (kubeadm, k3s, k0s) on a single VPS works, but MicroShift brings some specific advantages for an edge/single-node scenario:
- Minimal footprint: Ships without the heavy components (etcd cluster, controller-manager HA). Uses CRI-O and an embedded etcd.
- OpenShift semantics: Security Context Constraints (SCC) instead of PodSecurityAdmission. More expressive, but requires explicit SCC bindings for every workload.
- HAProxy ingress included:
openshift-ingress/router-defaulthandles TLS termination out of the box. - OVN-Kubernetes by default — but we replace it with kube-kindnet (simpler, single-node appropriate).
The tradeoff: SCCs require additional boilerplate for every service account, and kindnet’s default POD_SUBNET doesn’t match MicroShift’s pod CIDR — a one-line config override fixes this permanently.
Architecture Overview
graph TD
Internet["Internet<br/>443/80 HTTPS/HTTP · 25/587/993 Mail · 51820/udp WireGuard"]
subgraph VPS["Fedora Server 43 · 8 vCPU · 16 GiB · Btrfs pool"]
subgraph K8S["MicroShift 4.21 · OKD/SCOS"]
pihole["pihole — DNS server"]
vault["vaultwarden — Password manager"]
paper["paperless — Document management"]
nc["nextcloud — File sync"]
collab["collabora — Online Office"]
mail["mailstack — Mail server"]
mon["monitoring — Grafana"]
hp["homepage — Static website"]
cert["cert-manager — TLS automation"]
acme["acme-dns — ACME DNS-01"]
pihole ~~~ vault ~~~ paper ~~~ nc ~~~ collab ~~~ mail ~~~ mon ~~~ hp ~~~ cert ~~~ acme
end
WG["WireGuard wg0 · 10.0.0.1/24"]
acme ~~~ WG
end
Pi["Raspberry Pi · 10.0.0.12<br/>btrbk remote backup"]
Internet --> pihole
WG --> PiGitOps: All Kubernetes manifests live in a GitHub repository (github.com/youruser/homelab). Flux CD watches the repo and applies changes automatically. A GitHub Action creates a Btrfs snapshot before every push.
TLS: One wildcard certificate from Let’s Encrypt covers all domains. cert-manager + acme-dns handle DNS-01 challenges. A systemd timer syncs the certificate to every app namespace weekly.
Backups: btrbk creates hourly snapshots of all Btrfs subvolumes and sends them via SSH to a Raspberry Pi. grub-btrfs registers every snapshot in the GRUB menu for easy rollback.
Prerequisites
| Resource | Value |
|---|---|
| OS | Fedora Server 43 (fresh install) |
| vCPUs | 8 |
| RAM | 16 GiB |
| Disk | ≥ 512 GiB (Btrfs) |
| Network | Public IPv4, ports 22/80/443/51820 reachable |
| Access | Root SSH with key auth |
| DNS | A domain you control at a registrar (for ACME and ingress hostnames) |
| GitHub | Account with a private homelab repository |
| Grafana Cloud | Free tier account (for monitoring) |
| Telegram | Bot token + chat ID (for notifications — optional but recommended) |
The Raspberry Pi (remote backup target) is optional but strongly recommended for disaster recovery.
Step 1 — Base System
SSH Hardening
Move SSH to a non-standard port and disable password auth:
sed -i 's/^#Port 22/Port 222/' /etc/ssh/sshd_config
grep -q "^PermitRootLogin" /etc/ssh/sshd_config \
&& sed -i 's/^PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config \
|| echo "PermitRootLogin prohibit-password" >> /etc/ssh/sshd_config
grep -q "^PasswordAuthentication" /etc/ssh/sshd_config \
&& sed -i 's/^PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config \
|| echo "PasswordAuthentication no" >> /etc/ssh/sshd_config
# SELinux: allow sshd on port 222
dnf install -y policycoreutils-python-utils
semanage port -a -t ssh_port_t -p tcp 222
# Copy your authorized_keys, then:
systemctl restart sshd
Firewall exceptions for port 222 come in Step 3.
etckeeper
Version-control /etc from day one:
dnf install -y etckeeper
etckeeper init
etckeeper commit "initial fedora server setup"
All subsequent /etc changes should be followed by etckeeper commit "<description>". The git log becomes your authoritative change history.
Base Packages
dnf install -y \
vim-enhanced htop jq \
podman btrfs-progs \
fail2ban wireguard-tools \
snapper btrbk \
dnf5-automatic \
policycoreutils-python-utils
Sysctl Tuning
cat > /etc/sysctl.d/99-microshift.conf <<'EOF'
fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 16384
EOF
cat > /etc/sysctl.d/90-redis.conf <<'EOF'
vm.overcommit_memory = 1
EOF
cat > /etc/sysctl.d/90-wireguard.conf <<'EOF'
net.ipv4.ip_forward = 1
EOF
sysctl --system
vm.overcommit_memory=1 is required by Redis (used by Rspamd and Nextcloud). The inotify limits are needed by MicroShift/Flux at scale.
Telegram Notifications (optional)
Create a Telegram bot via @BotFather, start a chat and note your chat ID (use @userinfobot). Store credentials in /etc/telegramrc:
install -m 600 /dev/stdin /etc/telegramrc <<'EOF'
TOKEN=<YOUR_BOT_TOKEN>
CHATID=<YOUR_CHAT_ID>
EOF
sendtelegram.sh — reads /etc/telegramrc and sends a message via the Telegram Bot API:
#!/bin/bash
# Usage: sendtelegram.sh [-c configfile] [-t token] [-i chatid] [-m message]
while getopts ":c:t:i:p:m:v" opt; do
case "$opt" in
c) CONFIGFILE=$OPTARG ;; t) TOKEN_ARG=$OPTARG ;;
i) CHATID_ARG=$OPTARG ;; m) TEXT=$OPTARG ;;
p) PARSEMODE_ARG=$OPTARG ;; v) VERBOSE=1 ;;
esac
done
if [ -n "$CONFIGFILE" ]; then . "$CONFIGFILE"
elif [ -f /etc/telegramrc ]; then . /etc/telegramrc; fi
if [ -n "$TOKEN_ARG" ]; then TOKEN=$TOKEN_ARG; fi
if [ -n "$CHATID_ARG" ]; then CHATID=$CHATID_ARG; fi
URL="https://api.telegram.org/bot$TOKEN/sendMessage"
CMDARGS="chat_id=$CHATID&disable_web_page_preview=1&text=$TEXT"
[ -n "${PARSEMODE_ARG:-}" ] && CMDARGS="${CMDARGS}&parse_mode=$PARSEMODE_ARG"
curl -s --max-time 10 -d "$CMDARGS" "$URL" > /dev/null
install -m 755 sendtelegram.sh /usr/local/bin/sendtelegram.sh
sendtelegram.sh -m "Base system ready"
All other scripts call sendtelegram.sh or source /etc/telegramrc directly.
Automatic Updates
cat > /etc/dnf/automatic.conf <<'EOF'
[commands]
upgrade_type = default
random_sleep = 0
download_updates = yes
apply_updates = yes
reboot = when-needed
reboot_command = "shutdown -r +5 'Reboot after automatic update'"
[emitters]
emit_via = stdio
EOF
systemctl enable --now dnf5-automatic.timer
Mask passim
passim is a P2P cache for fwupd — useless on a headless VPS:
systemctl mask passim
etckeeper commit "base system: ssh/222, packages, sysctl, telegram, dnf-automatic"
snapper -c root create -d "basis-system-fertig"
Step 2 — Btrfs Storage Layout
Why Separate Subvolumes?
Granular subvolumes allow btrbk to snapshot and send only what changed. You can exclude volatile data (/var/cache, /var/tmp) from backups while still including critical data (/var/lib/microshift, PVCs, GitOps working tree).
Subvolume Creation
Fedora installs / as a root subvolume. Mount the top-level and add the rest:
mkdir -p /mnt/btrfs-top
mount -o subvolid=5 /dev/vda3 /mnt/btrfs-top
for sv in var home var_log var_cache var_tmp \
var_lib_microshift var_lib_pvc var_lib_containers var_lib_kubelet \
data btrbk_snapshots; do
btrfs subvolume create /mnt/btrfs-top/$sv
done
umount /mnt/btrfs-top
/etc/fstab Extension
Get your Btrfs UUID with blkid /dev/vda3, then add to /etc/fstab:
UUID=<btrfs-uuid> /var btrfs subvol=var,compress=zstd:1 0 0
UUID=<btrfs-uuid> /home btrfs subvol=home,compress=zstd:1 0 0
UUID=<btrfs-uuid> /var/log btrfs subvol=var_log 0 0
UUID=<btrfs-uuid> /var/cache btrfs subvol=var_cache 0 0
UUID=<btrfs-uuid> /var/tmp btrfs subvol=var_tmp 0 0
UUID=<btrfs-uuid> /var/lib/microshift btrfs subvol=var_lib_microshift 0 0
UUID=<btrfs-uuid> /var/lib/pvc btrfs subvol=var_lib_pvc 0 0
UUID=<btrfs-uuid> /var/lib/containers btrfs subvol=var_lib_containers 0 0
UUID=<btrfs-uuid> /var/lib/kubelet btrfs subvol=var_lib_kubelet 0 0
UUID=<btrfs-uuid> /data btrfs subvol=data 0 0
UUID=<btrfs-uuid> /mnt/btrfs-top btrfs subvolid=5,noauto 0 0
mkdir -p /var/lib/{microshift,pvc,containers,kubelet} /data /mnt/btrfs-top
systemctl daemon-reload && mount -a
SELinux Context for PVCs
The local-path-provisioner init container (busybox) cannot run chcon. Set the SELinux label from the host:
semanage fcontext -a -t container_file_t "/var/lib/pvc(/.*)?"
restorecon -Rv /var/lib/pvc
snapper — Timeline Snapshots
Eight subvolumes get timeline snapshots — all with identical retention (24h/8d/5w, numbered limit 20):
# Root
snapper -c root create-config /
snapper -c root set-config \
TIMELINE_CREATE=yes TIMELINE_CLEANUP=yes \
TIMELINE_LIMIT_HOURLY=24 TIMELINE_LIMIT_DAILY=8 \
TIMELINE_LIMIT_WEEKLY=5 TIMELINE_LIMIT_MONTHLY=0 TIMELINE_LIMIT_YEARLY=0 \
NUMBER_CLEANUP=yes NUMBER_LIMIT=20
# All other subvolumes — same retention
for CFG_SUBVOL in \
"home:/home" "data:/data" \
"var_lib_pvc:/var/lib/pvc" "var_lib_microshift:/var/lib/microshift" \
"var:/var" "var_log:/var/log" "var_lib_containers:/var/lib/containers"; do
CFG="${CFG_SUBVOL%%:*}"
SUBVOL="${CFG_SUBVOL##*:}"
snapper -c "$CFG" create-config "$SUBVOL"
snapper -c "$CFG" set-config \
TIMELINE_CREATE=yes TIMELINE_CLEANUP=yes \
TIMELINE_LIMIT_HOURLY=24 TIMELINE_LIMIT_DAILY=8 \
TIMELINE_LIMIT_WEEKLY=5 TIMELINE_LIMIT_MONTHLY=0 TIMELINE_LIMIT_YEARLY=0 \
NUMBER_CLEANUP=yes NUMBER_LIMIT=20
done
systemctl enable --now snapper-timeline.timer snapper-cleanup.timer
snap-all — convenience script that creates a numbered snapshot across all eight configs at once (used before risky changes):
cat > /usr/local/bin/snap-all <<'EOF'
#!/bin/bash
set -euo pipefail
DESC="${1:?Verwendung: snap-all <beschreibung>}"
for cfg in root home data var_lib_pvc var_lib_microshift var var_log var_lib_containers; do
printf " %-22s ... " "$cfg"
snapper -c "$cfg" create --cleanup-algorithm number --description "$DESC"
echo "ok"
done
EOF
chmod 755 /usr/local/bin/snap-all
Usage: snap-all "before-risky-change" — creates one snapshot per config, all with --cleanup-algorithm number so they count against NUMBER_LIMIT and are eventually pruned automatically.
grub-btrfs — Rollback from GRUB
grub-btrfs is not in Fedora repos — build from source:
dnf install -y make gettext
git clone https://github.com/Antynea/grub-btrfs /root/git/grub-btrfs
cd /root/git/grub-btrfs && make install
# Fedora-specific paths
sed -i \
-e 's|#GRUB_BTRFS_GRUB_DIRNAME=.*|GRUB_BTRFS_GRUB_DIRNAME="/boot/grub2"|' \
-e 's|#GRUB_BTRFS_SCRIPT_CHECK=.*|GRUB_BTRFS_SCRIPT_CHECK=grub2-script-check|' \
/etc/default/grub-btrfs/config
systemctl enable --now grub-btrfsd.service
grub2-mkconfig -o /boot/grub2/grub.cfg
The daemon watches /.snapshots via inotify and adds new snapshots to the GRUB menu automatically.
btrbk — Snapshots + Remote Backup
cat > /etc/btrbk/btrbk.conf <<'EOF'
timestamp_format long
# Local: keep only the latest snapshot as parent reference for incremental send
# snapper handles local rollback, btrbk local snapshots are minimal
snapshot_preserve_min latest
snapshot_preserve 2h 1d 0w
# Remote (Pi): full retention history
target_preserve_min 1h
target_preserve 24h 8d 5w
ssh_identity /root/.ssh/id_ed25519
volume /mnt/btrfs-top
snapshot_dir btrbk_snapshots
subvolume root
subvolume var
subvolume home
subvolume var_log
subvolume var_lib_pvc
subvolume var_lib_microshift
subvolume var_lib_containers
subvolume data
target send-receive ssh://backupuser@10.0.0.12/backup/btrfs/server
EOF
systemctl enable --now btrbk.timer
The remote target (Raspberry Pi) is only reachable after WireGuard is configured (Step 3). Until then, btrbk creates local snapshots only.
⚠️ Do NOT Enable Btrfs Quotas
This is a hard-won lesson: Btrfs quotas and etcd are incompatible.
When quotas are enabled, every snapshot deletion (btrbk and snapper-cleanup run hourly/daily) triggers a qgroup rescan. With hundreds of snapshots across many subvolumes, this rescan can block Btrfs metadata operations — including fsync — for up to 15 minutes.
etcd fails with DeadlineExceeded if fsync stalls for more than 5 seconds → kube-apiserver hangs → MicroShift crashes. The MicroShift restart loop persists until the rescan completes, then crashes again at the next btrbk run.
Symptom: MicroShift crashes at the same time every hour, exactly 15 minutes after btrbk runs. journalctl -u microshift -n 100 shows etcd fsync errors.
Fix: btrfs quota disable /
btrfs-list (a useful subvolume overview tool) works without quotas — the REFER/EXCL columns show empty, but all other data is present.
Why Btrfs has this problem (and ZFS doesn’t): ZFS encodes block ownership in the block itself at write time (birth_txg). Space accounting is O(1) per write, always consistent, and snapshot deletion is processed asynchronously via a per-snapshot “deadlist” — no global rescan, never blocking. Btrfs introduced snapshots without built-in ownership metadata, so qgroups must walk backreferences retroactively to determine who owns what — expensive and blocking.
What about simple quotas (squota)? Since kernel 6.7, Btrfs offers an alternative mode (btrfs quota enable --simple /) that attributes extents permanently to their creating subvolume — no backref walking, no rescan, O(1) per operation. This is safe for etcd. The trade-off: snapshots show ~0 exclusive usage (all extents stay attributed to the original subvolume), so snapshot size measurement is not possible. For this setup, quotas remain disabled — squota doesn’t provide useful space accounting for snapshots either.
| full qgroup | squota | disabled | ZFS | |
|---|---|---|---|---|
| How ownership is determined | backref walk (expensive) | creator subvolume (O(1)) | — | birth_txg (O(1)) |
| When accounting happens | retroactively (rescan) | inline | — | inline, atomic |
| Snapshot delete | blocking rescan | no rescan | — | async deadlist |
| Numbers always correct | no (inconsistent flag) | no (snapshots ~0) | — | yes |
| etcd-safe | no | yes | yes | — |
| Snapshot size measurable | yes | no | no | yes |
/boot Backup
cat > /usr/local/sbin/boot-backup.sh <<'EOF'
#!/bin/bash
set -euo pipefail
DEST=/var/lib/boot-backup
mkdir -p "$DEST/boot" "$DEST/efi"
rsync -aAX --delete /boot/ "$DEST/boot/"
rsync -aAX --delete /boot/efi/ "$DEST/efi/"
sfdisk --dump /dev/vda > "$DEST/partition-table.sfdisk"
sgdisk --backup="$DEST/partition-table.sgdisk" /dev/vda
EOF
chmod 755 /usr/local/sbin/boot-backup.sh
# systemd timer (daily + after every dnf5-automatic run)
# See full unit files in the repo
systemctl enable --now boot-backup.timer
Step 3 — Firewall & WireGuard
firewalld Zones
The setup uses three zones:
| Zone | Interface/Source | Purpose |
|---|---|---|
FedoraServer | ens3 (public NIC) | External traffic, explicit whitelist |
internal | wg0 (WireGuard) | VPN clients — trusted, full access |
trusted | 10.42.0.0/16 (Pod CIDR) | Pod-to-pod + kubelet traffic |
# FedoraServer zone — remove defaults, add only what's needed
firewall-cmd --permanent --zone=FedoraServer --remove-service=cockpit
firewall-cmd --permanent --zone=FedoraServer --remove-service=dhcpv6-client
# Custom SSH service on port 222
firewall-cmd --permanent --new-service=myssh
firewall-cmd --permanent --service=myssh --add-port=222/tcp
firewall-cmd --permanent --zone=FedoraServer --add-service=myssh
firewall-cmd --permanent --zone=FedoraServer --remove-service=ssh
# HTTP(S) for ingress
firewall-cmd --permanent --zone=FedoraServer --add-port=80/tcp
firewall-cmd --permanent --zone=FedoraServer --add-port=443/tcp
# Mail ports
for p in 25 587 993; do
firewall-cmd --permanent --zone=FedoraServer --add-port=${p}/tcp
done
# WireGuard
firewall-cmd --permanent --zone=FedoraServer --add-port=51820/udp
# acme-dns (DNS-01 challenge server)
firewall-cmd --permanent --zone=FedoraServer --add-port=53/tcp
firewall-cmd --permanent --zone=FedoraServer --add-port=53/udp
# NAT for WireGuard clients
firewall-cmd --permanent --zone=FedoraServer --add-masquerade
# internal zone — wg0 interface
firewall-cmd --permanent --zone=internal --add-interface=wg0
for p in 25 53 80 222 443 587 993 6443 8000 10250; do
firewall-cmd --permanent --zone=internal --add-port=${p}/tcp
done
firewall-cmd --permanent --zone=internal --add-port=53/udp
# trusted zone — Pod CIDR
firewall-cmd --permanent --zone=trusted --add-source=10.42.0.0/16
firewall-cmd --permanent --zone=trusted --add-source=169.254.169.1/32
firewall-cmd --permanent --zone=trusted --add-port=30000-32767/tcp
Cross-Zone Forwarding Policies
firewalld requires explicit policies for cross-zone forwarding — plain rules don’t cover forwarded packets:
# WireGuard clients → Internet (NAT)
firewall-cmd --permanent --new-policy=wg-to-internet
firewall-cmd --permanent --policy=wg-to-internet --add-ingress-zone=internal
firewall-cmd --permanent --policy=wg-to-internet --add-egress-zone=FedoraServer
firewall-cmd --permanent --policy=wg-to-internet --set-target=ACCEPT
# WireGuard clients → Pods (via HAProxy DNAT)
firewall-cmd --permanent --new-policy=wg-to-cluster
firewall-cmd --permanent --policy=wg-to-cluster --add-ingress-zone=internal
firewall-cmd --permanent --policy=wg-to-cluster --add-egress-zone=trusted
firewall-cmd --permanent --policy=wg-to-cluster --set-target=ACCEPT
firewall-cmd --reload
Critical: The --add-interface=wg0 for the internal zone must be --permanent. Without it, after a reload the policy zone assignment breaks and VPN clients lose cluster access.
WireGuard Server
cd /etc/wireguard && umask 077
wg genkey | tee server.key | wg pubkey > server.pub
/etc/wireguard/wg0.conf:
[Interface]
Address = 10.0.0.1/24
ListenPort = 51820
PrivateKey = <CONTENTS OF server.key>
PostUp = iptables -A FORWARD -i %i -j ACCEPT; iptables -A FORWARD -o %i -j ACCEPT
PostDown = iptables -D FORWARD -i %i -j ACCEPT; iptables -D FORWARD -o %i -j ACCEPT
[Peer]
# Example client
PublicKey = <client-pubkey>
AllowedIPs = 10.0.0.2/32
# Add one [Peer] block per device
Client configuration (use the WireGuard app):
- Endpoint:
yourserver.example.com:51820 - AllowedIPs:
0.0.0.0/0(route all traffic through VPN) - DNS:
10.0.0.1(Pihole — deployed later; fall back to1.1.1.1initially)
chmod 600 /etc/wireguard/wg0.conf
systemctl enable --now wg-quick@wg0
fail2ban
Phase 1: Base config (SSH only, before mailstack)
cat > /etc/fail2ban/jail.local <<'EOF'
[DEFAULT]
bantime = 86400
findtime = 259200
maxretry = 3
backend = auto
action = firewallcmd-rich-rules[actiontype=<multiport>]
[sshd]
enabled = true
port = 222
logpath = %(sshd_log)s
EOF
systemctl enable --now fail2ban
Phase 2: After mailstack deployment
The mail logs live inside a PVC mounted at /var/lib/pvc/<pvc-uuid>_mailstack_mail-logs/. Get the UUID:
kubectl get pvc -n mailstack mail-logs -o jsonpath='{.spec.volumeName}'
Add to /etc/fail2ban/jail.local (replace <PVC_UUID> with the value above):
[postfix]
enabled = true
port = smtp,submission
backend = polling
logpath = /var/lib/pvc/<PVC_UUID>_mailstack_mail-logs/postfix.log
[postfix-sasl]
enabled = true
port = smtp,submission
backend = polling
logpath = /var/lib/pvc/<PVC_UUID>_mailstack_mail-logs/postfix.log
[dovecot]
enabled = true
port = imaps
backend = polling
logpath = /var/lib/pvc/<PVC_UUID>_mailstack_mail-logs/dovecot.log
[postfix-sasl-unknown]
enabled = true
filter = postfix-sasl-unknown
action = firewallcmd-rich-rules[actiontype=<allports>]
ignorecommand = /usr/local/bin/fail2ban-sasl-mysql-check.sh <ip>
backend = polling
logpath = /var/lib/pvc/<PVC_UUID>_mailstack_mail-logs/postfix.log
maxretry = 1
bantime = 604800
findtime = 86400
[sshd-unknown]
enabled = true
port = 222
backend = systemd
maxretry = 1
bantime = 604800
[dovecot-unknown]
enabled = true
port = imaps
filter = dovecot
backend = polling
logpath = /var/lib/pvc/<PVC_UUID>_mailstack_mail-logs/dovecot.log
maxretry = 1
bantime = 604800
[postfix-rcpt-unknown]
enabled = true
port = smtp,submission
filter = postfix
backend = polling
logpath = /var/lib/pvc/<PVC_UUID>_mailstack_mail-logs/postfix.log
maxretry = 3
findtime = 3600
bantime = 604800
postfix-sasl-unknown — smart SASL jail
This custom jail bans on the first failed SASL login, but only if the attempted username does not exist in the mailbox database. Legitimate users who mistype their password from a new IP are not banned; automated scanners using random addresses are blocked immediately for 7 days.
Custom filter /etc/fail2ban/filter.d/postfix-sasl-unknown.conf:
[INCLUDES]
before = common.conf
[Definition]
failregex = warning: \S+\[<HOST>\](?::\d+)?: SASL \S+ authentication failed: .*, sasl_username=<F-USER>\S+</F-USER>
ignoreregex =
datepattern = %%b %%d %%H:%%M:%%S
{^LN-BEG}
MySQL check script /usr/local/bin/fail2ban-sasl-mysql-check.sh (return 0 = do NOT ban, return 1 = ban):
#!/bin/bash
IP="$1"
LOG="/var/lib/pvc/<PVC_UUID>_mailstack_mail-logs/postfix.log"
source /etc/fail2ban/.mysql-sasl-check.conf
USER=$(grep -F "[$IP]" "$LOG" | grep "SASL.*authentication failed" | \
grep -oP 'sasl_username=\K\S+' | tail -1)
[ -z "$USER" ] && exit 1
echo "$USER" | grep -qP '^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$' || exit 1
SAFE_USER="${USER//\'/\'\'}"
COUNT=$(mysql -h "$MYSQL_HOST" -u "$MYSQL_USER" -p"$MYSQL_PASS" "$MYSQL_DB" -sN \
-e "SELECT COUNT(*) FROM mailbox WHERE username='$SAFE_USER' AND active=1" 2>/dev/null)
[ "$COUNT" = "1" ] && exit 0 || exit 1
chmod 755 /usr/local/bin/fail2ban-sasl-mysql-check.sh
MySQL credentials in /etc/fail2ban/.mysql-sasl-check.conf (chmod 600). The MYSQL_HOST is the cluster IP of the maildb service:
# MYSQL_HOST: kubectl get svc -n mailstack maildb -o jsonpath='{.spec.clusterIP}'
install -m 600 /dev/stdin /etc/fail2ban/.mysql-sasl-check.conf <<'EOF'
MYSQL_HOST=<maildb-cluster-ip>
MYSQL_USER=postfix
MYSQL_PASS=<postfix-db-password>
MYSQL_DB=postfix
EOF
systemctl restart fail2ban
Auto-unban WireGuard clients
Prevent VPN lockout from SSH brute-force detection:
cat > /usr/local/bin/unban-fail2ban-clients.sh <<'EOF'
#!/bin/bash
for JAIL in $(fail2ban-client status | awk -F: '/Jail list/ {print $2}' | tr -d ',\t'); do
for IP in $(fail2ban-client status "$JAIL" | awk -F: '/Banned IP list/ {print $2}'); do
case "$IP" in
10.0.0.*) fail2ban-client set "$JAIL" unbanip "$IP" ;;
esac
done
done
EOF
chmod 755 /usr/local/bin/unban-fail2ban-clients.sh
# /etc/systemd/system/unban-fail2ban-clients.timer
[Unit]
Description=Hourly unban of WireGuard IPs
[Timer]
OnCalendar=hourly
Persistent=true
[Install]
WantedBy=timers.target
systemctl enable --now unban-fail2ban-clients.timer
Step 4 — MicroShift
Install
dnf copr enable -y @redhat-et/microshift
dnf install -y microshift openshift-clients # installs kubectl + oc
Configuration: Replace CNI and Storage
MicroShift defaults to OVN-Kubernetes (CNI) and TopoLVM (storage). Both are replaced:
install -m 644 /dev/stdin /etc/microshift/config.yaml <<'EOF'
network:
cniPlugin: "none"
storage:
driver: none
optionalCsiComponents:
- none
apiServer:
logLevel: Warning
auditLog:
maxFileSize: 200
maxFiles: 3
maxFileAge: 7
EOF
MicroShift 4.21+ ships internal kindnet manifests under /usr/lib/microshift/manifests.d/000-microshift-kindnet/ with POD_SUBNET=10.244.0.0/16 (kind’s default). cniPlugin: none disables OVN-K but not this internal kindnet set. Override kustomizePaths to exclude it:
install -m 644 /dev/stdin /etc/microshift/config.d/02-kindnet-paths.yaml <<'EOF'
manifests:
kustomizePaths:
- /usr/lib/microshift/manifests
- /usr/lib/microshift/manifests.d/000-microshift-kube-proxy
- /etc/microshift/manifests
- /etc/microshift/manifests.d/*
EOF
This keeps kube-proxy from the system set while letting your own kindnet manifest (with the correct CIDR) be the sole source of truth.
CNI: kube-kindnet
Place the kindnet DaemonSet manifest in /etc/microshift/manifests/kindnet.yaml. The critical configuration:
# In the DaemonSet env section:
- name: POD_SUBNET
value: "10.42.0.0/16"
Why not the default 10.244.0.0/16? MicroShift internally uses 10.42.0.0/16 for the pod CIDR. kindnet’s KIND-MASQ-AGENT chain uses this value to decide which traffic to masquerade. If it doesn’t match, requests from specific IP ranges (e.g., Pihole’s whitelist logic) are incorrectly masqueraded.
kindnet manifest
Place the kindnet DaemonSet in /etc/microshift/manifests/kindnet.yaml as two documents separated by --- (Namespace + DaemonSet). The critical env var:
Storage: local-path-provisioner
Use the standard rancher/local-path-provisioner manifest with two Fedora-specific changes:
- Full registry paths: Fedora blocks short image names.
rancher/local-path-provisioner:v0.x→docker.io/rancher/local-path-provisioner:v0.x,busybox:latest→docker.io/library/busybox:latest. - Remove
chconfrom the setup script: busybox doesn’t havechcon. The SELinux label for/var/lib/pvcwas set bysemanagein Step 2 — no runtimechconneeded.
Add SCC bindings for privileged and hostmount-anyuid in the same manifest.
Start MicroShift
systemctl enable --now microshift
export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
echo 'export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig' >> /root/.bashrc
echo 'alias k=kubectl' >> /root/.bashrc
kubectl get pods -A -w
Expected system pods once stable:
openshift-ingress/router-default— HAProxyopenshift-dns/dns-default+node-resolverkube-kindnet/kube-kindnet-ds-*kube-system/csi-snapshot-controllerlocal-path-storage/local-path-provisioner
Step 5 — GitOps with Flux CD
SSH Key for GitHub
ssh-keygen -t ed25519 -C "fedora-server" -f /root/.ssh/id_ed25519 -N ""
cat /root/.ssh/id_ed25519.pub
# → Add as Deploy Key (with write access) to github.com/youruser/homelab
SCC Bindings for Flux
MicroShift’s SCCs reject Flux’s default fsGroup: 1337. Create bindings before bootstrapping, so Flux pods can start:
cat > /etc/microshift/manifests/flux-scc.yaml <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
name: flux-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: flux-source-controller-scc
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:openshift:scc:privileged
subjects:
- kind: ServiceAccount
name: source-controller
namespace: flux-system
---
# Repeat for: kustomize-controller, helm-controller, notification-controller
EOF
systemctl restart microshift
Flux Bootstrap
curl -s https://fluxcd.io/install.sh | FLUX_VERSION=2.8.6 bash -s /usr/local/bin
flux bootstrap git \
--url=ssh://git@github.com/youruser/homelab \
--branch=main \
--path=configuration/ \
--private-key-file=/root/.ssh/id_ed25519 \
--silent
Clone the working tree to /data (backed by Btrfs, included in backups):
cd /data && git clone git@github.com:youruser/homelab.git .
Telegram Notifications for Flux Events
kubectl create secret generic telegram-secret -n flux-system \
--from-literal=token="$(awk -F= '/^TOKEN=/{print $2}' /etc/telegramrc)"
Then create a Provider (type: telegram) and Alert resource in the flux-system namespace. Key config: use exclusionList to filter transient retry messages (otherwise every health-check failure sends a notification):
# Alert.spec.exclusionList:
- ".*Reconciliation in progress.*"
- ".*retry budget.*"
Pre-Update Btrfs Snapshot via GitHub Action
Before every push to main, a GitHub Action SSHes to the server and creates a Btrfs snapshot. This gives you a clean rollback point if a Flux reconciliation breaks something.
Restricted SSH key (command-restricted, no PTY):
ssh-keygen -t ed25519 -f /root/.ssh/github-actions -N "" -C "gh-actions-snapshot"
KEY=$(cat /root/.ssh/github-actions.pub)
echo "command=\"/usr/local/bin/pre-flux-snapshot.sh\",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty $KEY" \
>> /root/.ssh/authorized_keys
cat > /data/scripts/pre-flux-snapshot.sh <<'EOF'
#!/bin/bash
export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
flux suspend ks --all
snapper -c root create -d "vor-flux-update-$(date +%Y%m%d-%H%M%S)"
flux resume ks --all
EOF
chmod 755 /data/scripts/pre-flux-snapshot.sh
ln -s /data/scripts/pre-flux-snapshot.sh /usr/local/bin/pre-flux-snapshot.sh
Store the private key as FEDORA_SSH_KEY secret in the GitHub repo.
Renovate Bot — Automatic Image Updates
Renovate runs as a GitHub Action every 6 hours and opens PRs for new container image tags. Configuration:
{
"extends": ["config:base"],
"automergeType": "pr",
"prHourlyLimit": 2,
"packageRules": [
{
"matchUpdateTypes": ["patch", "minor"],
"automerge": true,
"minimumReleaseAge": "3 days"
},
{
"matchUpdateTypes": ["major"],
"automerge": false
}
]
}
The Btrfs snapshot GitHub Action runs before Renovate’s auto-merges, so every automated image update has a rollback point.
Step 6 — Wildcard TLS with cert-manager + acme-dns
Architecture
architecture-beta group certns(cloud)[cert_manager namespace] service cname(internet)[acme challenge CNAME] service acmedns(server)[acme_dns port 53] in certns service issuer(server)[ClusterIssuer DNS 01] in certns service secret(disk)[wildcard tls Secret] in certns service sync(server)[sync wildcard tls] service appns(disk)[app namespace secrets] cname:R --> L:acmedns issuer:T --> B:acmedns issuer:B --> T:secret secret:B --> T:sync sync:B --> T:appns
Why acme-dns instead of direct DNS-01? Most domain registrars don’t offer a usable DNS API for cert-manager. acme-dns is a minimal DNS server that only handles TXT records for ACME challenges — you point a CNAME at it once, and cert-manager does the rest via a stable API.
Deploy acme-dns via Flux
Create gitops/acme-dns/ manifests (Deployment, Service, PVC, ConfigMap) and apply via Flux. Key points:
- Port 53 exposed via a ClusterIP Service with
externalIPs: [<YOUR_SERVER_IP>]— kube-proxy creates a single DNAT rule directly on the public IP; nohostPortneeded, RollingUpdate strategy works because the Service endpoint is managed by kube-proxy independently of the pod lifecycle - PVC for the SQLite credential database
Register one account per apex domain:
POD=$(kubectl get pod -n acme-dns -l app=acme-dns -o name)
for DOMAIN in example.com example.net; do
kubectl exec -n acme-dns "$POD" -- curl -s -X POST http://localhost:8081/register \
-H 'Content-Type: application/json' -d "{}"
done
# Save the JSON output — contains username, password, fulldomain per domain
Store as a Kubernetes secret:
install -m 600 <json-file> /etc/acme-dns/acmedns.json
kubectl create secret generic acmedns-credentials -n cert-manager \
--from-file=acmedns.json=/etc/acme-dns/acmedns.json
CNAME Records (one-time)
At your registrar, for each apex domain:
_acme-challenge.example.com CNAME <fulldomain-from-json>
cert-manager and Certificate
Deploy cert-manager via Helm (Flux HelmRelease). The Certificate resource covers all your domains:
spec:
secretName: wildcard-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- "*.example.com"
- example.com
- "*.example.net"
- example.net
Sync to App Namespaces
Kubernetes requires TLS secrets to be in the same namespace as the Ingress. Sync the wildcard secret weekly:
cat > /usr/local/bin/sync-wildcard-tls.sh <<'EOF'
#!/bin/bash
set -euo pipefail
export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
NAMESPACES=(pihole vaultwarden nextcloud paperless collabora mailstack homepage)
TMP=$(mktemp); trap 'rm -f "$TMP"' EXIT
kubectl get secret wildcard-tls -n cert-manager -o json > "$TMP"
for NS in "${NAMESPACES[@]}"; do
jq --arg ns "$NS" '
.metadata.name = ($ns + "-tls")
| .metadata.namespace = $ns
| del(.metadata.resourceVersion, .metadata.uid,
.metadata.creationTimestamp, .metadata.ownerReferences,
.metadata.managedFields)
' "$TMP" | kubectl apply -f -
done
EOF
chmod 755 /usr/local/bin/sync-wildcard-tls.sh
# systemd timer: every Monday 00:00
systemctl enable --now sync-wildcard-tls.timer
/usr/local/bin/sync-wildcard-tls.sh # run immediately
When adding a new service with a TLS ingress: add its namespace to NAMESPACES=(), commit, and run the script once manually.
Step 7 — Services
All services follow the same pattern:
- Manifests in
configuration/<name>/+ Kustomization CRconfiguration/<name>-ks.yaml - Secrets created manually with
kubectl create secret(never in git) flux reconcile kustomization <name> --with-source- Verify with
kubectl get pods -n <namespace>
Here are the notable per-service details:
Pihole
Runs as the DNS server for all WireGuard clients. Uses hostPort: 53 on the WireGuard interface so clients can point directly to 10.0.0.1 as their DNS server.
# configuration/pihole/deployment.yaml (excerpt)
strategy:
type: Recreate # prevents Pending state when hostPort: 53 is already bound
containers:
- name: pihole
image: docker.io/pihole/pihole:latest
env:
- name: FTLCONF_dns_upstreams
value: "8.8.8.8;8.8.4.4"
- name: FTLCONF_dns_listeningMode
value: "ALL"
- name: FTLCONF_webserver_api_password
valueFrom:
secretKeyRef:
name: pihole-password
key: FTLCONF_webserver_api_password
ports:
- containerPort: 53
hostPort: 53
hostIP: 10.0.0.1 # bind to WireGuard interface only
protocol: UDP
name: dns-udp
- containerPort: 53
hostPort: 53
hostIP: 10.0.0.1
protocol: TCP
name: dns-tcp
- containerPort: 80
name: http
securityContext:
privileged: true # required for FTL (NET_ADMIN)
volumeMounts:
- name: pihole-data
mountPath: /etc/pihole
- name: dnsmasq-custom
mountPath: /etc/dnsmasq.d/99-custom.conf
subPath: 99-custom.conf
# configuration/pihole/ingress.yaml
metadata:
annotations:
route.openshift.io/termination: "edge"
haproxy.router.openshift.io/ip_whitelist: "10.0.0.0/24 <CORP_IP_RANGE>"
spec:
tls:
- hosts: [dnsconfig.example.com]
secretName: pihole-tls
rules:
- host: dnsconfig.example.com
# configuration/pihole/scc.yaml — required for privileged workloads
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: pihole-privileged-scc
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:openshift:scc:privileged
subjects:
- kind: ServiceAccount
name: default
namespace: pihole
After deploying: add local DNS entries in Pihole for all services (so service.example.com resolves to 10.0.0.1 for VPN clients, bypassing public DNS).
Vaultwarden
Small footprint: 2 Gi PVC, 100 Mi memory limit. Access restricted to VPN + any additional IP ranges.
# configuration/vaultwarden/deployment.yaml (excerpt)
containers:
- name: vaultwarden
image: docker.io/vaultwarden/server:1.35.7
env:
- name: WEBSOCKET_ENABLED
value: "true"
- name: ROCKET_ADDRESS
value: "0.0.0.0"
resources:
limits:
memory: 100Mi
volumeMounts:
- name: data
mountPath: /data
strategy:
type: Recreate # important: only one writer at a time
# configuration/vaultwarden/ingress.yaml (excerpt)
annotations:
haproxy.router.openshift.io/ip_whitelist: "10.0.0.0/24 <CORP_IP_RANGE>"
haproxy.router.openshift.io/timeout: "300s"
Admin token: openssl rand -base64 48 | tr -d '\n'
Paperless-NGX
- Four PVCs:
data(PostgreSQL),media,consume,export - Critical for restore: always restore
paperless-dataandpaperless-mediafrom the same point in time — they must stay in sync
Nextcloud + MariaDB
# configuration/nextcloud/deployment.yaml (excerpt)
containers:
- name: nextcloud
image: docker.io/nextcloud:33-apache
env:
- name: MYSQL_HOST
value: mariadb
- name: MYSQL_DATABASE
value: owncloud
- name: MYSQL_USER
value: owncloud
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: nextcloud-secret
key: MYSQL_PASSWORD
- name: NEXTCLOUD_TRUSTED_DOMAINS
value: cloud.example.com
- name: REDIS_HOST
value: redis
- name: PHP_UPLOAD_LIMIT
value: 10G
- name: PHP_MEMORY_LIMIT
value: 1G
- name: APACHE_DISABLE_REWRITE_IP
value: "1"
- name: TRUSTED_PROXIES
value: 10.42.0.0/16 # pod CIDR — required for HAProxy reverse-proxy headers
resources:
limits:
memory: 2Gi
# configuration/nextcloud/ingress.yaml
annotations:
haproxy.router.openshift.io/timeout: "300s"
haproxy.router.openshift.io/proxy-body-size: "10g"
haproxy.router.openshift.io/hsts_header: max-age=15552000;includeSubDomains;preload
Two CronJobs in the same file — Nextcloud’s background job runner (every 5 min) and automatic app updates (daily at 03:00):
# configuration/nextcloud/cronjob.yaml
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: nextcloud-cron
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
securityContext:
runAsUser: 33 # www-data
runAsGroup: 33
containers:
- name: nextcloud-cron
image: docker.io/nextcloud:33-apache
command: ["php", "-f", "/var/www/html/cron.php"]
volumeMounts:
- {name: html, mountPath: /var/www/html}
- {name: data, mountPath: /var/www/html/data}
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: nextcloud-app-update
spec:
schedule: "0 3 * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
securityContext:
runAsUser: 33
runAsGroup: 33
containers:
- name: nextcloud-app-update
image: docker.io/nextcloud:33-apache
command: ["php", "-f", "/var/www/html/occ", "app:update", "--all"]
Mailstack
The most complex service — seven interdependent components:
architecture-beta group mailstack(cloud)[mailstack namespace] service db(database)[MariaDB] in mailstack service pa(server)[PostfixAdmin] in mailstack service postfix(server)[Postfix SMTP] in mailstack service dovecot(server)[Dovecot IMAP] in mailstack service rspamd(server)[Rspamd] in mailstack service redis(database)[Redis] in mailstack service clamav(server)[ClamAV] in mailstack db:T --> B:pa db:L --> R:postfix db:R --> L:dovecot postfix:T -- B:rspamd postfix:B --> T:dovecot rspamd:L --> R:redis rspamd:T --> B:clamav
Postfix: Uses hostNetwork: true so that connections arrive at Postfix with the real client IP — critical for fail2ban to see the actual source address. hostPort alone routes connections through the pod bridge (kindnet), which replaces the source IP with the bridge address (10.42.0.1). Config files (main.cf, master.cf, MySQL maps) are mounted from a ConfigMap. Mail logs are written to a shared PVC (mail-logs) that fail2ban reads from the host.
# configuration/mailstack/postfix-deployment.yaml (excerpt)
strategy:
type: Recreate
template:
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: postfix
image: docker.io/boky/postfix:latest
command: ["/usr/sbin/postfix", "-c", "/etc/postfix", "start-fg"]
securityContext:
privileged: true
volumeMounts:
- {name: config, mountPath: /etc/postfix/main.cf, subPath: main.cf}
- {name: tls, mountPath: /etc/ssl/mail, readOnly: true}
- {name: mail-logs, mountPath: /var/log/mail}
- name: log-tailer # sidecar: tails postfix.log to stdout for kubectl logs
image: docker.io/library/alpine:3
command: ["/bin/sh", "-c", "tail -F /var/log/mail/postfix.log"]
volumeMounts:
- {name: mail-logs, mountPath: /var/log/mail}
- name: logrotate # sidecar: daily rotation of postfix.log
image: docker.io/library/alpine:3
command: ["/bin/sh", "-c"]
args: ["apk add logrotate -q && while true; do logrotate ... || { echo 'ERROR'; exit 1; }; sleep 86400; done"]
volumeMounts:
- {name: mail-logs, mountPath: /var/log/mail}
Dovecot 2.4 breaking changes (if migrating from 2.3):
- MySQL connection config can no longer use
%{env:VAR}in themysql{}block — mount the credentials as a Kubernetes Secret file instead encrypt = dovecot:SHA512-CRYPTrenamed toencrypt = systemssl_protocolsremoved,ssl_min_protocolis the new knob
# configuration/mailstack/dovecot-deployment.yaml (excerpt)
strategy:
type: Recreate
template:
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: dovecot
image: docker.io/dovecot/dovecot:latest-root
ports:
- {containerPort: 993} # metadata only — hostNetwork binds to host stack directly
securityContext:
privileged: true
volumeMounts:
- {name: config, mountPath: /etc/dovecot/conf.d/10-auth.conf, subPath: 10-auth.conf}
- {name: db-secret, mountPath: /etc/dovecot/conf.d/01-db-connection.conf,
subPath: 01-db-connection.conf, readOnly: true} # MySQL credentials as secret file
- {name: vmail, mountPath: /home/vmail}
- {name: tls, mountPath: /etc/dovecot/ssl, readOnly: true}
- {name: mail-logs, mountPath: /var/log/mail}
- name: log-tailer # sidecar: tails dovecot.log to stdout for kubectl logs
image: docker.io/library/alpine:3
command: ["/bin/sh", "-c", "tail -F /var/log/mail/dovecot.log"]
volumeMounts:
- {name: mail-logs, mountPath: /var/log/mail}
Rspamd:
# configuration/mailstack/rspamd-deployment.yaml (excerpt)
metadata:
annotations:
prometheus.io/scrape: "true" # Alloy picks this up for Grafana Cloud
prometheus.io/port: "11334"
prometheus.io/path: "/metrics"
containers:
- name: rspamd
image: docker.io/rspamd/rspamd:4.0.1
securityContext:
privileged: true
ports:
- {containerPort: 11332, name: milter, hostPort: 11332}
- {containerPort: 11334, name: controller}
resources:
requests:
memory: 128Mi
limits:
memory: 768Mi # raised from 512Mi — startup spike causes OOMKill on node reboot
livenessProbe:
httpGet: {path: /ping, port: 11334} # use httpGet, NOT tcpSocket
readinessProbe:
httpGet: {path: /ping, port: 11334}
volumeMounts:
- {name: config, mountPath: /etc/rspamd/local.d}
- {name: dkim-keys, mountPath: /etc/rspamd/dkim, readOnly: true}
Key Rspamd config (local.d/dkim_signing.conf): set try_fallback = false to prevent signing with the wrong key when a domain’s DKIM key is missing. Set secure_ip to the pod bridge IP (10.42.0.1) so Grafana Alloy can scrape /metrics without authentication.
ClamAV: Run freshclam as a shell loop sidecar, not as an init container (freshclam takes minutes — init containers block pod startup):
- name: freshclam
image: docker.io/clamav/clamav:latest
command: ["/bin/sh", "-c"]
args: ["while true; do freshclam; sleep 3600; done"]
After deploying, activate fail2ban mail jails (see Step 3, Phase 2 above).
Monitoring (Grafana Cloud)
- Grafana Alloy runs as a DaemonSet with
hostNetwork: true— scrapes metrics from the node and Kubernetes API - Grafana Operator manages
GrafanaDashboardCRDs and pushes them to Grafana Cloud - kube-state-metrics for Kubernetes object metrics — needs a custom SCC for the
hostmount-anyuidservice account - Important: The Grafana Operator reconciles dashboards every ~10 minutes and overwrites any UI changes. Always fix dashboards in the YAML, not the UI.
- Disable leader election: Add
leaderElect: falseto the HelmRelease values. On a single-node cluster there is never a competing operator instance — leader election only adds unnecessary risk of crashes when the kube-apiserver has brief pauses (e.g. during etcd compaction). - Loki token scope: needs
logs:write(created under Stack → Loki → Send Logs, not under Access Policies)
ClickHouse for Rspamd Analytics (optional)
ClickHouse receives per-message metadata from Rspamd and enables detailed spam/ham analysis in Grafana. Rspamd writes to ClickHouse via the HTTP interface; the Grafana datasource uses a suspended CRD approach with a one-time API call to inject credentials.
Step 8 — Operations & Automation
All maintenance scripts live in /data/scripts/ (Btrfs-backed, included in btrbk backups) with symlinks in /usr/local/bin/. They run as systemd oneshot services with timers:
| Script | Timer | Purpose |
|---|---|---|
sendtelegram.sh | — (library) | Send Telegram messages — called by other scripts |
boot-backup.sh | daily + post-dnf5-automatic | Backup /boot, /boot/efi, partition table |
check-backup-btrfs-subvolumes.sh | daily | Alert if any Btrfs subvolume lacks btrbk or snapper config |
unban-fail2ban-clients.sh | hourly | Unban VPN client IPs from all fail2ban jails |
pre-flux-snapshot.sh | SSH trigger (GitHub Action) | Btrfs snapshot before Flux reconcile |
check-flux-update.sh | weekly | Telegram alert when a new Flux version is available |
sync-wildcard-tls.sh | Mon 00:00 | Copy wildcard TLS secret to all app namespaces |
system-check.sh | daily 08:00 | Telegram status: WG peers, pods, mail (24h), fail2ban, disk |
podman-image-cleanup.sh | Mon 03:00 | Remove dangling + unused Podman images, Telegram report |
snap-all | — (manual, before risky changes) | Snapshot all 8 snapper configs at once (root home data var_lib_pvc var_lib_microshift var var_log var_lib_containers) |
system-check.sh
#!/bin/bash
export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
source /etc/telegramrc
send() {
curl -s -X POST "https://api.telegram.org/bot${TOKEN}/sendMessage" \
-d chat_id="${CHATID}" -d parse_mode="HTML" \
--data-urlencode text="$1" > /dev/null
}
# WireGuard peer status
WG_PEERS=$(wg show wg0 | grep -c "latest handshake")
WG_RECENT=$(wg show wg0 | awk '/latest handshake/ {
if ($0 ~ /second/ || $0 ~ /minute/) count++
else if ($0 ~ /hour/) { match($0, /([0-9]+) hour/, a); if (a[1]+0 < 2) count++ }
} END{print count+0}')
# Kubernetes pod health
NOT_RUNNING=$(kubectl get pods -A --no-headers 2>/dev/null \
| grep -v -E "Running|Completed" | wc -l)
NOT_RUNNING_LIST=$(kubectl get pods -A --no-headers 2>/dev/null \
| grep -v -E "Running|Completed" \
| awk '{print $1"/"$2" ("$4")"}' | head -10)
TOTAL_PODS=$(kubectl get pods -A --no-headers 2>/dev/null | grep -c "Running")
# Mail stats (last 24h from postfix logs)
MAIL_IN=$(kubectl logs -n mailstack deploy/mail-postfix --since=24h 2>/dev/null \
| grep -c "postfix/lmtp.*status=sent" || echo 0)
MAIL_OUT=$(kubectl logs -n mailstack deploy/mail-postfix --since=24h 2>/dev/null \
| grep -c "postfix/smtp.*status=sent" || echo 0)
MAIL_REJECT=$(kubectl logs -n mailstack deploy/mail-postfix --since=24h 2>/dev/null \
| grep -c "NOQUEUE: reject" || echo 0)
QUEUE_RAW=$(kubectl exec -n mailstack deploy/mail-postfix -- postqueue -p 2>/dev/null \
| tail -1)
QUEUE=$(echo "$QUEUE_RAW" | grep -q "empty" && echo "leer" || echo "$QUEUE_RAW")
# fail2ban
BANNED_COUNT=0
for jail in $(fail2ban-client status 2>/dev/null \
| grep "Jail list:" | sed 's/.*Jail list:\s*//' | tr ', ' '\n' | grep -v '^$'); do
n=$(fail2ban-client status "$jail" 2>/dev/null \
| grep "Currently banned:" | awk '{print $NF}')
BANNED_COUNT=$((BANNED_COUNT + ${n:-0}))
done
DISK_FREE=$(df -h / | awk 'NR==2 {print $4 " von " $2 " frei (" $5 " belegt)"}')
# Build status icons
[ "$WG_RECENT" -ge 1 ] \
&& WG_STATUS="Aktiv: ${WG_RECENT}/${WG_PEERS} Peers (Handshake unter 2h)" \
&& WG_ICON="OK" \
|| { WG_STATUS="Keine aktiven Peers!"; WG_ICON="WARN"; }
[ "$NOT_RUNNING" -eq 0 ] \
&& K8S_STATUS="${TOTAL_PODS} Pods Running" && K8S_ICON="OK" \
|| { K8S_STATUS="${NOT_RUNNING} Pod(s) nicht Running:\n${NOT_RUNNING_LIST}"; K8S_ICON="WARN"; }
[ "$BANNED_COUNT" -gt 0 ] && F2B_ICON="WARN" || F2B_ICON="OK"
send "System-Check $(date '+%d.%m.%Y %H:%M')
WireGuard [${WG_ICON}]
${WG_STATUS}
Kubernetes [${K8S_ICON}]
${K8S_STATUS}
Mail letzte 24h:
Eingehend zugestellt: ${MAIL_IN}
Ausgehend gesendet: ${MAIL_OUT}
Rejects: ${MAIL_REJECT}
Queue: ${QUEUE}
Fail2ban [${F2B_ICON}]
Gebannte Clients: ${BANNED_COUNT}
Speicher:
${DISK_FREE}"
Timer — once daily at 08:00:
# /etc/systemd/system/system-check.timer
[Unit]
Description=System Check täglich 08:00
[Timer]
OnCalendar=*-*-* 08:00:00
Persistent=false
[Install]
WantedBy=timers.target
sync-wildcard-tls.sh
#!/bin/bash
export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
NAMESPACES=(pihole vaultwarden nextcloud paperless collabora mailstack homepage)
for NS in "${NAMESPACES[@]}"; do
export NS
kubectl get secret wildcard-tls -n cert-manager -o json | python3 -c "
import json, sys, os
s = json.load(sys.stdin)
ns = os.environ['NS']
s['metadata']['namespace'] = ns
s['metadata']['name'] = ns + '-tls'
for k in ['resourceVersion','uid','creationTimestamp','managedFields',
'annotations','ownerReferences','labels']:
s['metadata'].pop(k, None)
print(json.dumps(s))
" | kubectl apply -f -
echo "Synced wildcard-tls -> ${NS}/${NS}-tls"
done
pre-flux-snapshot.sh
#!/bin/bash
export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
DESCRIPTION="vor-flux-update-$(date +%Y%m%d-%H%M%S)"
flux suspend ks --all -n flux-system 2>/dev/null
snap-all "$DESCRIPTION"
flux resume ks --all -n flux-system 2>/dev/null
echo "Snapshots erstellt: $DESCRIPTION"
check-backup-btrfs-subvolumes.sh
Verifies all Btrfs subvolumes are tracked in both btrbk.conf and snapper. Reports gaps via Telegram (triggered by systemd OnFailure):
#!/bin/bash
set -euo pipefail
BTRBK_CONF=/etc/btrbk/btrbk.conf
BTRFS_TOP=/mnt/btrfs-top
EXCLUDED=(btrbk_snapshots var_cache var_tmp var_lib_kubelet)
EXCLUDED_PATTERNS=("root-snap-pre-*")
MOUNTED_HERE=0
if ! mountpoint -q "$BTRFS_TOP" 2>/dev/null; then
mount "$BTRFS_TOP"; MOUNTED_HERE=1
fi
trap '[[ $MOUNTED_HERE -eq 1 ]] && umount "$BTRFS_TOP" 2>/dev/null || true' EXIT
mapfile -t CURRENT_SUBVOLS < <(
btrfs subvolume list "$BTRFS_TOP" \
| awk '$7 == 5 && $NF !~ /\// { print $NF }' | sort)
mapfile -t CONFIGURED < <(
grep -E '^\s+subvolume\s+' "$BTRBK_CONF" | awk '{print $2}' | sort)
is_excluded() {
local sv="$1"
for ex in "${EXCLUDED[@]}"; do [[ "$sv" == "$ex" ]] && return 0; done
for pat in "${EXCLUDED_PATTERNS[@]}"; do [[ "$sv" == $pat ]] && return 0; done
return 1
}
MISSING_BTRBK=()
MISSING_SNAPPER=()
for sv in "${CURRENT_SUBVOLS[@]}"; do
is_excluded "$sv" && continue
configured=0
for conf in "${CONFIGURED[@]}"; do [[ "$sv" == "$conf" ]] && configured=1 && break; done
[[ $configured -eq 0 ]] && MISSING_BTRBK+=("$sv")
[[ -f "/etc/snapper/configs/$sv" ]] || MISSING_SNAPPER+=("$sv")
done
OVERALL_EXIT=0
[[ ${#MISSING_BTRBK[@]} -gt 0 ]] && { echo "WARN btrbk: ${MISSING_BTRBK[*]}"; OVERALL_EXIT=1; }
[[ ${#MISSING_SNAPPER[@]} -gt 0 ]] && { echo "WARN snapper: ${MISSING_SNAPPER[*]}"; OVERALL_EXIT=1; }
[[ $OVERALL_EXIT -eq 0 ]] && echo "OK: Alle Subvolumes in btrbk und snapper konfiguriert."
exit $OVERALL_EXIT
Output: daily system-check message
System-Check 19.04.2026 08:00
WireGuard [OK]
Aktiv: 10/12 Peers (Handshake unter 2h)
Kubernetes [OK]
45 Pods Running
Mail letzte 24h:
Eingehend zugestellt: 31
Ausgehend gesendet: 22
Rejects: 3
Queue: leer
Fail2ban [OK]
Gebannte Clients: 2
Speicher:
290G von 610G frei (52% belegt)
Step 9 — Backup & Disaster Recovery
Three independent layers protect the system. Each layer has a distinct job and they compose cleanly:
| Layer | Tool | Scope | Where | Retention | Use case |
|---|---|---|---|---|---|
| Local timeline snapshots | snapper | All 8 subvolumes | On-disk .snapshots/ | 24h hourly · 8d daily · 5w weekly | Instant rollback without unmounting |
| Local btrbk snapshots | btrbk | All 8 subvolumes | btrbk_snapshots/ | Latest only (parent reference) | Basis for incremental send to Pi |
| Remote backup | btrbk → Pi | All 8 subvolumes | Raspberry Pi via WireGuard | 24h hourly · 8d daily · 5w weekly | Full recovery after VM loss |
snapper handles local rollback; btrbk handles the remote copy. Local btrbk snapshots are kept minimal — just one per subvolume as the parent reference so the next send to the Pi remains incremental instead of a full transfer.
What is Backed Up
| Subvolume | Mountpoint | snapper | btrbk → Pi | Contains |
|---|---|---|---|---|
root | / | ✅ | ✅ | System, packages, /etc |
home | /home | ✅ | ✅ | User home directories |
data | /data | ✅ | ✅ | GitOps repo, scripts |
var_lib_pvc | /var/lib/pvc | ✅ | ✅ | Kubernetes PVC data (app data) |
var_lib_microshift | /var/lib/microshift | ✅ | ✅ | Cluster state, kubeconfig |
var | /var | ✅ | ✅ | System state, boot backup |
var_log | /var/log | ✅ | ✅ | Logs |
var_lib_containers | /var/lib/containers | ✅ | ✅ | Container images |
Not backed up intentionally: var_cache, var_tmp, var_lib_kubelet — ephemeral or rebuildable.
Pre-Update Snapshots
Before every GitOps update, a GitHub Action SSHs into the server and runs:
flux suspend ks --all -n flux-system
snap-all "vor-flux-update-$(date +%Y%m%d-%H%M%S)"
flux resume ks --all -n flux-system
snap-all creates one numbered snapshot across all 8 snapper configs simultaneously. If a Flux reconcile breaks something, you can roll back any or all subvolumes to the pre-update state within seconds — no mount required, just snapper -c <cfg> rollback <nr>.
Monitoring
check-backup-btrfs-subvolumes.sh runs daily via systemd timer. It verifies that every non-excluded Btrfs subvolume has both a btrbk entry and a snapper config. On failure, OnFailure=check-btrbk-notify.service sends a Telegram alert.
check-backup-btrfs-subvolumes.sh
# → OK: Alle relevanten Subvolumes sind in btrbk.conf konfiguriert.
# → OK: Alle relevanten Subvolumes haben eine snapper-Config.
Scenario A — Rollback (VM still running)
A1. /etc rollback via etckeeper
etckeeper commits every change to /etc as a git commit. Single-file recovery is a one-liner:
# See recent commits
etckeeper vcs log --oneline -10
# Restore a single file
etckeeper vcs checkout HEAD~1 -- /etc/microshift/config.yaml
A2. Root rollback via snapper
The fastest path for a broken system update is the GRUB menu. grub-btrfs adds every snapper snapshot automatically:
1. reboot
2. In the GRUB menu: "Fedora Linux snapshots"
3. Select the snapshot (e.g. "vor-flux-update-20260515-093000")
4. System boots read-only into the snapshot
5. If it works — make the rollback permanent:
snapper rollback # sets snapshot as new default subvolume
reboot # boots into the restored, writable system
Without rebooting into the snapshot first:
snapper -c root list
snapper -c root rollback 116 # rolls back root subvolume
reboot
A3. App data rollback from snapper
snapper snapshots are directly accessible as read-only directories — no mounting required:
# For any subvolume, the snapshots are here:
# /var/lib/pvc/.snapshots/<nr>/snapshot/
# /home/.snapshots/<nr>/snapshot/
# etc.
# Example: restore a PVC directory from snapshot 42
snapper -c var_lib_pvc list
kubectl scale deploy vaultwarden -n vaultwarden --replicas=0
cp -a /var/lib/pvc/.snapshots/42/snapshot/<pvc-dir>/. /var/lib/pvc/<pvc-dir>/
kubectl scale deploy vaultwarden -n vaultwarden --replicas=1
For apps with multiple PVCs (Nextcloud, Paperless), always restore all related PVCs in the same operation to avoid split-brain.
A4. Individual subvolume from btrbk snapshot
If the snapper snapshot is too recent or you need a specific point in the btrbk window:
mount /mnt/btrfs-top
btrbk list snapshots # find the right snapshot name
systemctl stop microshift
# Save current state
btrfs subvolume snapshot /mnt/btrfs-top/var_lib_pvc \
/mnt/btrfs-top/btrbk_snapshots/var_lib_pvc.recovery-backup
# Replace with snapshot
umount /var/lib/pvc
btrfs subvolume delete /mnt/btrfs-top/var_lib_pvc
btrfs subvolume snapshot \
/mnt/btrfs-top/btrbk_snapshots/var_lib_pvc.20260515T0200 \
/mnt/btrfs-top/var_lib_pvc
mount /var/lib/pvc
systemctl start microshift
umount /mnt/btrfs-top
Scenario B — Full VM Loss (restore from Pi)
B1. Provision new VM, boot Fedora LiveISO
dnf install -y btrfs-progs rsync gdisk
B2. Restore partition table
# Fetch partition table from Pi backup
scp mko@10.0.0.12:/backup/btrfs/server/var/<snapshot>/lib/boot-backup/partition-table.sfdisk /tmp/
# Restore (same disk size)
sfdisk /dev/vda < /tmp/partition-table.sfdisk
B3. Create filesystems with original UUIDs
mkfs.vfat -F32 /dev/vda1
mkfs.xfs -f /dev/vda2
xfs_admin -U bafddb4f-cda4-4aae-9d4b-b75da20e3680 /dev/vda2
mkfs.btrfs -f \
-L "fedora_v220241250910305995" \
-U 9017b7d7-c4e6-45a3-9c04-48f25cd640fd \
/dev/vda3
The UUIDs must match because they are hardcoded in /etc/fstab and the GRUB config.
B4. Mount top-level, receive subvolumes from Pi
mount -o subvolid=5 /dev/vda3 /mnt/restore
for sv in root var home var_log var_lib_microshift var_lib_pvc var_lib_containers data; do
SNAP=$(ssh mko@10.0.0.12 "ls /backup/btrfs/server/${sv}/ | sort | tail -1")
ssh mko@10.0.0.12 "btrfs send /backup/btrfs/server/${sv}/${SNAP}" \
| btrfs receive /mnt/restore/
# Convert read-only snapshot to writable subvolume
btrfs subvolume snapshot /mnt/restore/${SNAP} /mnt/restore/${sv}_new
btrfs subvolume delete /mnt/restore/${SNAP}
mv /mnt/restore/${sv}_new /mnt/restore/${sv}
done
B5. Restore /boot and reinstall GRUB
mount /dev/vda2 /mnt/boot
mount /dev/vda1 /mnt/efi
rsync -aAX /mnt/restore/var/lib/boot-backup/boot/ /mnt/boot/
rsync -aAX /mnt/restore/var/lib/boot-backup/efi/ /mnt/efi/
mount -o subvol=root /dev/vda3 /mnt/sysroot
# ... mount remaining subvolumes ...
for d in dev dev/pts proc sys run; do mount --bind /$d /mnt/sysroot/$d; done
chroot /mnt/sysroot grub2-install /dev/vda
chroot /mnt/sysroot grub2-mkconfig -o /boot/grub2/grub.cfg
touch /mnt/sysroot/.autorelabel # trigger SELinux relabel on first boot
umount -R /mnt/sysroot
reboot
After reboot, SELinux relabels all files (a few minutes), then a second reboot brings the system fully up.
Recovery point
The Pi holds 24 hourly + 8 daily + 5 weekly snapshots per subvolume. You can recover to any point within that five-week window by selecting an older snapshot in step B4.
Key Lessons Learned
Btrfs quotas kill etcd. Don’t enable them. The combination of hundreds of snapshots + hourly btrbk cleanup + qgroup rescans will crash MicroShift reliably every hour. See Step 2 for full explanation.
MicroShift SCCs are opt-in. Every service account that runs a privileged workload needs an explicit SCC binding. Don’t assume privileged is inherited. Check with kubectl auth can-i use scc/privileged --as=system:serviceaccount:<ns>:<sa>.
nginx in OpenShift must run as non-root. The standard nginx:alpine image listens on port 80 and requires root. Use nginxinc/nginx-unprivileged:alpine on port 8080 instead. MicroShift’s HAProxy router handles TLS termination and proxies to port 8080 with the original Host header.
kindnet’s POD_SUBNET must match MicroShift’s pod CIDR. The default kindnet value is 10.244.0.0/16. MicroShift uses 10.42.0.0/16. Mismatch causes subtle masquerade failures that only appear with IP-whitelisted applications. MicroShift 4.21+ ships internal kindnet manifests that deploy with the wrong default regardless of cniPlugin: none — fix via kustomizePaths in /etc/microshift/config.d/ to exclude /usr/lib/microshift/manifests.d/000-microshift-kindnet/.
Use hostNetwork: true for mail ports, not hostPort. hostPort alone routes new connections through the pod network bridge (kindnet). The source IP arriving at Postfix or Dovecot becomes the bridge address (10.42.0.1), not the real client IP — fail2ban is blind to the attacker. hostNetwork: true connects the pod directly to the host network stack and preserves source IPs end-to-end. Set dnsPolicy: ClusterFirstWithHostNet so cluster-internal DNS (maildb, rspamd) still resolves correctly.
firewalld inter-zone forwarding requires policies. You cannot use --add-forward-port or zone rules to forward traffic between zones. Create explicit Policy objects with ingress/egress zones.
Dovecot 2.4 is a breaking upgrade from 2.3. Dozens of config options were renamed, removed, or changed defaults. Budget significant time for migration testing. Key gotcha: MySQL passwords in mysql{} config blocks cannot use %{env:VAR} syntax — pass them as mounted secret files.
GitOps changes don’t restart pods. Flux applies ConfigMap changes but doesn’t trigger pod restarts. If your application doesn’t watch for config file changes, restart manually with kubectl rollout restart deployment/<name> -n <ns>.
Repository Structure
configuration/
├── kustomization.yaml ← Flux entry point
├── flux-system/ ← Flux own manifests
├── acme-dns/ ← acme-dns service
├── cert-manager/ ← cert-manager + ClusterIssuer + Certificate
├── pihole/ ← DNS server
├── vaultwarden/ ← Password manager
├── paperless/ ← Document management
├── nextcloud/ ← File sync + MariaDB
├── collabora/ ← Online Office
├── mailstack/ ← Postfix + Dovecot + Rspamd + ClamAV + Redis + MariaDB + PostfixAdmin
├── monitoring/ ← Alloy + kube-state-metrics
├── monitoring-grafana/ ← Grafana Operator + dashboards
└── homepage/ ← Static website