High Availability K3s Cluster Setup Guide
Overview
Info — Guide Information
- Difficulty: Advanced
- Time Required: ~1 hour
- Last Updated: March 2024
- K3s Version: v1.33.0+k3s1
- Longhorn Version: v1.6.0
Architecture Overview
etcd + K3s Server] end subgraph "Worker Nodes" B[Worker Node 1
K3s Agent] C[Worker Node 2
K3s Agent] D[Worker Node 3
K3s Agent] end subgraph "Storage Layer" E[Longhorn
Distributed Block Storage] end A --> B A --> C A --> D B --> E C --> E D --> E style A fill:#f9f,stroke:#333 style E fill:#bbf,stroke:#333
Prerequisites
System Requirements
Master Node
- 2 CPU cores
- 4GB RAM
- 40GB disk space
- Ubuntu 22.04 LTS
- Static IP address
Worker Nodes
- 2 CPU cores
- 8GB RAM
- 100GB disk space
- Ubuntu 22.04 LTS
- Static IP addresses
Network Requirements
- Unrestricted connectivity between nodes
- Internet access for package installation
- Firewall ports open:
- TCP/6443 (K3s API)
- TCP/2379-2380 (etcd)
- UDP/8472 (VXLAN)
- TCP/10250 (kubelet)
Server Preparation
Warning — Important
Execute these steps on ALL nodes unless specified otherwise.
System Updates
# Update package list and upgrade system
sudo apt update && sudo apt upgrade -y
# Install required packages
sudo apt install -y \
curl \
gnupg \
nfs-common \
open-iscsi \
jq \
logrotate
Configure Log Rotation
Create a logrotate configuration for K3s:
File: /etc/logrotate.d/k3s
/var/log/k3s/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 0640 root root
postrotate
systemctl restart k3s-server 2>/dev/null || true
systemctl restart k3s-agent 2>/dev/null || true
endscript
}
System Configuration
# Enable and start open-iscsi for Longhorn
sudo systemctl enable --now iscsid
# Configure sysctl settings
cat << EOF | sudo tee /etc/sysctl.d/k3s.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
vm.max_map_count = 262144
EOF
sudo sysctl --system
K3s Installation
Choose your datastore (important)
K3s supports two common backends:
- Default (single server): SQLite — simple, great for labs; no etcd health to check.
- High Availability: Embedded etcd — requires multiple server nodes; etcd health applies.
The commands below show both paths. Pick one and follow it consistently.
Master Node Setup
Danger — Master Node Only
Run these commands ONLY on the master node.
# Download K3s installation script
curl -sfL https://get.k3s.io > k3s-install.sh
# Install K3s server with HA etcd and VXLAN
sudo INSTALL_K3S_VERSION="v1.33.0+k3s1" bash k3s-install.sh server \
--cluster-init \
--flannel-backend=vxlan \
--disable traefik \
--disable servicelb \
--disable local-storage \
--tls-san $(hostname -f) \
--write-kubeconfig-mode 644
# Get node token for workers
sudo cat /var/lib/rancher/k3s/server/node-token
Note: To use the default SQLite datastore instead of embedded etcd, run the same install command without --cluster-init. If you choose SQLite, skip the etcd health checks below.
Worker Nodes Setup
Info — Worker Nodes Only
Replace MASTER_IP and NODE_TOKEN with your actual values.
# Download K3s installation script
curl -sfL https://get.k3s.io > k3s-install.sh
# Install K3s agent
sudo INSTALL_K3S_VERSION="v1.33.0+k3s1" K3S_URL="https://MASTER_IP:6443" \
K3S_TOKEN="NODE_TOKEN" bash k3s-install.sh agent
Verify Cluster Status
Check Nodes
kubectl get nodes -o wide
Check Pods
kubectl get pods -A
Check etcd Health (only if you chose embedded etcd)
# If using embedded etcd, verify endpoint health
kubectl -n kube-system get pods | grep etcd
# Then exec into the etcd pod shown above, e.g.:
kubectl -n kube-system exec -it etcd-<server-hostname> -- etcdctl endpoint health
Longhorn Installation
Prerequisites Check
Run this on all nodes to verify Longhorn requirements:
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/scripts/environment_check.sh | bash
Install Longhorn
Note — Installation Steps
Execute these commands on the master node.
# Add Longhorn Helm repository
helm repo add longhorn https://charts.longhorn.io
helm repo update
# Install Longhorn
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
--create-namespace \
--version 1.6.0 \
--set defaultSettings.defaultDataPath="/var/lib/longhorn" \
--set defaultSettings.guaranteedEngineManagerCPU=5 \
--set defaultSettings.guaranteedReplicaManagerCPU=5
Verify Longhorn Installation
Check Pods
kubectl -n longhorn-system get pods
Check StorageClass
kubectl get sc
Access Dashboard
# Port forward Longhorn UI
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8000:80
Access via: http://localhost:8000
Troubleshooting
Common Issues
Bug — Known Problems and Solutions
Node Not Ready
- Check K3s service status:
sudo systemctl status k3s - View K3s logs:
sudo journalctl -u k3s
etcd Issues
- Check etcd cluster health:
sudo k3s etcd-snapshot ls - Verify etcd member list:
kubectl -n kube-system exec -it etcd-master -- etcdctl member list
Longhorn Volume Issues
- Check volume status:
kubectl -n longhorn-system get volumes - View instance manager logs:
kubectl -n longhorn-system logs -l app=longhorn-manager
Maintenance
Backup Procedures
etcd Backup
# Create etcd snapshot
sudo k3s etcd-snapshot save --name etcd-backup-$(date +%Y%m%d)
Longhorn Backup
# Create backup settings
kubectl -n longhorn-system apply -f - <<EOF
apiVersion: longhorn.io/v1beta1
kind: BackupTarget
metadata:
name: default
spec:
backupTargetURL: s3://your-bucket-name@region/path
credentialSecret: aws-secret
EOF
Monitoring Setup
Tip — Monitoring Stack
Consider installing:
- Prometheus for metrics collection
- Grafana for visualization
- Alertmanager for notifications
Security Recommendations
Note — NetworkPolicies with flannel (vxlan)
The default K3s CNI here is flannel with vxlan. Flannel does not enforce Kubernetes NetworkPolicy resources. If you need NetworkPolicy, use a policy‑capable CNI such as Cilium or Calico (or Canal: flannel for routing + Calico for policy). If you remain on flannel, skip NetworkPolicy.
If using a policy‑capable CNI, a common baseline is a default‑deny policy:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Optional: to switch from flannel to a policy‑capable CNI, install K3s with flannel disabled and then deploy your CNI of choice:
# Install K3s server without flannel (choose your other flags as needed)
sudo INSTALL_K3S_VERSION="v1.33.0+k3s1" bash k3s-install.sh server \
--flannel-backend=none \
--disable traefik --disable servicelb --disable local-storage
# Example: install Cilium (version pinned as appropriate)
helm repo add cilium https://helm.cilium.io
helm repo update
helm upgrade --install cilium cilium/cilium \
--namespace kube-system \
--set kubeProxyReplacement=disabled
- Pod Security Standards
apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sPSPPrivilegedContainer metadata: name: no-privileged-containers spec: enforcementAction: deny
Quote — Need Help?
If you encounter any issues:
- Check the K3s documentation: https://docs.k3s.io/
- Visit the Longhorn documentation: https://longhorn.io/docs/
- Join the K3s Slack channel: https://slack.rancher.io/