Help:User journeys/Canasta multi-node on a generic VPS with k3s

From Canasta Wiki

Journey info  Β·  Platform: Generic VPS / self-managed k3s  Β·  Time: ~2 hours wall-clock

This journey stands up a multi-node Canasta wiki on generic VPS hosting β€” Hetzner Cloud, DigitalOcean, Linode, Vultr, Hostinger, or any provider that rents you plain Linux virtual machines. It is the companion to Help:User journeys/Canasta multi-node on AWS EC2 with k3s: the cluster build, shared storage, and wiki creation are identical on any provider, so this page does not repeat them in full β€” it focuses on what actually differs once you leave AWS, namely networking (there are no security groups and inter-instance traffic may be blocked by default). For the conceptual material on multi-node Kubernetes, see Help:Multi-node Kubernetes. This is one concrete path, not the only way.

Prerequisites

  • Two VPS instances from your provider, each with at least 4 GiB RAM, a 20 GB disk, and Debian 13 or Ubuntu 24.04 (amd64). For Elasticsearch or heavy traffic, use 8 GiB+.
  • An operator workstation with ssh and the Canasta CLI installed. The k3s install steps need canasta-native mode (the installer needs systemd on the host and can't run from inside the canasta-docker container) β€” see Help:Installation.
  • A domain name under your DNS control. Let's Encrypt issues real certificates once DNS resolves; port 80 must be reachable from the internet for the HTTP-01 challenge.
  • Costs: two VPS instances bill while running; check your provider's pricing. Production sizing scales with worker count.
  • Prior knowledge: basic Linux, SSH, and your provider's console. K3s is installed for you by Canasta.

Read Help:Multi-node Kubernetes first for the storage requirements, HA caveats, and scaling model.

Phase 1: Provision the VPS instances and open the network

Create two Linux VPS instances in your provider's console. Record the public IP of each:

  • NODE1_IP β€” control plane
  • NODE2_IP β€” worker

Everything from Phase 2 on is provider-agnostic. The provider-specific work is networking, and on a generic VPS there are two separate layers to get right β€” the host firewall, and whether the provider lets your instances talk to each other at all.

Host firewall

There are no AWS-style security groups; each host has its own firewall (often ufw or firewalld, and on many providers nothing is enabled by default β€” everything is open). Open these ports:

From the internet (both nodes):

  • 22/TCP β€” SSH (ideally restricted to your workstation's IP)
  • 80/TCP β€” Let's Encrypt HTTP-01 challenge
  • 443/TCP β€” HTTPS

Between the two nodes (control plane ↔ worker) β€” this is the part that has no security-group shortcut:

  • 6443/TCP β€” k3s API server
  • 8472/UDP β€” flannel VXLAN (cross-node pod networking; TCP-only will silently break cross-node DNS and ClusterIP services)
  • 10250/TCP β€” kubelet (metrics, kubectl logs/exec)

If you enable a host firewall, you must add the inter-node rules or the cluster's pod network breaks. With ufw, on each node (substitute the other node's IP for <PEER_IP>):

sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow from <PEER_IP> to any port 6443 proto tcp
sudo ufw allow from <PEER_IP> to any port 8472 proto udp
sudo ufw allow from <PEER_IP> to any port 10250 proto tcp
sudo ufw enable

Provider VM-to-VM isolation (the gotcha that bites first)

On AWS, two instances in the same security group can always reach each other. Many budget VPS providers do the opposite: they isolate instances from each other by default at the network layer (private-VLAN or port isolation) as an anti-abuse measure. With this in place, packets between your two VPS instances never arrive even with the host firewall wide open β€” because the block is below the host, the ports look open but nothing gets through. This breaks k3s: the worker can't reach the control-plane API, and cross-node pod networking and DNS fail.

Confirm it before building the cluster. From the control plane, watch for traffic from the worker while the worker pings the control plane:

# On node1 (control plane):
sudo tcpdump -ni any host <NODE2_IP>

# On node2 (worker), in another shell:
ping <NODE1_IP>

If tcpdump on node1 shows packets arriving from NODE2_IP, the path is open β€” continue. If it shows nothing (only its own gateway ARP), inter-instance traffic is blocked by the provider.

Resolve it, in order of preference:

  1. Use the provider's private network feature if it has one (Hetzner Cloud private networks, DigitalOcean VPC, etc.): attach both instances and use the private IPs for cluster traffic. This is the cleanest option and is not metered as public egress.
  2. If there is no private networking, ask the provider to allow VM-to-VM traffic for your account. Some budget hosts block it by default but enable it per-account on request once you explain you're running a multi-node cluster.
  3. Otherwise, use a provider where same-account instances can talk by default, or fall back to AWS same-VPC per the AWS journey.

Do not start the cluster build until the tcpdump/ping test succeeds in both directions β€” every later step depends on the nodes reaching each other.

Public IP

A VPS instance usually has a single address (its public IP), with no AWS-style public/private split. Pass that public IP to --public-ip when you install the control plane (next phase) so the k3s API server's TLS certificate covers the address your controller uses to reach it. If you put the instances on a provider private network for cluster traffic, you still pass the public IP here, because that is the address your controller workstation connects to.

Phase 2: Build the cluster

From here on, the steps are identical to the AWS journey β€” only the networking above differs. Each command is summarized below; see the AWS journey, Phase 2 for the full explanation of each.

# 1. Register both hosts with Canasta (run on the controller)
canasta host add --name node1 --ssh admin@<NODE1_IP>
canasta host add --name node2 --ssh admin@<NODE2_IP>

# 2. Install k3s on the control plane (canasta-native mode)
canasta install k8s-cp --host node1 --public-ip <NODE1_IP>

# 3. Join the worker
canasta install k8s-worker --host node2 --cp-host node1

# 4. Confirm both nodes are Ready
ssh node1 'sudo k3s kubectl get nodes'

If the control-plane node is dual-stack (has both an IPv4 and an IPv6 address), Canasta selects the IPv4 automatically when building the worker's join URL β€” there is nothing to configure.

Then, exactly as in the AWS journey:

  • Configure kubectl on the controller β€” copy the kubeconfig from node1 and rewrite its server: URL to https://<NODE1_IP>:6443 (the address must be one --public-ip covered).
  • Provision shared NFS storage β€” canasta storage setup nfs --host node1 --install-server --share /srv/nfs/canasta --storage-class-name nfs.
  • Point DNS at any node's public IP (a wildcard for subdomain-routed farms).

ℹ️ Note: The real-client-IP fix that CrowdSec depends on β€” k3s Traefik as a DaemonSet with externalTrafficPolicy: Local β€” is applied automatically by canasta install k8s-cp on every provider. Nothing to configure, but it is why a banned address is blocked correctly on a VPS regardless of which node terminates TLS. See Help:CrowdSec#On Kubernetes.

Phase 3: Create the wiki and validate

Identical to the AWS journey β€” see the AWS journey, Phase 3. In short:

canasta create \
  --host node1 \
  --orchestrator k8s \
  --id mywiki \
  --wiki main \
  --domain-name wiki.example.com \
  --storage-class nfs \
  --access-mode ReadWriteMany

canasta status --id mywiki   # pods Running, PVCs Bound (RWX nfs), certificate Ready

Both --storage-class and --access-mode ReadWriteMany are needed for a multi-node, multi-replica web tier. Browse to https://wiki.example.com/wiki/Main_Page and confirm a valid Let's Encrypt certificate. Scaling the web tier, Argo CD, and GitOps all work as documented in the AWS journey.

Cleanup / teardown

Tear down in the same order as the AWS journey β€” workers first, then the instance, then the control plane:

canasta uninstall k8s --host node2 --cp-host node1   # worker (--cp-host removes its Node entry)
canasta delete --host node1 --id mywiki --yes        # the wiki instance
canasta uninstall k8s --host node1                   # control plane

Then destroy both VPS instances in your provider's console to stop billing. Host-firewall rules and any VM-to-VM unblocking the provider applied are removed with the instances β€” no separate cleanup.

Production considerations

What this journey simplified that real production should add:

  • Shared storage durability. Managed NFS (EFS, Filestore, Azure Files) is usually unavailable on budget VPS providers, so you are colocating NFS on the control plane β€” which couples storage availability to that node. Put the NFS export on a provider block-storage volume so it survives a node rebuild, or run a dedicated NFS instance. See Help:Storage.
  • Inter-node networking. Prefer a provider private network over the public interface for cluster traffic β€” lower latency, not metered as public egress, and it sidesteps the VM-to-VM isolation default entirely.
  • Ingress single point of failure. DNS points directly at a node IP, so that node is a single point of failure for ingress unless you add a provider floating IP or external load balancer in front of the nodes.
  • Database HA. The default deployment runs one MariaDB pod on node-local storage. Move to an external managed or replicated database β€” pass USE_EXTERNAL_DB=true + MYSQL_HOST in an envfile to canasta create. See Help:External database.
  • Backups off-host. Set up Restic against a bucket on a different provider and enable scheduled snapshots with canasta backup init + canasta backup schedule set. See Help:Backup and restore.
  • Operator workstation durability. The controller holds per-instance state at ~/canasta/<id>/; run it from a stable host, not a personal laptop, for long-lived deployments.

Troubleshooting

Pitfalls specific to generic-VPS multi-node:

The worker never joins, or joins but its pods can't resolve DNS or reach ClusterIP services. Almost always provider VM-to-VM isolation or a missing inter-node firewall rule β€” the two layers from Phase 1. Run the tcpdump/ping test; if no packets arrive, the provider is blocking inter-instance traffic (use a private network or ask support to unblock it). If packets do arrive, open 6443/TCP, 8472/UDP, and 10250/TCP between the nodes. A web pod's wait-for-db init container hanging on a DNS lookup is the classic symptom of blocked cross-node networking.

kubectl get nodes from the controller fails with a TLS validation error. The address in ~/.kube/config isn't covered by the k3s API certificate. Re-run canasta install k8s-cp --public-ip <addr> with the address you actually use.

Certificate stuck Ready=False. DNS hasn't propagated, or port 80 isn't reachable from the internet (host firewall or provider edge) β€” Let's Encrypt's HTTP-01 challenge needs an open port 80.

PVCs stuck Pending, NFS mount failures, pods all on one node. These are not VPS-specific β€” see the AWS journey's troubleshooting and Help:Multi-node Kubernetes#Troubleshooting.

For the broader catalog of multi-node K8s issues that aren't journey-specific, see Help:Multi-node Kubernetes#Troubleshooting.