Help:User journeys/Canasta multi-node on AWS EC2 with k3s

Journey info · Platform: AWS EC2 / self-managed k3s · Time: ~2 hours wall-clock

This journey provisions a multi-node Canasta wiki on AWS using self-managed k3s. Two EC2 instances form a control-plane + worker cluster, NFS exported from the control plane provides RWM-capable shared storage, and Canasta runs the wiki with a multi-replica web tier. It's a worked example aimed at operators who want self-managed Kubernetes on commodity infrastructure — for the conceptual material on multi-node K8s requirements and limitations, see Help:Multi-node Kubernetes; this is one path, not the only one.

Prerequisites

An AWS account with permissions to create EC2 instances, security groups, and (optionally) Route 53 records.
An operator workstation with:
- aws CLI v2
- ssh
- Canasta CLI installed (either mode works for k3s — see Help:Installation).
A domain name under your DNS control. Free Let's Encrypt staging certs are fine for a first run; switch to production certs once DNS is verified.
Costs: two EC2 instances and their EBS root volumes bill while running. Check current AWS pricing for your chosen instance type and region; production sizing scales linearly with worker count.
Prior knowledge: basic familiarity with EC2 and SSH. K3s is installed for you by Canasta.

For the architecture details (storage requirements, HA caveats, scaling model), read Help:Multi-node Kubernetes first.

Phase 1: Provision the EC2 nodes

Launch two Linux instances. The same shape works on any cloud or bare metal — adapt the security-group / firewall sections to your provider.

Instance launch

Choose an AMI and instance type:

AMI: Debian 13 or Ubuntu 24.04 (amd64).
Type: at least 4 GiB RAM (c7i-flex.large, t3.medium). For Elasticsearch or heavy traffic, use 8 GiB+.
Storage: 20 GB gp3 EBS root volume. The default 8 GB AMI is too small for the Canasta image plus k3s images plus PVC space.
Key pair: your SSH key.

Record the public IP/hostname of each:

NODE1_IP — control plane
NODE2_IP — worker

Security group

Both instances share a single security group with these inbound rules:

SSH (22/TCP) from the controller's IP.
HTTP (80/TCP) from anywhere — required for Let's Encrypt HTTP-01 challenges.
HTTPS (443/TCP) from anywhere.
Kubernetes API (6443/TCP) from the controller's IP.
All traffic (TCP + UDP) between instances in the same security group. k3s uses VXLAN over UDP port 8472 for cross-node pod networking; TCP-only will silently break cross-node DNS and ClusterIP services.

Phase 2: Build the cluster

Step 1: Register the nodes with Canasta

Canasta targets remote hosts by short name via SSH. Register each node in $CANASTA_CONFIG_DIR/hosts.yml on the controller:

canasta host add --name node1 --ssh admin@<NODE1_IP>
canasta host add --name node2 --ssh admin@<NODE2_IP>

Use whatever short names you prefer (cp and worker1 are also fine). Subsequent commands target these hosts by name with --host node1 / --host node2.

If your ~/.ssh/config already maps node1 to the right user, --ssh node1 (no user@) also works.

Confirm:

canasta host list

Step 2: Install k3s on the control plane

canasta install k8s-cp --host node1 --public-ip <NODE1_IP>

--public-ip tells the installer to add that address to the k3s API server's TLS certificate. You must pass it whenever the controller is on a different machine than the control-plane node — otherwise the cert will only cover the cluster-internal IP and you won't be able to talk to the API from the controller. Pass --public-ip multiple times to add several addresses (e.g. both an IP and a DNS name).

ℹ️ Note: canasta install k8s-cp requires canasta-native mode. The k3s installer needs systemd on the host and can't run from inside the canasta-docker container. Either install the canasta-native CLI on the controller, or install k3s on the host directly with curl -sfL https://get.k3s.io | sh - and then proceed.

Step 3: Join the worker

canasta install k8s-worker --host node2 --cp-host node1

Canasta SSHs from the controller to node1, fetches the join token and the cluster-internal API IP, and runs the agent install on node2. No tokens are stored on the controller.

Verify:

ssh node1 'sudo k3s kubectl get nodes'

Both nodes should show Ready within ~30 seconds.

Step 4: Configure kubectl on the controller

The control plane's kubeconfig lives at /etc/rancher/k3s/k3s.yaml on node1 with its server: URL pointing at https://127.0.0.1:6443. Copy it to the controller and rewrite the server URL to a name reachable from the controller's network:

mkdir -p ~/.kube
ssh node1 'sudo k3s kubectl config view --raw' > ~/.kube/config
# macOS:
sed -i '' "s|server: https://127.0.0.1:6443|server: https://<NODE1_IP>:6443|" ~/.kube/config
# Linux / WSL:
sed -i "s|server: https://127.0.0.1:6443|server: https://<NODE1_IP>:6443|" ~/.kube/config
chmod 600 ~/.kube/config
kubectl get nodes

The address you substitute must be one that --public-ip covered above (otherwise the cert won't validate).

Verify with canasta doctor to flag any remaining missing dependencies (helm, kubectl) before you hit them later:

canasta doctor

Step 5: Provision shared NFS storage

The simplest NFS target is the control-plane node itself. canasta storage setup nfs installs the server package, creates the share directory, exports it, and installs the NFS CSI driver + StorageClass on the cluster in one step:

canasta storage setup nfs \
  --host node1 \
  --install-server \
  --share /srv/nfs/canasta \
  --storage-class-name nfs

If the NFS server is a separate host, replace --install-server with --server <IP>. --host node1 is still required so canasta runs the helm/kubectl steps from a host that has a working kubeconfig pointing at this cluster.

Verify:

canasta storage list --host node1   # expect 'nfs' listed

In production you'd usually use a dedicated NFS host or managed NFS (EFS, Filestore, Azure Files) — colocating NFS with the control plane couples storage availability to that node.

Step 6: Point DNS at the cluster

K3s's bundled Traefik ingress listens on 80/443 on every node. Point your wiki's DNS at any node's IP:

wiki.example.com  A  <NODE1_IP>

Verify: dig +short wiki.example.com.

For subdomain-routed wiki farms, add a wildcard: *.wiki.example.com → <NODE1_IP>.

Phase 3: Create the wiki and validate

Step 1: Create the instance

canasta create \
  --host node1 \
  --orchestrator k8s \
  --id mywiki \
  --wiki main \
  --domain-name wiki.example.com \
  --storage-class nfs \
  --access-mode ReadWriteMany

Both --storage-class and --access-mode are needed for multi-node multi-replica web. ReadWriteMany declares the four content PVCs as RWM so they can mount from multiple nodes at once; without it they default to RWO and the chart's declared contract will be wrong, even if the NFS CSI driver's leniency makes it look like it works.

When a real domain name is used, cert-manager and Let's Encrypt are configured automatically — no manual TLS setup.

The new instance directory lands at ~/canasta/mywiki/ on node1 by default. Override with --path /absolute/path/on/remote.

Verify:

canasta status --id mywiki   # pods Running, PVCs Bound (RWX nfs), certificate Ready

Step 2: Browse to the wiki

Open https://wiki.example.com/wiki/Main_Page in a browser. You should see the MediaWiki main page with a valid Let's Encrypt certificate (green padlock, no warnings).

If the certificate isn't ready yet, give cert-manager 30–60 seconds to complete the ACME challenge. Check status with kubectl describe certificate -n canasta-mywiki.

Step 3 (optional): Open the Argo CD dashboard

canasta install k8s-cp installs Argo CD on the control plane. To open the UI from the controller:

# Print the auto-generated admin password:
canasta argocd password --host node1

# Tunnel argocd-server to your laptop and print the localhost URL.
# Blocks until Ctrl-C — keep the session running while using the UI.
canasta argocd ui --host node1

Browse https://localhost:8443 (the URL canasta argocd ui prints), accept the self-signed certificate, log in as admin. Applications managed via canasta gitops show up here as Argo CD Applications. Until you run canasta gitops init the dashboard is empty — that's normal.

To list applications without opening the UI:

canasta argocd apps --host node1

Step 4 (optional): Connect to gitops

Put the instance's configuration under git so Argo CD can sync changes from the repo:

canasta gitops init \
  --host node1 \
  --id mywiki \
  --name dev \
  --repo git@github.com:<org>/<repo>.git \
  --key ~/mywiki-deploy-key

The command pauses and prints an SSH public key — add it as a deploy key with write access in your repo's settings, then press Enter. Argo CD picks up the repo and starts syncing.

canasta gitops status --host node1 --id mywiki
canasta argocd apps --host node1   # expect Synced / Healthy

Step 5 (optional): Scale the web tier

Edit the per-instance values.yaml on node1 and run canasta restart:

canasta list                        # note the instance path on node1
# Add to <instance_path>/values.yaml on node1, before 'domains:':
#   web:
#     replicaCount: 3
canasta restart --host node1 --id mywiki
canasta status --id mywiki   # web pods listed under 'Pods', NODE column shows spread

The restart flow reads web.replicaCount and applies it via helm upgrade. The new web pods attach to the same NFS-backed RWM PVCs and can land on any node in the cluster.

For a quick experiment without editing values.yaml you can kubectl scale deploy/canasta-mywiki-web -n canasta-mywiki --replicas=3, but that's ephemeral — the next canasta upgrade resets the replica count back to 1.

The K8s scheduler tries to distribute pods by resource availability but does not guarantee spread. With replicaCount: 3 on two nodes you may get 2-1 or 3-0 depending on the scheduler. If you need guaranteed spread, cordon the crowded node to force reschedule:

ssh node1 'sudo k3s kubectl cordon <node-name>'
canasta restart --host node1 --id mywiki
ssh node1 'sudo k3s kubectl uncordon <node-name>'

Adding more workers

canasta host add --name node3 --ssh admin@<NODE3_IP>
canasta install k8s-worker --host node3 --cp-host node1

Shared storage (NFS) and ingress (Traefik) already cover any number of nodes — no reconfiguration needed. New web replicas (after canasta restart) can schedule onto the new node.

Cleanup / teardown

Tear down workers first, then the instance, then the control plane.

Tear down a worker:

canasta uninstall k8s --host node2 --cp-host node1

--cp-host is required for worker uninstalls so canasta can also delete the worker's Node entry from the control plane. Without it, the Node lingers in kubectl get nodes and a worker rejoining with the same hostname reclaims it (preserving its AGE), masking the uninstall.

Tear down the instance:

canasta delete --host node1 --id mywiki --yes

Tear down the control plane:

canasta uninstall k8s --host node1

(--cp-host is not needed here — tearing down the control plane discards the Node list along with everything else.)

Then terminate the EC2 instances and delete the security group in the AWS console (or via aws ec2 terminate-instances + aws ec2 delete-security-group) to stop billing.

Production considerations

What this journey simplified that real production should add:

Database HA — the default deployment runs one MariaDB pod on node-local storage. Move to an external managed database (RDS Multi-AZ, Aurora, Cloud SQL, or a Galera cluster) by passing USE_EXTERNAL_DB=true + MYSQL_HOST in an envfile to canasta create. See Help:External database.
NFS placement — colocating NFS with the control plane couples storage availability to that node. A dedicated NFS host or managed NFS (AWS EFS) is more durable.
Multi-AZ topology — for cloud failure resilience, distribute workers across at least two availability zones. The same canasta-managed flow works; only the EC2 launch step changes.
ACME production certs — rerun canasta create without the staging-certs flag once DNS is verified.
Backup — set up Restic against an off-cluster bucket and enable scheduled snapshots with canasta backup init + canasta backup schedule set. See Help:Backup and restore.
Monitoring + alerting — none of this guide covers observability. See Help:Observability.
Operator workstation durability — the controller holds per-instance state at ~/canasta/<id>/. Run from a stable host (small EC2 instance in the same VPC, CI runner, or shared admin workstation) rather than a personal laptop for long-lived deployments.

Troubleshooting

Common pitfalls specific to this journey:

Pods on the worker node can't resolve DNS or reach ClusterIP services. VXLAN/UDP blocked between nodes. Verify the security group allows all UDP (not just TCP) between cluster nodes. Smoking gun: web pod's wait-for-db init container hangs on DNS lookup.

kubectl get nodes from the controller fails with a TLS validation error. The address you put in ~/.kube/config isn't covered by the k3s API cert. Re-run canasta install k8s-cp --public-ip <addr> with the address you actually use, or add it via a re-install (the API server cert is rebuilt on each install).

PVCs stuck Pending. kubectl describe pvc <name> shows the cause. Usually: StorageClass missing, StorageClass doesn't support RWM, or NFS server not reachable from worker nodes.

NFS mount fails with Failed to resolve server <name>. The StorageClass parameter server: got a hostname the cluster nodes can't resolve. Check it: kubectl get sc nfs -o jsonpath='{.parameters.server}'. Recreate with an IP that's reachable from worker nodes.

Web replicas Pending after scale-up. RWO PVC already attached to a different node. Recreate the instance with --access-mode ReadWriteMany.

Pods all on one node. Scheduler doesn't guarantee spread. Use the cordon workaround in Phase 3 Step 5.

Certificate stuck Ready=False. Usually DNS hasn't propagated, or port 80 isn't open from the internet (Let's Encrypt HTTP-01 challenge requires reachable port 80).

For the broader catalog of multi-node K8s issues that aren't journey-specific, see Help:Multi-node Kubernetes#Troubleshooting.

Windows / WSL notes

Everything in this journey works unmodified on WSL2 Ubuntu. Watch out for:

SSH key permissions. If the AWS key lives on /mnt/c/..., WSL sees it as mode 0777 and ssh refuses to use it. Copy the key into ~/.ssh/ inside WSL and chmod 600 it.
dig isn't preinstalled — sudo apt install dnsutils.
Argo CD tunnel. WSL2 auto-forwards localhost to the Windows host, so https://localhost:8443 works from the Windows browser without extra setup.
sed syntax. Use the Linux form (no after -i) — the macOS form errors on WSL.