Backing Up Virtual Machines vs Kubernetes

In previous articles we described how we moved from a single VM running Docker Compose to a Kubernetes cluster on Talos OS, and how we then consolidated our zoo of vertical VMs into dedicated namespaces. One topic was still missing from the picture, the most uncomfortable one — the one everybody talks about and few actually tackle: backups.

Spoiler: the biggest difference between backing up VMs and backing up Kubernetes isn't in the tools, it's in how the software running on top of them is designed. But let's take it one step at a time — we'll come back to that in a future article.

First rule: not all data is the same

This is a stance we hold regardless of the orchestrator. It applies to VMs, it applies to Kubernetes, it applies to anything we have in production: “data” is not all created equal, and dumping everything in the same place is the easiest way to make your life miserable when it's time to restore.

We draw a line between two categories:

1. Binary files (PDFs, images, etc.)

These belong on a dedicated S3-compatible storage, not on the application's filesystem, and on this point we are very strict:

The bucket is never public. Access is exclusively through the product's API, never direct.
The end user never downloads a file by reading directly from S3: they always go through the application, which validates their identity, checks permissions and — if everything checks out — serves them the file (typically via a short-lived signed URL or a proxied stream).
S3 credentials live in a Kubernetes Secret (or in the equivalent secret manager in the VM world), never in the code, never in configuration files committed in plain text.

The reason is as trivial as it is important: if a user can download a file without going through your software, you no longer control who accesses what. And when the time comes for a GDPR audit, or to answer an enterprise client who asks you to “prove who read this document,” you're in trouble.

Pleasant side effect: the application's filesystem slims down. No more 500 GB VMs because “they hold six years' worth of attachments.” The application goes back to the size it should be — a few GB, not hundreds — and restoring it in a disaster scenario becomes an order of magnitude faster.

This doesn't mean the S3 bucket shouldn't be protected. It absolutely should. But backing up a bucket is a problem you tackle separately, with dedicated tools: sometimes third-party software (Veeam, Restic, Kopia, MinIO-side solutions for compatible environments), very often the backup/replication services the S3 provider itself offers. Scaleway, for example, offers native replication and versioning; the other serious providers do the same. The important part is separating the management of file backups from the management of application backups.

2. Databases

And here we get to the other half of the problem. Once the binary files are out of the way, what's left to preserve carefully are the databases: PostgreSQL, MySQL, MongoDB, Redis if you really care about its state, vector engines if you have any.

The approaches are well known:

Bash scripts with scheduled hot dumps. The classic pg_dump/mysqldump fired off by a cron job, with external storage as the destination. It works, it's solid, it has the virtue of simplicity.
Proprietary or engine-specific tools. pgBackRest for PostgreSQL, Percona XtraBackup for MySQL, the official MongoDB utilities. More sophisticated, they handle incrementals, point-in-time recovery, retention.
Provider-managed services, if the database is managed.

Which one to choose depends on how critical the data is, on the RPO/RTO you've set yourself, on the complexity you're willing to take on. As for where to send these backups, the retention rules, the regulatory implications (GDPR, regulated sectors, compliant retention), we'll come back to that in a dedicated article, written together with the colleagues who deal specifically with information security and compliance. The topic deserves more than a paragraph tossed in offhand.

And now let's get to Kubernetes

With the model above in mind — files on S3, databases backed up separately — the question becomes: what's still left to back up on a Kubernetes cluster?

Here's the single most important point of the whole article:

You should not back up the software.

If the software is well designed — and we'll come back to that topic too soon — it's nothing more than versioned Docker images, with a clear compatibility table between the versions of the various components. The image is already in the registry, already versioned, already reproducible. Backing it up would be like backing up a library of a programming language: pointless.

What you do need to back up is:

The contents of the databases (which we already covered above).
The contents of the S3 bucket (same thing).
The Kubernetes cluster configurations: namespaces, Deployments, Services, Ingresses, ConfigMaps, Secrets, PersistentVolumeClaims, Roles/RoleBindings, NetworkPolicies. All the declarative state that describes how the software runs.

Point 3 is the one that makes the real difference between backing up a VM and backing up Kubernetes.

VM vs Kubernetes: the difference is not trivial

A VM backup is the backup of a single monolithic object that contains everything: operating system, application, configuration, data. It is heavy (tens or hundreds of GB even for simple services), slow to create and even slower to restore, and it is rigid: hard to extract just one piece from, hard to restore onto an infrastructure different from the one it originated on.

A backup of a Kubernetes namespace, done right, is instead a few MB of YAML plus the database data. It takes a few seconds, occupies almost no space, and — this is the interesting part — it potentially lets you restore the entire namespace onto another Kubernetes cluster in a matter of minutes. Different cluster, different provider, even a different Kubernetes distribution: if the manifests are written portably, they just work.

It's the difference between moving a house with everything inside it and moving the blueprints of the house, the furniture inventory and the suppliers' addresses. The second approach is infinitely more nimble.

The tool we had to write ourselves

At this point the natural question is: with all the tools that exist in the Kubernetes ecosystem, wasn't there already something that did exactly this? The honest answer is: we looked, we tried several solutions, and none of them did exactly what we needed, in the way we needed it and with the operational simplicity we demanded. Neither among the commercial products, nor in open source.

So we wrote it ourselves.

How it's built

We deliberately wanted the design to be minimal:

A single self-contained pod, installed into the cluster with a manifest.
It uses Kubernetes' native APIs to read and serialize all of a namespace's resources (and the associated volumes, when relevant).
For each database it uses specific, parametric scripts: the same interface for whoever uses it, a different implementation underneath depending on the engine (PostgreSQL, MySQL, MongoDB, and so on). For both backup and restore.
Portable output: the result is an archive containing the namespace's YAML manifests plus the database dumps, and it can be restored onto any Kubernetes cluster with the tool installed.

No magic, no exotic dependencies, no complicated operator to maintain. A pod, the APIs, the scripts. Simple, clean, effective: the three words we always use as our compass when designing internal tools.

Bonus: backing up the entire cluster

There's one last feature we added that we're particularly fond of. The same tool can back up all of the cluster's base configurations, excluding the application namespaces: ingress controller, cert-manager, storage classes, network configurations, system RBAC, and so on.

Why? Because in a disaster recovery scenario, the most painful part isn't restoring the applications — those are manifests, you can carry them wherever you want — it's rebuilding the underlying cluster exactly as it was. With this cluster-level backup, we can set up a new Kubernetes cluster from scratch, apply the bundle, and within minutes have an environment identical to the one we lost. At that point, restoring the application namespaces is just one more command.

To wrap up

The jump from VMs to Kubernetes changed a lot of things in our day-to-day operations, but backups are where the difference is felt the most — and felt for the better. Four takeaways, if you're interested in mirroring the approach:

1
Always separate files from databases, regardless of the orchestrator. Files on private S3 behind the application's API, databases backed up separately.
2
Don't back up the software: containerized and versioned, it's already its own backup.
3
Leverage Kubernetes' declarative nature: back up the cluster state as YAML, not as disk images.
4
Look for tools that do what you need. If you can't find them, write them: the Kubernetes ecosystem is vast but not omnipotent, and sometimes a well-built pod solves what a commercial product costing thousands of euros a year does not.

In upcoming articles we'll return to the topics we touched on here: what it means to design software so that a DevOps engineer doesn't lose their mind over backups, and — together with our security and compliance colleagues — where these backups should actually be stored, according to which criteria and under which regulatory constraints.

Luca Vitali

Want a Kubernetes infrastructure with backups done right?

Codebaker designs and manages Kubernetes clusters on European cloud (Scaleway) and on-premise, with tailored backup and disaster recovery strategies. If you want to talk it over with people who do it every day, drop us a line.