Migration to Kubernetes
Mar 01, 2021 - Jun 30, 2021
One of the Zwift.com web engineering team's big engineering initiatives in 2021 was to migrate its infrastructure from OpsWorks to Kubernetes (k8s). There were a few reasons for this:
- The rest of the engineering org was migrating to k8s
- Across the teams there was contention for hardware resources, especially toward the end of the week when we cut a release candidate. 5 teams and more than 20 individuals were competing for a QA environment to get their change(s) verified before merging into their team branch (and then on into the release branch). That means a lot of PR checker runs! And it means we need quite a few QA environments. Prior to k8s, we used OpsWorks, a system provided by AWS that relies on EC2 instances and Chef. It required a veritable ton of manual steps to spin up a new environment. In fact, it required so many that at some point our devops team no longer knew how to do it. Using k8s allowed us to spin up a new QA environment in a matter of minutes. And a side effect of the k8s sytem that our devops team designed, we didn't need them to help anymore. They became a provider of tools and consulting. All the power was now in the hands of the web team. Once this was a true reality, we could spin up new environments in a matter of minutes. If ever there was contention on a RC cut day, an engineer could just spin up a new env real quick, get their changes verified, then spin it down. Some manual intervention required, to copy/paste/tweak some k8s config YAML, but a doable operation. Very possible.
- Overall cost reduction. Believe it or not, having ephemeral compute resources that wake up, then go back to sleep, is cheaper than ones that are always awake, by orders of magnitude.
The project took about 4 months to complete. There was quite a lot involved:
- Creating docker images to support npm (yarn) builds, docker image builds and curl requests (to clear cache etc)
- K8s config to use docker images for PR checkers
- Creating Jenkins pipelines to handle PR checkers, software builds and deployments
- K8s configs for QA env
- Documentation
- Training teams how to use the new tools to spin up their own environments
- K8s configs for staging (release candidate verification) environment
- K8s configs for production environment
- More documentation
- Testing and launching it all!
I didn't do all of this single handedly, but I did do the majority of the "boots on the ground" work. It was a big task and I really wanted to avoid it, so waited all the way up until the time that the pain of working in our non-k8s environment became great enough to justify the work. I learned quite a bit about things I never thought I would, and we all got it done. The payoff for great efforts is usually pretty great, and this was no exception.