Written by Steven Cammidge, Systems Reliability Engineer at LINX.
Kernel-based Virtual Machine (KVM) technology isn’t anything new in our industry.
It’s been around in the mainline linux kernel since Feb 2007 and is widely used for a number of applications. For Internet Exchange Points (IXPs) like LINX possibly not used as well as it could be – until now!
Kernel-based Virtual Machines (KVM) – an open-source virtualization technology that is integrated directly into the Linux kernel allowing it to act as a hypervisor.
Hypervisor – a program used to run and manage one or more virtual machines on a computer
At LINX we have been using KVM and similar tools like VMWare to run our route servers, route collectors and the back-end tooling that keeps our stats online.
Before
Previously, we would manually build the servers to run services which took considerable time and involved different teams with certain tasks locked behind the previous task being done. This necessitated serialization of the build and inherently introduced time to handover between different people working on distinct aspects of the project.
Also, when it came to resilience if this single piece of hardware running each service at each of our LINX enabled data centre locations were to fail we would need to ensure we had a backup hardware on standby or even worse, we would be delayed waiting on new hardware delivery to get things back up and running.
These were two areas we wished to improve upon.
Removing manual steps, reducing human error, reducing time to recover
Things at LINX are moving at a fast pace right now.
There are new geographies to deploy and our existing IXP fabrics are expanding into new data centres at more regular intervals. So, we designed a three-pronged solution to address the areas we wanted to improve on.
Physical: a model using 2-3 pieces of hardware at each LINX site (depending on requirements) to provide the hardware resiliency, each piece of hardware has a connection to every switch allowing any workload to run from any piece of hardware.
Virtual: Utilising virtualization allows us flexibility in building and restoring servers and services in the event of a failure including rapid relocation if needed, something that is logistically more troublesome physically.
IaC/automation: Building a deployment role in ansible with templating capable of configuring the VM and virtual networks, deploying OS images, bootstrapping and configuring said images ready for immediate software deployments eliminates the manual steps required when deploying new services. This also reduces the time taken till they can be brought live and ensuring a high degree of standardisation.
The advantage of this process is that we can define the entire landscape of LINX services and the KVM based virtual machines they will run on, including network config and mapping prior to the build as IaC. This not only removes the inherent serialization and handover of tasks during the build which are instead handled by automation but allows us to deploy several servers simultaneously in a fraction of the time.
Infrastructure as Code IaC – is the process of managing and provisioning computer data center resources through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools
Because it’s IaC, from my side as part of the IS team, we can now prepare new deployments in advance and pre-stage almost every aspect of the build so then all that needs to be done is the physical and KVM host OS install when ready.
This means we can get new LINX enabled sites up in hours in theory.
Also in the event of LINX route server or collector failures, we simply redeploy the affected server to alternative hardware either in that location or another location on the peering LAN, no waiting for back up solutions or hardware replacements to restore service, leading to a shorter time to recovery for the service affected.
Seeing Results
We made a new Internet Exchange Point live in Riyadh earlier this year as part of our partnership with Center3. This was the first site we deployed using our new process and utilising the KVM and hypervisor capabilities.
This new process means faster recovery from failures not only on our route server software but also for our member portal and live stats.
We also launched a new DDoS service for our members recently and this has been released using the same KVM automation. It uses a virtual machine that enables that acts as a blackhole endpoint for RTBH services for LINX member networks peering with the route servers.
We have used similar tools like VMware in the past, however as ambassadors for open source technology we are finding the KVM solution a good fit for us right now! Allowing us great flexibility in certain use cases across our Linux servers, giving a lightweight automated VM solution with no vendor or Linux flavour lock in.
< Go Back