Google Compute Engine offers a really unique technology called “Live Migration” which keeps your instances running even when a host undergo downtime such as during software or hardware update. Google Compute Engine migrates your running instance to another physical host in the same network zone rather than requiring your instances to be rebooted.
Live Migration helps Google to perform maintenance which is integral to keeping infrastructure protected and reliable without interrupting any of your instances. The Live Migration is a very cool feature, and Google claims that:
“your instance might experience a short period of decreased performance, although generally, most instances should not notice any difference.”
Naturally, we wanted to test this claim ;-) Fortunately, you can easily simulate a maintenance event on an instance by using either the gcloud command-line tool or an API call, i.e.
gcloud beta compute instances simulate-maintenance-event [INSTANCE_NAME] \ --zone [ZONE] OR POST https://www.googleapis.com/compute/beta/projects/[PROJECT_ID]/zones/[ZONE]/instances/[INSTANCE_NAME]/simulateMaintenanceEvent
To make our test more real-life, we have decided to generate some load on our test instance and then to initiate a Live Migration event and see how it effects our instance.
Using Cloud Launcher, we created a n1-standard-1 machine running nginx (by Bitnami). For load generation, we have decided to use K6 — developer centric open source load testing tool written in Go. We have opted to store test results with InfluxDB and visualize them with Grafana.
Here is our script:
To start generating the load, we ran K6 as following:
k6 run — out influxdb=http://x.x.x.x:8086/myk6db — vus 50 — duration 10m — rps 6000 test.js
After about 60 seconds we started the maintenance event simulation.
As you can see, during a period of about 2 seconds, the response time was significantly higher and our instance did not handle any requests at that time. However, there are no errors and the client has been served during the migration.
We have repeated the same test on a bigger instance (n1-standard-4):
Finally, another test on even larger instance serving 10k requests per second (comparing to the 6k in previous tests).
As you can see from the charts, the behavior stays consistent across workloads and instances of different types. You can expect about 2s during which your instance will not respond during the live migration, however no connections will be dropped and we did not observed any errors during our tests.
Live Migration is a cool and unique feature in the public cloud and it’s the default option when creating a instances in Google Compute Engine, and now you can test how your app will behave during live migration.