Andromeda 2.1 reduces GCP’s intra-zone latency by 40%
By Jake Adriaens, Google Staff Software Engineer
November 3, 2017
Cloud customers now enjoy significantly improved intra-zone network
latency with the release of
Andromeda 2.1, a
software-defined network (SDN) stack
that underpins all of
Google Cloud Platform
(GCP). The latest version of Andromeda reduces network latency
between Compute Engine VMs by 40% over Andromeda 2.0 and by nearly a
factor of 8 since we first launched Andromeda in 2014.
This kind of network performance is especially important as more
applications move into the cloud and are accessed via web browsers.
While the headline metric is often bandwidth, network latency is
frequently the more important determiner of application performance.
For example, low latency is essential for financial transactions,
ad-tech, video, gaming and retail, as well as workloads such as HPC
applications, memcache and in-memory databases. Likewise, HTTP-based
microservices will see significant improvement in responsiveness
with reduced latency.
Andromeda 2.1 latency improvements come from a form of hypervisor
bypass that builds on virtio, the Linux paravirtualization standard
for device drivers. Andromeda 2.1 enhancements enable the Compute
Engine guest VM and the Andromeda software switch to communicate
directly via shared memory network queues, bypassing the hypervisor
completely for performance-sensitive per-packet operations.
In our previous approach, the hypervisor thread served as a bridge
between the guest VM and the Andromeda software switch. Packets
flowed from the VM to a hypervisor thread, to the local host’s
Andromeda software switch, then over the physical network to another
Andromeda software switch, and back up through the hypervisor to the
VM. Further, any time the thread wasn’t bridging packets, it was
descheduled, increasing tail latency for new packet processing. In
many cases, a single network round-trip required four costly
hypervisor thread wakeups!
Andromeda 2.1's optimized
datapath using hypervisor bypass.
Andromeda 2.1 performance in action
The new Andromeda 2.1 stack delivers
noteworthy reductions in VM-to-VM network latency. The figure below
shows the factor by which the latency has reduced over time compared
to the median round-trip time of the original stack.
Factor by which latency
has improved over time
This reduction in network round-trip
times translates into real-world performance boosts for latency
sensitive applications. Take Aerospike, a high-performance in-memory
NoSQL database. The new Andromeda stack delivers both a reduction in
request latency and improved request throughput for Aerospike, as
SDN is a foundational building block of Google Cloud, you should see
similar improvements in intra-zone latency, regardless of what
applications you're running.
delivers flexibility and reliability
Andromeda SDN enables more
flexibility than other hardware-based stacks. With SDN, we can
quickly develop and overhaul our entire virtual network
infrastructure. We can roll out new cloud network services and
features, apply security patches and gain significant performance
improvements. Better yet, we can confidently deploy to Google Cloud
with no downtime, reboots or even VM migrations, because the
flexibility of SDN allows us to thoroughly test our code. Watch this
space to learn about the new features and enhanced network
performance made possible by our Andromeda SDN foundation.