Gremlin Soundproofs Kubernetes

November 19, 2020

Gremlin added new features to “soundproof” Kubernetes and help engineers prevent noisy neighbors in a cluster. The idea of sharing resources across machines is not new or unique to Kubernetes; however, given the highly dynamic and ephemeral nature of containers orchestrated by Kubernetes, which can host dozens of apps and hundreds of services across a single cluster, sharing resources and security permissions is an even larger concern.

According to recent Kubernetes Adoption Research, 59% of large organizations use Kubernetes in production, which mirrors the distribution of companies running chaos attacks on the Gremlin platform. Gartner predicts in their CTO’s Guide to Containers and Kubernetes that “by 2025, more than 85% of global organizations will be running containerized applications in production, which is a significant increase from fewer than 35% in 2019.”

Besides being highly flexible and scalable, Kubernetes adoption is driven by resource efficiency: containers have a smaller resource footprint, which enables a much higher tenant density on a host, thereby increasing infrastructure utilization. But that density and utilization of resources adds to the “noisy neighbor” problem, where one scaling or problematic service can impact another on the same node within a cluster. Without doing proactive testing, it's difficult to know how a system handles a noisy neighbor in production, unless there is a spike in demand on a single service, at which point it’s too late and customers already feel the impact.

“Kubernetes is becoming the default way to build and operate applications at many enterprises, but along with the advantage of abstraction comes uncertainty,” said Lorne Kligerman, Sr Director of Product at Gremlin. “We’re providing DevOps teams with better tooling to understand how their Kubernetes applications will behave under various stresses, such as when a neighboring container is spiking with traffic.”

The noisy neighbor problem also introduces security concerns. Performing chaos experiments in multi-tenant environments requires fine-grained controls. Ideally, individuals and teams are limited to the namespaces where they should be performing attacks. Using namespace access control ensures that only team members with correct permissions will have access to specific Kubernetes objects, versus all objects in the cluster. This is crucial to ensuring the Chaos Engineering work an engineer is doing isn’t negatively impacting neighboring services.


Test individual pod scaling and Kubernetes resource limits to prevent “noisy neighbors” taking down your application

Easily target specific Kubernetes objects to test how they handle spikes in usage without impacting the entire application

Securely allow testing Kubernetes in shared cluster environments

Running targeted experiments on Kubernetes infrastructure via Gremlin’s intuitive user-interface helps SRE and DevOps teams simulate real-world failures that are unpredictable, difficult to replicate, and cause downtime if they happen in production. Engineers can specify exactly which Kubernetes objects they’d like to test, and simulate CPU spikes or servers shutting down, without affecting the entire cluster and ultimately giving them more confidence in the resiliency of their environments.

“Gremlin makes Chaos Engineering easy and seamless,” said Chaitanya Krant, Engineering Manager at National Australia Bank. “For us, it’s cut down the amount of time involved in designing and executing the chaos experiments, particularly for our Microservices and Kubernetes.”

Terms of Use | Copyright © 2002 - 2020 CONSTITUENTWORKS SM  CORPORATION. All rights reserved. | Privacy Statement