Misconfigured Kubeflow workloads are a security risk
By Yossi Weizman, Microsoft Security Research Software Engineer, Azure Security Center, ILDC
June 11, 2020
Azure Security Center (ASC) monitors and defends thousands of Kubernetes clusters running on top of AKS. Azure Security Center regularly searches for and research for new attack vectors against Kubernetes workloads. We recently published a blog post about a large scale campaign against Kubernetes clusters that abused exposed Kubernetes dashboards for deploying cryptocurrency miners.
In this blog, we’ll reveal a new campaign that was observed recently by ASC that targets Kubeflow, a machine learning toolkit for Kubernetes. We observed that this attack effected on tens of Kubernetes clusters.
Kubeflow is an open-source project, started as a project for running TensorFlow jobs on Kubernetes. Kubeflow has grown and become a popular framework for running machine learning tasks in Kubernetes. Nodes that are used for ML tasks are often relatively powerful, and in some cases include GPUs. This fact makes Kubernetes clusters that are used for ML tasks a perfect target for crypto mining campaigns, which was the aim of this attack.
During April, we observed deployment of a suspect image from a public repository on many different clusters. The image is ddsfdfsaadfs/dfsdf:99. By inspecting the image’s layers, we can see that this image runs an XMRIG miner:
This repository contains several more images, which differ in the mining configuration. We saw some deployments of those images too.
Looking at the various clusters that the above image ran on showed that most of them run Kubeflow. This fact implies that the access vector in this attacker is the machine-learning framework.
The question is how can Kubeflow be used as an access vector for such an attack?
Kubeflow framework consists of many different services. Some of those services include: frameworks for training models, Katib and Jupyter notebook server, and more.
Kubeflow is a containerized service: the various tasks run as containers in the cluster. Therefore, if attackers somehow get access to Kubeflow, they have multiple ways to run their malicious image in the cluster.
The framework is divided into different namespaces, which are a collection of Kubeflow services. Those namespaces are translated into Kubernetes namespaces in which the resources are deployed.
In first access to Kubeflow, the user is prompted to create a namespace:
In the picture above, we created a new namespace with the default name anonymous. This namespace is broadly seen in the attack and was one of the indicators to the access vector in this campaign.
Kubeflow creates multiple CRDs in the cluster which expose some functionality over the API server:
In addition, Kubeflow exposes its UI functionality via a dashboard that is deployed in the cluster:
The dashboard is exposed by Istio ingress gateway, which is by default accessible only internally. Therefore, users should use port-forward to access the dashboard (which tunnels the traffic via the Kubernetes API server).
In some cases, users modify the setting of the Istio Service to Load-Balancer which exposes the Service (istio-ingressgateway in the namespace istio-system) to the Internet. We believe that some users chose to do it for convenience: without this action, accessing to the dashboard requires tunneling through the Kubernetes API server and isn’t direct. By exposing the Service to the Internet, users can access to the dashboard directly. However, this operation enables insecure access to the Kubeflow dashboard, which allows anyone to perform operations in Kubeflow, including deploying new containers in the cluster.
If attackers have access to the dashboard, they have multiple methods to deploy a backdoor container in the cluster. We will demonstrate two options:
This image doesn’t necessarily have to be a legitimate notebook image, thus attackers can run their own image using this feature.
The Kubernetes threat matrix that we recently published contains techniques that can be used by attackers to attack the Kubernetes cluster. A representation of this campaign in the matrix would look like:
The attacker used an exposed dashboard (Kubeflow dashboard in this case) for gaining initial access to the cluster. The execution and persistence in the cluster were performed by a container that was deployed in the cluster. The attacker managed to move laterally and deploy the container using the mounted service account. Finally, the attacker impacted the cluster by running a cryptocurrency miner.
How to check if your cluster is impacted?
kubectl get pods –
kubectl get service istio-ingressgateway -n istio-system
Azure Security Center has detected multiple campaigns against Kubernetes clusters in the past that have a similar access vector: an exposed service to the internet. However, this is the first time that we have identified an attack that targets Kubeflow environments specifically.
To learn more about AKS Support in Azure Security Center, please see this documentation.
Start a trial of Azure Security Center Standard to get advanced threat protection capabilities.