Unifying monitoring and security for Kubernetes on Azure Container Service

We’ve seen an increase in container workloads running in production environments and a new wave of tooling that’s cropped up around container deployments. Microsoft Azure has a number of different partners in the container space and today we’re featuring a new product from Sysdig. Sysdig Secure, run-time container security and forensics.

Pushing container and microservice based applications into production will radically change the way you monitor and secure your environment. In this post we’ll review the challenges of this new infrastructure and give a number of examples of monitoring and securing Kubernetes on Azure container services with Sysdig.

How to instrument your Azure environment using helm with Sysdig
Best practices for leveraging Kubernetes metadata to optimize and secure your containers
How troubleshooting and forensics has changed in containerized environments

Why unify Monitoring & Security?

“The purpose and intent of DevSecOps is to build on the mindset that 'everyone is responsible for security' with the goal of safely distributing security decisions at speed and scale to those who hold the highest level of context without sacrificing the safety required.” – DevSecOps

The rise of DevSecOps has created new platform for operators who are in charge of providing container based platforms as a service for their own development teams. This includes giving teams all the performance tooling they need to make sure the services they run are stable as well as secure.

These platform operators focus their workflows around two main concepts:

Visibility – what’s the performace of my service? Is my infrastructure safe?
Forensics – what happened to the deployment that crashed? What unexpected outbound connection was spawned, and what data was written to disk?

While the questions you ask for monitoring and security are different the workflow is the same. Sysdig provides developers a unified experience for interacting with their data from a single instrumentation point with low system and cognitive overhead.

Getting Started with Kubernetes on Azure Container Service (ACS) & Sysdig

If you’re new to ACS check out this post [SD1] [KA2] to get step by step instructions for deploying Kubernetes or your favorite orchestrator up and running in minutes.

We’ll be using a helm chart to instrument our environment which will start the Sysdig agent on each of our hosts in the Kubernetes cluster. For more info about how Sysdig collects data from your environment check out our how it works page.

Visibility into Kubernetes Services

Performance Monitoring

One of the best parts of Kubernetes is how extensive their internal labeling is. We take advantage of this within grouping in Sysdig Monitor. You’re able to group and explore your containers based on their physical hierarchy (for example > Host > pod > container) or based on their logical microservice hierarchy (for example, namespace > replicaset > pod > container).

Screen Shot 2017-10-09 at 4.02.29 PM.png

Screen Shot 2017-10-09 at 4.03.00 PM.png

Click on each of these images and see the difference between a physical and a logical grouping to monitor your Docker containers with Kubernetes context.

If you’re interested in the utilization of your underlying physical resource (eg. identifying noisy neighbors) then the physical hierarchy is great. But if you’re looking to explore the performance of your applications and microservices, then the logical hierarchy is often the best place to start.

In general, the ability to regroup your infrastructure on the fly is a more powerful way to troubleshoot your environment as compared to the typical dashboard.

Securing Kubernetes Services

This same metadata can be used to protect your Kubernetes services. Using a label like kubernetes.deployment.name, we can enforce a policy to protect a logical service regardless of how many containers, hosts, or azure regions that deployment is running in.

What we’re looking at below is a policy to protect my redis Kubernetes deployment from an exfiltration event by detecting an unexpected outbound connection from that logical service. From there, we can also take actions on any policy violation to stop the container before any data has left our redis service.

Forensics in Container Environments

Doing forensics for troubleshooting and incident response both face the same challenge: containers are ephemeral and the data we want is often long gone. Also, they’re essentially black boxes and it’s often hard to tell what’s actually running inside of them.

We don’t have time to ssh into the host and run a core dump if Kubernetes is killing our containers. Our system needs to proactively capture all activity with the ability to troubleshoot that data outside of production.

Sysdig’s unique instrumentation allows us to capture all activities from users, system calls, network, processes, and even contents written to file or passed over to the network pre and post policy violation. This is something that has so much data it’s best explained over a quick one minute video. Check out this analysis of what can happen when a user spawns a shell in a container, and all the data we can collect about their subsequent actions.

Conclusion

While the end result of your analysis might be different between monitoring and security platforms, the data and the workflow are often the same. You need to be able to view your infrastructure through a Kubernetes lens, and see rich activity about everything going on in your hosts. See Sysdig’s full visibility and forensics capabilites with a single container agent per host from this webinar or get started in less than 3 minutes with helm.