Beruflich Dokumente
Kultur Dokumente
Kubernetes
Observability
Monitoring, troubleshooting and securing
Kubernetes with Sumo Logic
Table of contents
Introduction 2
Chapter 1 5
Understanding the Kubernetes monitoring landscape
Chapter 2 7
Challenges of monitoring and troubleshooting in Kubernetes environments
Chapter 3 10
Why traditional Kubernetes M&T solutions fail
Chapter 4 12
What to monitor
Chapter 5 15
Collecting Kubernetes data
Chapter 6 17
Sumo Logic: A unified DevSecOps platform for Kubernetes
Conclusion 21
Five reasons why you should choose Sumo Logic for Kubernetes monitoring
Appendix: A 22
Kubernetes metrics
Introduction
Why is monitoring Kubernetes hard? What is
observability in Kubernetes? How do I achieve it?
14% StackRox
Kubernetes non-Kubernetes
Chapter 1:
Understanding the Kubernetes
monitoring landscape
Goals of monitoring
First, what are we trying to accomplish by monitoring?
There are many answers to this question, but often, the primary
reason is to ensure reliability. Are things working as expected?
If not, what is broken and why?
Observability vs monitoring
Caption goes here. Lorem ipsum dolor sit amet, consectetur adipiscing elit. 5
From Distributed Systems
Observability, Cindy Sridharan
argues that
160,000
120,000
8,000
4,000
0 12.5 25 37.5 50
Container age (minutes)
The large volume of containers (generated by multiple customers) lasting less than 5 minutes indicates the potential for net new application
architectures using containers for periods of time far less than the amount of time typically needed to activate a virtual machine.
Caption goes here.-Lorem
New Relic
ipsumDocker Beta
dolor sit Program
amet, Development
consectetur Analysis
adipiscing elit. 7
E-BOOK | Kubernetes Observability
Everything is ephemeral abstractions are how Kubernetes organizes itself. Kubernetes has
different hierarchies — services, namespace, deployment, or node
Everything in Kubernetes is, by design, ephemeral. Kubernetes centric views. Tools should have the flexibility to view Kubernetes
achieves its elastic ability to scale and contract by taking control through these various lenses.
over how pods—and the containers within those pods—are
deployed. A job needs to be done and Kubernetes schedules a
pod. When the job is complete, the pod is destroyed just as freely. Tools are distributed
But zoom out and we notice that Kubernetes has made the nodes
replaceable as well. A server dies and pods are rescheduled to Between logging tools, metrics tools, GitHub, and even SSH,
available nodes. Zoom out yet again to the clusters and these too engineers are constantly switching between a variety of tools to
are just as easily replaced. gain a complete picture of their system, i.e., observability. Walking
through a typical alert investigation, we can quickly get a sense
You have to zoom all the way out to the services to find a of this. An alert comes in and we immediately go check the logs
component with any staying power inside of Kubernetes. to find out more about the specific problem. Running through
Services and deployments represent the core application. a mental checklist of potential problems, we log into GitHub to
They still change but much less than their underlying components. see if any new code has been pushed. Did Kubernetes make any
Most tools weren’t designed to look at an environment from the scheduling decisions? What are the upstream and downstream
perspective of these logical abstractions. But these logical dependencies of the error I am seeing? And so on. Rarely are the
answers to the puzzle nicely connected and in one place.
But the more they are, the quicker we can resolve the issue.
Container Container
Kubernetes has various hierarchies and Sumo Logic allows you to look at your data through these different lenses — depending on the situation.
8
E-BOOK | Kubernetes Observability
Prometheus
9
E-BOOK | Kubernetes Observability
solutions fail
Fragmented visibility
Kubernetes has several key differences that push the limits Most solutions only provide visibility into a piece of the
of traditional application monitoring. Due to the distributed Kubernetes environment. Admins are forced to navigate
ephemeral nature of Kubernetes, most existing solutions fail to between tools for logs, metrics, events, and security threats
give the visibility we might expect, resulting in longer resolution to build a real-time picture of application health.
times. Looking at these potential pitfalls can help guide us as we
take a fresh look at Kubernetes management and monitoring.
Pods, Nodes, even clusters can all be destroyed and rebuilt with
ease. Effectively monitoring what is running in Kubernetes means
to monitor at the application level, focusing on the Service and
Deployment abstractions. Understanding what is happening from
Caption goes here. Lorem ipsum dolor sit amet, consectetur adipiscing elit. 10
E-BOOK | Kubernetes Observability
Furthermore, not only the tools but the data are also fragmented. Unfortunately, security visibility is often a low priority for teams
It is near impossible to connect the dots between metrics on a running Kubernetes, and existing toolsets rarely capture any sort
node to logs from a pod in that node. of security events for Kubernetes. Due to the lack of end-to-end
visibility into Kubernetes environments, the risk of undetected
This is because the metadata tagging of the data being collected security threats is a real issue. Kubernetes also makes it
is not consistent. A metric might be tagged with the pod and challenging to identify vulnerabilities in images at runtime,
cluster it was collected from, while a log might be labeled using a enforce security policies, and detect and remediate threats.
different naming convention. The metadata enrichment process
must be streamlined and centralized to gain consistent tagging, That said, end-users won’t care about the difficulties involved
and therefore, correlation. when their data is compromised. It is essential to take a more
DevSecOps-style approach in Kubernetes environments that
incorporates security considerations into the CI/CD lifecycle,
and elevates security visibility to the same importance as
operational visibility.
STORAGE &
COLLECTION & ENRICHMENT
DATA ACCESS
Node
Logs
</>
Fluentd Pod
Container Logging
Events
Backend
Node
</>
Prometheus
Metrics Pod
Metrics
Container
Backend
In traditional solutions, log and event collection and enrichment happens separately from metric collection enrichment, inhibiting the ability to correlate data
during troubleshooting.
11
E-BOOK | Kubernetes Observability
Chapter 4: store logs, and create events which we can collect. We can break
down the components into four main parts.
Master
Pods
Pod Pod Pods are the lowest level resource in the Kubernetes cluster. A
API Server Container Container pod is made up of one or more containers. Containers in a given
pod will share the same namespace, and the same storage and
Scheduler resources.
Controller Node
Manager Containers
Kubelet Kube Proxy
Containers run inside pods. Containers run the application
workloads as well as some Kubernetes components.
Pod Pod
etcd
Container Container
12
E-BOOK | Kubernetes Observability
Kubernetes metrics
API server As most communication happens through the API server, monitoring API server request latency
can give you a quick insight into larger issues that might be impacting your cluster.
•• API server request latency
•• Requests per minute
•• Etcd requests
etcd etcd uses a raft protocol to elect a leader to manage coordination between the other members,
as etcd is a distributed key-value store. While leader changes are normal, too many could be a
sign of a problem.
•• Leader changes
•• Quorum
•• Disk space
Controller manager Monitor the requests it is making to your Cloud provider to ensure the controller manager can
successfully orchestrate. Currently, these metrics are available for AWS, GCE, and OpenStack.
•• Cloud provider latency
•• Scheduling latency
Scheduler Watching the request limits will ensure pods don’t fail to run due to lack of resources.
•• Request limits
•• Quota limits
•• Anti-affinity policy
Note: affinity allows you to control where pods run based on specific hardware requirements.
Kublet Keeping a close eye on Kubelet ensures that the Control Plane can always communicate with
the node that Kubelet is running on.
•• Containers currently running
•• Current runtime operations
•• Operation latency
Nodes Visibility into the standard host metrics of a node ensures you can monitor the health of each
node in your cluster, avoiding any downtime as a result of an issue with a particular node.
•• CPU
•• Memory consumption
•• System load
•• Filesystem activity
•• Network activity
Caption goes here. Lorem ipsum dolor sit amet, consectetur adipiscing elit. 13
E-BOOK | Kubernetes Observability
(continued)
Containers At a minimum, you need access to the resource consumption of containers. Kubelet accesses
the container metrics from CAdvisor, a daemon that collects, aggregates, processes, and
exports information about running containers.
•• Resource consumption
•• CPU
•• Memory
•• File System
•• Network usage
Kube-State-Metrics Kube-State-Metrics is a Kubernetes add-on that provides insights into the state of Kubernetes.
It watches the Kubernetes API and generates various metrics, so you know what is currently
running. Metrics are generated for just about every Kubernetes resource including pods,
deployments, daemonsets, and nodes.
•• Pod status
•• Container resource limits and requests
•• Reason container is in waiting state
•• Node status
•• Deployment status
Kubernetes logs
The containers running in Kubernetes emit logs, which then get stored on that node.
Container workloads
Logs from these workloads provide information about the decisions the code is making
and the actions it is taking.
Kubernetes components Logs from these components give insights into the decisions made by Kubernetes.
Kubernetes events
Caption goes here. Lorem ipsum dolor sit amet, consectetur adipiscing elit. 14
E-BOOK | Kubernetes Observability
Prometheus pulls data from all of the components and jobs running in Kubernetes.
15
E-BOOK | Kubernetes Observability
Kubernetes does not define a single standard approach to log Events provide insight into decisions being made by the cluster
collection, but the most common method is called cluster level and unexpected events that occur in Kubernetes. Events are
logging. Cluster level logging deploys a node level logging agent stored the API server on the master node, and collected using
to each node which then funnels data to a separate backend for the same method as log collection — via a node level logging
storage and analysis of logs. The primary benefit of this solution agent like Fluentd.
is that if a pod dies, the logs detailing what happened are retained.
Implementing node level logging, without funneling data to a
logging backend, will not retain log data if pods are evicted or die. Setup using Helm
Cluster level logging ensures that data is captured and retailed.
A common tool for implementing cluster level logging is Fluentd — Finally, collectors for logs, metrics, events, and security can be
or Fluentbit, a lightweight version of Fluentd — which acts as the easily deployed using Helm—an open source Kubernetes
node level logging agent funneling data to a logging backend, package manager. Helm can significantly simplify the setup
like Sumo Logic. process, reducing hundreds of lines of configuration to one.
These collection plugins can be used on any Kubernetes
cluster, whether one from a managed service like Amazon Elastic
Kubernetes Service (EKS) or a cluster you are running entirely
on your own.
Cluster level logging implementation with Fluentbit deployed to all nodes for node level logging and Fluentd acting as a centralized metadata enrichment pipeline —grabbing
enrichment data from the API server.
16
E-BOOK | Kubernetes Observability
Chapter 6: Note
COLLECTION
Logs Fluentbit
ENRICHMENT
Service
Deployment
Events Fluentd
Namespace
Node
Metrics Prometheus
Pod
Container
Security Falco +
Sumo Logic collection and enrichment for logs, metrics, events, and security data.
17
E-BOOK | Kubernetes Observability
Figure 1. Namespace overview gives quick visibility into pods experiencing issues or in this case, in a CrashLoopBackOff state.
18
E-BOOK | Kubernetes Observability
Figure 2. Rich metadata enables Sumo Logic to automatically build out the explorer hierarchy of the components present in your cluster,
and keep the explorer up to date as pods are added and removed.
Figure 3. Security visibility is available at the cluster level alongside log, metric, and event data.
19
E-BOOK | Kubernetes Observability
Cloud infrastructure
Security monitoring
Network & OS
Operating System, Firewall , Network devices
Application services
AWS Cloudfront, Akamai, Fastly
Application code
Java, Scala, .NET, Rails, Serverless/Lambda
Caption goes here. Lorem ipsum dolor sit amet, consectetur adipiscing elit. 20
E-BOOK | Kubernetes Observability
Sumo Logic combines metrics, logs, Sumo Logic allows admins to monitor and Sumo Logic’s solution leverages the
events, and security to create a real-time troubleshoot their environments using the de facto standards endorsed by the
view of the performance, uptime, and mental model of Kubernetes and that Cloud Native Computing Foundation
security of a Kubernetes platform. of their custom application, rather than (CNFC). Sumo Logic’s solution utilizes
being forced through the lens of a server- the extensive ecosystem of integrations
based approach. View a Kubenretes already created and maintained for
environment through its different monitoring Kubernetes.
hierarchies: node, deployment, service,
and namespace.
21
E-BOOK | Kubernetes Observability
Common metrics
22
E-BOOK | Kubernetes Observability
The Kubernete Control Plane is the engine that powers The API Server provides the front-end for the Kubernetes cluster
Kubernetes. It consists of multiple parts working together to and is the central point that all components interact. The following
orchestrate your containerized applications. Each piece serves table presents the top metrics you need to have clear visibility into
a specific function and exposes its own set of metrics to monitor the state of the API Server.
the health of that component. To effectively monitor the Control
Plane, visibility into each components health and state is critical.
Metric Description
23
E-BOOK | Kubernetes Observability
Etcd
Metric Description
24
E-BOOK | Kubernetes Observability
Scheduler
Metric Description
Controller manager
Metric Description
25
E-BOOK | Kubernetes Observability
Kube-State-metrics
Metric Description
26
E-BOOK | Kubernetes Observability
(continued)
Metric Description
The Nodes of a Kubernetes cluster are made up of multiple Keeping a close eye on Kubelet ensures that the Control Plane can
parts, and as such you have numerous pieces to monitor. always communicate with the node that Kubelet is running on.
In addition to the common GoLang runtime metrics, Kubelet
exposes some internals about its actions that are good to track.
Metric Description
Visibility into the standard host metrics of a node ensures you Monitoring of all of the Kubernetes metrics is just one piece of
can monitor the health of each node in your cluster, avoiding any the puzzle. It is imperative that you also have visibility into your
downtime as a result of an issue with a particular node. containerized applications that Kubernetes is orchestrating. At a
You need visibility into all aspects of the node, including CPU minimum, you need access to the resource consumption of those
and Memory consumption, System Load, filesystem activity containers. Kubelet access the container metrics from CAdvisor,
and network activity. a tool that can analyze resource usage of containers and makes
them available. These include the standard resource metrics like
CPU, Memory, File System and Network usage.
27
See business
differently
www.sumologic.com
© Copyright 2019 Sumo Logic, Inc. All rights reserved. Sumo Logic, Elastic Log Processing, LogReduce, Push Analytics and Big Data
for Real-Time IT are trademarks of Sumo Logic, Inc. All other company and product names mentioned herein may be trademarks of
their respective owners. Updated 09/19