Kubernetes runs containerized applications on a cluster of machines and keeps them in the state you describe. It does this by placing work on the right machines, routing traffic to the right places, and watching for failures and changes.
The basic flow
1. You describe what you want to run
Most Kubernetes workloads start as a declared “desired state” (what should be running, how many copies, and how they should be exposed). Kubernetes is built around declarative configuration and automation.
2. Kubernetes decides where it should run
Kubernetes schedules containers onto machines in the cluster based on available compute resources and what each container needs. Containers run inside Pods, which is the unit Kubernetes places on a machine.
3. Kubernetes keeps checking reality vs. your desired state
Controllers watch the cluster and work to move the current state closer to the desired state, using the API server to make changes.
Container scheduling and day-to-day management
Scheduling is the “where should this run?” decision.
1. Pods are scheduled, not individual containers
Kubernetes groups containers into Pods and then places those Pods on machines.
2. The scheduler assigns Pods to a suitable node
The kube-scheduler looks for Pods that aren’t assigned yet and selects a node for them.
3. Node agents keep the Pods running
On each node, kubelet makes sure the Pods are running (including their containers).
Load balancing and service discovery
Containers and Pods can be created, moved, or replaced, so applications need stable ways to find each other.
Service discovery and load balancing are built-in behaviors
Kubernetes manages service discovery and uses load balancing so traffic can be routed even as Pods change over time.
Services provide a stable address for a changing set of Pods
The Service API provides a stable IP address or host name for a service backed by one or more Pods, and Kubernetes tracks the backing Pods through EndpointSlice objects.
Traffic routing updates as Pods change
When Pods behind a service change, the service routing adapts so traffic continues to reach current back ends.
Scaling applications (and why “desired state” matters)
Kubernetes can scale workloads toward the state you set, including scaling based on compute utilization.
Common scaling ideas include:
More replicas (more Pods) to handle higher demand.
Fewer replicas when demand drops.
Resource tracking so placement decisions reflect CPU and memory needs.
This ties back to the “desired state” model: you specify the target, and controllers keep working toward it.
Self-healing: What happens when something breaks
Kubernetes includes self-healing behaviors that aim to maintain workload health and availability. These include:
Restarting failed containers (container-level restarts).
Replacing failed Pods to keep the requested number of replicas (replica replacement).
Rescheduling workloads when nodes become unavailable.
Removing failed Pods from service endpoints so traffic goes only to healthy Pods (load balancing for services).
Self-healing checks container health and restarts or replicates them when problems occur.
The role of Kubernetes KPIs
Key performance indicators (KPIs, or metrics) are used to understand cluster health and workload behavior.
Where KPIs come from
Kubernetes system components emit metrics (Prometheus format) that are useful for dashboards and alerts.
Metrics are typically available on a component’s /metrics HTTP endpoint, including components such as kube-apiserver, kube-scheduler, kubelet, kube-proxy, and kube-controller-manager.
Examples of what KPIs help you spot
Cluster health signals (component-level metrics and error patterns)
Workload stability (for example, frequent restarts or replacements)
Capacity pressure (resource allocation vs. demand, tied to scaling decisions)
Why this matters in day-to-day operations
Monitoring gives teams a more complete view of cluster resources, the Kubernetes API, containers, and logs, which shortens the feedback loop between issues and fixes.