Day 58 – Metrics Server and Horizontal Pod Autoscaler (HPA)
On Day 58 of my 90 Days of DevOps journey, I learned how Kubernetes automatically scales applications based on resource usage using Metrics Server and Horizontal Pod Autoscaler (HPA).
Yesterday I worked with resource requests and limits.
Today I learned how Kubernetes actually monitors CPU and memory usage in real time and automatically increases or decreases Pods when traffic changes.
This felt very close to real production Kubernetes environments.
Metrics Server
Today I installed the Kubernetes Metrics Server.
Metrics Server collects resource usage data like:
CPU usage
Memory usage
from nodes and Pods inside the cluster.
After enabling Metrics Server, I tested:
kubectl top nodes
and:
kubectl top pods -A
For the first time, I could see real-time CPU and memory usage inside my Kubernetes cluster.
I also learned:
kubectl top shows actual resource usage
while:
kubectl describe pod
shows configured requests and limits.
That difference was important.
Exploring kubectl top
Next, I explored different kubectl top commands.
I checked:
Node CPU usage
Node memory usage
Pod resource usage
Pods consuming highest CPU
using:
kubectl top pods -A --sort-by=cpu
This helped me understand how Kubernetes monitors workloads continuously.
Deployment with CPU Requests
To use HPA, Kubernetes needs CPU requests.
I created a Deployment using the:
registry.k8s.io/hpa-example
image.
Then I added:
resources.requests.cpu: 200m
Without CPU requests, HPA cannot calculate utilization percentages properly.
That is one of the most common mistakes while configuring HPA.
Horizontal Pod Autoscaler (HPA)
Next, I created an HPA using:
kubectl autoscale
The HPA was configured with:
minimum replicas: 1
maximum replicas: 10
target CPU utilization: 50%
Initially, the TARGETS column showed:
because Metrics Server needed some time to collect metrics.
After a short wait, Kubernetes started showing actual CPU utilization values.
This was really interesting to watch.
Generating Load and Auto Scaling
The most exciting part today was testing auto scaling practically.
I created a BusyBox load generator that continuously sent requests to the application.
As CPU usage increased above 50%:
Kubernetes automatically increased the number of replicas.
I watched the scaling in real time using:
kubectl get hpa --watch
The Deployment scaled from:
1 replica → multiple replicas
automatically.
After deleting the load generator Pod, Kubernetes slowly started scaling the Deployment back down.
That felt like real cloud infrastructure behavior.
Declarative HPA using YAML
Finally, I created the HPA using YAML with:
autoscaling/v2
I also learned about the behavior section.
The behavior section controls:
how fast Kubernetes scales up
how slowly Kubernetes scales down
This gives more control over scaling behavior in production environments.
I also learned the difference between:
autoscaling/v1
autoscaling/v2
v1 mainly supports CPU metrics.
v2 supports:
CPU
memory
custom metrics
advanced scaling behavior
What I Learned Today
Today I learned:
Metrics Server
kubectl top
Horizontal Pod Autoscaler
CPU-based auto scaling
Load testing
autoscaling/v1 vs v2
Scaling behavior configuration
Real-time Kubernetes monitoring
Final Thoughts
Today was one of the most interesting Kubernetes learning days so far.
Watching Kubernetes automatically increase and decrease Pods based on traffic felt very powerful.
This is how modern applications handle changing traffic in production environments.
Understanding HPA is helping me understand how scalable cloud-native applications work
