Skip to main content

Command Palette

Search for a command to run...

Day 58 – Metrics Server and Horizontal Pod Autoscaler (HPA)

Updated
3 min read
H
I am an MCA (Cloud Computing) student at MIT ADT University, focused on becoming a DevOps & Cloud Engineer who builds reliable, scalable, and automated systems. 🏆 5× Hackathon Winner — Hacktoberfest Hackathon, Innovyuh Hackathon – Solution Challenge, Build with AI (GDG Cloud Pune), Project Innovista (Research & Project Competition), NASA International Space Apps Challenge. I work at the intersection of development and operations, where I design, automate, and manage cloud infrastructure: • Cloud & Linux: AWS (EC2, S3, VPC) and Linux system management • Infrastructure as Code: Terraform & Ansible • Containers & Orchestration: Docker & Kubernetes • CI/CD: Jenkins & GitHub Actions • Monitoring & SRE: Prometheus, Grafana, ELK, OpenTelemetry Beyond building, I actively write about DevOps and Cloud to simplify complex concepts and share practical knowledge. I am currently looking for opportunities where I can contribute, learn from industry experts, and grow into a high-impact DevOps Engineer.

On Day 58 of my 90 Days of DevOps journey, I learned how Kubernetes automatically scales applications based on resource usage using Metrics Server and Horizontal Pod Autoscaler (HPA).

Yesterday I worked with resource requests and limits.

Today I learned how Kubernetes actually monitors CPU and memory usage in real time and automatically increases or decreases Pods when traffic changes.

This felt very close to real production Kubernetes environments.

Metrics Server

Today I installed the Kubernetes Metrics Server.

Metrics Server collects resource usage data like:

CPU usage
Memory usage

from nodes and Pods inside the cluster.

After enabling Metrics Server, I tested:

kubectl top nodes

and:

kubectl top pods -A

For the first time, I could see real-time CPU and memory usage inside my Kubernetes cluster.

I also learned:

kubectl top shows actual resource usage

while:

kubectl describe pod

shows configured requests and limits.

That difference was important.

Exploring kubectl top

Next, I explored different kubectl top commands.

I checked:

Node CPU usage
Node memory usage
Pod resource usage
Pods consuming highest CPU

using:

kubectl top pods -A --sort-by=cpu

This helped me understand how Kubernetes monitors workloads continuously.

Deployment with CPU Requests

To use HPA, Kubernetes needs CPU requests.

I created a Deployment using the:

registry.k8s.io/hpa-example

image.

Then I added:

resources.requests.cpu: 200m

Without CPU requests, HPA cannot calculate utilization percentages properly.

That is one of the most common mistakes while configuring HPA.

Horizontal Pod Autoscaler (HPA)

Next, I created an HPA using:

kubectl autoscale

The HPA was configured with:

minimum replicas: 1
maximum replicas: 10
target CPU utilization: 50%

Initially, the TARGETS column showed:

because Metrics Server needed some time to collect metrics.

After a short wait, Kubernetes started showing actual CPU utilization values.

This was really interesting to watch.

Generating Load and Auto Scaling

The most exciting part today was testing auto scaling practically.

I created a BusyBox load generator that continuously sent requests to the application.

As CPU usage increased above 50%:

Kubernetes automatically increased the number of replicas.

I watched the scaling in real time using:

kubectl get hpa --watch

The Deployment scaled from:

1 replica → multiple replicas

automatically.

After deleting the load generator Pod, Kubernetes slowly started scaling the Deployment back down.

That felt like real cloud infrastructure behavior.

Declarative HPA using YAML

Finally, I created the HPA using YAML with:

autoscaling/v2

I also learned about the behavior section.

The behavior section controls:

how fast Kubernetes scales up
how slowly Kubernetes scales down

This gives more control over scaling behavior in production environments.

I also learned the difference between:

autoscaling/v1

autoscaling/v2

v1 mainly supports CPU metrics.

v2 supports:

CPU
memory
custom metrics
advanced scaling behavior

What I Learned Today

Today I learned:

Metrics Server
kubectl top
Horizontal Pod Autoscaler
CPU-based auto scaling
Load testing
autoscaling/v1 vs v2
Scaling behavior configuration
Real-time Kubernetes monitoring

Final Thoughts

Today was one of the most interesting Kubernetes learning days so far.

Watching Kubernetes automatically increase and decrease Pods based on traffic felt very powerful.

This is how modern applications handle changing traffic in production environments.

Understanding HPA is helping me understand how scalable cloud-native applications work

See you on Day 59