Skip to content

HPA Auto-Scaling

In this tutorial you will deploy a CPU-intensive workload, configure a Horizontal Pod Autoscaler (HPA), generate load against it, and watch Kubernetes automatically scale pods up in response. When the load stops, you will observe pods scale back down. Metrics Server — pre-installed by kinder — powers the CPU metrics that make this possible.

  • kinder installed
  • Docker (or Podman) installed and running
  • kubectl installed and on PATH
Terminal window
kinder create cluster

Step 2: Deploy a CPU-intensive application

Section titled “Step 2: Deploy a CPU-intensive application”

The registry.k8s.io/hpa-example image runs a PHP script that performs CPU-intensive calculations on every request — it is the standard Kubernetes HPA demo workload.

apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
replicas: 1
selector:
matchLabels:
app: php-apache
template:
metadata:
labels:
app: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
requests:
cpu: "200m"
limits:
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
spec:
selector:
app: php-apache
ports:
- port: 80
targetPort: 80

Save this as php-apache.yaml and apply it:

Terminal window
kubectl apply -f php-apache.yaml

Wait for the pod to be ready:

Terminal window
kubectl get pods

Expected output:

NAME READY STATUS RESTARTS AGE
php-apache-7d8b9c6f5-xk4pq 1/1 Running 0 30s

Step 3: Create a Horizontal Pod Autoscaler

Section titled “Step 3: Create a Horizontal Pod Autoscaler”

Create an HPA that targets the php-apache deployment, scaling between 1 and 5 replicas when CPU utilization crosses 50%.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50

Save this as php-apache-hpa.yaml and apply it:

Terminal window
kubectl apply -f php-apache-hpa.yaml

Verify the HPA was created:

Terminal window
kubectl get hpa php-apache-hpa

Expected output (TARGETS may show <unknown>/50% initially):

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache-hpa Deployment/php-apache <unknown>/50% 1 5 1 10s

Wait 60 seconds for Metrics Server to collect its first scrape cycle, then check the HPA again:

Terminal window
kubectl get hpa php-apache-hpa

Expected output:

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache-hpa Deployment/php-apache 8%/50% 1 5 1 90s

In a separate terminal, start a load-generating pod inside the cluster. This pod runs a continuous loop that sends requests to the php-apache service, driving up its CPU usage:

Terminal window
kubectl run load-generator \
--image=busybox:1.28 \
--restart=Never \
-- /bin/sh -c "while true; do wget -qO- http://php-apache; done"

The load-generator pod runs inside the cluster and can reach php-apache directly by service name. The continuous wget loop generates sustained CPU load on the PHP containers.

Verify the load generator is running:

Terminal window
kubectl get pod load-generator

Expected output:

NAME READY STATUS RESTARTS AGE
load-generator 1/1 Running 0 15s

In a separate terminal, watch the HPA as it responds to the increasing CPU utilization:

Terminal window
kubectl get hpa php-apache-hpa --watch

Expected output (over several minutes as load builds):

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache-hpa Deployment/php-apache 8%/50% 1 5 1 2m
php-apache-hpa Deployment/php-apache 62%/50% 1 5 1 2m30s
php-apache-hpa Deployment/php-apache 62%/50% 1 5 2 2m45s
php-apache-hpa Deployment/php-apache 98%/50% 1 5 2 3m
php-apache-hpa Deployment/php-apache 98%/50% 1 5 4 3m15s
php-apache-hpa Deployment/php-apache 51%/50% 1 5 4 3m45s
php-apache-hpa Deployment/php-apache 48%/50% 1 5 4 4m

TARGETS rises above 50%, the HPA increases REPLICAS, and CPU per pod drops as the load is spread across more instances. The exact numbers will differ based on your machine’s speed.

You can also watch the pods being created in real time:

Terminal window
kubectl get pods --watch

Expected output (scaling up):

NAME READY STATUS RESTARTS AGE
php-apache-7d8b9c6f5-xk4pq 1/1 Running 0 4m
php-apache-7d8b9c6f5-bv9mz 1/1 Running 0 45s
php-apache-7d8b9c6f5-cr7jl 1/1 Running 0 45s
php-apache-7d8b9c6f5-dt2wn 1/1 Running 0 45s

Delete the load-generator pod to stop the traffic:

Terminal window
kubectl delete pod load-generator

Expected output:

pod "load-generator" deleted

Wait a few minutes, then check the HPA:

Terminal window
kubectl get hpa php-apache-hpa

Expected output:

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache-hpa Deployment/php-apache 0%/50% 1 5 1 12m
Terminal window
kinder delete cluster

This removes the cluster, all pods, and the HPA. No other cleanup is required.