HPA Auto-Scaling
In this tutorial you will deploy a CPU-intensive workload, configure a Horizontal Pod Autoscaler (HPA), generate load against it, and watch Kubernetes automatically scale pods up in response. When the load stops, you will observe pods scale back down. Metrics Server — pre-installed by kinder — powers the CPU metrics that make this possible.
Prerequisites
Section titled “Prerequisites”- kinder installed
- Docker (or Podman) installed and running
kubectlinstalled and on PATH
Step 1: Create the cluster
Section titled “Step 1: Create the cluster”kinder create clusterStep 2: Deploy a CPU-intensive application
Section titled “Step 2: Deploy a CPU-intensive application”The registry.k8s.io/hpa-example image runs a PHP script that performs CPU-intensive calculations on every request — it is the standard Kubernetes HPA demo workload.
apiVersion: apps/v1kind: Deploymentmetadata: name: php-apachespec: replicas: 1 selector: matchLabels: app: php-apache template: metadata: labels: app: php-apache spec: containers: - name: php-apache image: registry.k8s.io/hpa-example ports: - containerPort: 80 resources: requests: cpu: "200m" limits: cpu: "500m"---apiVersion: v1kind: Servicemetadata: name: php-apachespec: selector: app: php-apache ports: - port: 80 targetPort: 80Save this as php-apache.yaml and apply it:
kubectl apply -f php-apache.yamlWait for the pod to be ready:
kubectl get podsExpected output:
NAME READY STATUS RESTARTS AGEphp-apache-7d8b9c6f5-xk4pq 1/1 Running 0 30sStep 3: Create a Horizontal Pod Autoscaler
Section titled “Step 3: Create a Horizontal Pod Autoscaler”Create an HPA that targets the php-apache deployment, scaling between 1 and 5 replicas when CPU utilization crosses 50%.
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: php-apache-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: php-apache minReplicas: 1 maxReplicas: 5 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50Save this as php-apache-hpa.yaml and apply it:
kubectl apply -f php-apache-hpa.yamlVerify the HPA was created:
kubectl get hpa php-apache-hpaExpected output (TARGETS may show <unknown>/50% initially):
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEphp-apache-hpa Deployment/php-apache <unknown>/50% 1 5 1 10sStep 4: Verify the HPA is reading metrics
Section titled “Step 4: Verify the HPA is reading metrics”Wait 60 seconds for Metrics Server to collect its first scrape cycle, then check the HPA again:
kubectl get hpa php-apache-hpaExpected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEphp-apache-hpa Deployment/php-apache 8%/50% 1 5 1 90sStep 5: Generate load
Section titled “Step 5: Generate load”In a separate terminal, start a load-generating pod inside the cluster. This pod runs a continuous loop that sends requests to the php-apache service, driving up its CPU usage:
kubectl run load-generator \ --image=busybox:1.28 \ --restart=Never \ -- /bin/sh -c "while true; do wget -qO- http://php-apache; done"The load-generator pod runs inside the cluster and can reach php-apache directly by service name. The continuous wget loop generates sustained CPU load on the PHP containers.
Verify the load generator is running:
kubectl get pod load-generatorExpected output:
NAME READY STATUS RESTARTS AGEload-generator 1/1 Running 0 15sStep 6: Watch pods scale up
Section titled “Step 6: Watch pods scale up”In a separate terminal, watch the HPA as it responds to the increasing CPU utilization:
kubectl get hpa php-apache-hpa --watchExpected output (over several minutes as load builds):
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEphp-apache-hpa Deployment/php-apache 8%/50% 1 5 1 2mphp-apache-hpa Deployment/php-apache 62%/50% 1 5 1 2m30sphp-apache-hpa Deployment/php-apache 62%/50% 1 5 2 2m45sphp-apache-hpa Deployment/php-apache 98%/50% 1 5 2 3mphp-apache-hpa Deployment/php-apache 98%/50% 1 5 4 3m15sphp-apache-hpa Deployment/php-apache 51%/50% 1 5 4 3m45sphp-apache-hpa Deployment/php-apache 48%/50% 1 5 4 4mTARGETS rises above 50%, the HPA increases REPLICAS, and CPU per pod drops as the load is spread across more instances. The exact numbers will differ based on your machine’s speed.
You can also watch the pods being created in real time:
kubectl get pods --watchExpected output (scaling up):
NAME READY STATUS RESTARTS AGEphp-apache-7d8b9c6f5-xk4pq 1/1 Running 0 4mphp-apache-7d8b9c6f5-bv9mz 1/1 Running 0 45sphp-apache-7d8b9c6f5-cr7jl 1/1 Running 0 45sphp-apache-7d8b9c6f5-dt2wn 1/1 Running 0 45sStep 7: Stop load and observe scale-down
Section titled “Step 7: Stop load and observe scale-down”Delete the load-generator pod to stop the traffic:
kubectl delete pod load-generatorExpected output:
pod "load-generator" deletedWait a few minutes, then check the HPA:
kubectl get hpa php-apache-hpaExpected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEphp-apache-hpa Deployment/php-apache 0%/50% 1 5 1 12mClean up
Section titled “Clean up”kinder delete clusterThis removes the cluster, all pods, and the HPA. No other cleanup is required.