Kubernetes Horizontal Pod Autoscaling - Complete Scaling Guide

Comprehensive tutorial on Kubernetes Horizontal Pod Autoscaling (HPA) including configuration, metrics server setup, scaling behavior, and best practices

What is Horizontal Pod Autoscaling?

HPA Overview

  • Automatically scales the number of pods in a deployment, replica set, or stateful set
  • Uses the Kubernetes Metrics Server to collect resource metrics
  • Scales based on CPU utilization, memory usage, or custom metrics
  • Works within defined minimum and maximum replica boundaries
  • Essential for handling variable workloads efficiently
Pod
Pod
Application
Metrics Server
HPA Controller
Pod
Pod
Pod
Pod
Scaled Application

Prerequisites

Pods must have resource requests and limits defined for HPA to work properly

How HPA Works

HPA Operation

  • Metrics Collection: HPA checks the Metrics Server every 30 seconds
  • Target Calculation: Calculates desired replica count based on metrics
  • Scaling Decision: Scales according to min and max replica boundaries
  • Cooldown Periods: Prevents rapid scaling oscillations
  • Continuous Monitoring: Constantly evaluates and adjusts as needed

Metrics Server

The Metrics Server is a cluster-wide aggregator of resource usage data. It collects metrics from each node and makes them available through the Metrics API.

Cooldown / Delay Periods

Scale Up

Event Occurs

3 min delay

Scale Down

Event Occurs

5 min delay

Preventing Racing Conditions

  • Once a change has been made, HPA waits before making another
  • Default scale up delay: 3 minutes
  • Default scale down delay: 5 minutes
  • Prevents rapid oscillations and provides stability

HPA kubectl Commands

Imperative Creation

kubectl autoscale deployment [name] \
  --cpu-percent=50 --min=3 --max=10

Create HPA imperatively with CPU target

Declarative Creation

kubectl apply -f [hpa.yaml]

Create HPA from YAML manifest file

Get HPA Status

kubectl get hpa [name]

Get the autoscaler status and current metrics

Delete HPA

kubectl delete -f [hpa.yaml]
kubectl delete hpa [name]

Delete HPA using YAML file or resource name

Complete HPA Examples

Deployment with Resource Limits

apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: webapp
        image: nginx:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60

Simple HPA Example (v1)

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: simple-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 8
  targetCPUUtilizationPercentage: 70

Key Configuration Points

  • scaleTargetRef: Reference to the deployment to scale
  • minReplicas: Minimum number of pods (safety net)
  • maxReplicas: Maximum number of pods (cost control)
  • targetCPUUtilizationPercentage: CPU usage target (v1)
  • metrics: Multiple metric targets (v2)

HPA v2 Advantages

  • Support for multiple metrics (CPU, memory, custom)
  • More flexible scaling behavior configuration
  • Better control over scaling speed and stabilization
  • Current recommended version

Advanced HPA Configuration

Custom Metrics HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metrics-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        type: AverageValue
        averageValue: 1k
  - type: Object
    object:
      metric:
        name: requests-per-second
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: main-route
      target:
        type: Value
        value: 10k

Scaling Behavior Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa-behavior
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Pods
        value: 4
        periodSeconds: 60
      - type: Percent
        value: 100
        periodSeconds: 60
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 2
        periodSeconds: 90
      - type: Percent
        value: 50
        periodSeconds: 90
      selectPolicy: Max

Multiple Metric Types

Resource Metrics

  • CPU utilization
  • Memory usage
  • Most common use case
  • Built-in support

Pods Metrics

  • Custom metrics from pods
  • Requests per second
  • Queue length
  • Requires metrics adapter

Object Metrics

  • Metrics from other objects
  • Ingress request rate
  • Custom resource metrics
  • Advanced use cases

HPA Best Practices

Configuration & Tuning

  • Set appropriate min/max replicas: Balance availability and cost
  • Use realistic CPU targets: 50-80% is typical for web applications
  • Configure resource requests properly: HPA calculations depend on these
  • Use HPA v2: More features and better control than v1
  • Test scaling behavior: Validate under different load patterns
  • Monitor HPA decisions: Use kubectl get hpa regularly

Operational Guidelines

  • Install Metrics Server: Essential for HPA to function
  • Use custom metrics for app-specific scaling: Better than CPU for many workloads
  • Configure cooldown periods appropriately: Prevent thrashing
  • Combine with Cluster Autoscaler: For complete scaling solution
  • Set up alerts: Monitor when HPA reaches max replicas
  • Document scaling policies: Team understanding is crucial

Important: HPA requires the Metrics Server to be installed in your cluster. Most managed Kubernetes services include it by default, but self-managed clusters may need manual installation.

Troubleshooting Common Issues

HPA Not Scaling
  • Check if Metrics Server is running
  • Verify resource requests are set
  • Check HPA status with kubectl get hpa
  • Look at pod metrics with kubectl top pods
Rapid Scaling Oscillations
  • Increase stabilization window
  • Adjust scaling policies
  • Review metric collection frequency
  • Consider application load patterns