Scaling Pods - Kubernetes Horizontal Pod Autoscaling (HPA) Complete Guide

What is Horizontal Pod Autoscaling?

HPA Overview

Automatically scales the number of pods in a deployment, replica set, or stateful set
Uses the Kubernetes Metrics Server to collect resource metrics
Scales based on CPU utilization, memory usage, or custom metrics
Works within defined minimum and maximum replica boundaries
Essential for handling variable workloads efficiently

Pod

Application

→

Metrics Server

→

HPA Controller

→

Pod

Scaled Application

Prerequisites

Pods must have resource requests and limits defined for HPA to work properly

How HPA Works

HPA Operation

Metrics Collection: HPA checks the Metrics Server every 30 seconds
Target Calculation: Calculates desired replica count based on metrics
Scaling Decision: Scales according to min and max replica boundaries
Cooldown Periods: Prevents rapid scaling oscillations
Continuous Monitoring: Constantly evaluates and adjusts as needed

Metrics Server

The Metrics Server is a cluster-wide aggregator of resource usage data. It collects metrics from each node and makes them available through the Metrics API.

Cooldown / Delay Periods

Scale Up

Event Occurs

3 min delay

Scale Down

Event Occurs

5 min delay

Preventing Racing Conditions

Once a change has been made, HPA waits before making another
Default scale up delay: 3 minutes
Default scale down delay: 5 minutes
Prevents rapid oscillations and provides stability

HPA kubectl Commands

Imperative Creation

                            kubectl autoscale deployment [name] \

                              --cpu-percent=50 --min=3 --max=10

Create HPA imperatively with CPU target

Declarative Creation

kubectl apply -f [hpa.yaml]

Create HPA from YAML manifest file

Get HPA Status

kubectl get hpa [name]

Get the autoscaler status and current metrics

Delete HPA

                            kubectl delete -f [hpa.yaml]

                            kubectl delete hpa [name]

Delete HPA using YAML file or resource name

Complete HPA Examples

Deployment with Resource Limits

apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: webapp
        image: nginx:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60

Simple HPA Example (v1)

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: simple-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 8
  targetCPUUtilizationPercentage: 70

Key Configuration Points

scaleTargetRef: Reference to the deployment to scale
minReplicas: Minimum number of pods (safety net)
maxReplicas: Maximum number of pods (cost control)
targetCPUUtilizationPercentage: CPU usage target (v1)
metrics: Multiple metric targets (v2)

HPA v2 Advantages

Support for multiple metrics (CPU, memory, custom)
More flexible scaling behavior configuration
Better control over scaling speed and stabilization
Current recommended version

Advanced HPA Configuration

Custom Metrics HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metrics-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        type: AverageValue
        averageValue: 1k
  - type: Object
    object:
      metric:
        name: requests-per-second
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: main-route
      target:
        type: Value
        value: 10k

Scaling Behavior Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa-behavior
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Pods
        value: 4
        periodSeconds: 60
      - type: Percent
        value: 100
        periodSeconds: 60
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 2
        periodSeconds: 90
      - type: Percent
        value: 50
        periodSeconds: 90
      selectPolicy: Max

Multiple Metric Types

Resource Metrics

CPU utilization
Memory usage
Most common use case
Built-in support

Pods Metrics

Custom metrics from pods
Requests per second
Queue length
Requires metrics adapter

Object Metrics

Metrics from other objects
Ingress request rate
Custom resource metrics
Advanced use cases

HPA Best Practices

Configuration & Tuning

Set appropriate min/max replicas: Balance availability and cost
Use realistic CPU targets: 50-80% is typical for web applications
Configure resource requests properly: HPA calculations depend on these
Use HPA v2: More features and better control than v1
Test scaling behavior: Validate under different load patterns
Monitor HPA decisions: Use kubectl get hpa regularly

Operational Guidelines

Install Metrics Server: Essential for HPA to function
Use custom metrics for app-specific scaling: Better than CPU for many workloads
Configure cooldown periods appropriately: Prevent thrashing
Combine with Cluster Autoscaler: For complete scaling solution
Set up alerts: Monitor when HPA reaches max replicas
Document scaling policies: Team understanding is crucial

Important: HPA requires the Metrics Server to be installed in your cluster. Most managed Kubernetes services include it by default, but self-managed clusters may need manual installation.

Kubernetes Horizontal Pod Autoscaling - Complete Scaling Guide

What is Horizontal Pod Autoscaling?

HPA Overview

Prerequisites

How HPA Works

HPA Operation

Metrics Server

Cooldown / Delay Periods

Preventing Racing Conditions

HPA kubectl Commands

Imperative Creation

Declarative Creation

Get HPA Status

Delete HPA

Complete HPA Examples

Deployment with Resource Limits

HPA Configuration

Simple HPA Example (v1)

Key Configuration Points

HPA v2 Advantages

Advanced HPA Configuration

Custom Metrics HPA

Scaling Behavior Configuration

Multiple Metric Types

Resource Metrics

Pods Metrics

Object Metrics

HPA Best Practices

Configuration & Tuning

Operational Guidelines

Troubleshooting Common Issues

HPA Not Scaling

Rapid Scaling Oscillations