What is Horizontal Pod Autoscaling?
HPA Overview
- Automatically scales the number of pods in a deployment, replica set, or stateful set
- Uses the Kubernetes Metrics Server to collect resource metrics
- Scales based on CPU utilization, memory usage, or custom metrics
- Works within defined minimum and maximum replica boundaries
- Essential for handling variable workloads efficiently
Prerequisites
Pods must have resource requests and limits defined for HPA to work properly
How HPA Works
HPA Operation
- Metrics Collection: HPA checks the Metrics Server every 30 seconds
- Target Calculation: Calculates desired replica count based on metrics
- Scaling Decision: Scales according to min and max replica boundaries
- Cooldown Periods: Prevents rapid scaling oscillations
- Continuous Monitoring: Constantly evaluates and adjusts as needed
Metrics Server
The Metrics Server is a cluster-wide aggregator of resource usage data. It collects metrics from each node and makes them available through the Metrics API.
Cooldown / Delay Periods
Scale Up
Event Occurs
Scale Down
Event Occurs
Preventing Racing Conditions
- Once a change has been made, HPA waits before making another
- Default scale up delay: 3 minutes
- Default scale down delay: 5 minutes
- Prevents rapid oscillations and provides stability
HPA kubectl Commands
Imperative Creation
--cpu-percent=50 --min=3 --max=10
Create HPA imperatively with CPU target
Declarative Creation
Create HPA from YAML manifest file
Get HPA Status
Get the autoscaler status and current metrics
Delete HPA
kubectl delete hpa [name]
Delete HPA using YAML file or resource name
Complete HPA Examples
Deployment with Resource Limits
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
replicas: 3
selector:
matchLabels:
app: webapp
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: webapp
image: nginx:latest
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
HPA Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
Simple HPA Example (v1)
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: simple-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 8
targetCPUUtilizationPercentage: 70
Key Configuration Points
- scaleTargetRef: Reference to the deployment to scale
- minReplicas: Minimum number of pods (safety net)
- maxReplicas: Maximum number of pods (cost control)
- targetCPUUtilizationPercentage: CPU usage target (v1)
- metrics: Multiple metric targets (v2)
HPA v2 Advantages
- Support for multiple metrics (CPU, memory, custom)
- More flexible scaling behavior configuration
- Better control over scaling speed and stabilization
- Current recommended version
Advanced HPA Configuration
Custom Metrics HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metrics-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: requests-per-second
target:
type: AverageValue
averageValue: 1k
- type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: main-route
target:
type: Value
value: 10k
Scaling Behavior Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa-behavior
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 4
periodSeconds: 60
- type: Percent
value: 100
periodSeconds: 60
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 2
periodSeconds: 90
- type: Percent
value: 50
periodSeconds: 90
selectPolicy: Max
Multiple Metric Types
Resource Metrics
- CPU utilization
- Memory usage
- Most common use case
- Built-in support
Pods Metrics
- Custom metrics from pods
- Requests per second
- Queue length
- Requires metrics adapter
Object Metrics
- Metrics from other objects
- Ingress request rate
- Custom resource metrics
- Advanced use cases
HPA Best Practices
Configuration & Tuning
- Set appropriate min/max replicas: Balance availability and cost
- Use realistic CPU targets: 50-80% is typical for web applications
- Configure resource requests properly: HPA calculations depend on these
- Use HPA v2: More features and better control than v1
- Test scaling behavior: Validate under different load patterns
- Monitor HPA decisions: Use kubectl get hpa regularly
Operational Guidelines
- Install Metrics Server: Essential for HPA to function
- Use custom metrics for app-specific scaling: Better than CPU for many workloads
- Configure cooldown periods appropriately: Prevent thrashing
- Combine with Cluster Autoscaler: For complete scaling solution
- Set up alerts: Monitor when HPA reaches max replicas
- Document scaling policies: Team understanding is crucial
Important: HPA requires the Metrics Server to be installed in your cluster. Most managed Kubernetes services include it by default, but self-managed clusters may need manual installation.
Troubleshooting Common Issues
HPA Not Scaling
- Check if Metrics Server is running
- Verify resource requests are set
- Check HPA status with kubectl get hpa
- Look at pod metrics with kubectl top pods
Rapid Scaling Oscillations
- Increase stabilization window
- Adjust scaling policies
- Review metric collection frequency
- Consider application load patterns