Kubernetes Observability - Health Probes Complete Guide

Comprehensive tutorial on Kubernetes Health Probes including Liveness, Readiness, and Startup probes with practical YAML examples and configuration best practices

What are Kubernetes Probes?

Probing the Container

  • The kubelet checks containers periodically using probes
  • Probes determine the health and readiness of containers
  • Three types of probes: Startup, Readiness, and Liveness
  • Each probe can use different action types to check health
  • Essential for maintaining application reliability
Kubelet
Probe
Container
Response

Startup Probe

To know when a container has started

Readiness Probe

To know when a container is ready to accept traffic

Liveness Probe

Indicates whether the code is running or not

Probe Types Explained

Startup Probe

Used for slow-starting containers to determine when the application has successfully started.

  • Disables liveness and readiness checks until it succeeds
  • Useful for legacy applications with long startup times
  • Prevents killing containers during initialization

Readiness Probe

Determines if a container is ready to serve requests.

  • A failing readiness probe stops traffic to the pod
  • Container remains running but not receiving traffic
  • Essential for rolling updates and load balancing

Liveness Probe

Determines if the container is running properly.

  • A failing liveness probe restarts the container
  • Detects deadlocks and hung applications
  • Ensures application remains responsive

Important: A failing readiness probe will stop the application from receiving traffic. A failing liveness probe will restart the container.

Probe Action Types

ExecAction

Execute a command inside the container

exec:
  command:
  - cat
  - /app/healthy

TCPSocketAction

Check if a TCP socket port is open

tcpSocket:
  port: 8080

HTTPGetAction

Performs an HTTP GET against a specific port and path

httpGet:
  path: /healthz
  port: 8080

HTTPGet Additional Options

httpGet:
  path: /health
  port: 8080
  host: 127.0.0.1
  scheme: HTTPS
  httpHeaders:
  - name: Custom-Header
    value: Awesome

Probe Configuration Parameters

  • initialDelaySeconds: Delay before first probe
  • periodSeconds: How often to probe
  • timeoutSeconds: Probe timeout
  • successThreshold: Consecutive successes needed
  • failureThreshold: Consecutive failures allowed

Complete Probes Example

apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: k8s.gcr.io/goproxy:0.1
    ports:
    - containerPort: 8080
    
    # Startup Probe - for slow starting containers
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
      failureThreshold: 3
      periodSeconds: 10
    
    # Readiness Probe - when container is ready for traffic
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    
    # Liveness Probe - if container is running properly
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

Startup Probe

  • HTTP GET to /healthz on port 8080
  • Checks every 10 seconds
  • Allows 3 failures before giving up
  • Disables other probes until successful

Readiness Probe

  • TCP socket check on port 8080
  • Starts after 5 seconds
  • Checks every 10 seconds
  • If fails, stops traffic to the pod

Liveness Probe

  • TCP socket check on port 8080
  • Starts after 15 seconds
  • Checks every 20 seconds
  • If fails, restarts the container

Advanced Probe Examples

Exec Action Example

apiVersion: v1
kind: Pod
metadata:
  name: postgres-db
spec:
  containers:
  - name: postgres
    image: postgres:13
    env:
    - name: POSTGRES_PASSWORD
      value: "secret"
    livenessProbe:
      exec:
        command:
        - sh
        - -c
        - exec pg_isready -U postgres
      initialDelaySeconds: 30
      periodSeconds: 10
    readinessProbe:
      exec:
        command:
        - sh
        - -c
        - exec pg_isready -U postgres
      initialDelaySeconds: 5
      periodSeconds: 5

HTTP Get with Headers Example

apiVersion: v1
kind: Pod
metadata:
  name: web-application
spec:
  containers:
  - name: webapp
    image: nginx:latest
    livenessProbe:
      httpGet:
        path: /health
        port: 80
        httpHeaders:
        - name: X-Custom-Auth
          value: "Bearer token123"
        - name: Accept
          value: application/json
      initialDelaySeconds: 10
      periodSeconds: 5
      timeoutSeconds: 2
    readinessProbe:
      httpGet:
        path: /ready
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5
      successThreshold: 2
      failureThreshold: 3

Database with All Three Probes

apiVersion: v1
kind: Pod
metadata:
  name: mysql-database
spec:
  containers:
  - name: mysql
    image: mysql:8.0
    env:
    - name: MYSQL_ROOT_PASSWORD
      value: "secret123"
    startupProbe:
      exec:
        command:
        - sh
        - -c
        - mysqladmin ping -h localhost -uroot -p${MYSQL_ROOT_PASSWORD}
      failureThreshold: 30
      periodSeconds: 10
    readinessProbe:
      exec:
        command:
        - sh
        - -c
        - mysql -e 'SELECT 1' -uroot -p${MYSQL_ROOT_PASSWORD}
      initialDelaySeconds: 5
      periodSeconds: 5
    livenessProbe:
      tcpSocket:
        port: 3306
      initialDelaySeconds: 30
      periodSeconds: 10

Probes Best Practices

Configuration Guidelines

  • Set appropriate initial delays: Allow applications to start properly
  • Use conservative timeouts: Avoid false positives from slow responses
  • Configure realistic periods: Balance between responsiveness and load
  • Use startup probes for slow applications: Prevent unnecessary restarts
  • Make readiness probes lightweight: They run frequently during traffic
  • Test failure scenarios: Ensure probes work as expected

Application Design

  • Implement proper health endpoints: /healthz, /readyz, /livez
  • Make health checks independent: Don't depend on external services
  • Include dependency checks in readiness: Database, cache, etc.
  • Keep liveness checks simple: Basic "is the process running" check
  • Use different endpoints: Separate health, readiness, and liveness
  • Log probe failures: Help with debugging issues

Probe Configuration Recommendations

Startup Probe
  • failureThreshold: 30
  • periodSeconds: 10
  • Use for apps taking >5min to start
Readiness Probe
  • periodSeconds: 5-10
  • timeoutSeconds: 1-3
  • failureThreshold: 3
Liveness Probe
  • periodSeconds: 10-30
  • initialDelaySeconds: 15-30
  • failureThreshold: 3

Warning: Avoid making liveness probes dependent on external services. If an external service fails, it could cause all your containers to restart in a cascade failure.