Autoscaling

July 17, 2025 4 min read DevOps Cloud-Computing Containers Docs IBM-DevOps Autoscaling

This document explains Kubernetes autoscaling, including HPA, VPA, and CA their mechanisms, configuration, and best practices for optimizing resource usage and cost. It covers how each autoscaler works and when to use them.

On this page

Kubernetes autoscaling optimizes resource usage and cost by automatically adjusting pods and nodes based on demand. This document covers HPA, VPA, and CA, their configuration, and practical examples for efficient scaling.

Introduction to Autoscaling

Autoscaling in Kubernetes enables dynamic adjustment of resources to match workload demand, improving efficiency and reducing costs. It operates at both the pod and cluster levels, using different types of autoscalers.

Types of Kubernetes Autoscalers

Kubernetes provides three main autoscalers:

Autoscaler	Function
Horizontal Pod Autoscaler (HPA)	Adjusts the number of pod replicas based on metrics like CPU or memory usage.
Vertical Pod Autoscaler (VPA)	Adjusts resource requests and limits for containers, scaling up or down the resources allocated to pods.
Cluster Autoscaler (CA)	Adjusts the number of nodes in the cluster when pods cannot be scheduled due to resource constraints.

Horizontal Pod Autoscaler (HPA)

HPA automatically increases or decreases the number of running pods in response to workload changes. It uses metrics such as CPU or memory utilization and is configured with minimum and maximum replica counts.

Example: HPA Command

1kubectl autoscale deployment nginx --min=2 --max=5 --cpu-percent=50

This command sets the minimum number of pods to 2, maximum to 5, and triggers scaling when average CPU usage reaches 50%.

Example: HPA YAML

 1apiVersion: autoscaling/v2
 2kind: HorizontalPodAutoscaler
 3metadata:
 4  name: nginx-hpa
 5spec:
 6  scaleTargetRef:
 7    apiVersion: apps/v1
 8    kind: Deployment
 9    name: nginx
10  minReplicas: 2
11  maxReplicas: 5
12  metrics:
13    - type: Resource
14      resource:
15        name: cpu
16        target:
17          type: Utilization
18          averageUtilization: 50

Vertical Pod Autoscaler (VPA)

VPA adjusts the resource requests and limits for containers, scaling up or down the CPU and memory allocated to pods. It is useful when horizontal scaling is not possible or ideal.

Example: VPA YAML

 1apiVersion: autoscaling.k8s.io/v1
 2kind: VerticalPodAutoscaler
 3metadata:
 4  name: nginx-vpa
 5spec:
 6  targetRef:
 7    apiVersion: 'apps/v1'
 8    kind: Deployment
 9    name: nginx
10  updatePolicy:
11    updateMode: 'Auto'

Cluster Autoscaler (CA)

CA automatically adjusts the number of nodes in the cluster. When pods cannot be scheduled due to insufficient resources, CA adds nodes; when demand drops, it removes nodes to optimize costs.

Best Practices for Autoscaling

Use HPA for stateless workloads and dynamic scaling.
Use VPA for workloads that require vertical scaling.
Use CA to ensure pods are scheduled efficiently across nodes.
Avoid using HPA and VPA together on CPU/memory metrics; use custom metrics if needed.
Monitor autoscaler performance and adjust thresholds as required.

Common Autoscaling Commands

The following commands are frequently used to manage and monitor autoscaling in Kubernetes. Each command is shown with its description and purpose.

Command	Description	Purpose
kubectl get hpa	Lists all Horizontal Pod Autoscalers in the current namespace.	Monitor the status and configuration of HPA resources.
kubectl describe hpa	Shows detailed information about a specific HPA resource.	View metrics, scaling events, and configuration for an HPA.
kubectl autoscale deployment	Creates or updates an HPA for a deployment.	Enable horizontal scaling for a deployment.
kubectl get vpa	Lists all Vertical Pod Autoscalers in the current namespace.	Monitor the status and configuration of VPA resources.
kubectl describe vpa	Shows details about a specific VPA resource.	View recommendations and updates for pod resources.
kubectl get nodes	Lists all nodes in the cluster.	Monitor available nodes for cluster autoscaling.
kubectl describe node	Shows details about a specific node.	Inspect node resources and scheduling status.

Conclusion

Kubernetes autoscaling provides flexible, efficient resource management at both pod and cluster levels. Understanding and configuring HPA, VPA, and CA enables optimal scaling and cost control for diverse workloads.

FAQ

Autoscaling automatically adjusts resources such as pods and nodes to match workload demand, optimizing efficiency and cost.

The Horizontal Pod Autoscaler (HPA) increases or decreases pod replicas in response to metrics like CPU or memory utilization.

VPA adjusts resource requests and limits for containers, scaling up or down the CPU and memory allocated to pods.

CA is used when pods cannot be scheduled due to insufficient node resources, automatically adding or removing nodes as needed.

Use HPA for stateless workloads
Use VPA for vertical scaling
Use HPA and VPA together on CPU/memory metrics
Monitor autoscaler performance

(3) Using HPA and VPA together on CPU/memory metrics is not recommended; use custom metrics if needed.

Autoscaler	Function
A. HPA	1. Adjusts pod replicas based on metrics
B. VPA	2. Adjusts resource requests and limits for pods
C. CA	3. Adjusts the number of nodes in the cluster

A-1, B-2, C-3.

HPA and VPA should not be used together on CPU or memory metrics in Kubernetes.

True. Using HPA and VPA together on CPU or memory metrics can cause conflicts; use custom metrics if both are needed.

Replicaset

Rolling Updates

Browse Courses

Autoscaling

Introduction to Autoscaling

Types of Kubernetes Autoscalers

Horizontal Pod Autoscaler (HPA)

Example: HPA Command

Example: HPA YAML

Vertical Pod Autoscaler (VPA)

Example: VPA YAML

Cluster Autoscaler (CA)

Best Practices for Autoscaling

Common Autoscaling Commands

Conclusion

FAQ

What is the main purpose of autoscaling in Kubernetes?

Which autoscaler adjusts the number of pod replicas based on CPU or memory usage?

What is the function of the Vertical Pod Autoscaler (VPA)?

When should Cluster Autoscaler (CA) be used?

Which of the following is not a best practice for autoscaling?

Match the following autoscaler types with their functions

True or False