Kubernetes Cost Optimization: Cluster Right-Sizing 2026

This guide is part of the Cloud Cost Optimization: Enterprise FinOps Guide. Kubernetes has become the dominant platform for enterprise container workloads, but its flexible scheduling model creates a systematic cost problem: resource requests are set conservatively, nodes are over-provisioned to ensure pod scheduling succeeds, and idle capacity accumulates invisibly across clusters. Understanding the full K8s cost stack, from individual container resource requests to cluster-level autoscaling and managed service pricing, is prerequisite to meaningful optimisation. For commitment instrument strategy covering EKS, AKS, and GKE nodes, see the Reserved Instances vs Savings Plans guide.

In This Guide

The K8s Cost Architecture
Pod and Container Right-Sizing
Autoscaling Strategy
Node Pool Optimisation
Spot and Preemptible Node Pools
Namespace Quotas and LimitRanges
EKS, AKS, and GKE Cost Comparison
K8s Cost Tooling
Frequently Asked Questions

The K8s Cost Architecture

Kubernetes costs operate at four distinct levels, each requiring separate optimisation strategies. At the container level, CPU and memory requests determine how much of a node's capacity is "reserved" for a pod — even if the pod never actually uses that capacity. At the pod level, replica counts, resource limits, and scheduling constraints determine pack density on nodes. At the node level, instance type selection, autoscaling boundaries, and Spot vs on-demand mix determine the cost of raw compute. At the cluster level, managed control plane fees, networking costs, storage, and observability tooling add overhead that compounds at scale.

The Request vs Actual Usage Gap

The root cause of most Kubernetes cost waste is the gap between resource requests (what pods ask for) and actual resource usage (what pods consume). In typical enterprise clusters, CPU requests exceed actual CPU usage by 3–5x and memory requests exceed actual usage by 2–3x. This gap creates clusters that are "full" according to the scheduler, no new pods can be scheduled while the underlying nodes sit at 20–30% actual CPU utilisation. Right-sizing resource requests is the single highest-leverage Kubernetes cost optimisation action.

Pod and Container Right-Sizing

Pod right-sizing is the process of setting CPU and memory requests to values that accurately reflect actual resource consumption, with appropriate headroom for traffic spikes and garbage collection events. The standard approach is to collect 14–30 days of Prometheus metrics (or equivalent), identify the P95 CPU and memory utilisation for each container, and set requests to P95 + a safety margin (typically 10–20% for CPU, 20–30% for memory).

Expert Advisory

Want independent help negotiating better terms? We rank the top advisory firms across 14 vendor categories. Free matching, no commitment.

Get Matched with an Advisor → See Rankings →

Vertical Pod Autoscaler (VPA)

The Kubernetes Vertical Pod Autoscaler automates resource request right-sizing by analysing historical usage and recommending (or automatically applying) updated CPU and memory request values. VPA operates in three modes: Off (recommendations only, no automatic updates), Initial (requests set only at pod creation), and Auto (requests updated dynamically, requiring pod restarts). For production workloads, VPA in recommendation mode is safe to run immediately; Auto mode requires careful testing as pod restarts can impact availability.

VPA recommendations typically identify 30–50% CPU over-provisioning in enterprise clusters. The most common over-provisioned containers are Java applications (which are often given 2–4 CPU requests but rarely use more than 0.5–1 CPU) and microservices with conservative initial estimates that were never revisited after deployment.

VPA and HPA Conflict

VPA and Horizontal Pod Autoscaler (HPA) cannot safely operate on the same resource simultaneously. If HPA is scaling replicas based on CPU utilisation, and VPA is simultaneously changing CPU requests, the scaling signals interfere — VPA reduces requests, HPA sees lower utilisation per pod, and the system oscillates. The safe configuration: use HPA for CPU-based horizontal scaling and VPA for memory-based right-sizing only, or use a controller like KEDA that avoids the conflict entirely.

Autoscaling Strategy

Kubernetes autoscaling operates at two levels: Horizontal Pod Autoscaler (HPA) scales the number of pod replicas based on CPU, memory, or custom metrics; Cluster Autoscaler (CA) scales the number of nodes based on pending pod scheduling requirements. Optimising both layers is essential to minimise costs while maintaining availability.

Horizontal Pod Autoscaler Tuning

The most common HPA misconfiguration is setting target CPU utilisation too low, typically 50% — out of fear of latency spikes. This keeps pods running at half their capacity, requiring twice as many replicas (and nodes) as necessary. For most stateless web services and APIs, a target CPU utilisation of 70–80% is appropriate; response time at this level is typically indistinguishable from 50%. Set appropriate minReplicas to ensure availability during scale-up lag, and use KEDA (Kubernetes Event-Driven Autoscaling) for workloads driven by queue depth or custom metrics rather than CPU.

Cluster Autoscaler Optimisation

Cluster Autoscaler (CA) adds nodes when pods are pending due to insufficient capacity and removes nodes when they have been underutilised for a configurable period (default: 10 minutes). Key tuning parameters for cost optimisation: set scale-down-utilisation-threshold to 0.5 (default is 0.5, but many clusters run with higher values); reduce scale-down-delay-after-add to 10 minutes for environments where workloads are predictably bursty; enable expander=least-waste to prefer filling existing nodes over adding new ones. Karpenter (AWS-native) and Azure Karpenter provider offer faster, more cost-aware node provisioning than standard CA, with native Spot instance integration.

Node Pool Optimisation

Node pool design is one of the highest-impact K8s cost levers and one of the most frequently ignored. Most clusters start with a single node pool using a single instance type, which is convenient but suboptimal: different workload profiles (CPU-intensive, memory-intensive, batch, stateful) have different optimal instance types, and mixing them in a single pool means all workloads pay the price of the most conservative choice.

Free Resource

Get the IT Negotiation Playbook — free

Used by 4,200+ IT directors and procurement leads. Oracle, Microsoft, SAP, Cloud, all covered.

Workload Type	Recommended Instance Family	Key Characteristic	Pool Strategy
General-purpose web/API	AWS m7i, Azure D-series, GCP N2	Balanced CPU/memory ratio	On-demand + Spot mix
CPU-intensive (ML inference, encoding)	AWS c7i, Azure F-series, GCP C3	High CPU:memory ratio	On-demand for SLAs
Memory-intensive (caching, in-memory DB)	AWS r7i, Azure E-series, GCP M3	High memory:CPU ratio	On-demand (memory risk)
Batch / ML training	AWS p4/p5, Azure NC-series, GCP A3	GPU acceleration	Spot preferred (70%+ savings)
System/monitoring pods	AWS t3/t4g, Azure B-series, GCP E2	Burstable, low baseline	Small on-demand pool

Spot and Preemptible Node Pools

Spot Instances (AWS), Azure Spot VMs, and GCP Spot VMs offer 60–91% discounts versus on-demand pricing for the same instance types. For Kubernetes workloads, Spot node pools are one of the most impactful cost levers available — but require architecture accommodations to handle preemption gracefully.

Spot-Compatible Workload Patterns

The following workload types are well-suited to Spot/preemptible nodes: batch processing jobs, ML model training, CI/CD pipeline runners, development and staging environments, stateless microservices with multiple replicas (where losing one node doesn't impact availability), and data processing pipelines with checkpointing. Workloads that are not suitable for Spot: stateful single-instance databases, leader-elected controllers, services with strict single-digit millisecond latency SLAs, and anything where a 30-second shutdown notice cannot be gracefully handled.

The optimal architecture for Spot-enabled clusters uses a mixed node pool strategy: a small on-demand baseline pool (sized for minimum viable capacity) plus a Spot pool for burst and batch capacity. Node affinity and taints/tolerations route appropriate workloads to each pool. Kubernetes Pod Disruption Budgets (PDBs) ensure rolling Spot replacements don't take down more than a defined percentage of a deployment's replicas simultaneously.

Are your Kubernetes clusters costing more than they should?

Independent cloud cost advisors can audit your K8s cost profile and identify your highest-value optimisation opportunities.

Get Matched →

Namespace Quotas and LimitRanges

ResourceQuotas and LimitRanges are Kubernetes admission control mechanisms that constrain resource consumption at the namespace level. Without them, a single team can inadvertently (or deliberately) consume unlimited cluster resources, degrading performance for all other tenants and causing uncontrolled cost growth.

ResourceQuotas set hard caps on total CPU requests, memory requests, CPU limits, memory limits, and object counts (pods, services, PVCs) per namespace. LimitRanges set default and maximum resource requests and limits for individual containers in a namespace, ensuring that pods without explicit resource declarations are assigned defaults rather than running unconstrained. Together, these mechanisms enforce the resource governance that makes cluster cost attribution and budgeting possible. Without LimitRanges, developers frequently deploy pods with no resource requests. Which the scheduler treats as having zero resource requirements, leading to nodes that appear to have spare capacity but are actually overloaded.

EKS, AKS, and GKE Cost Comparison

Dimension	AWS EKS	Azure AKS	GCP GKE
Control plane cost	$0.10/hr per cluster ($73/mo)	Free (Standard tier)	Free (1 zonal cluster), $0.10/hr (Standard/Autopilot)
Spot node support	Native Spot, Karpenter	Spot node pools, Karpenter	Spot pools, GKE Autopilot Spot
Managed autoscaling	Cluster Autoscaler or Karpenter	Cluster Autoscaler or Karpenter	GKE Autopilot (fully managed)
RI/commitment coverage	EC2 Savings Plans cover nodes	Azure RIs + Savings Plans	CUDs cover node compute
Fargate/serverless nodes	EKS Fargate (higher per-vCPU cost)	AKS Virtual Nodes (ACI)	GKE Autopilot (automated)
Cost visibility tooling	Container Insights + Cost Explorer	Container Insights + Cost Management	GKE Cost Allocation (native namespace)

K8s Cost Tooling

Several open-source and commercial tools provide Kubernetes-specific cost visibility that cloud-native billing tools lack. OpenCost (CNCF project) provides real-time, namespace-level cost allocation based on node pricing and pod resource consumption, integrating with Prometheus and supporting all three major managed K8s services. Kubecost (commercial; built on OpenCost) adds cost efficiency scores, request right-sizing recommendations, and cluster health metrics. Goldilocks (open-source, by Fairwinds) automates VPA recommendation generation and presents them in a dashboard, making right-sizing analysis accessible without Prometheus expertise.

At the cloud provider level, GKE has the most mature native cost attribution. GKE Cost Allocation in the GCP Billing Console provides namespace and label-based cost breakdown without additional tooling. AWS Container Insights and Azure Monitor provide cluster metrics but require additional work to produce meaningful cost-per-namespace or cost-per-workload views.

Frequently Asked Questions

What is a realistic savings target for Kubernetes cost optimisation?

In enterprise clusters that have not been actively optimised, 30–50% cost reduction is achievable through a combination of resource request right-sizing (15–25% typical), Spot node pool introduction (additional 10–20% depending on eligible workload percentage), and autoscaling tuning (5–15%). Namespace quota enforcement and idle cluster cleanup (dev/staging clusters left running overnight or over weekends) can add another 10–15% on top. The total available savings depend heavily on the current maturity of the cluster's resource management practices.

Should we use GKE Autopilot or Standard mode for cost optimisation?

GKE Autopilot eliminates node management overhead and charges only for pod-requested resources (not node capacity) — making it naturally more cost-efficient for workloads with highly variable resource requirements. Standard mode gives more control over node pool configuration, Spot integration, and custom autoscaling logic, which typically delivers better optimisation for mature FinOps teams. For teams without dedicated Kubernetes operational expertise, Autopilot is often the lower-waste option. For teams with FinOps maturity and mixed workloads, Standard mode with well-tuned node pools typically delivers lower cost at scale.

How do we apply cloud commitment discounts to Kubernetes node costs?

Cloud commitment instruments (AWS Savings Plans, Azure RIs, GCP CUDs) apply to the underlying EC2/Azure VM/Compute Engine instances running as Kubernetes nodes, not to the K8s construct itself. AWS Compute Savings Plans are particularly well-suited to K8s node pools because they cover any instance type in any region, accommodating Karpenter's dynamic instance selection. For Azure AKS, Azure Savings Plans for Compute or Reserved VM Instances cover node VMs. For GCP GKE Standard, node CUDs apply to the Compute Engine VMs. See the RI vs Savings Plans guide for detailed commitment instrument selection guidance.

What is the cost impact of running multiple small clusters vs one large cluster?

Multiple small clusters increase overhead costs: on AWS EKS, each cluster costs $73/month in control plane fees; even five clusters add $365/month in base overhead. More significantly, small clusters have lower pack efficiency, a cluster with 10 nodes cannot achieve the same bin-packing density as a 100-node cluster. Multi-tenant large clusters with proper namespace isolation (quotas, RBAC, network policies) typically achieve 15–30% better node utilisation than equivalent capacity spread across multiple clusters. The exceptions are regulatory/compliance requirements that mandate cluster isolation, and clusters in different regions for latency or data sovereignty reasons.

Kubernetes Cost Optimization:
Cluster Right-Sizing Guide

The K8s Cost Architecture

Pod and Container Right-Sizing

Vertical Pod Autoscaler (VPA)

Autoscaling Strategy

Horizontal Pod Autoscaler Tuning

Cluster Autoscaler Optimisation

Node Pool Optimisation

Spot and Preemptible Node Pools

Spot-Compatible Workload Patterns

Namespace Quotas and LimitRanges

EKS, AKS, and GKE Cost Comparison

K8s Cost Tooling

Frequently Asked Questions

Kubernetes Cost Expert

Cloud FinOps Series

White Paper

Newsletter

Cut Your Kubernetes Compute Costs

Get the IT Negotiation Playbook — free

Kubernetes Cost Optimization:Cluster Right-Sizing Guide

The K8s Cost Architecture

Pod and Container Right-Sizing

Vertical Pod Autoscaler (VPA)

Autoscaling Strategy

Horizontal Pod Autoscaler Tuning

Cluster Autoscaler Optimisation

Node Pool Optimisation

Spot and Preemptible Node Pools

Spot-Compatible Workload Patterns

Namespace Quotas and LimitRanges

EKS, AKS, and GKE Cost Comparison

K8s Cost Tooling

Frequently Asked Questions

Kubernetes Cost Expert

Cloud FinOps Series

White Paper

Newsletter

Cut Your Kubernetes Compute Costs

Related coverage

Best Cloud Cost Optimization Consulting Firms

Cloud Cost Optimization: Enterprise FinOps Guide

Cloud Cost Optimization Guide

Serverless Cost Optimization

Multi-Cloud Cost Optimization: AWS + Azure + GCP

Cloud Cost Governance Policies

Get the IT Negotiation Playbook — free

Kubernetes Cost Optimization:
Cluster Right-Sizing Guide