Azure CloudJune 14, 20266 min read

Kubernetes on Azure: 5 YAML Tweaks That Cut Cluster Cost in Half

TechseriaTeam

Kubernetes on Azure: 5 YAML Tweaks That Cut Cluster Cost in Half

Most AKS clusters are dramatically over-provisioned. Not because the engineers who built them were reckless — but because Kubernetes cost optimisation requires intentional configuration that does not happen by default.

The default AKS cluster: fixed node count, no autoscaling, resource requests set to match the biggest load spike the developer can imagine, no namespace quotas, all workloads running on on-demand nodes. The result: 60–70% of provisioned compute capacity running unused, billed at full price, every hour of every day.

This article covers five specific YAML configurations that address the five biggest sources of AKS waste. Each section includes the actual YAML, an explanation of what it does, and — where applicable — before/after cost data.

Real client outcome context: A UK SaaS company running a 40-node AKS cluster (mix of Standard_D4s_v3 nodes) paying £18,400/month. After implementing all five configurations over 6 weeks: £9,100/month — a 51% reduction with no degradation in application performance or availability.

Fix 1: Set Accurate Resource Requests and Limits

Cost impact: typically the single largest source of AKS waste

Resource requests are the values Kubernetes uses to schedule pods onto nodes. If a pod requests 2 vCPU and 4GB RAM, the Kubernetes scheduler reserves that capacity on a node, even if the pod is actually using 0.3 vCPU and 800MB in production.

The consistent finding across AKS audits: developers set resource requests to 3–5x actual utilisation. It feels safe. In aggregate, it means nodes fill up based on reserved capacity, not actual usage — forcing the cluster to provision more nodes than the actual workload requires.

Diagnosing your current waste:

First, check actual resource usage vs. requests:

# For each pod, compare requests to actual usage kubectl top pods --all-namespaces --sort-by=cpu # For nodes, see requested vs. allocatable:

If your node description shows "CPU Requests: 3800m (95%)" but `kubectl top node` shows actual CPU usage at 28%, you have a resource request inflation problem.

The YAML fix — before and after:

# BEFORE: Developer's "safe" estimate resources: requests:

After running the application in production for 30 days and sampling actual metrics from Azure Monitor (Container Insights), the actual P99 usage was 180m CPU and 380MB memory.

# AFTER: Based on 30-day P99 actual usage + 30% headroom resources: requests:

Impact: With accurate requests, 3x more pods fit on each node. A 12-node cluster that was scheduling 4 pods per node can now schedule 12 pods per node — reducing the required node count from 12 to 4 for the same workload. Node cost reduction: 67%.

How to do this safely:

Deploy Container Insights on your AKS cluster (Azure Monitor integration)
Run workloads for 2–4 weeks under real production load
Query P95/P99 CPU and memory usage per container from Azure Monitor
Set requests to P95 actual + 20% headroom; set limits to 2x requests
Deploy changes in a staging environment first; validate with load testing

Fix 2: Enable and Configure the Cluster Autoscaler

Cost impact: eliminates over-provisioned base node count

AKS cluster autoscaler scales the node pool up when pods cannot be scheduled (insufficient capacity) and down when nodes have been underutilised for a configurable period. Without it, you manually set a node count and pay for it whether the load justifies it or not.

The default autoscaler configuration is often misconfigured in two ways: the scale-down threshold is too conservative (nodes never scale down) and the scale-down delay is too long (nodes remain after load drops).

Enable cluster autoscaler on your node pool:

# Node pool with autoscaler enabled apiVersion: v1 kind: NodePool # This is illustrative; use az aks nodepool update or ARM template

Via Azure CLI:

az aks nodepool update \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \

Tune the autoscaler ConfigMap for faster scale-down:

apiVersion: v1 kind: ConfigMap metadata:

Important: set Pod Disruption Budgets to control scale-down behaviour:

Without PDBs, the autoscaler may terminate nodes with running pods too aggressively, causing availability issues. Define PDBs for all production workloads:

apiVersion: policy/v1 kind: PodDisruptionBudget metadata:

Cost impact for the 40-node cluster:

Before: static 40-node cluster, minimum 35 nodes even at off-peak. After: autoscaler min 8 nodes, max 50 nodes. Off-peak (nights/weekends): scales to 8–12 nodes. Weekday peak: scales to 30–40 nodes.

Effective node-hours reduction: ~35%. Monthly saving on node costs alone: ~£3,500.

Fix 3: Add a Spot Node Pool for Fault-Tolerant Workloads

Cost impact: 60–80% cost reduction on eligible workloads

AKS supports mixed node pools — you can have an on-demand node pool for production-critical workloads and one or more Spot node pools for workloads that can tolerate interruption.

Workloads suitable for Spot node pools:

Background job processors
Queue workers
Batch data transformation
Cache warmers
Non-production environments (dev, staging, QA running in the same cluster)

Creating a Spot node pool:

az aks nodepool add \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \

Schedule workloads onto the Spot node pool using tolerations:

Spot nodes are automatically tainted with `kubernetes.azure.com/scalesetpriority=spot:NoSchedule`. To schedule a workload onto Spot nodes, add the corresponding toleration and a node affinity:

apiVersion: apps/v1 kind: Deployment metadata:

Handle SIGTERM in your application for graceful Spot eviction:

import signal import sys def handle_sigterm(signum, frame):

Cost impact for the 40-node cluster:

The company had 12 nodes running queue workers and batch jobs — workloads that could tolerate interruption. Moving these to Spot:

Before: 12 × Standard_D4s_v3 on-demand = £1,920/month
After: 12 × Standard_D4s_v3 Spot (~75% discount) = £480/month
Monthly saving: £1,440

Fix 4: Deploy Vertical Pod Autoscaler (VPA)

Cost impact: eliminates static resource waste as load patterns change

Even after right-sizing resource requests (Fix 1), applications evolve. A service that needed 250m CPU in January may need 400m in June as user traffic grows. Without ongoing adjustment, requests drift out of alignment with reality — either too low (causing OOMKills and throttling) or too high (wasting capacity).

Vertical Pod Autoscaler (VPA) automatically recommends and optionally adjusts resource requests based on actual usage history. It is the automation layer that keeps resource requests accurate over time.

Install VPA on AKS:

# VPA is not installed by default on AKS — add via the add-on or manual install az aks update \ --resource-group myResourceGroup \

Configure VPA for a deployment (Recommendation mode — safe start):

apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata:

Check VPA recommendations:

kubectl describe vpa api-server-vpa -n production # Output includes: # Target (what VPA recommends): cpu: 180m, memory: 340Mi

Use these recommendations to update your deployment's resource requests, then consider switching VPA to `Auto` mode for automatic application.

Note on VPA + HPA interaction: Do not use VPA in `Auto` mode simultaneously with Horizontal Pod Autoscaler (HPA) on CPU/memory metrics. They will conflict. Use HPA for scaling replicas based on custom metrics (queue depth, request rate) and VPA for right-sizing resource requests.

Fix 5: Implement Namespace Resource Quotas

Cost impact: prevents resource sprawl and enforces cost accountability

Without namespace quotas, any team or developer with cluster access can deploy workloads with any resource requests they choose — including unreasonably high requests that prevent other workloads from scheduling efficiently. This is a governance issue that directly impacts cost.

Namespace quotas enforce a ceiling on total resource consumption within a namespace. Combined with LimitRange objects that set default requests/limits for pods that do not specify them, quotas create cost accountability at the team level.

ResourceQuota for a production namespace:

apiVersion: v1 kind: ResourceQuota metadata:

LimitRange to enforce defaults (prevents pods with no resource spec from consuming unbounded resources):

apiVersion: v1 kind: LimitRange metadata:

Cost impact for the 40-node cluster:

Before namespace quotas, the development namespace was consuming resources equivalent to 8 production-grade nodes — developers had deployed test workloads with 4-vCPU requests that ran continuously. After implementing namespace quotas with a 4-vCPU total limit for the development namespace and auto-shutdown schedules, those 8 nodes were freed.

Full Before/After Cost Summary: 40-Node Cluster

Optimisation Monthly Saving

Right-sized resource requests £3,200

Cluster autoscaler (off-peak scale-down) £3,500

Spot node pool for batch workloads £1,440

VPA right-sizing (ongoing drift correction) £800

Namespace quotas (dev environment control) £1,200

Remaining monthly bill £9,100 (was £18,400)

Total monthly saving £9,140 (51%)

Implementation timeline: 6 weeks. Engineering investment: approximately 80 hours of senior DevOps/SRE time (3 weeks at £150/hour = £9,000 one-time cost). Payback period: 1 month.

Not Sure Where to Start?

These five configurations reduce AKS cost by 40–60% in most environments. But implementing them correctly — without introducing latency regressions, OOMKills, or scheduling failures — requires careful workload profiling and testing before production rollout.

Techseria's AKS Cost Optimisation engagement includes:

Full cluster audit (resource requests vs. actual usage, node utilisation, namespace analysis)
Configuration design for all five optimisations, tailored to your workload profile
Staging environment testing and performance validation
Production rollout with monitoring and rollback plan
Ongoing Azure Cost Management dashboards for AKS spend visibility

Fixed-fee from £12,000 for clusters up to 50 nodes. Typical saving: 8–15x the engagement cost in the first year.

[Book a Strategy Session](https://techseria.com/contact) to discuss your AKS environment and get a specific cost reduction estimate for your cluster configuration.

Ready to accelerate your operations?

See how custom AI solutions, ERPNext integration, and workflow automations can lower your operating costs. Book your free 30-minute Workflow Audit with a senior engineer.

Book Free Audit Learn more about us

Kubernetes on Azure: 5 YAML Tweaks That Cut Cluster Cost in Half

Kubernetes on Azure: 5 YAML Tweaks That Cut Cluster Cost in Half

Fix 1: Set Accurate Resource Requests and Limits

Fix 2: Enable and Configure the Cluster Autoscaler

Fix 3: Add a Spot Node Pool for Fault-Tolerant Workloads

Fix 4: Deploy Vertical Pod Autoscaler (VPA)

Fix 5: Implement Namespace Resource Quotas

Full Before/After Cost Summary: 40-Node Cluster

Not Sure Where to Start?

Ready to accelerate your operations?

Recent Articles

Measuring ROI on AI Agent Deployment: The Only 5 KPIs That Actually Tell You If It's Working

Azure DevOps for Mid-Market: Is the Complexity Worth It vs GitHub Actions?

Azure AI Foundry vs Custom LLM Integration: Decision Guide for Enterprise Teams