Error #1: Resource Exhausted Troubleshoot and Fix

Sign Up

DevOps Diaries

Hey — It's Avinash Tietler 👋

Here you get a use cases,top news, tools, and articles from the DevOps.

IN TODAY'S EDITION

🔬Use Case:

Why am getting "Resource Exhausted" error in Kubernetes?

📺Top News:

AWS plans $10 billion infrastructure investment in Ohio, advancing the future of AI.

📢 Remote Job:

EXL is hiring a DevOps Engineer - Location: Worldwide (Remote)

📑Resource:

10,000+ Free Courses from Tech Giants: Learn from Google, Microsoft, Amazon, and More

USE CASE

Why am getting "Resource Exhausted" error in Kubernetes?

Possible Causes

Insufficient Node Resources: The cluster nodes do not have enough CPU or memory to schedule new pods.
Improper Resource Requests and Limits: Pods are requesting more resources than available or no resource limits are defined, leading to over-utilization.
High Cluster Workload: The cluster is running too many pods, causing resource contention.
Evictions: Nodes evict pods to reclaim resources due to resource pressure.

Step-by-Step Resolution

☞ Identify the Cause

Check Pod Events:
kubectl describe pod <pod-name>

Look for messages like Insufficient CPU or Insufficient memory.

Check Node Status:
kubectl get nodes
kubectl describe node <node-name>
Review node resource capacity and utilization.
Monitor Resource Usage: Use metrics-server or tools like Prometheus and Grafana to monitor CPU and memory usage across the cluster.

☞ Verify Resource Requests and Limits

Ensure that each pod has proper resource requests and limits defined.
resources:
requests:
memory: "128Mi"
cpu: "500m"
limits:
memory: "256Mi"
cpu: "1000m"
Check existing pod configurations:
kubectl get pods -o yaml

☞ Adjust Resource Requests and Limits

Modify Resource Configuration: Update resource requests and limits in the pod specification to match actual needs.

kubectl edit deployment <deployment-name>

☞ Scale the Application

If the workload is high, scale up the application using replicas:
kubectl scale deployment <deployment-name> --replicas=<number>

☞ Add More Nodes to the Cluster

For On-Prem Clusters: Add more nodes manually.

For Cloud Providers: Enable cluster auto-scaling. Example (GKE):

gcloud container clusters update <cluster-name> --enable-autoscaling --min-nodes=<min> --max-nodes=<max>

☞ Implement Resource Quotas

Limit resource usage per namespace to prevent overuse:

apiVersion: v1 kind: ResourceQuota metadata: name: resource-quota namespace: <namespace> spec: hard: requests.cpu: "4" requests.memory: "8Gi" limits.cpu: "8" limits.memory: "16Gi"

☞ Enable Cluster Autoscaler

Use Kubernetes Cluster Autoscaler to dynamically add nodes when resources are exhausted.

☞ Optimize Existing Resources

Identify and terminate unused pods.
Reduce resource limits for less critical applications.

☞ Monitor and Prevent Future Issues

Set up alerting for high resource utilization.
Use Vertical Pod Autoscaler (VPA) or Horizontal Pod Autoscaler (HPA) for dynamic scaling.

By identifying the root cause, ensuring proper resource configuration, and scaling your cluster appropriately, you can resolve and prevent "Resource Exhausted" errors. Regular monitoring and proactive resource management are essential to maintaining a stable Kubernetes environment.