• Devops Diaries
  • Posts
  • Error #22 - Pod Eviction error Troubleshoot and Fix

Error #22 - Pod Eviction error Troubleshoot and Fix

Pod eviction in Kubernetes is a critical process that ensures cluster stability under resource constraints, node failures, or policy enforcement. You can minimize disruptions and keep your workloads running smoothly.

IN TODAY'S EDIT

Use Case

Pod Eviction error Troubleshoot and Fix

🚀 Top News

Siri's Silent Listen: Apple's $95 million privacy settlement and what it means for you

📚️ Resources :

Learn New Thing: Tutorial for Selenium automation testing tool lovers.

Want to prepare for Interviews & Certifications

USE CASE

Pod Eviction error Troubleshoot and Fix

Pod eviction in Kubernetes refers to the process where a running pod is forcibly terminated and removed from a node. Evictions can happen due to resource constraints, policy enforcement, or node failures. Understanding why evictions occur, how to troubleshoot them, and how to prevent them is crucial for maintaining a stable Kubernetes cluster.

Pod Eviction in Kubernetes

Pod eviction occurs when Kubernetes removes a pod from a node, either voluntarily (e.g., due to resource pressure) or involuntarily (e.g., due to node failures). Unlike pod deletion, which is triggered by a user, eviction is typically controlled by Kubernetes.

Evictions can be initiated by:

  • The Kubelet (node-level component)

  • The Kubernetes Scheduler (based on node availability)

  • The Cluster Autoscaler

  • Taints and Tolerations policies

  • Pod Disruption Budgets (PDBs) during voluntary disruptions

Causes of Pod Eviction

A. Resource Pressure on the Node

  1. Memory Pressure: If a node runs out of memory (OutOfMemory - OOM), Kubernetes prioritizes evicting pods.

  2. CPU Pressure: CPU throttling or high CPU utilization can lead to evictions.

  3. Disk Pressure: If the available disk space or inodes fall below the threshold, Kubernetes evicts pods.

  4. PID Pressure: If the node runs out of process IDs (PIDs), pods are evicted.

How to Identify Resource Pressure?

Run the following command: kubectl describe node <node-name>

Look for conditions like:

  • MemoryPressure: True

  • DiskPressure: True

  • PIDPressure: True

B. Node Failures or Unresponsiveness

  • If a node becomes unreachable due to a network issue, hardware failure, or crashes, the pods on that node may be evicted.

  • Kubernetes marks the node as NotReady and schedules replacement pods on another node.

Check Node Status: kubectl get nodes

If a node is NotReady, the pods on it will be evicted.

C. Taints and Tolerations

  • Nodes can have taints, which prevent certain pods from running on them unless they have a matching toleration.

  • If a pod doesn’t tolerate a taint, it will be evicted.

Check for Taints: kubectl describe node <node-name> | grep Taint

D. Pod Disruption Budgets (PDBs)

  • PDBs define the minimum number of available pods during voluntary disruptions (e.g., during upgrades).

  • If a disruption occurs beyond the allowed threshold, Kubernetes may evict excess pods.

Check PDBs: kubectl get pdb

E. Cluster Autoscaler

  • When scaling down nodes in a cluster, pods may be evicted if they don’t fit on remaining nodes.

  • Non-PDB-protected pods are more likely to be evicted.

F. Failed Liveness or Readiness Probes

  • If a pod continuously fails its liveness or readiness probes, Kubernetes may evict and replace it.

Check Pod Events: kubectl describe pod <pod-name>

Troubleshooting Pod Evictions

Step 1: Check Eviction Events

Run: kubectl get events --sort-by=.metadata.creationTimestamp

Look for events mentioning eviction, such as:

Pod <pod-name> evicted: node had insufficient memory.

Step 2: Describe the Pod

kubectl describe pod <pod-name>

This will show detailed information about why the pod was evicted.

Step 3: Check Node Conditions

kubectl describe node <node-name>

Look for MemoryPressure, DiskPressure, or PIDPressure.

Step 4: Identify Taints

kubectl describe node <node-name> | grep Taint

If there are taints, ensure the pod has a corresponding toleration.

Step 5: Check Pod Disruption Budgets (PDBs)

kubectl get pdb -A

If PDBs are too strict, Kubernetes may evict pods to maintain minimum availability.

Step 6: Check Autoscaler Logs

If cluster autoscaling is enabled, check logs:

kubectl logs -n kube-system deployment/cluster-autoscaler

Step 7: Check Liveness and Readiness Probes

kubectl describe pod <pod-name> | grep -A10 Liveness

If the probes fail frequently, the pod may be evicted.

Preventive Measures for Pod Evictions

A. Optimize Resource Requests and Limits

Ensure your pod requests and limits are well-defined

B. Use Quality of Service (QoS) Classes

Kubernetes prioritizes pods based on QoS:

  1. Guaranteed: Pods with defined requests and limits (less likely to be evicted).

  2. Burstable: Pods with requests but higher limits.

  3. BestEffort: Pods with no requests/limits (most likely to be evicted).

Use Guaranteed QoS for critical workloads.

C. Set Pod Priority Classes

Assign higher priority to critical workloads:

D. Adjust Eviction Thresholds

Modify eviction thresholds in kubelet configuration:

--eviction-hard=memory.available<200Mi,nodefs.available<10%,imagefs.available<15%

E. Implement Pod Anti-Affinity

Ensure critical pods don’t get evicted together:

F. Use Tolerations and Node Selectors

Allow pods to run on tainted nodes:

G. Ensure Nodes Have Sufficient Resources

Monitor node capacity: kubectl top node

Scale up the cluster if necessary.

H. Use Pod Disruption Budgets (PDBs) Wisely

Avoid over-restricting PDBs: minAvailable: 2

This ensures at least two pods remain running.

Reply

or to participate.