Devops Diaries
Posts
Error #3: OOMKilled Troubleshoot and Fix

Error #3: OOMKilled Troubleshoot and Fix

Kubernetes has a lot of built-in capabilities to ensure your workloads get enough CPU and memory to stay healthy. However, misconfiguration is a common reason why Kubernetes might kill pods despite the workload.

Avinash Tietler
December 28, 2024

Sign Up

DevOps Diaries

Hey — It's Avinash Tietler 👋

Here you get a use cases,top news, tools, and articles from the DevOps.

IN TODAY'S EDITION

Use Case

Troubleshoot and Fix Kubernetes OOMKilled error

🚀 Tech News

Does current AI represent a dead end?

👀 Remote Jobs

Senior Technical Lead - Proximity Works Location: Worldwide (Remote)

📚️ Resources

70+ FutureLearn Courses with Free Certificates

USE CASE

Troubleshoot and Fix Kubernetes OOMKilled error

OOMKilled (Out of Memory Killed) occurs in Kubernetes when a container exceeds the memory limit specified in its resource configuration. The Kubernetes scheduler, in coordination with the kubelet, monitors container resource usage.

When a container uses more memory than its allocated limit, the kernel's Out of Memory (OOM) Killer terminates the container to protect the system from running out of memory.

Possible Reasons Behind OOMKilled

Insufficient Memory Limits
The memory limit specified in the container's resource configuration is too low for its actual workload.
Memory Leak in Application
The application running inside the container has a memory leak, causing it to consume increasing amounts of memory over time.
Unexpected High Workload
A sudden surge in traffic or workload might cause the application to use more memory than anticipated.
Improper Resource Allocation
Containers are deployed without specifying resource limits, leading to unbounded memory usage and competition for system resources.
Misconfigured Applications
Applications are configured to use more memory than what the container is allowed.
Multiple Containers on the Same Node
If multiple containers are running on the same node, one container consuming excessive memory can lead to eviction of others.
Node Resource Exhaustion
The node itself may not have enough memory to handle all the containers running on it.
Unoptimized Code or Queries
Poorly optimized application code or inefficient database queries could lead to excessive memory usage.

How to Resolve OOMKilled

Analyze Logs
- Use kubectl logs <pod-name> -c <container-name> to check the container logs for any application-specific issues before the termination.
- Investigate system logs or use kubectl describe pod <pod-name> to get more details about the OOMKilled event.
Increase Memory Limits
- Update the container's memory limits in the resources section of the Pod's YAML file:
Apply the changes using kubectl apply -f <file.yaml>.

Optimize Application Code
- Fix memory leaks by identifying inefficient code or dependencies.
- Perform load testing to ensure the application handles memory efficiently.
Use Vertical Pod Autoscaler (VPA)
- Deploy a Vertical Pod Autoscaler to dynamically adjust memory and CPU requests for the container.
Scale Out Workload
- Horizontal scaling can reduce memory usage per container by distributing the workload across multiple replicas:

kubectl scale deployment <deployment-name> --replicas=<number>

Monitor Resource Usage
- Use tools like Prometheus, Grafana, or Kubernetes Dashboard to monitor resource consumption.
- Analyze metrics from kubectl top pod and kubectl top node.
Set Proper Resource Requests and Limits
- Ensure that requests and limits are set appropriately to prevent resource contention:
Use Quality of Service (QoS) Classes
- Assign QoS classes (Guaranteed, Burstable, BestEffort) by setting both requests and limits. Guaranteed pods are less likely to be OOMKilled.
Avoid Memory Overcommitment
- Ensure the node has sufficient memory to handle all scheduled containers without overcommitting resources.
Preemptive Scaling of Nodes
- Use Cluster Autoscaler to add nodes when resource limits on the current nodes are reached.
Enable Debugging Tools
- Tools like kubectl-debug, kubectl exec, and kubectl cp can help debug memory usage within a running container.
Investigate the Node
- Check the node's memory usage using kubectl describe node <node-name> to identify if the node itself is low on memory.

Reply

or to participate.