- Devops Diaries
- Posts
- Error #3: OOMKilled Troubleshoot and Fix
Error #3: OOMKilled Troubleshoot and Fix
Kubernetes has a lot of built-in capabilities to ensure your workloads get enough CPU and memory to stay healthy. However, misconfiguration is a common reason why Kubernetes might kill pods despite the workload.

DevOps Diaries
Hey — It's Avinash Tietler 👋
Here you get a use cases,top news, tools, and articles from the DevOps.
IN TODAY'S EDITION
Use Case
Troubleshoot and Fix Kubernetes OOMKilled error
🚀 Tech News
👀 Remote Jobs
Senior Technical Lead - Proximity Works Location: Worldwide (Remote)
📚️ Resources
USE CASE
Troubleshoot and Fix Kubernetes OOMKilled error
OOMKilled (Out of Memory Killed) occurs in Kubernetes when a container exceeds the memory limit specified in its resource configuration. The Kubernetes scheduler, in coordination with the kubelet, monitors container resource usage.
When a container uses more memory than its allocated limit, the kernel's Out of Memory (OOM) Killer terminates the container to protect the system from running out of memory.
Possible Reasons Behind OOMKilled
Insufficient Memory Limits
The memory limit specified in the container's resource configuration is too low for its actual workload.Memory Leak in Application
The application running inside the container has a memory leak, causing it to consume increasing amounts of memory over time.Unexpected High Workload
A sudden surge in traffic or workload might cause the application to use more memory than anticipated.Improper Resource Allocation
Containers are deployed without specifying resource limits, leading to unbounded memory usage and competition for system resources.Misconfigured Applications
Applications are configured to use more memory than what the container is allowed.Multiple Containers on the Same Node
If multiple containers are running on the same node, one container consuming excessive memory can lead to eviction of others.Node Resource Exhaustion
The node itself may not have enough memory to handle all the containers running on it.Unoptimized Code or Queries
Poorly optimized application code or inefficient database queries could lead to excessive memory usage.
How to Resolve OOMKilled
Analyze Logs
Use
kubectl logs <pod-name> -c <container-name>
to check the container logs for any application-specific issues before the termination.Investigate system logs or use
kubectl describe pod <pod-name>
to get more details about the OOMKilled event.
Increase Memory Limits
Update the container's memory limits in the
resources
section of the Pod's YAML file:
Apply the changes using
kubectl apply -f <file.yaml>
.
Optimize Application Code
Fix memory leaks by identifying inefficient code or dependencies.
Perform load testing to ensure the application handles memory efficiently.
Use Vertical Pod Autoscaler (VPA)
Deploy a Vertical Pod Autoscaler to dynamically adjust memory and CPU requests for the container.
Scale Out Workload
Horizontal scaling can reduce memory usage per container by distributing the workload across multiple replicas:
kubectl scale deployment <deployment-name> --replicas=<number>
Monitor Resource Usage
Use tools like Prometheus, Grafana, or Kubernetes Dashboard to monitor resource consumption.
Analyze metrics from
kubectl top pod
andkubectl top node
.
Set Proper Resource Requests and Limits
Ensure that requests and limits are set appropriately to prevent resource contention:
Use Quality of Service (QoS) Classes
Assign QoS classes (Guaranteed, Burstable, BestEffort) by setting both
requests
andlimits
. Guaranteed pods are less likely to be OOMKilled.
Avoid Memory Overcommitment
Ensure the node has sufficient memory to handle all scheduled containers without overcommitting resources.
Preemptive Scaling of Nodes
Use Cluster Autoscaler to add nodes when resource limits on the current nodes are reached.
Enable Debugging Tools
Tools like kubectl-debug, kubectl exec, and kubectl cp can help debug memory usage within a running container.
Investigate the Node
Check the node's memory usage using
kubectl describe node <node-name>
to identify if the node itself is low on memory.
Reply