Error #3: OOMKilled Troubleshoot and Fix

Kubernetes has a lot of built-in capabilities to ensure your workloads get enough CPU and memory to stay healthy. However, misconfiguration is a common reason why Kubernetes might kill pods despite the workload.

DevOps Diaries

Hey — It's Avinash Tietler 👋

Here you get a use cases,top news, tools, and articles from the DevOps.

IN TODAY'S EDITION

Use Case
  • Troubleshoot and Fix Kubernetes OOMKilled error

🚀 Tech News
👀 Remote Jobs
📚️ Resources

USE CASE

Troubleshoot and Fix Kubernetes OOMKilled error

OOMKilled (Out of Memory Killed) occurs in Kubernetes when a container exceeds the memory limit specified in its resource configuration. The Kubernetes scheduler, in coordination with the kubelet, monitors container resource usage.

When a container uses more memory than its allocated limit, the kernel's Out of Memory (OOM) Killer terminates the container to protect the system from running out of memory.

Possible Reasons Behind OOMKilled

  1. Insufficient Memory Limits
    The memory limit specified in the container's resource configuration is too low for its actual workload.

  2. Memory Leak in Application
    The application running inside the container has a memory leak, causing it to consume increasing amounts of memory over time.

  3. Unexpected High Workload
    A sudden surge in traffic or workload might cause the application to use more memory than anticipated.

  4. Improper Resource Allocation
    Containers are deployed without specifying resource limits, leading to unbounded memory usage and competition for system resources.

  5. Misconfigured Applications
    Applications are configured to use more memory than what the container is allowed.

  6. Multiple Containers on the Same Node
    If multiple containers are running on the same node, one container consuming excessive memory can lead to eviction of others.

  7. Node Resource Exhaustion
    The node itself may not have enough memory to handle all the containers running on it.

  8. Unoptimized Code or Queries
    Poorly optimized application code or inefficient database queries could lead to excessive memory usage.

How to Resolve OOMKilled

  • Analyze Logs

    • Use kubectl logs <pod-name> -c <container-name> to check the container logs for any application-specific issues before the termination.

    • Investigate system logs or use kubectl describe pod <pod-name> to get more details about the OOMKilled event.

  • Increase Memory Limits

    • Update the container's memory limits in the resources section of the Pod's YAML file:

      Kubernetes Troubleshooting

    Apply the changes using kubectl apply -f <file.yaml>.

  • Optimize Application Code

    • Fix memory leaks by identifying inefficient code or dependencies.

    • Perform load testing to ensure the application handles memory efficiently.

  • Use Vertical Pod Autoscaler (VPA)

    • Deploy a Vertical Pod Autoscaler to dynamically adjust memory and CPU requests for the container.

  • Scale Out Workload

    • Horizontal scaling can reduce memory usage per container by distributing the workload across multiple replicas:

kubectl scale deployment <deployment-name> --replicas=<number>

  • Monitor Resource Usage

    • Use tools like Prometheus, Grafana, or Kubernetes Dashboard to monitor resource consumption.

    • Analyze metrics from kubectl top pod and kubectl top node.

  • Set Proper Resource Requests and Limits

    • Ensure that requests and limits are set appropriately to prevent resource contention:

  • Use Quality of Service (QoS) Classes

    • Assign QoS classes (Guaranteed, Burstable, BestEffort) by setting both requests and limits. Guaranteed pods are less likely to be OOMKilled.

  • Avoid Memory Overcommitment

    • Ensure the node has sufficient memory to handle all scheduled containers without overcommitting resources.

  • Preemptive Scaling of Nodes

    • Use Cluster Autoscaler to add nodes when resource limits on the current nodes are reached.

  • Enable Debugging Tools

    • Tools like kubectl-debug, kubectl exec, and kubectl cp can help debug memory usage within a running container.

  • Investigate the Node

    • Check the node's memory usage using kubectl describe node <node-name> to identify if the node itself is low on memory.

Reply

or to participate.