The Insufficient Capacity error in Kubernetes typically occurs when the cluster does not have enough resources (CPU, memory, storage, or specific hardware like GPUs) to schedule a pod. Let's break it down:
🔴 Causes of the Insufficient Capacity Error:
Node Resource Exhaustion – All available nodes lack sufficient CPU/memory.
Pod Resource Requests Exceed Available Resources – The requested resources are too high.
Tainted or Unschedulable Nodes – Nodes might have taints preventing scheduling.
Affinity and Anti-Affinity Constraints – Pods might have rules restricting where they can run.
Resource Limits on Namespaces – Namespace-level quotas might be exhausted.
Pod Disruption Budget (PDB) Restrictions – May prevent new pods from starting.
GPU or Special Hardware Constraints – If specific hardware is needed but unavailable.
🛠 Troubleshooting the Insufficient Capacity Error:
Check Node Resource Availability:
bashkubectl describe node <node-name>Look for
AllocatablevsCapacityvalues.Check Pod Scheduling Events:
bashkubectl describe pod <pod-name>Look for messages like
"0/5 nodes available: insufficient memory"List Available Nodes:
bashkubectl get nodesEnsure enough nodes are in
Readystate.Check Resource Requests & Limits:
bashkubectl get pod <pod-name> -o yamlAdjust requests in the deployment YAML if needed.
Inspect Node Affinity & Tolerations:
bashkubectl describe pod <pod-name>Ensure proper affinity/toleration settings.
Check Namespace Quotas:
bashkubectl get resourcequota --all-namespacesMonitor Cluster Metrics (Optional):
bashkubectl top nodes kubectl top podsIdentify resource-heavy workloads.
✅ Preventive Approaches:
Autoscaling
Enable Cluster Autoscaler:
yamlapiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: my-app-hpa spec: maxReplicas: 10 minReplicas: 2 targetCPUUtilizationPercentage: 70Configure Cluster Autoscaler for auto-scaling nodes.
Resource Requests & Limits:
Define optimal requests:
yamlresources: requests: cpu: "500m" memory: "512Mi" limits: cpu: "1" memory: "1Gi"
Node Pool Expansion:
Increase nodes manually in cloud provider settings.
Optimize Workloads:
Scale down less critical workloads.
Monitor with Prometheus & Grafana:
Set up alerts for high resource usage.