- Devops Diaries
- Posts
- Error #12 : Service Discovery Failure error Troubleshoot and Fix
Error #12 : Service Discovery Failure error Troubleshoot and Fix
Service discovery is a mechanism by which services discover each other dynamically without the need for hard coding IP addresses or endpoint configuration.

IN TODAY'S EDIT
⌛ Use Case |
Service Discovery Failure error Troubleshoot and Fix |
🚀 Top News |
Siri's Silent Listen: Apple's $95 million privacy settlement and what it means for you |
📚️ Resources : |
Learn New Thing: Tutorial for Selenium automation testing tool lovers. |
Want to prepare for Interviews & Certifications |
Before we begin... a big thank you to Friend Support. |
Inviul is the multi niche learning platform. It covers various topics like Selenium, Appium,Cucumber, Java and many more. |
USE CASE
Service Discovery Failure error Troubleshoot and Fix
Cloud-native applications run as microservices using pods or containers. In these environments, microservices need to communicate dynamically without manual configuration.
Service discovery makes this possible.
In this article, will discuss about Service discovery error.
Service discovery failure in Kubernetes occurs when a service cannot connect to the required resources (e.g., Pods or external services) due to misconfigurations or network issues. This failure disrupts the communication between services, leading to application downtime or degraded performance.
Possible Causes
Incorrect Service Configuration:
Misconfigured selectors in the service definition.
Service name or labels do not match the target Pods.
Network Issues:
Problems with the Kubernetes network plugin (CNI).
Network policies blocking traffic between Pods.
DNS Resolution Issues:
CoreDNS Pod issues (e.g., not running or misconfigured).
Errors in Kubernetes DNS configurations.
Pod Readiness Problems:
Pods targeted by the service are not in a Ready state.
Deployment issues causing pods to fail health checks.
IP Allocation Problems:
Exhaustion of IP addresses in the Kubernetes cluster.
Issues with the IP range of the Pod network.
Service Type Mismatch:
Using the wrong service type (ClusterIP, NodePort, LoadBalancer) for your use case.
Steps to Troubleshoot and Fix
Step 1: Verify Service and Pod Configuration
Check the service definition for correct selectors:
kubectl get svc -o yaml
Ensure the service selector matches the labels of the target Pods:
kubectl get pods --show-labels
Step 2: Check Pod Status
Ensure the Pods targeted by the service are running and ready:
kubectl get pods -o wide
If Pods are not Ready, investigate using:
kubectl describe pod
kubectl logs
Step 3: Validate DNS
Test DNS resolution within the cluster:
kubectl run dns-test --image=busybox --restart=Never --rm -it -- nslookup
Ensure CoreDNS Pods are running:
kubectl get pods -n kube-system | grep coredns
Step 4: Inspect Network Configuration
Check for network policies that might block traffic:
kubectl get networkpolicy -A
Review the CNI plugin logs for errors.
Step 5: Test Connectivity
Use tools like curl, wget, or ping inside a Pod to test service accessibility:
kubectl exec -it -- curl :
Step 6: Review Logs
Check service and kube-proxy logs:
kubectl logs -n kube-system
Step 7: Validate IP Allocation
Check for IP exhaustion in the cluster:
kubectl describe node | grep PodCIDR
Fixes
Update Service Selectors:
Correct the labels in the service definition to match the target Pods.
Restart CoreDNS:
If DNS is the issue, restart CoreDNS Pods:
kubectl rollout restart deployment coredns -n kube-system
Adjust Network Policies:
Modify or remove restrictive network policies that block traffic.
Fix Pod Readiness:
Resolve readiness issues by checking container health checks and configurations.
Scale and Manage IPs:
Increase the IP address range for the cluster if exhausted.
Update Service Type:
Use the appropriate service type based on your requirements.
Preventive Tips
Use Consistent Labeling:
Establish a standardized labeling convention across services and Pods.
Set Up Monitoring:
Use tools like Prometheus and Grafana to monitor network and DNS health.
Test Configurations:
Test service configurations in a staging environment before deploying to production.
Review Resource Limits:
Ensure adequate IP ranges and cluster resources to handle workloads.
Automate Health Checks:
Regularly validate service and DNS configurations using automated scripts.
Enable Logging and Alerts:
Set up logging for CoreDNS, kube-proxy, and application services to detect and alert on failures early.
Reply