Troubleshooting
Common issues and solutions for Sligo Enterprise deployment.
Table of Contents
Installation Issues
Helm Install Fails
Symptom: helm install command fails
Common causes:
- Missing secrets
```bash
Check if all secrets exist
kubectl get secrets -n sligo
Expected secrets:
- nextjs-secrets
- backend-secrets
- mcp-gateway-secrets
- postgres-secrets (or postgres-external-secrets)
**Solution:** Create missing secrets (see [SECRETS.md](/sligo-helm-charts/SECRETS/))
2. **Invalid values**
```bash
# Validate your values file
helm template test sligo/sligo-cloud -f values-production.yaml
Solution: Fix syntax errors in values file
- Namespace doesn’t exist
# Create namespace kubectl create namespace sligo
Pod Issues
Pods Not Starting (Pending)
Symptom: Pods stuck in Pending state
kubectl get pods -n sligo
# NAME READY STATUS RESTARTS AGE
# sligo-app-xxx 0/1 Pending 0 5m
Diagnosis:
kubectl describe pod <pod-name> -n sligo
Common causes:
- Insufficient resources
- Error:
Insufficient cpuorInsufficient memory - Solution: Scale down other workloads or add nodes
- Error:
- Image pull errors
- Error:
ErrImagePullorImagePullBackOff - Solution: Check image repository URLs and credentials
- Error:
- Storage class not available
- Error:
Pending PersistentVolumeClaim - Solution: Verify storage class exists
- Error:
Pods CrashLooping
Symptom: Pods restarting repeatedly
kubectl get pods -n sligo
# NAME READY STATUS RESTARTS AGE
# sligo-app-xxx 0/1 CrashLoopBackOff 5 10m
Diagnosis:
# Check logs
kubectl logs <pod-name> -n sligo --previous
# Check events
kubectl describe pod <pod-name> -n sligo
Common causes:
- Missing environment variables
- Check logs for “environment variable not set” errors
- Solution: Verify secrets are created correctly
- Database connection issues
- Check logs for connection errors
- Solution: Verify database credentials and connectivity
- Application errors
- Check logs for application-specific errors
- Solution: Contact support with logs
Pods Not Ready
Symptom: Pods running but not ready
kubectl get pods -n sligo
# NAME READY STATUS RESTARTS AGE
# sligo-app-xxx 0/1 Running 0 10m
Diagnosis:
# Check readiness probe
kubectl describe pod <pod-name> -n sligo | grep -A 10 "Readiness"
# Check logs
kubectl logs <pod-name> -n sligo
Common causes:
- Readiness probe failing
- Probe endpoint not responding
- Solution: Verify health endpoint is working
- Application slow to start
- Increase
initialDelaySecondsin readiness probe - Solution: Update values:
app: readinessProbe: initialDelaySeconds: 60 # Increase from 15
- Increase
Networking Issues
Cannot Access Application
Symptom: Application URL not accessible
Diagnosis:
# Check ingress
kubectl get ingress -n sligo
# Check services
kubectl get svc -n sligo
# Check ALB creation (AWS)
kubectl describe ingress -n sligo
Common causes:
- Ingress not created
- No ALB provisioned
- Solution: Check ingress controller is installed
- DNS not configured
- Domain not pointing to ALB
- Solution: Create CNAME record
- Security group blocking traffic
- ALB security group too restrictive
- Solution: Allow inbound traffic on ports 80/443
Pods Cannot Connect to Database
Symptom: Application logs show database connection errors
Diagnosis:
# Test from pod
kubectl exec -it <pod-name> -n sligo -- sh
# Inside pod:
nc -zv postgres 5432 # For internal DB
nc -zv <rds-host> 5432 # For external DB
Common causes:
- Wrong database host
- Solution: Verify
database.external.hostin values
- Solution: Verify
- Database security group
- RDS security group not allowing EKS traffic
- Solution: Add EKS security group to RDS allowed list
- Credentials incorrect
- Solution: Verify secrets have correct values
Database Issues
PostgreSQL Not Starting
Symptom: PostgreSQL pod not starting
Diagnosis:
kubectl logs postgres-0 -n sligo
kubectl describe statefulset postgres -n sligo
Common causes:
- PVC not bound
- Check:
kubectl get pvc -n sligo - Solution: Verify storage class exists
- Check:
- Insufficient resources
- Solution: Increase resource limits
- Data corruption
- Last resort: Delete PVC and start fresh (data loss!)
Cannot Connect to PostgreSQL
Diagnosis:
# From within cluster
kubectl run -it --rm debug --image=postgres:15 --restart=Never -n sligo -- \
psql -h postgres -U sligo -d sligo
# Check service
kubectl get svc postgres -n sligo
Ingress Issues
ALB Not Created (AWS)
Symptom: No load balancer appears in AWS console
Diagnosis:
# Check ingress events
kubectl describe ingress -n sligo
# Check ALB controller logs
kubectl logs -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller
Common causes:
- ALB controller not installed
- Solution: Install AWS Load Balancer Controller
- IAM permissions missing
- Solution: Attach required IAM policy to node role
- Invalid annotations
- Solution: Check ALB annotations in values
SSL/TLS Issues
Symptom: HTTPS not working or certificate errors
Common causes:
- Certificate ARN incorrect
- Solution: Verify ACM certificate ARN in annotations
- Certificate not validated
- Solution: Complete ACM certificate validation
- Domain mismatch
- Certificate domain doesn’t match ingress host
- Solution: Update certificate or ingress host
Performance Issues
High CPU Usage
Diagnosis:
# Check resource usage
kubectl top pods -n sligo
# Check metrics
kubectl describe pod <pod-name> -n sligo | grep -A 10 "Limits"
Solutions:
- Increase resource limits
app: resources: limits: cpu: 2000m # Increase from 500m - Scale horizontally
app: replicaCount: 5 # Increase from 2 - Enable autoscaling
app: autoscaling: enabled: true minReplicas: 2 maxReplicas: 10
High Memory Usage
Solutions:
- Increase memory limits
app: resources: limits: memory: 2Gi # Increase from 512Mi - Check for memory leaks
- Review application logs
- Monitor over time
Slow Response Times
Common causes:
- Database queries slow
- Solution: Add database indexes
- Solution: Enable connection pooling
- Insufficient resources
- Solution: Increase CPU/memory limits
- Too few replicas
- Solution: Scale up or enable autoscaling
Getting Help
If you’re still experiencing issues:
Gather information:
# Get all resources
kubectl get all -n sligo
# Get events
kubectl get events -n sligo --sort-by='.lastTimestamp'
# Get logs from all pods
kubectl logs -n sligo -l app.kubernetes.io/name=sligo-cloud --tail=100
Contact support:
- Email: support@sligo.ai
- Include:
- Helm chart version
- Kubernetes version
- AWS region (if applicable)
- Error messages and logs
- Steps to reproduce
Community resources:
- GitHub Issues: https://github.com/Sligo-AI/sligo-helm-charts/issues
- Documentation: https://sligo-ai.github.io/sligo-helm-charts