June 15, 2026 · 9 min read · kubernetes
Pod Stuck in Pending: The Complete Kubernetes Troubleshooting Guide
A pod stuck in Pending means Kubernetes has accepted it but cannot run it yet. Unlike CrashLoopBackOff (the container runs and dies), a Pending pod has never started — the scheduler cannot find a home for it, or something it needs does not exist. Here is every cause and the exact command to confirm it.
The first move, always
Before guessing, ask the scheduler what it is unhappy about:
kubectl describe pod <pod>
Scroll to the Events section at the bottom. The scheduler writes its reasoning there in plain text — 0/3 nodes are available: insufficient memory, pod has unbound immediate PersistentVolumeClaims, node(s) had untolerated taint. Ninety percent of Pending diagnoses end right here. The causes below map directly to those messages.
Cause 1: Insufficient CPU or memory
The most common cause. No node has enough allocatable resources to satisfy the pod's requests.
kubectl describe nodes | grep -A5 'Allocated resources'
Compare the pod's requests against what each node has free. Fix: lower the pod's requests if they are inflated, scale up the node pool, or enable the cluster autoscaler so a new node is added on demand.
Cause 2: PVC not bound
If the pod mounts a PersistentVolumeClaim that is not Bound, the pod waits indefinitely.
kubectl get pvc
kubectl describe pvc <claim>
A Pending PVC usually means the StorageClass does not exist, has a typo, or there is no provisioner to satisfy it. Fix: create or correct the StorageClass, or confirm your CSI driver / provisioner is running. See the PVC section of our Kubernetes monitoring guide.
Cause 3: Node selector or affinity mismatch
If the pod has a nodeSelector or nodeAffinity rule that no node satisfies, it stays Pending. The describe output says something like node(s) didn't match Pod's node affinity/selector.
Fix: correct the label requirement, or label a node to match — kubectl label node <node> <key>=<value>.
Cause 4: Taints and tolerations
Nodes can carry taints (for example, a GPU pool or a control-plane node) that repel pods lacking the matching toleration. The event reads node(s) had untolerated taint {key: value}.
kubectl describe node <node> | grep -i taint
Fix: add the matching tolerations to the pod spec, or schedule it onto an untainted pool.
Cause 5: Resource quota exceeded
If the namespace has a ResourceQuota and the new pod would push it over, admission blocks scheduling.
kubectl describe resourcequota -n <namespace>
Fix: raise the quota, free capacity by removing unused workloads, or right-size the pod's requests.
Cause 6: Missing pull secrets
Less common, but a pod referencing an imagePullSecret that does not exist can fail before scheduling completes. Fix: create the secret in the right namespace and reference it correctly in the pod spec or service account.
A fast diagnostic order
Work top-down — it matches frequency:
kubectl describe pod→ read the Events. (Solves most cases instantly.)- If "insufficient resources" → check node allocatable vs requests.
- If "unbound PVC" → check PVCs and StorageClass.
- If "didn't match affinity/taint" → check node labels and taints.
- If admission was denied → check ResourceQuota.
How Tracegrid identifies the reason
Tracegrid reads the same scheduler events you would — but it does it the instant the pod goes Pending and tells you the specific constraint in plain English: "Pending: no node has 2Gi free; largest free is 1.4Gi on node-2," or "Pending: unbound PVC data-0, StorageClass fast-ssd not found." You skip the describe-and-decode loop entirely and go straight to the fix. That is the difference between monitoring that reports state and monitoring that explains it.
Written by Pradip — founder of Tracegrid, building AI infrastructure intelligence so small teams get senior-SRE answers at 3am.