Known Issues
kinder doctor runs 24 diagnostic checks across 9 categories to detect common environment problems before they cause cryptic cluster creation failures. This page documents every check: what it detects, why it matters, and how to fix it.
For exit codes, CI usage, and common error scenarios, see the Troubleshooting page.
How to run diagnostics
Section titled “How to run diagnostics”kinder doctorUse --output json for machine-readable output (useful in CI pipelines):
kinder doctor --output jsonExit codes
Section titled “Exit codes”| Exit code | Meaning |
|---|---|
| 0 | All checks passed (or skipped) |
| 1 | One or more checks failed |
| 2 | One or more checks warned (no failures) |
If both failures and warnings exist, exit code 1 takes priority.
Runtime
Section titled “Runtime”Container Runtime
Section titled “Container Runtime”What it detects: Whether a container runtime (Docker, Podman, or nerdctl) is installed and responding. Checks in order: docker, podman, nerdctl — stops at the first binary found.
Why it matters: A working container runtime is required for cluster creation. Without one, kinder create cluster cannot start.
Platforms: All
How to fix:
If no runtime is found, install one:
# Docker (recommended)# https://docs.docker.com/get-docker/
# Podman# https://podman.io/getting-started/installation
# nerdctl# https://github.com/containerd/nerdctlIf a runtime is found but not responding, start the daemon:
# Linuxsudo systemctl start docker
# macOS / Windows# Open Docker DesktopDocker
Section titled “Docker”Disk Space
Section titled “Disk Space”What it detects: Available disk space on the Docker data root directory. Warns below 5 GB, fails below 2 GB.
Why it matters: Docker images and container layers require significant disk space. Running out mid-creation causes corrupted images and cryptic errors.
Platforms: All
How to fix:
# Remove unused containers, networks, and dangling imagesdocker system prune
# Also remove unused imagesdocker image prune -aDaemon JSON Init Config
Section titled “Daemon JSON Init Config”What it detects: Whether any daemon.json file has "init": true set. Searches six candidate locations: /etc/docker/daemon.json, ~/.docker/daemon.json, $XDG_CONFIG_HOME/docker/daemon.json, /var/snap/docker/current/config/daemon.json, ~/.rd/docker/daemon.json, and C:\ProgramData\docker\config\daemon.json (Windows).
Why it matters: Docker’s init option injects tini as PID 1 in every container. kind nodes expect to run their own init system, so "init": true breaks node startup with: Couldn't find an alternative telinit implementation to spawn.
Platforms: All
How to fix:
Remove "init": true from your daemon.json and restart Docker:
{ "init": false}sudo systemctl restart dockerSee kind known issues: Docker init for details.
Docker Snap
Section titled “Docker Snap”What it detects: Whether Docker is installed via snap by resolving the docker binary path through symlinks and checking for /snap/ in the resolved path.
Why it matters: Snap-confined Docker uses a restricted TMPDIR that prevents kind from writing temporary files during cluster creation.
Platforms: Linux only
How to fix:
Option A — set a snap-accessible TMPDIR:
export TMPDIR=$HOME/tmpmkdir -p $TMPDIROption B — reinstall Docker via apt (recommended):
sudo snap remove docker# Follow: https://docs.docker.com/engine/install/ubuntu/Docker Socket Permissions
Section titled “Docker Socket Permissions”What it detects: Whether docker info fails with “permission denied,” indicating the current user cannot access the Docker socket.
Why it matters: Without socket access, no Docker commands can run, including cluster creation.
Platforms: Linux only
How to fix:
sudo usermod -aG docker $USERnewgrp dockerkubectl
Section titled “kubectl”What it detects: Whether the kubectl binary is installed and responds to kubectl version --client.
Why it matters: kubectl is required to interact with the cluster after creation. Without it, you cannot deploy workloads or inspect cluster state.
Platforms: All
How to fix:
Install kubectl: https://kubernetes.io/docs/tasks/tools/
kubectl Version Skew
Section titled “kubectl Version Skew”What it detects: Whether the installed kubectl client version is more than one minor version away from the Kubernetes version used by the default kind node image.
Why it matters: The Kubernetes version skew policy allows kubectl to be within +/-1 minor version of the cluster. Larger skew can cause API incompatibilities, missing fields, or unexpected behavior.
Platforms: All
How to fix:
Update kubectl to match your cluster version: https://kubernetes.io/docs/tasks/tools/
NVIDIA Driver
Section titled “NVIDIA Driver”What it detects: Whether nvidia-smi is installed and can query the driver version.
Why it matters: The NVIDIA driver is the foundation for GPU container support. Without it, the GPU device plugin cannot access GPUs.
Platforms: Linux only
How to fix:
Install NVIDIA drivers: https://www.nvidia.com/drivers
Verify installation:
nvidia-smiNVIDIA Container Toolkit
Section titled “NVIDIA Container Toolkit”What it detects: Whether nvidia-ctk (NVIDIA Container Toolkit CLI) is available on PATH.
Why it matters: The container toolkit provides the runtime hooks that expose GPU devices inside containers.
Platforms: Linux only
How to fix:
sudo apt-get install -y nvidia-container-toolkitSee the NVIDIA Container Toolkit install guide for other distributions.
NVIDIA Docker Runtime
Section titled “NVIDIA Docker Runtime”What it detects: Whether the nvidia runtime is registered in Docker’s runtime configuration by querying docker info.
Why it matters: Docker must be configured to use the NVIDIA runtime so that GPU devices are passed through to kind node containers.
Platforms: Linux only
How to fix:
sudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart dockerVerify the runtime is registered:
docker info | grep -i nvidiaKernel
Section titled “Kernel”Inotify Limits
Section titled “Inotify Limits”What it detects: The values of fs.inotify.max_user_watches and fs.inotify.max_user_instances. Warns if watches are below 524288 or instances are below 512.
Why it matters: kind nodes run multiple inotify watchers (kubelet, kube-apiserver, etcd). Low limits cause “too many open files” errors that crash cluster components.
Platforms: Linux only
How to fix:
sudo sysctl fs.inotify.max_user_watches=524288sudo sysctl fs.inotify.max_user_instances=512To make these changes persistent across reboots:
echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.d/99-kind.confecho "fs.inotify.max_user_instances=512" | sudo tee -a /etc/sysctl.d/99-kind.confsudo sysctl --systemSee kind known issues: inotify for details.
Kernel Version
Section titled “Kernel Version”What it detects: The running kernel version via uname. Fails if below 4.6.
Why it matters: Linux kernels before 4.6 lack cgroup namespace support, which kind requires to isolate container resources. This is a hard blocker — kind cannot function on older kernels.
Platforms: Linux only
How to fix:
Upgrade your kernel to 4.6 or later. On Ubuntu:
sudo apt-get update && sudo apt-get dist-upgradeSecurity
Section titled “Security”AppArmor
Section titled “AppArmor”What it detects: Whether the AppArmor kernel module is loaded and enabled by reading /sys/module/apparmor/parameters/enabled.
Why it matters: AppArmor can cache stale container profiles that prevent kind containers from starting. This is a known Docker issue (moby/moby#7512).
Platforms: Linux only
How to fix:
If kind containers fail to start with AppArmor-related errors:
sudo aa-remove-unknownThis removes stale AppArmor profiles without disabling AppArmor entirely.
Local Path CVE
Section titled “Local Path CVE”What it detects: The deployed version of local-path-provisioner on a running cluster, by querying the deployed image tag. Warns when the version is below v0.0.34 — the fix release for CVE-2025-62878 (path traversal allowing a malicious workload to write outside its PV).
Why it matters: A workload exploiting CVE-2025-62878 can read or write host filesystem paths outside its PersistentVolume, breaking the isolation guarantees of dynamic provisioning. The fix is straightforward (a version bump), but the vulnerability is silent — there is no functional symptom until exploitation.
Platforms: All. Skipped when local-path-provisioner is not installed (addons.localPath: false).
Status values:
- ok — local-path-provisioner is at v0.0.34 or higher (kinder ships v0.0.35+ pinned)
- warn — version below v0.0.34 detected
- skip — addon not installed, or unable to probe deployed version
How to fix:
If you’re running the kinder default (localPath: true), simply create a fresh cluster — kinder pins the embedded manifest to a fix-version image. If you’ve overridden the manifest or are running a derived addon, bump the local-path-provisioner image tag to v0.0.34 or later.
SELinux
Section titled “SELinux”What it detects: Whether SELinux is in enforcing mode by reading /sys/fs/selinux/enforce. Only warns on Fedora, where /dev/dma_heap permission denials are a known issue.
Why it matters: On Fedora, SELinux enforcing mode denies access to /dev/dma_heap, which breaks kind node containers.
Platforms: Linux only
How to fix:
Temporary (until reboot):
sudo setenforce 0Persistent:
# Edit /etc/selinux/configSELINUX=permissivePlatform
Section titled “Platform”Firewalld
Section titled “Firewalld”What it detects: Whether firewalld is running with the nftables backend by checking firewall-cmd --state and parsing /etc/firewalld/firewalld.conf. If the FirewallBackend line is absent, it defaults to nftables (the Fedora 32+ default).
Why it matters: The nftables backend bypasses iptables rules that Docker and kind rely on for container networking. This causes network connectivity failures between pods and nodes.
Platforms: Linux only
How to fix:
Edit /etc/firewalld/firewalld.conf:
FirewallBackend=iptablesThen restart firewalld:
sudo systemctl restart firewalldSee kind known issues: firewalld for details.
Host Mount Path
Section titled “Host Mount Path”What it detects: Whether every host path declared in a cluster config’s extraMounts exists and is accessible. The check reads extraMounts via --config <file>. Relative paths are resolved with filepath.Abs before os.Stat.
Why it matters: A non-existent host path causes kinder create cluster to fail with a cryptic Docker mount error mid-provisioning, after several other steps have already run. Catching the missing path at doctor-time saves minutes of false-start cluster creation.
Platforms: All. Skipped when --config is not passed or extraMounts is absent.
Status values:
- ok — host path exists and is accessible
- fail — host path does not exist on the filesystem
- warn — host path exists but cannot be read (permission error)
How to fix:
Create the directory before invoking kinder create cluster:
mkdir -p /Users/me/work/myapp-datakinder create cluster --config cluster.yamlFor relative paths, remember that they resolve against your current working directory, not the config file’s directory.
Docker Desktop File Sharing
Section titled “Docker Desktop File Sharing”What it detects: On macOS, whether every host path in extraMounts is within Docker Desktop’s configured file-sharing roots. Reads ~/Library/Group Containers/group.com.docker/settings-store.json to extract the shared roots; falls back to Docker Desktop default sharing roots when settings-store.json is absent.
Why it matters: Docker Desktop on macOS only mounts paths that are under approved file-sharing roots. Paths outside the shared roots silently fail to mount — the container starts, but the bind mount is empty. The pod sees an empty directory instead of your expected files, causing confusing application-level errors.
Platforms: macOS only. Skipped on Linux and Windows.
Status values:
- ok — every configured
extraMounthost path is under a shared root - warn — one or more
extraMountpaths are outside the shared roots
How to fix:
Open Docker Desktop → Settings → Resources → File Sharing, add the parent directory containing your mount path, and apply & restart Docker Desktop. Or move your mount source under an already-shared root (e.g., ~/).
See the Host Directory Mounting guide for the end-to-end pattern.
WSL2 Cgroup
Section titled “WSL2 Cgroup”What it detects: Whether the system is running under WSL2 using a multi-signal approach (requires /proc/version containing “microsoft” AND at least one of: $WSL_DISTRO_NAME set, /proc/sys/fs/binfmt_misc/WSLInterop exists, or WSLInterop-late exists). If WSL2 is confirmed, verifies that cgroup v2 controllers (cpu, memory, pids) are available.
Why it matters: kind nodes need cgroup v2 controllers for resource management. WSL2 does not enable cgroup v2 by default on older distributions, causing kubelet failures.
Platforms: Linux only (WSL2 runs a Linux kernel)
How to fix:
Create or edit %UserProfile%\.wslconfig in Windows:
[wsl2]kernelCommandLine = cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1Then restart WSL:
wsl --shutdownSee WSL2 cgroup v2 setup for a detailed guide.
Rootfs Device
Section titled “Rootfs Device”What it detects: Whether Docker is using the BTRFS storage driver or a BTRFS backing filesystem by querying docker info for both the .Driver field and .DriverStatus metadata.
Why it matters: BTRFS has known issues with Docker’s overlay storage and device node management. kind containers may fail to create or run with device-related errors.
Platforms: Linux only
How to fix:
Option A — use a different filesystem for Docker storage:
Dedicate an ext4 or XFS partition for /var/lib/docker.
Option B — switch Docker storage driver to overlay2:
See kind known issues: Docker on BTRFS for configuration steps.
Network
Section titled “Network”Subnet Clashes
Section titled “Subnet Clashes”What it detects: Whether Docker network subnets (from the kind and bridge networks) overlap with host routing table entries. On Linux, parses ip route show; on macOS, parses netstat -rn. Self-referential routes (Docker’s own routes in the host table) are excluded from comparison.
Why it matters: When Docker subnets overlap with VPN, corporate network, or other host routes, cluster nodes cannot reach external services and pods may have connectivity failures.
Platforms: All (route parsing is platform-specific)
How to fix:
- Identify the conflicting network:
docker network inspect kind- Configure custom Docker address pools in
/etc/docker/daemon.json:
{ "default-address-pools": [ { "base": "10.200.0.0/16", "size": 24 } ]}- Restart Docker and recreate the cluster:
sudo systemctl restart dockerkinder delete clusterkinder create clusterOffline Readiness
Section titled “Offline Readiness”What it detects: Whether every container image required by the active addon set is already present in the local image store. Used as a pre-flight check for kinder create cluster --air-gapped. The catalog is sourced from RequiredAllImages() in pkg/cluster/internal/providers/common, which aggregates Images []string from each enabled addon package.
Why it matters: Air-gapped cluster creation (--air-gapped) disables every network call for image pulls. If even one required image is absent locally, creation fast-fails with a missing-image list — useful, but only after you’ve already invoked the command. Running this check first lets you pre-load everything in advance.
Platforms: All. Skipped when no container runtime is found.
Status values:
- ok — every required image is present in the local image store
- warn — one or more required images are absent (printed as a tabwriter table)
- skip — no container runtime found
How to fix:
Pull each missing image with your provider’s pull command:
docker pull docker.io/metallb/controller:v0.15.3docker pull docker.io/envoyproxy/envoy:v1.36.2# ...etc for every image listed by the warningThe Working Offline guide walks through the full two-mode workflow: pre-create image baking vs. post-create kinder load images.
Cluster Lifecycle
Section titled “Cluster Lifecycle”Cluster Node Skew
Section titled “Cluster Node Skew”What it detects: Kubernetes version skew across the nodes of a running cluster. Reports a warning when worker nodes are more than 3 minor versions behind the control-plane, or when control-plane nodes in an HA cluster do not all share the same minor version. Reads node Kubernetes versions via in-container kubelet --version (matches realListNodes semantics in the doctor package).
Why it matters: Out-of-skew workers can fail to register with the control-plane, refuse to schedule pods that need newer API features, or break in-place upgrade paths. Mixed control-plane minors are not supported by kubeadm and produce subtle etcd / scheduler bugs.
Platforms: All. Skipped when no kinder cluster is found, or when nodes use non-semver image tags (e.g. latest) that cannot be parsed.
Status values:
- ok — all nodes within skew policy
- warn — workers more than 3 minors behind CP, or CP minor mismatch detected
- skip — cluster not found, or non-semver node image tags in use
How to fix:
Recreate the cluster with a config that complies with skew rules. For multi-version clusters, see the Multi-Version Clusters guide — kinder validates skew at config-parse time so most violations are caught before any container is created.
Cluster Resume Readiness
Section titled “Cluster Resume Readiness”What it detects: Whether etcd quorum is healthy on a multi-control-plane cluster before kinder resume brings worker nodes back up. Probes etcd via crictl exec <etcd-id> etcdctl endpoint health against the first running control-plane node.
Why it matters: When a stopped HA cluster has lost more control-plane containers than its quorum tolerates (e.g., 2 of 3 CPs were docker rm’d while the cluster was paused), bringing workers up against a broken etcd produces hours of cascading kubelet/scheduler errors with no obvious root cause. This check warns up-front so the user can recover etcd before resume.
Platforms: All. Skipped on single-control-plane clusters (no quorum risk).
Status values:
- ok — all declared CP containers are present and etcd reports healthy on the bootstrap node
- warn — at least one CP container is missing/stopped, OR etcd reports unhealthy members below quorum
- skip — single-CP cluster, or no kinder cluster found, or
crictl/etcdctlnot reachable inside the etcd container (rare; indicates a non-standard node image)
How to fix:
If the warning fires because CP containers are stopped, restart them with docker start <node-name> before running kinder resume. If etcd itself reports degraded health, restore from the most recent kinder snapshot (the snapshot’s etcd state was captured while the cluster was running).
Automatic Mitigations
Section titled “Automatic Mitigations”When you run kinder create cluster, the create flow calls ApplySafeMitigations() before provisioning nodes. This system can automatically apply safe fixes without any manual intervention.
Tier-1 constraints
Section titled “Tier-1 constraints”Automatic mitigations are limited to tier-1 operations only:
- Allowed: Environment variable adjustments, cluster configuration tweaks
- Never allowed: Running
sudo, modifying system files (daemon.json,sysctl.conf,firewalld.conf), or elevating privileges
This design ensures that kinder never makes system-level changes without explicit user action. When a problem is detected that requires system changes, kinder doctor documents the fix command for you to run manually.
Current state
Section titled “Current state”No tier-1 automatic mitigations currently exist. The infrastructure is fully wired and ready for future use — when a safe, idempotent mitigation is identified, it can be added to the SafeMitigations() registry without any architectural changes.
All 24 checks documented on this page are diagnostic only. They report problems and suggest fixes but do not modify your system.