containerd burning a CPU core: GOGC=100 is the wrong default on large nodes

4 min read

containerd burning a CPU core: GOGC=100 is the wrong default on large nodes

A few C19 large-node machines were running containerd at ~100% of a CPU core. Users reported node jitter. One perf capture and the flame graph was embarrassingly clear: 99% of CPU was in Go GC. The fix is one env var. But cleaning up afterwards, I realized Go's default GOGC=100 is essentially an anti-pattern for long-running processes with a small live heap.

TL;DR

  • Root cause: containerd's live heap is small (a few hundred MB), but short-term allocation rate is high. Default GOGC=100 triggers GC constantly — almost all CPU is in GC.
  • Fix: GOGC=1000. Loosens the GC trigger threshold. Cost: resident memory grows from ~700M to ~1G (acceptable). containerd CPU drops from ~1 core to 0.4 core.

Incident

Two C19 nodes reported abnormal containerd CPU usage:

  • Nodes: large business machines, dozens of pods each
  • Symptom: containerd long-running at 100% of one core
  • Memory: containerd RSS only ~700 MB (normal for containerd)

Key contrast: CPU saturated, memory low. "High CPU, low memory" is the classic signature of GC thrashing in a Go process.

Analysis

Find the hot spot

perf -p `pidof containerd` -g -- sleep 20

The flame graph: 99% of stacks point at runtime.gcBgMarkWorker / runtime.scanobject / runtime.greyobject. Combined with 700M RSS, this is frequent GC.

Verify the hypothesis

Disable GC entirely:

GOGC=off

Result:

  • Memory: starts climbing without limit. Ran for a while, hit 16 GB (of course — GC is off).
  • CPU: drops to ~0.2 core.

Confirmed: GC is the CPU consumer. But GOGC=off isn't usable in production. Time to find a middle ground.

After tuning, GOGC=1000:

  • Memory: ~700 MB (basically unchanged — live heap is small)
  • CPU: ~0.4 core
  • containerd functional paths (create/destroy/status): verified normal

Why does cranking GOGC drop CPU from 1 core to 0.4?

Go's GC trigger is controlled by GOGC:

Target heap memory = Live heap + (Live heap + GC roots) × GOGC / 100

GOGC means: percent of growth relative to live heap before the next GC. Default GOGC=100 means:

  • live heap = 10 MB → next GC at 20 MB
  • live heap = 700 MB → next GC at 1.4 GB

But containerd's profile is "long-lived process + many short-lived temporary objects":

  • live heap stable at a few hundred MB
  • short bursts allocate many short-lived objects (each container create/destroy/status query spawns a flurry of intermediate objects)
  • the "delta" easily hits 100% of live heap → GC fires
  • but very little is actually reachable garbage → GC does mostly useless work
  • so GC fires repeatedly, each pass cleans nothing meaningful, CPU is all GC

Cranking GOGC = relaxing the trigger threshold. Let short-lived objects die naturally, then sweep many at once. CPU spent per unit garbage collected goes up; GC frequency goes down. Cost: higher peak memory.

More detail: Go GC Guide.

Fix

Trivial. Edit the unit file:

/usr/lib/systemd/system/containerd.service

[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
Environment="GOGC=1000"

systemctl daemon-reload && systemctl restart containerd. Done.

Choosing a GOGC value

We picked 1000; not a magic number:

  • GOGC=200–500: middle ground. Modest memory growth, meaningful CPU reduction.
  • GOGC=1000: our choice. containerd live heap is a few hundred MB; 1000% headroom → ~1 GB peak. Acceptable.
  • GOGC=off: unbounded memory growth, debug only.
  • More refined: use GOMEMLIMIT for a soft cap; Go GC-s aggressively as you approach the limit. More stable than tuning GOGC alone, but requires more precise sizing.

Heuristic: small live heap + room to grow memory → bigger GOGC. The smaller your live heap and the more memory you can spare, the higher you can push GOGC.

Appendix

Before / after (measured):

Metric GOGC=100 (default) GOGC=off GOGC=1000
CPU ~1 core ~0.2 core ~0.4 core
RSS ~700 MB climbs to 16 GB ~700 MB (peak ~1 GB)
GC frequency high 0 low

References:

Zoe

Written by

Zoe

AI Infra Engineer · LLM Serving · GPU/RDMA · indie hacker, obsessed with shipping tools

Comments