6 Key Insights About Stack Allocation in Go for Faster Programs

By • min read
<h2 id="intro">Introduction</h2> <p>Go's runtime team has been hard at work reducing the overhead of heap allocations in the latest releases (Go 1.22 and 1.23). Heap allocations not only consume CPU cycles for memory management but also burden the garbage collector—even with advances like the Green Tea GC. The solution? Shifting more allocations to the stack, which is far cheaper and can be reclaimed automatically. In this article, we break down six crucial insights about stack allocation, using a common slice-growth pattern to illustrate the problem and the fix. By understanding these points, you can write more efficient Go code that runs faster and generates less garbage.</p><figure style="margin:20px 0"><img src="https://go.dev/images/google-white.png" alt="6 Key Insights About Stack Allocation in Go for Faster Programs" style="width:100%;height:auto;border-radius:8px" loading="lazy"><figcaption style="font-size:12px;color:#666;margin-top:5px">Source: blog.golang.org</figcaption></figure> <h2 id="item1">1. The High Cost of Heap Allocations</h2> <p>Every time your Go program allocates memory from the heap, a significant amount of code executes to satisfy that request. The allocator must find a free block, manage fragmentation, and update metadata. Additionally, heap allocations increase the load on the garbage collector, which must later scan and reclaim unused objects. Even with incremental and concurrent GC improvements, the overhead remains non-trivial. In hot code paths, heap allocations can become a major bottleneck. <a href="#item2">Stack allocations</a> avoid this overhead entirely, making them a prime target for optimization.</p> <h2 id="item2">2. Why Stack Allocations Are Faster</h2> <p>Stack allocations are fundamentally cheaper than heap allocations. When a function is called, its local variables are allocated on the stack by simply moving the stack pointer—a constant-time operation that often requires no explicit allocation call. Stack memory is automatically reclaimed when the function returns, placing zero burden on the garbage collector. This also leads to better cache locality: stack data is reused promptly and stays hot in the CPU cache. In contrast, heap objects may be scattered across memory, causing cache misses. For performance-sensitive code, moving allocations to the stack can yield dramatic speedups.</p> <h2 id="item3">3. The Slice Growth Problem</h2> <p>Consider the common pattern of building a slice by appending items from a channel. The slice starts empty, and <code>append</code> dynamically grows the backing array. On its first iteration, <code>append</code> allocates a backing store of size 1. When full, it allocates a new array of size 2, copies the old data, and discards the original. Next it allocates size 4, then 8, and so on—doubling each time. While this amortizes the cost over many appends, the early allocations are wasteful. For a slice that ultimately contains only a few items, you might allocate and discard many small arrays, each requiring heap allocation and later GC.</p> <h2 id="item4">4. The Wasteful Startup Phase</h2> <p>The startup phase—when the slice grows from size 1 to moderate capacity—is where most of the allocation overhead occurs. Each resizing produces garbage that the collector must later clean up. If your slice never grows large, this phase represents all you experience: repeated small allocations, copies, and deallocations. In a tight loop processing many tasks, this can severely degrade performance. The <code>append</code> function doesn't know the final size, so it cannot be aggressive early on. The result is a hidden cost that many developers overlook. <a href="#item5">Pre-allocating capacity</a> sidesteps this entirely.</p> <h2 id="item5">5. A Simple Optimization: Pre-allocate Slice Capacity</h2> <p>If you know the approximate number of elements your slice will hold—or even an upper bound—you can pre-allocate the backing array using <code>make([]task, 0, estimatedCap)</code>. This ensures that <code>append</code> never needs to allocate new memory until the slice exceeds that capacity. For example, if you expect at most 100 tasks, use <code>tasks := make([]task, 0, 100)</code>. Now all appends within that capacity are allocation-free. This simple change eliminates the wasteful startup phase and reduces GC pressure. The cost is just one heap allocation upfront instead of many small ones. Pre-allocation also improves cache behavior because all elements are contiguous and allocated in a single block.</p> <h2 id="item6">6. Additional Benefits of Pre-allocation</h2> <p>Pre-allocating slice capacity does more than reduce heap allocations. It also minimizes copying: since the backing array doesn't need to be reallocated, existing elements stay in place. This saves CPU cycles and reduces the chance of memory fragmentation. Moreover, the single allocation is easier for the GC to track than many small ones. In some cases, if the slice is small enough (e.g., under a certain size threshold), the Go compiler may even allocate the backing array on the stack, turning a heap allocation into a zero-cost stack allocation. This is part of a broader trend in Go's compiler to escape-analyze objects that do not leave the function, moving them to the stack automatically. By pre-allocating, you make it more likely that the entire slice lives on the stack.</p> <h2 id="conclusion">Conclusion</h2> <p>Heap allocations are expensive, but many can be avoided by pushing memory onto the stack. Understanding the slice growth pattern—and the overhead of its startup phase—is a key step. By pre-allocating slice capacity, you can eliminate a cascade of small allocations, reduce GC load, and often let the compiler place the backing array on the stack entirely. These optimizations are especially valuable in hot code paths. Apply these insights to your own programs, and you'll see tangible performance gains with minimal code changes.</p>