Up to Main Index Up to Journal for February, 2026
JOURNAL FOR TUESDAY 24TH FEBRUARY, 2026
______________________________________________________________________________
SUBJECT: Notes on Go 1.26 performance
DATE: Tue 24 Feb 22:35:18 GMT 2026
On the 10th February Go 1.26.0 was released. Various articles were written
with headlines such as:
“Silently boost your backend performance with Go 1.26”
“Go 1.26 unleashes performance-boosting Green Tea GC”
“Go 1.26 Released: Enhanced Performance and New ...”
It’s a shame, but I’m not seeing much of a performance gain. In fact, a lot of
my tests show a regression — at least on Intel.
For my testing I used my desktop machine (Intel) and my MacBook Pro M4 (ARM).
Desktop is an Intel Core i9-12900T, 64Gb RAM. ARM is Apple M4, 32Gb RAM.
For the tests I’m using my standard benchmarks for Mere (my own programming
language).
I thought it might be interesting to start with a base-line comparison between
the machines. I’ve included the MacBook with figures for performance mode and
power saving mode. The Intel desktop is used as the baseline for comparisons.
This is with Go 1.25.7:
Intel --- ARM Performance --- --- ARM Power Save ---
BENCHMARK TIME TIME DIFF % DIFF TIME DIFF % DIFF
loop-counter 10.26s 8.59s -1.67s 16.28% 16.63s +6.37s 62.09%
indexed-counter 36.39s 21.58s -14.81s 40.70% 42.55s +6.16s 16.93%
counter 10.30s 8.70s -1.60s 15.53% 17.16s +6.86s 66.60%
eratosthenes-sieve 10.75s 8.95s -1.80s 16.74% 19.10s +8.35s 77.67%
We can see that the MacBook, in performance mode, wipes the floor with the
desktop. However, I could practically hear the juice being sucked out of the
MacBook battery during testing :P
All of the tests in this post use the MacBook in performance mode.
The first set of benchmarks compares Go 1.25.7 with stock Go 1.26.0:
Desktop (Intel)
1.25.7 1.26.0 DIFF % DIFF
loop-counter 10.26s 12.17s +1.91s 18.62%
indexed-counter 36.39s 51.87s +15.48s 42.54%
counter 10.30s 11.73s +1.43s 13.88%
eratosthenes-sieve 10.75s 11.20s +0.45s 4.19%
MacBook Pro M4 (ARM)
1.25.7 1.26.0 DIFF % DIFF
loop-counter 8.59s 8.02s -0.57s 6.64%
indexed-counter 21.58s 20.83s -0.75s 3.48%
counter 8.70s 8.51s -0.19s 2.18%
eratosthenes-sieve 8.95s 9.92s +0.97s 10.84%
Here we can see that on Intel performance is up to 42% worse. While on ARM it
is up to 7% better except for on case where it is over 10% worse.
There is a setting ‘GOEXPERIMENT=nogreenteagc’ that allows use to disable the
new green tea garbage collector. Lets see if that is the culprit on Intel:
Desktop (Intel)
1.25.7 1.26.0 DIFF % DIFF
loop-counter 10.26s 11.51s +1.25s 12.18%
indexed-counter 36.39s 47.84s +11.45s 31.46%
counter 10.30s 11.05s +0.75s 7.28%
eratosthenes-sieve 10.75s 10.92s +0.17s 1.58%
MacBook Pro M4 (ARM)
1.25.7 1.26.0 DIFF % DIFF
loop-counter 8.59s 8.20s -0.39s 4.54%
indexed-counter 21.58s 21.27s -0.31s 1.44%
counter 8.70s 8.41s -0.29s 3.33%
eratosthenes-sieve 8.95s 8.98s +0.03s 0.34%
On Intel performance is now up to 31% worse. On ARM it’s now only up to 4% better.
Therefore, it’s better to use the green tea garbage collector on ARM but not Intel?
There is another new feature that allocates more slices on the stack. This can
be disabled using the ‘-gcflags=all=-d=variablemakehash=n’ compiler flags.
Desktop (Intel)
1.25.7 1.26.0 Diff % Diff
loop-counter 10.26s 10.62s +0.36s 3.51%
indexed-counter 36.39s 49.64s +13.25s 36.41%
counter 10.30s 10.76s +0.46s 4.47%
eratosthenes-sieve 10.75s 10.17s -0.58s 5.40%
MacBook Pro M4 (ARM)
1.25.7 1.26.0 Diff % Diff
loop-counter 8.59s 6.73s -1.86s 21.65%
indexed-counter 21.58s 19.65s -1.93s 8.94%
counter 8.70s 8.30s -0.40s 4.60%
eratosthenes-sieve 8.95s 8.52s -0.44s 4.92%
In this case Intel is up to 36% worse. On ARM performance is up to 22% better.
Once gain, it seems better to use the new slice allocation on ARM not Intel?
Another new feature that can be turned off is an experiment to reduce the cost
of small object allocations.
This can be done by setting ‘GOEXPERIMENT=nosizespecializedmalloc’ at build
time. Let’s try it:
Desktop (Intel)
1.25.7 1.26.0 Diff % Diff
loop-counter 10.26s 12.11s +1.85s 18.03%
indexed-counter 36.39s 48.48s +12.09s 33.22%
counter 10.30s 11.71s +1.41s 13.69%
eratosthenes-sieve 10.75s 11.07s +0.32s 2.98%
This makes Go 1.26 performance on Intel worse than stock Go 1.26
How about combining ‘GOEXPERIMENT=nogreenteagc,nosizespecializedmalloc’ and
‘-gcflags=all=-d=variablemakehash=n’? Then we get:
Desktop (Intel)
1.25.7 1.26.0 Diff % Diff
loop-counter 10.26s 10.38s +0.12s 1.17%
indexed-counter 36.39s 48.55s +12.16s 33.42%
counter 10.30s 10.79s +0.49s 4.76%
eratosthenes-sieve 10.75s 10.15s -0.60s 5.58%
These figures are better than stock Go 1.26 across the board. How much better?
Comparing to our original stock Go 1.26 times on Intel we get:
Desktop (Intel) Stock +Flags
1.26.0 1.26.0 Diff % Diff
loop-counter 12.17s 10.38s -1.79s 14.71%
indexed-counter 51.87s 48.55s -3.32s 6.40%
counter 11.73s 10.79s -0.94s 8.01%
eratosthenes-sieve 11.20s 10.15s -1.05s 9.38%
Turning off the Green Tea garbage collector, stack slice allocations and new
allocations for small objects is an overall Go 1.26 performance improvement.
At least on Intel, on my desktop. However, the performance is still a ways off
Go 1.25.7 :(
Go 1.26 — a disappointment and I’m not that impressed. Worse, in Go 1.27
expect the option to turn off the Green Tea garbage collector and small
allocations to go away.
Should I use stock Go 1.26.0, Go 1.26.0 with a bunch of options or continue
using Go 1.25.7 for now? I just don’t know ¯\_(“/)_/¯
--
Diddymus
Up to Main Index Up to Journal for February, 2026