WolfMUD: Journal for Tuesday 24th February, 2026


  Up to Main Index                          Up to Journal for February, 2026

                   JOURNAL FOR TUESDAY 24TH FEBRUARY, 2026
______________________________________________________________________________

SUBJECT: Notes on Go 1.26 performance
   DATE: Tue 24 Feb 22:35:18 GMT 2026

On the 10th February Go 1.26.0 was released. Various articles were written
with headlines such as:

  “Silently boost your backend performance with Go 1.26”
  “Go 1.26 unleashes performance-boosting Green Tea GC”
  “Go 1.26 Released: Enhanced Performance and New ...”

It’s a shame, but I’m not seeing much of a performance gain. In fact, a lot of
my tests show a regression — at least on Intel.

For my testing I used my desktop machine (Intel) and my MacBook Pro M4 (ARM).
Desktop is an Intel Core i9-12900T, 64Gb RAM. ARM is Apple M4, 32Gb RAM.

For the tests I’m using my standard benchmarks for Mere (my own programming
language).

I thought it might be interesting to start with a base-line comparison between
the machines. I’ve included the MacBook with figures for performance mode and
power saving mode. The Intel desktop is used as the baseline for comparisons.
This is with Go 1.25.7:


                     Intel   --- ARM Performance ---  --- ARM Power Save ---
           BENCHMARK   TIME    TIME     DIFF  % DIFF    TIME    DIFF  % DIFF
        loop-counter 10.26s   8.59s   -1.67s  16.28%  16.63s  +6.37s  62.09%
     indexed-counter 36.39s  21.58s  -14.81s  40.70%  42.55s  +6.16s  16.93%
             counter 10.30s   8.70s   -1.60s  15.53%  17.16s  +6.86s  66.60%
  eratosthenes-sieve 10.75s   8.95s   -1.80s  16.74%  19.10s  +8.35s  77.67%


We can see that the MacBook, in performance mode, wipes the floor with the
desktop. However, I could practically hear the juice being sucked out of the
MacBook battery during testing :P

All of the tests in this post use the MacBook in performance mode.

The first set of benchmarks compares Go 1.25.7 with stock Go 1.26.0:


               Desktop (Intel)
                                  1.25.7  1.26.0     DIFF  % DIFF
                     loop-counter 10.26s  12.17s   +1.91s  18.62%
                  indexed-counter 36.39s  51.87s  +15.48s  42.54%
                          counter 10.30s  11.73s   +1.43s  13.88%
               eratosthenes-sieve 10.75s  11.20s   +0.45s   4.19%

               MacBook Pro M4 (ARM)
                                  1.25.7  1.26.0     DIFF  % DIFF
                     loop-counter  8.59s   8.02s   -0.57s   6.64%
                  indexed-counter 21.58s  20.83s   -0.75s   3.48%
                          counter  8.70s   8.51s   -0.19s   2.18%
               eratosthenes-sieve  8.95s   9.92s   +0.97s  10.84%


Here we can see that on Intel performance is up to 42% worse. While on ARM it
is up to 7% better except for on case where it is over 10% worse.

There is a setting ‘GOEXPERIMENT=nogreenteagc’ that allows use to disable the
new green tea garbage collector. Lets see if that is the culprit on Intel:


               Desktop (Intel)
                                  1.25.7  1.26.0     DIFF  % DIFF
                     loop-counter 10.26s  11.51s   +1.25s  12.18%
                  indexed-counter 36.39s  47.84s  +11.45s  31.46%
                          counter 10.30s  11.05s   +0.75s   7.28%
               eratosthenes-sieve 10.75s  10.92s   +0.17s   1.58%

               MacBook Pro M4 (ARM)
                                  1.25.7  1.26.0     DIFF  % DIFF
                     loop-counter  8.59s   8.20s   -0.39s   4.54%
                  indexed-counter 21.58s  21.27s   -0.31s   1.44%
                          counter  8.70s   8.41s   -0.29s   3.33%
               eratosthenes-sieve  8.95s   8.98s   +0.03s   0.34%


On Intel performance is now up to 31% worse. On ARM it’s now only up to 4% better.
Therefore, it’s better to use the green tea garbage collector on ARM but not Intel?

There is another new feature that allocates more slices on the stack. This can
be disabled using the ‘-gcflags=all=-d=variablemakehash=n’ compiler flags.


               Desktop (Intel)
                                  1.25.7  1.26.0     Diff  % Diff
                     loop-counter 10.26s  10.62s   +0.36s   3.51%
                  indexed-counter 36.39s  49.64s  +13.25s  36.41%
                          counter 10.30s  10.76s   +0.46s   4.47%
               eratosthenes-sieve 10.75s  10.17s   -0.58s   5.40%

               MacBook Pro M4 (ARM)
                                  1.25.7  1.26.0     Diff  % Diff
                     loop-counter  8.59s   6.73s   -1.86s  21.65%
                  indexed-counter 21.58s  19.65s   -1.93s   8.94%
                          counter  8.70s   8.30s   -0.40s   4.60%
               eratosthenes-sieve  8.95s   8.52s   -0.44s   4.92%


In this case Intel is up to 36% worse. On ARM performance is up to 22% better.
Once gain, it seems better to use the new slice allocation on ARM not Intel?

Another new feature that can be turned off is an experiment to reduce the cost
of small object allocations.

This can be done by setting ‘GOEXPERIMENT=nosizespecializedmalloc’ at build
time. Let’s try it:


               Desktop (Intel)
                                  1.25.7  1.26.0     Diff  % Diff
                     loop-counter 10.26s  12.11s   +1.85s  18.03%
                  indexed-counter 36.39s  48.48s  +12.09s  33.22%
                          counter 10.30s  11.71s   +1.41s  13.69%
               eratosthenes-sieve 10.75s  11.07s   +0.32s   2.98%


This makes Go 1.26 performance on Intel worse than stock Go 1.26

How about combining ‘GOEXPERIMENT=nogreenteagc,nosizespecializedmalloc’ and
‘-gcflags=all=-d=variablemakehash=n’? Then we get:


               Desktop (Intel)
                                  1.25.7  1.26.0     Diff  % Diff
                     loop-counter 10.26s  10.38s   +0.12s   1.17%
                  indexed-counter 36.39s  48.55s  +12.16s  33.42%
                          counter 10.30s  10.79s   +0.49s   4.76%
               eratosthenes-sieve 10.75s  10.15s   -0.60s   5.58%


These figures are better than stock Go 1.26 across the board. How much better?
Comparing to our original stock Go 1.26 times on Intel we get:


               Desktop (Intel)     Stock  +Flags
                                  1.26.0  1.26.0     Diff  % Diff
                     loop-counter 12.17s  10.38s   -1.79s  14.71%
                  indexed-counter 51.87s  48.55s   -3.32s   6.40%
                          counter 11.73s  10.79s   -0.94s   8.01%
               eratosthenes-sieve 11.20s  10.15s   -1.05s   9.38%


Turning off the Green Tea garbage collector, stack slice allocations and new
allocations for small objects is an overall Go 1.26 performance improvement.
At least on Intel, on my desktop. However, the performance is still a ways off
Go 1.25.7 :(

Go 1.26 — a disappointment and I’m not that impressed. Worse, in Go 1.27
expect the option to turn off the Green Tea garbage collector and small
allocations to go away.

Should I use stock Go 1.26.0, Go 1.26.0 with a bunch of options or continue
using Go 1.25.7 for now? I just don’t know ¯\_(“/)_/¯

--
Diddymus


  Up to Main Index                          Up to Journal for February, 2026