Up to Main Index                               Up to Journal for May, 2026

                      JOURNAL FOR SUNDAY 17TH MAY, 2026
______________________________________________________________________________

SUBJECT: Replacing Mountains with Ant Hills (tiny webserver setup) Part 3
   DATE: Sun 17 May 18:08:59 BST 2026

In part 2 of this mini series[1] we ended with the i9 desktop benchmarks:


  >wrk -t8 -c1500 -d30 --latency \
    https​://www.phreaks1.dev/journal/2026/3/28.html
  Running 30s test @ https​://www.phreaks1.dev/journal/2026/3/28.html
    8 threads and 1500 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency    15.97ms   60.47ms   1.07s    98.85%
      Req/Sec     1.32k   116.73     1.57k    81.57%
    Latency Distribution
       50%    5.67ms
       75%   13.70ms
       90%   28.78ms
       99%   86.10ms
    313532 requests in 30.06s, 1.60GB read
    Socket errors: connect 0, read 2, write 0, timeout 0
  Requests/sec: 10,429.40
  Transfer/sec:     54.54MB


However, the main point of this series was not to chase requests per second. I
started out trying to see if I could replace Apache with some simple, well
understood tools and ended up documenting the experience along the way. Good
performance was just a happy side effect.

With that in mind I wanted to see how well my stunnel+httpd setup scaled down
as well as up. Naturally I put it on one of my trusty Raspberry Pi 4 (4Gb) and
here are the results:


  >wrk -t8 -c500 -d30 --latency \
    https​://www.phreaks8.dev/journal/2026/3/28.html
  Running 30s test @ https​://www.phreaks8.dev/journal/2026/3/28.html
    8 threads and 500 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency    47.70ms  123.75ms   1.81s    92.05%
      Req/Sec    88.30     27.87   190.00     63.23%
    Latency Distribution
       50%   11.90ms
       75%   24.38ms
       90%   89.26ms
       99%  684.81ms
    21103 requests in 30.05s, 98.90MB read
    Socket errors: connect 0, read 0, write 0, timeout 1
  Requests/sec: 702.19
  Transfer/sec: 3.29MB


Not too shabby. The stunnel+httpd setup was managing to push over seven
hundred requests/sec with five hundred concurrent connections over HTTPS, with
a median response time of 11.9ms. The “server” software was using 80% user
CPU, 20% system CPU and a light 42Mb RAM, with `wrk` run from the i9 desktop.
From a user perspective, even under heavy load, stunnel+httpd can deliver a
superb, snappy user experience on an older Raspberry Pi 4 (currently £96). Not
chasing requests/second but those figures look mighty fine ;)

That’s all good and all, but how low can we go? The current price of Raspberry
Pi is high… except… the original Raspberry Pi Zero W is still £14.40…


  >wrk -t8 -c500 -d30 --latency \
    https​://www.phreaks0.dev/journal/2026/3/28.html
  Running 30s test @ https​://www.phreaks0.dev/journal/2026/3/28.html
    8 threads and 500 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency   366.91ms  210.90ms   1.58s    77.23%
      Req/Sec     7.67      6.78    50.00     85.07%
    Latency Distribution
       50%  318.82ms
       75%  441.92ms
       90%  636.79ms
       99%    1.18s
    962 requests in 30.10s, 4.51MB read
  Requests/sec:     31.96
  Transfer/sec:    153.37KB


Wait! What? Well that wasn’t a complete disaster and no magic smoke either :P
That’s over thirty requests/sec with five hundred concurrent connections over
HTTPS. What’s more the CPU stayed under 50°C in a case with passive cooling
while dangling from the desktop i9’s USB port. Oh, and median response time
was only 318ms. Networking used TCP/IP over USB for this test, not Wi-Fi.

Now there is one little “cheat” used in the stunnel configuration for the
Raspberry Pi. You see, the Raspberry Pi 4, and earlier models, do not have any
cryptographic extensions built into the CPU. Meaning all the cryptographic
computations have to be done long hand. To help the little Pis out we used a
more CPU friendly cipher in the `stunnel` configuration that is just as
secure as before:


  ciphersuites = TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256
  curves = X25519


The Raspberry Pi Zero is so cute, let’s give it a much more realistic load and
see what it can really do:


  >wrk -t8 -c10 -d30 --latency \
    https​://www.phreaks0.dev/journal/2026/3/28.html
  Running 30s test @ https​://www.phreaks0.dev/journal/2026/3/28.html
    8 threads and 10 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency    87.26ms   28.47ms 211.43ms   70.21%
      Req/Sec     4.89      1.25    10.00     83.54%
    Latency Distribution
       50%   83.77ms
       75%  104.86ms
       90%  124.61ms
       99%  169.93ms
    1148 requests in 30.02s, 5.38MB read
  Requests/sec:     38.24
  Transfer/sec:    183.52KB


Nice! :) Just over thirty eight requests/sec with a median response time of
83.77ms with ten concurrent requests over HTTPS.

Here are all three machines in a nice summary for an easy comparison:


  Metric             Intel Core i9 PC   Raspberry Pi 4  Raspberry Pi Zero W
  -----------------  -----------------  --------------  -------------------
  Hardware Class     High-End Desktop   Modern SBC      Legacy SBC
  CPUs               8P/8E, 24 Threads  4 Cortex-A72    ARM1176JZF-S
  Speed              4.8Ghz/3.6Ghz      1.8Ghz          1Ghz
  Memory             64Gb               4Gb             512Mb
  Cost               £2,500             £96             £14.40
  Crypto Extensions  Hardware Accel.    None/Software   None/Software
  Cipher Suite       TLS_AES_256_GCM    TLS_CHACHA20    TLS_CHACHA20
  Concurrency        1,500              500             500
  Requests / Sec     10,429.40 (Note)   702.19           31.96
  Median Latency      5.67ms             11.90ms        318.82ms
  99th% Latency      86.10ms            684.81ms          1.18s
  Requests / Day     901,065,600        60,652,800      2,678,400
  Requests / £       360,426            631,800         186,000
  Core Temperature   85°C (Fan)         60°C (Fan)      <50°C (Passive)
  Memory Footprint   255Mb              42Mb            29Mb

    Note: The i9 Workstation was simultaneously running the `wrk` load
    generator on half of its available CPU cores during the test.


How do those numbers translate to the real world? Let’s point the Google
Chrome web browser at the sites, poke around in the dev tools and find out:


                    ------i9------  -----RPi4-----  -----RPi0-----
  Metric            Cold    Cached  Cold    Cached  Cold    Cached
  ----------------  ------  ------  ------  ------  ------  ------
  Requests          5       4       5       4       5       4
  Transferred       61.7kb  2.3kb   61.6kb  2.3kb   61.5kb  2.3kb
  Resources         71.5kb  71.3kb  71.3kb  71.1kb  71.3kb  71.1kb
  Finish            149ms   58ms    193ms   65ms    276ms   111ms
  DOMContentLoaded   68ms   58ms     84ms   65ms    115ms   111ms
  Load              141ms   126ms   178ms   178ms   247ms   180ms

  Cold = non-cached cold start, Cached = all but HTML cached


A surprising result, all of the load times are very similar…

At the end of the day, a user at a browser can get nearly the same amazingly
fast experience from a tiny Raspberry Pi Zero W as from an Intel Core i9
workstation. Peel away all the layers, optimize your stack, and what you are
left with is the physics of a network connection and the internet itself.

You can’t go faster than the physics.

Where you can go faster is at scale, handling multiple concurrent requests.
That is where the i9 shines.

As before, there is a little more to this: redirects for people landing using
‘HTTP’, logging, virtual hosting for multiple sites, serving git repositories
and more. But I’m saving that for a full how-to guide in the Annex!

Now who wants to show me Apache or Nginx running on this cute little board
with their stack? No? Nobody? What do you mean there’s only 512Mb RAM!?

--
Diddymus

  [1] Replacing Mountains with Ant Hills (tiny webserver setup) Part 2
      /journal/2026/5/12.html


  Up to Main Index                               Up to Journal for May, 2026