Up to Main Index Up to Journal for May, 2026
JOURNAL FOR SUNDAY 17TH MAY, 2026
______________________________________________________________________________
SUBJECT: Replacing Mountains with Ant Hills (tiny webserver setup) Part 3
DATE: Sun 17 May 18:08:59 BST 2026
In part 2 of this mini series[1] we ended with the i9 desktop benchmarks:
>wrk -t8 -c1500 -d30 --latency \
https://www.phreaks1.dev/journal/2026/3/28.html
Running 30s test @ https://www.phreaks1.dev/journal/2026/3/28.html
8 threads and 1500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 15.97ms 60.47ms 1.07s 98.85%
Req/Sec 1.32k 116.73 1.57k 81.57%
Latency Distribution
50% 5.67ms
75% 13.70ms
90% 28.78ms
99% 86.10ms
313532 requests in 30.06s, 1.60GB read
Socket errors: connect 0, read 2, write 0, timeout 0
Requests/sec: 10,429.40
Transfer/sec: 54.54MB
However, the main point of this series was not to chase requests per second. I
started out trying to see if I could replace Apache with some simple, well
understood tools and ended up documenting the experience along the way. Good
performance was just a happy side effect.
With that in mind I wanted to see how well my stunnel+httpd setup scaled down
as well as up. Naturally I put it on one of my trusty Raspberry Pi 4 (4Gb) and
here are the results:
>wrk -t8 -c500 -d30 --latency \
https://www.phreaks8.dev/journal/2026/3/28.html
Running 30s test @ https://www.phreaks8.dev/journal/2026/3/28.html
8 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 47.70ms 123.75ms 1.81s 92.05%
Req/Sec 88.30 27.87 190.00 63.23%
Latency Distribution
50% 11.90ms
75% 24.38ms
90% 89.26ms
99% 684.81ms
21103 requests in 30.05s, 98.90MB read
Socket errors: connect 0, read 0, write 0, timeout 1
Requests/sec: 702.19
Transfer/sec: 3.29MB
Not too shabby. The stunnel+httpd setup was managing to push over seven
hundred requests/sec with five hundred concurrent connections over HTTPS, with
a median response time of 11.9ms. The “server” software was using 80% user
CPU, 20% system CPU and a light 42Mb RAM, with `wrk` run from the i9 desktop.
From a user perspective, even under heavy load, stunnel+httpd can deliver a
superb, snappy user experience on an older Raspberry Pi 4 (currently £96). Not
chasing requests/second but those figures look mighty fine ;)
That’s all good and all, but how low can we go? The current price of Raspberry
Pi is high… except… the original Raspberry Pi Zero W is still £14.40…
>wrk -t8 -c500 -d30 --latency \
https://www.phreaks0.dev/journal/2026/3/28.html
Running 30s test @ https://www.phreaks0.dev/journal/2026/3/28.html
8 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 366.91ms 210.90ms 1.58s 77.23%
Req/Sec 7.67 6.78 50.00 85.07%
Latency Distribution
50% 318.82ms
75% 441.92ms
90% 636.79ms
99% 1.18s
962 requests in 30.10s, 4.51MB read
Requests/sec: 31.96
Transfer/sec: 153.37KB
Wait! What? Well that wasn’t a complete disaster and no magic smoke either :P
That’s over thirty requests/sec with five hundred concurrent connections over
HTTPS. What’s more the CPU stayed under 50°C in a case with passive cooling
while dangling from the desktop i9’s USB port. Oh, and median response time
was only 318ms. Networking used TCP/IP over USB for this test, not Wi-Fi.
Now there is one little “cheat” used in the stunnel configuration for the
Raspberry Pi. You see, the Raspberry Pi 4, and earlier models, do not have any
cryptographic extensions built into the CPU. Meaning all the cryptographic
computations have to be done long hand. To help the little Pis out we used a
more CPU friendly cipher in the `stunnel` configuration that is just as
secure as before:
ciphersuites = TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256
curves = X25519
The Raspberry Pi Zero is so cute, let’s give it a much more realistic load and
see what it can really do:
>wrk -t8 -c10 -d30 --latency \
https://www.phreaks0.dev/journal/2026/3/28.html
Running 30s test @ https://www.phreaks0.dev/journal/2026/3/28.html
8 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 87.26ms 28.47ms 211.43ms 70.21%
Req/Sec 4.89 1.25 10.00 83.54%
Latency Distribution
50% 83.77ms
75% 104.86ms
90% 124.61ms
99% 169.93ms
1148 requests in 30.02s, 5.38MB read
Requests/sec: 38.24
Transfer/sec: 183.52KB
Nice! :) Just over thirty eight requests/sec with a median response time of
83.77ms with ten concurrent requests over HTTPS.
Here are all three machines in a nice summary for an easy comparison:
Metric Intel Core i9 PC Raspberry Pi 4 Raspberry Pi Zero W
----------------- ----------------- -------------- -------------------
Hardware Class High-End Desktop Modern SBC Legacy SBC
CPUs 8P/8E, 24 Threads 4 Cortex-A72 ARM1176JZF-S
Speed 4.8Ghz/3.6Ghz 1.8Ghz 1Ghz
Memory 64Gb 4Gb 512Mb
Cost £2,500 £96 £14.40
Crypto Extensions Hardware Accel. None/Software None/Software
Cipher Suite TLS_AES_256_GCM TLS_CHACHA20 TLS_CHACHA20
Concurrency 1,500 500 500
Requests / Sec 10,429.40 (Note) 702.19 31.96
Median Latency 5.67ms 11.90ms 318.82ms
99th% Latency 86.10ms 684.81ms 1.18s
Requests / Day 901,065,600 60,652,800 2,678,400
Requests / £ 360,426 631,800 186,000
Core Temperature 85°C (Fan) 60°C (Fan) <50°C (Passive)
Memory Footprint 255Mb 42Mb 29Mb
Note: The i9 Workstation was simultaneously running the `wrk` load
generator on half of its available CPU cores during the test.
How do those numbers translate to the real world? Let’s point the Google
Chrome web browser at the sites, poke around in the dev tools and find out:
------i9------ -----RPi4----- -----RPi0-----
Metric Cold Cached Cold Cached Cold Cached
---------------- ------ ------ ------ ------ ------ ------
Requests 5 4 5 4 5 4
Transferred 61.7kb 2.3kb 61.6kb 2.3kb 61.5kb 2.3kb
Resources 71.5kb 71.3kb 71.3kb 71.1kb 71.3kb 71.1kb
Finish 149ms 58ms 193ms 65ms 276ms 111ms
DOMContentLoaded 68ms 58ms 84ms 65ms 115ms 111ms
Load 141ms 126ms 178ms 178ms 247ms 180ms
Cold = non-cached cold start, Cached = all but HTML cached
A surprising result, all of the load times are very similar…
At the end of the day, a user at a browser can get nearly the same amazingly
fast experience from a tiny Raspberry Pi Zero W as from an Intel Core i9
workstation. Peel away all the layers, optimize your stack, and what you are
left with is the physics of a network connection and the internet itself.
You can’t go faster than the physics.
Where you can go faster is at scale, handling multiple concurrent requests.
That is where the i9 shines.
As before, there is a little more to this: redirects for people landing using
‘HTTP’, logging, virtual hosting for multiple sites, serving git repositories
and more. But I’m saving that for a full how-to guide in the Annex!
Now who wants to show me Apache or Nginx running on this cute little board
with their stack? No? Nobody? What do you mean there’s only 512Mb RAM!?
--
Diddymus
[1] Replacing Mountains with Ant Hills (tiny webserver setup) Part 2
/journal/2026/5/12.html
Up to Main Index Up to Journal for May, 2026