Up to Main Index                               Up to Journal for May, 2026

                      JOURNAL FOR TUESDAY 12TH MAY, 2026
______________________________________________________________________________

SUBJECT: Replacing Mountains with Ant Hills (tiny webserver setup) Part 2
   DATE: Tue 12 May 19:28:03 BST 2026


     TL;DR Over 10k req/sec, 99% within 87ms, 1500 concurrent requests


Previously[1] I showed a webserver setup using just `socat` and busybox
`httpd`. The results were reasonable:

  That’s one million successful requests and nearly 22Gb of data transferred
  using one thousand concurrent TLS (HTTPS) connections — delivered in an
  average 686ms. The “server” software was running at under 30% CPU and using
  10Mb RAM while `ab` was trying to take over the rest of the machine.

For the following, the servers and benchmarks were all running on the same
desktop PC: Intel i9-12900T (8p/8e cores @ 4.8Ghz/3.6Ghz, 24 threads) with
64Gb RAM. When you see figures like 950% CPU, that is 9 1/2 threads worth of
CPU being used.

Due to some limitations in `ab` (Apache Benchmark) that I was using for
testing, I have switched to `wrk` and re-ran the benchmark for the socat+httpd
setup used in part 1:


  >wrk -t8 -c1500 -d30 --latency \
    https​://www.phreaks1.dev/journal/2026/3/28.html
  Running 30s test @ https​://www.phreaks1.dev/journal/2026/3/28.html
    8 threads and 1500 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency    90.35ms  150.77ms   1.35s    96.07%
      Req/Sec   618.17    186.79     1.49k    68.86%
    Latency Distribution
       50%   59.12ms
       75%   77.33ms
       90%  107.37ms
       99%    1.05s
    144909 requests in 30.10s, 679.11MB read
    Socket errors: connect 0, read 0, write 0, timeout 276
  Requests/sec: 4814.62
  Transfer/sec: 22.56MB


The socat+httpd setup was managing to push nearly five thousand requests/sec
with one and a half thousand concurrent connections over HTTPS, with an
average response time of 60ms. The “server” software was using 100% CPU and
10Mb RAM, `wrk` was using 450% CPU. From a user perspective, even under heavy
load, socat+httpd can deliver a superb, snappy user experience.

Is it possible to improve upon this while still only using simple tools? Hold
my coffee…

Here are my latest results from testing an improved setup:


  >wrk -t8 -c1500 -d30 --latency \
    https​://www.phreaks1.dev/journal/2026/3/28.html
  Running 30s test @ https​://www.phreaks1.dev/journal/2026/3/28.html
    8 threads and 1500 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency    15.97ms   60.47ms   1.07s    98.85%
      Req/Sec     1.32k   116.73     1.57k    81.57%
    Latency Distribution
       50%    5.67ms
       75%   13.70ms
       90%   28.78ms
       99%   86.10ms
    313532 requests in 30.06s, 1.60GB read
    Socket errors: connect 0, read 2, write 0, timeout 0
  Requests/sec: 10,429.40
  Transfer/sec:     54.54MB


Now we are achieving over ten thousand requests/second with one and a half
thousand concurrent connections over HTTPS, with an average response time of
only 5.67ms. The “server” software was running at 950% CPU and using 255Mb RAM
while `wrk` at 700% CPU was trying to take over the rest of the machine.

Here is an easier comparison of the two benchmarks:


    Metric          Benchmark 1       Benchmark 2        Variance
    --------------  ----------------  -----------------  ----------------
    Throughput      4,814.62 req/sec  10,429.40 req/sec  +5614.78 req/sec
    Transfer Rate      22.56 MB/s         54.54 MB/s       +31.98 MB/s
    Median Latency     59.12 ms            5.67 ms         -53.45 ms
    90% Latency       107.37 ms           28.78 ms         -78.59 ms
    99% Latency     1,050.00 ms           86.10 ms        -963.90 ms
    CPU Utilization    30.00 %           950.00 %         +920.00 %


What is the secret this time? It’s `stunnel` and busybox `httpd`. While
`socat` is the “Swiss Army Knife” for networking, `stunnel` is the precision
scalpel for TLS encapsulation. The busybox `httpd` configuration is the same
embarrassingly small file as before:


    H:./public
    I:index.html
    E404:/public/not-found.html
    .ttf:application/font-sfnt
    .woff:application/font-woff
    .js:application/javascript
    .wasm:application/wasm


Launching busybox `httpd` stays the same as well:


    busybox httpd -c ./httpd.conf -p 127.0.0.1:8080 &


Next we need a configuration file for `stunnel`:


    foreground = no
    fips = no
    debug = 0
    output = /dev/null
    stack = 65536

    [my_frontend]
    accept = 443
    connect = 127.0.0.1:8080
    cert = ./combined.pem
    verify = 0
    options = -NO_TICKET
    ticketKeySecret = 455ef80a2f9c6dbf01693e1d9f6d50ba455ef80a2…
    ticketMacSecret = 41ca772c0db7ec9e695c5761d9beb3c741ca772c0…
    sslVersionMin = TLSv1.3
    ciphersuites = TLS_AES_256_GCM_SHA384:TLS_AES_128_GCM_SHA384
    sessionCacheSize = 20000
    sessionCacheTimeout = 600
    socket = l:TCP_NODELAY=1
    socket = r:TCP_NODELAY=1
    socket = l:SO_REUSEADDR=1
    socket = l:SO_KEEPALIVE=1
    TIMEOUTclose = 0
    TIMEOUTidle = 60
    TIMEOUTbusy = 20


That’s not too bad. Only 25 lines (631 bytes) of additional configuration for
over twice the performance. It could be smaller, but it contains a few
performance tweaks. We launch `stunnel` into the background like so:


    stunnel ./stunnel.conf > /dev/null 2>&1 &


Our monitoring script, run by cron @reboot and every 15 minutes, becomes:


    #!/bin/sh
    if ! pgrep -f "stunnel" > /dev/null; then
      stunnel ./stunnel.conf > /dev/null 2>&1 &
    fi

    if ! pgrep -f "busybox httpd" > /dev/null; then
      busybox httpd -c ./httpd.conf -p 127.0.0.1:8080 &
    fi


That’s it! Two configuration files, two commands and a tiny script for a
high-performance, self-healing, TLS-encrypted fortress.

But it has come at a cost…

The “server” footprint has grown from 10Mb to 255Mb RAM. This is a lot less
compared to Apache[2]. The memory usage is mainly due to the large TLS session
cache defined in the `stunnel` configuration: `sessionCacheSize = 20000`

Whereas `socat` forked a new clean process to handle each connection,
`stunnel` is a long running, multi-threaded process. However it is this
forking of new processes that is limiting `socat`.

By leveraging TLS session resumption, `stunnel` avoids repeating costly
cryptographic handshakes for returning clients — one of the main reasons our
median latency plummeted. It also means `stunnel` more effectively uses the
CPU (950%) compared to `socat` (100%). However our median latency plummets
from 59.12ms to just 5.67ms for a much more responsive user experience. Note
that `stunnel` has the same pedigree as `socat`, being around since 1998.

At the end of the day it’s up to you. Is it socat+httpd? Or stunnel+httpd with
twice the performance and ten times lower latency — at the cost of a 25 line
(631 bytes) configuration file, better CPU utilization and 245Mb more RAM?

As before, there is a little more to this: redirects for people landing using
‘HTTP’, logging, virtual hosting for multiple sites, serving git repositories
and more. But I’m saving that for a full how-to guide in the Annex!

Now, who has my coffee?

--
Diddymus

  [1] Replacing Mountains with Ant Hills (tiny webserver setup) Part 1
      /journal/2026/4/25.html

  [2] For 1,000 concurrent connections Apache can consume 400Mb-1Gb depending
      on whether the event MPM or Prefork MPM is used. Nginx uses 40-60Mb.


  Up to Main Index                               Up to Journal for May, 2026