Up to Main Index                               Up to Journal for May, 2021

                       JOURNAL FOR SUNDAY 9TH MAY, 2021
______________________________________________________________________________

SUBJECT: Another gripe sorted out
   DATE: Mon 10 May 02:22:25 BST 2021

Since the release of v0.0.18 there was another gripe which I have neglected to
mention. How was I measuring performance and how could people measure the
performance of their own servers?

To be honest, I was using a quick hack. Simply logging to the server log and
then a lot of grep and awk afterwards — not very pretty, not something I would
expect users to be doing.

I’ve spent some time this week looking at adding some proper metrics to the
log status line. Here is an abbreviated example showing me, just wandering
around a bit:


  U[3Mb +16kb] A[+784] O[12379 +625] T[273 +0] W[4.941µs 4.941µs] P[1 1]
  U[3Mb  +8kb] A[+382] O[12665 +286] T[273 +0] W[2.241µs 4.941µs] P[1 1]
  U[3Mb  +0b ] A[+599] O[13104 +439] T[273 +0] W[1.727µs 4.941µs] P[1 1]
  U[3Mb +16kb] A[+339] O[13345 +241] T[273 +0] W[1.621µs 4.941µs] P[1 1]
  U[3Mb +16kb] A[+578] O[13809 +464] T[273 +0] W[1.638µs 4.941µs] P[1 1]


That’s a maximum of about 5µs (microseconds) which is not too shabby.

To make the lines fit this post I left out the ‘G’ column, the number of
goroutines which was 29. The interesting column is the new ‘W’ or wait time
column. This shows the maximum wait time to acquire locks since the last
logged statistics and the maximum wait time since the server started. The wait
time to acquire locks directly effects how long it takes the server to respond
to player commands. Ideally this should be below 1 second, at a maximum I’d
say it could peek at 3 seconds.

How does this look with say 30,000 players using my desktop and the botrunner?
It looks something like this, with even more columns dropped to fit:


  U[683Mb  -50Mb] A[+35450656] O[5725831 -1479284] W[236.439ms 419.999ms]
  U[627Mb  -55Mb] A[+35394834] O[4060084 -1665747] W[144.588ms 419.999ms]
  U[781Mb +153Mb] A[+35132771] O[8298108 +4238024] W[164.868ms 419.999ms]
  U[714Mb  -66Mb] A[+35278292] O[6627162 -1670946] W[146.795ms 419.999ms]
  U[654Mb  -59Mb] A[+35169809] O[4850261 -1776901] W[331.530ms 419.999ms]


This time the maximum wait time for locks is a hair over 400ms or just under
half a second wait time. Again, not too shabby ;)

A quick note on the implementation of this statistic. Originally I was using a
channel to store the maximum value. However, this added a huge bottle neck as
now every player was waiting on the channel to become available. Instead I
switched to a read/write mutex. We check if the wait time was above the
maximum recorded so far using a read lock. Then we only take a read/write lock
if we need to record our new value. This proved to be much more scaleable.
I’ve left both versions in the git repository for those who like to go digging
in the history.

There were a few more gripes with the botrunning using hard-coded ports, IP
addresses and other values. I’m currently busy sorting out as many gripes as I
can. If you have a gripe I haven’t addressed please let me know and I’ll add
it to my list: diddymus@wolfmud.org

I’ve not retested all of my Raspberry Pi yet with the new metrics, which could
be an interesting exercise to do. First I need to recompile Go as Go 1.16.4
was released 6th May. All of the performance testing for this article was done
using that release.

Changes are out on the public dev branch. There are a few cosmetic only code
fixes and a bug fix for niceDate with dates before the 10th of the month.

--
Diddymus


  Up to Main Index                               Up to Journal for May, 2021