Up to Main Index Up to Journal for May, 2021 JOURNAL FOR SUNDAY 9TH MAY, 2021 ______________________________________________________________________________ SUBJECT: Another gripe sorted out DATE: Mon 10 May 02:22:25 BST 2021 Since the release of v0.0.18 there was another gripe which I have neglected to mention. How was I measuring performance and how could people measure the performance of their own servers? To be honest, I was using a quick hack. Simply logging to the server log and then a lot of grep and awk afterwards — not very pretty, not something I would expect users to be doing. I’ve spent some time this week looking at adding some proper metrics to the log status line. Here is an abbreviated example showing me, just wandering around a bit: U[3Mb +16kb] A[+784] O[12379 +625] T[273 +0] W[4.941µs 4.941µs] P[1 1] U[3Mb +8kb] A[+382] O[12665 +286] T[273 +0] W[2.241µs 4.941µs] P[1 1] U[3Mb +0b ] A[+599] O[13104 +439] T[273 +0] W[1.727µs 4.941µs] P[1 1] U[3Mb +16kb] A[+339] O[13345 +241] T[273 +0] W[1.621µs 4.941µs] P[1 1] U[3Mb +16kb] A[+578] O[13809 +464] T[273 +0] W[1.638µs 4.941µs] P[1 1] That’s a maximum of about 5µs (microseconds) which is not too shabby. To make the lines fit this post I left out the ‘G’ column, the number of goroutines which was 29. The interesting column is the new ‘W’ or wait time column. This shows the maximum wait time to acquire locks since the last logged statistics and the maximum wait time since the server started. The wait time to acquire locks directly effects how long it takes the server to respond to player commands. Ideally this should be below 1 second, at a maximum I’d say it could peek at 3 seconds. How does this look with say 30,000 players using my desktop and the botrunner? It looks something like this, with even more columns dropped to fit: U[683Mb -50Mb] A[+35450656] O[5725831 -1479284] W[236.439ms 419.999ms] U[627Mb -55Mb] A[+35394834] O[4060084 -1665747] W[144.588ms 419.999ms] U[781Mb +153Mb] A[+35132771] O[8298108 +4238024] W[164.868ms 419.999ms] U[714Mb -66Mb] A[+35278292] O[6627162 -1670946] W[146.795ms 419.999ms] U[654Mb -59Mb] A[+35169809] O[4850261 -1776901] W[331.530ms 419.999ms] This time the maximum wait time for locks is a hair over 400ms or just under half a second wait time. Again, not too shabby ;) A quick note on the implementation of this statistic. Originally I was using a channel to store the maximum value. However, this added a huge bottle neck as now every player was waiting on the channel to become available. Instead I switched to a read/write mutex. We check if the wait time was above the maximum recorded so far using a read lock. Then we only take a read/write lock if we need to record our new value. This proved to be much more scaleable. I’ve left both versions in the git repository for those who like to go digging in the history. There were a few more gripes with the botrunning using hard-coded ports, IP addresses and other values. I’m currently busy sorting out as many gripes as I can. If you have a gripe I haven’t addressed please let me know and I’ll add it to my list: diddymus@wolfmud.org I’ve not retested all of my Raspberry Pi yet with the new metrics, which could be an interesting exercise to do. First I need to recompile Go as Go 1.16.4 was released 6th May. All of the performance testing for this article was done using that release. Changes are out on the public dev branch. There are a few cosmetic only code fixes and a bug fix for niceDate with dates before the 10th of the month. -- Diddymus Up to Main Index Up to Journal for May, 2021