Up to Main Index                             Up to Journal for April, 2017

                     JOURNAL FOR FRIDAY 21ST APRIL, 2017
______________________________________________________________________________

SUBJECT: Info, tips & updated on debugging WolfMUD
   DATE: Fri 21 Apr 23:58:54 BST 2017

With the recent journal entries last month on debugging, performance and
sorting out deallocation of objects for garbage collection I’ve had a lot of
questions asking how I actually debug WolfMUD. Hopefully this post, and some
recent changes on the public dev branch, will help with debugging WolfMUD and
provide ideas for debugging Go programs in general.

My main tool is the obvious one, to sprinkle Printf lines throughout the code:


  log.Printf("[DEBUG] %p %[1]T %s: %s", t, t.UID(), FindName(t).Name("?"))


This will write messages to the log something like:


  2017/04/21 20:04:22.702800 thing.go:61: [DEBUG] 0xc42010fda0 *attr.Thing
  #UID-2Q: "the tavern door"


For temporary debugging I always put "[DEBUG]" in the messages so that the
lines are easy to find and take out again once I’ve finished debugging. If a
particular message is very useful for debugging I’ll drop the "[DEBUG]" and
condition the message with a configuration option.

I also use the #DUMP command a lot to inspect the state of Things. The #DUMP
command is a player command that can be used from within WolfMUD if enabled in
the server configuration[1]. An example of the output from ‘#DUMP CHEST’:


  0xc4200eb770 *attr.Thing #UID-3B, 7 attributes:
    0xc4201249e0 *attr.Description "This is a very stout wooden chest about 2
    feet wide and 1 foot deep. Thick metal bands bind it."
    0xc420124a00 *attr.Name "an iron bound chest"
    0xc420124a40 *attr.Alias 2 aliases: "CHEST", "#UID-3B"
    0xc4200eb530 *attr.Reset After: 5m0s Jitter: 1m0s Spawn: false
      0x0 event.Cancel
    0xc4200eb560 *attr.Cleanup After: 5m0s Jitter: 1m0s
      0x0 event.Cancel
    0xc4200f4b90 *attr.Inventory Lock ID: 83, 1 items (split: 0):
      0xc4200eb740 *attr.Thing #UID-3A, 7 attributes:
        0xc4200f4be0 *attr.Inventory Lock ID: 84, 1 items (split: 0):
          0xc4200eb650 *attr.Thing #UID-39, 6 attributes:
            0xc420124a80 *attr.Alias 2 aliases: "BALL", "#UID-39"
            0xc4200eb5c0 *attr.Reset After: 0s Jitter: 2m0s Spawn: true
              0x0 event.Cancel
            0xc4200eb5f0 *attr.Cleanup After: 1s Jitter: 0s
              0x0 event.Cancel
            0xc420124aa0 *attr.Description "This is a small, red, rubber ball."
            0xc420124ac0 *attr.Name "a small red ball"
            0xc4200eb620 *attr.Locate -> Origin: 0xc4200f4be0 a small leather
            pouch, Where: 0xc4200f4be0 a small leather pouch
        0xc420124b00 *attr.Description "This is a small pouch made of soft
        dark leather."
        0xc420124b20 *attr.Name "a small leather pouch"
        0xc420124b60 *attr.Alias 2 aliases: "POUCH", "#UID-3A"
        0xc4200eb6b0 *attr.Reset After: 1m0s Jitter: 1m0s Spawn: false
          0x0 event.Cancel
        0xc4200eb6e0 *attr.Cleanup After: 10m0s Jitter: 10m0s
          0x0 event.Cancel
        0xc4200eb710 *attr.Locate -> Origin: 0xc4200f4b90 an iron bound chest,
        Where: 0xc4200f4b90 an iron bound chest
    0xc4200eb7a0 *attr.Locate -> Origin: 0xc4200e3b80 Fireplace,
    Where: 0xc4200e3b80 Fireplace


The #DUMP command provides a lot of information about an item, player,
location or any other Thing and will recursively display information for
Inventory content. So using #DUMP on a location will list all of it’s content
including items, players and their content — the main reason why I add aliases
to locations even though they are not really needed.

When debugging clean ups, resets and respawning I needed to be able to check
the status of items that were not in play. One disadvantage of #DUMP is that
it only works on aliases for a Thing in the current location. To be able to
#DUMP an arbitrary Thing I used the unsafe package to dump any Thing by its
address. For example ‘#DUMP 0xc4200eb650’:


  0xc4200eb650 *attr.Thing #UID-39, 6 attributes:
    0xc420124a80 *attr.Alias 2 aliases: "BALL", "#UID-39"
    0xc4200eb5c0 *attr.Reset After: 0s Jitter: 2m0s Spawn: true
      0x0 event.Cancel
    0xc4200eb5f0 *attr.Cleanup After: 1s Jitter: 0s
      0x0 event.Cancel
    0xc420124aa0 *attr.Description "This is a small, red, rubber ball."
    0xc420124ac0 *attr.Name "a small red ball"
    0xc4200eb620 *attr.Locate -> Origin: 0xc4200f4be0 a small leather pouch,
    Where: 0xc4200f4be0 a small leather pouch


A common technique I used was to #DUMP a Thing to find it’s address. Then junk
the Thing / wait for reset or respawn and then #DUMP the address to examine
the state of the Thing. Since discovering debug.SetPanicOnFault the #DUMP
command no longer crashes the server if invalid addresses are given or the
address is not that of a Thing ;)

Also while working on clean ups, resets and respawning I needed to make sure
items were being properly garbage collected. For this I added some logging to
the NewThing and Free methods as well as a finalizer to Thing so I knew when
it was garbage collected. This appears in the log as:


  NewThing: 0xc420014300 *attr.Thing #UID-6K: "a small red ball"
  Disposing: 0xc420014300 *attr.Thing #UID-6K: "a small red ball"
  Finalizing: 0xc420014300 *attr.Thing #UID-6K


Note, I’ve dropped the date and time from these log lines to keep the line
lengths short. Also note, this works best if Stats.Rate is set low — I usually
set it to 10s — and Stats.GC is set to true. The changes for logging Thing
information has not been released until now as they make the log very noisy.
The format of the finalizer for Thing is quite simple and generic, I often
reuse variations of it with a "[DUMP]" prefix when debugging allocations and
garbage collection issues:


    runtime.SetFinalizer(t, func(t has.Thing) {
      log.Printf("Finalizing: %s", t)
    })


This change is now in the public dev branch along with some configuration
changes which I will go into shortly.

For general performance, allocation and memory debugging I use the standard
pprof tool along with a #DEBUG command. The #DEBUG command allows players to
start and stop different profiles from within WolfMUD. For example:


  >#debug memprof start
  Memory profiling turned on.
 // wait a while...
 >#debug memprof stop
 Memory profiling turned off.


This generates a file called memprof in the server’s current directory that we
can examine with Go’s pprof tool:


  >go tool pprof ./server memprof
  File: server
  Type: inuse_space
  Time: Apr 21, 2017 at 9:36pm (UTC)
  Entering interactive mode (type "help" for commands, "o" for options)
  (pprof) top
  Showing nodes accounting for 16094.02kB, 69.27% of 23234.47kB total
  Dropped 69 nodes (cum <= 116.17kB)
  Showing top 10 nodes out of 72
        flat  flat%   sum%        cum   cum%
   3749.62kB 16.14% 16.14%  3749.62kB 16.14% runtime.malg src/runtime/proc.go
   2187.28kB  9.41% 25.55%  2187.28kB  9.41% attr.NewAlias attr/alias.go
   2031.05kB  8.74% 34.29%  2031.05kB  8.74% attr.NewInventory internal/brl.go
   1562.34kB  6.72% 41.02%  3005.62kB 12.94% comms.newClient comms/client.go
   1251.75kB  5.39% 46.41%  5095.88kB 21.93% runtime.systemstack asm_amd64.s
   1249.88kB  5.38% 51.78%  1249.88kB  5.38% attr.(*Thing).Add attr/thing.go
   1249.88kB  5.38% 57.16%  2812.22kB 12.10% net.(*netFD).accept fd_unix.go
   1093.64kB  4.71% 61.87%  1093.69kB  4.71% frontend.New frontend/frontend.go
    937.41kB  4.03% 65.91%   937.41kB  4.03% net.sockaddrToTCP tcpsock_posix.go
    781.17kB  3.36% 69.27%   781.17kB  3.36% attr.NewInventory attr/inventory.go
  (pprof)


Note, the names on the right above have been badly mangled to keep the line
lengths short for this post only. If the figures look high that’s because
there were a lot of ‘players’ on the server when the example was made. The
corresponding status line from the logs:


  U[110Mb  +46kb] O[357548  +869] T[10184  +0] G[10020  +0] P 10000/10000


That’s 110Mb of memory used 357,548 objects 10,184 Things 10,020 goroutines
for 10,000 players ;)

#DEBUG can be used for CPU, memory and block profiling and for making heap
dumps. It can also force a client goroutine to panic — for testing error
handling. The #DEBUG command has not been released before and was only for my
testing and debugging but it is now on the public dev branch and behind a new
configuration option.

So what are these new configuration options I’ve been mentioning? Debugging
creates a lot of noise in the log. Some of the debugging commands are also not
suitable for players on a ‘public’ server — even if it’s just run for a few
friends. These reasons are why I’ve held back for so long on committing the
changes. To deal with this I have improved the debug configuration for the
server and finally added all of my debugging code and commands for others to
use. The new configuration section looks like:


  //
  // Debug configuration
  //
    Debug.LongLog:      false
    Debug.Panic:        false
    Debug.AllowDump:    false
    Debug.AllowDebug:   false
    Debug.Events:       false
    Debug.Things:       false


The default configuration in data/config.wrj has been updated and the new
options are detailed in docs/configuration-file.txt but briefly:


     LongLog: If true adds millisecond precision and file/line number of where
              log message was written to log lines.
       Panic: If true causes client and event goroutine to panic and terminate
              the server instead of recovering gracefully.
   AllowDump: If true enables the #DUMP command.
  AllowDebug: If true enables the #DEBUG command.
      Events: If true writes log messages when an event is queued, cancelled
              or delivered.
      Things: If true writes log messages when a Thing is created, disposed of
              or finalized.


The old Server.Debug setting has been removed. The new #DEBUG command takes
the following options — detailed in cmd/debug.go:


  DEBUG MEMPROF   ( START | STOP )
  DEBUG CPUPROF   ( START | STOP )
  DEBUG BLOCKPROF ( START | STOP )
  DEBUG HEAPDUMP
  DEBUG PANIC


You can also use ON and OFF instead of START or STOP.

And that is the whirlwind tour of debugging WolfMUD. There are also a few
other updates on the public dev branch. The $ACTION and $CLEANUP commands now
work with a passed alias. A count of the number of live Things has been added
to the log status lines. The frog in the gardens has been added as has the
tavern cat. Lease acquire and release log messages have been dropped as they
were actually misleading and confusing under heavy loads.

Anything else I wanted to say?

I now have a Raspberry Pi running Raspbian for testing WolfMUD and a spare
backup. Both are Model B rev 2 boards running at 900MHz. Currently the Pi can
handle about 4,000 players. At some point I’ll probably upgrade to a Raspberry
Pi 3. Maybe get a cheap, dinky Pi Zero W for WolfMUD as well.

I’ve also switched over to the Go 1.8.1 release for WolfMUD without issues.

--
Diddymus

  [1] Commands starting with a ‘#’ like #DUMP and #DEBUG are supposed to be
      privileged commands but I haven’t implemented access levels yet.


  Up to Main Index                             Up to Journal for April, 2017