Up to Main Index                              Up to Journal for July, 2017

                     JOURNAL FOR SATURDAY 22ND JULY, 2017
______________________________________________________________________________

SUBJECT: Let’s try this again…
   DATE: Sat 22 Jul 21:56:25 BST 2017

Argh! This is driving me mad. How does the WolfMUD server know when I am
writing a journal entry? Last week I was writing an entry after the server had
been running for days without issue. As I finished writing the server threw
out another data race trace, making the whole post moot. This week I reworked
the post after a lot more debugging and testing and the server did it again :(

As you can probably guess I’m still trying to solve the data races. I believe
I’m making some progress as it’s taking longer and longer for the server to
throw out the warnings. I have not managed to find the root cause of the issue
yet and it’s getting very frustrating. As far as I can tell all of the correct
locks are being taken and in the correct order. It doesn’t help that I’ve only
been getting short periods of time to work on WolfMUD lately. I really need
some proper time to sit down and work through the issues and the code properly
instead of adding random debugging and poking things with a sharp stick.

I did find and fix one locking issue. We should always take the lock of the
outermost Inventory when locking. For example, if we have a location with a
player who has a container and an item resets in the container we need to lock
the outermost inventory, which in this case would be the location — currently
the container’s Inventory would be locked. In this example the Inventory
hierarchy is container→player→location. I’ve added some code to automatically
find the outermost Inventory lock when adding a lock. Guess I need to go check
documentation in various places as well now.

One thing I have found really helpful when debugging is this little function:


  func GetGID() []byte {
    b := make([]byte, 64)
    b = b[:runtime.Stack(b, false)]
    b = bytes.TrimPrefix(b, []byte("goroutine "))
    return b[:bytes.IndexByte(b, ' ')]
  }


It simply returns the current goroutine number so that I can prefix debug
messages with it and match up the log lines with the goroutine numbers
reported in the data race trace. Yes, I know it’s a bad idea[1]. The function
is fragile and parses the goroutine number out of the stack data — the format
of which may change in the future breaking the function. For now, for me, for
debugging, it’s a great help. It’s not very efficient and I’ve found other
ways of doing it, but this is portable across platforms.

Where does this all leave me? I’ve had two machines running  a server each
with player bots running around for over a day with no data races. Then again,
I’m just finishing this entry so… just checked and both machines still running
okay ;) Think it might be time to tidy up the latest round of fixes and push
out some updates at last.

--
Diddymus

  [1] Apparently if you do something like this, especially to implement
      goroutine local storage, general consensus is you’re going to hell.


  Up to Main Index                              Up to Journal for July, 2017