WolfMUD: Journal for Wednesday 26th July, 2017


  Up to Main Index                              Up to Journal for July, 2017

                     JOURNAL FOR WEDNESDAY 26TH JULY, 2017
______________________________________________________________________________

SUBJECT: Data race fixes delayed, New debug tooling
   DATE: Wed 26 Jul 22:50:45 BST 2017

Some people have noticed that I haven’t made the changes available as detailed
in my last journal entry. The reason was simple, the WolfMUD server was being a
real pain and dropped a data race trace five minutes after I published.

*sigh*

I spent time pouting, grumbling and throwing away all of the changes I’d made.

I then spent a lot of time thinking about how to improve the current situation
with respect to solving the data races. In theory the locking rules in WolfMUD
are very simple:


  1. Locking is only done on the outermost Inventory of an Inventory hierarchy
  2. When handling the attributes of a Thing, you need to lock the outermost
     Inventory the Thing is in
  3. Using a Locate Attribute is concurrently safe
  4. Using a Thing itself (not its Attributes) is concurrently safe


When executing any player or scripted command there is some standard setup
that is performed. First we need to know where the ‘actor’[1] is. We find out
where they are via the Locate.Where method — rule 3 means this is safe. To
find out where the actor is we need to find the actor’s Locate Attribute via
the FindLocate function — rule 4 means this is safe. We can then lock the
outermost Inventory of where the player is by calling the Inventory.Outermost
method — rule 1. After this setup is completed we are free to access anything
within the outermost Inventory including nested Inventories like containers,
other players, items, items in containers etc.

If we need multiple locations, for example when moving from one location to
another, we do the setup as above. This locks the location where we are. We
then find out the Inventory we want to move to via the Exit attribute from the
current (locked) Inventory parent Thing. We then add the lock for the
destination and reacquire our locks. Rule 1 says we can now access anything
within the current and destination Inventories. In this case we can remove the
player from the current location, add them to the destination, notify players
at the current location that we left, notify players at the destination we
arrived and finally access the items at the destination to describe the new
location.

For the most part this all works beautifully :)

Where things tend to go wonky is where something is ‘out of play’ — that is
not in an Inventory we can lock on. For example there is a race when first
adding a player, or removing them from the game world. Items can also go ‘out
of play’ during a reset.

After throwing away all of my pending changes I did something crazy, really
crazy. I realised I needed to validate rules 1 and 2 above. I spent ages
adding a locks parameter as the first parameter of nearly every function or
method for every Attribute. The locks were a simple []has.Inventory containing
a copy of the state locks being held. That way I could verify at every step
that any Thing or Attribute that was touched was under a lock and log when it
wasn’t.

Only one problem, there were a huge number of lock verifying changes and it
was difficult to tell them apart from actual bug fixes. So I threw all of
those changes away as well :(

The time spent on that failed exercise wasn’t wasted. I realised I only needed
to check the locks in a few places — most of the other changes were to
transport the lock data and propagate it through the different calls to where
it was needed. As a result I modified the thing.Attrs method to make sure the
Thing is in the hierarchy of a locked Inventory — to access an Attribute you
have to use a finder function and they all call the thing.Attrs method. I’ve
also added checks to the Inventory methods to make sure Inventory are always
in the hierarchy of a locked Inventory. I’ve also overridden the BRL Lock and
Unlock methods exposed in the Inventory type to record the locks and unlocks.
Last of all I’ve added some DebugXXX functions to a separate attr/debugging.go
file. With these few, simple changes I can now monitor all of the locks and
report when an improper Inventory or Thing access is detected.

The difference between my tracing and the standard data race trace is that I
can now follow the flow of locking, unlocking and accesses as opposed to the
simple snapshot displayed by the standard data race trace.

So apologies for the delay and to everyone waiting for the data race fixes.
With my new debugging ‘tool’ I hope to nail the remaining data races I know
about and I’ve already started fixing some I didn’t know about.

On a side note, Go 1.9 RC1 was announced today :)

--
Diddymus

  [1] The actor is whatever is performing the command, usually a player but
      can actually be any Thing.


  Up to Main Index                              Up to Journal for July, 2017