Up to Main Index                              Up to Journal for July, 2017

                      JOURNAL FOR SUNDAY 9TH JULY, 2017
______________________________________________________________________________

SUBJECT: Cause of data race found
   DATE: Sun  9 Jul 23:08:31 BST 2017

I finally think have a handle on the data race that has been plaguing me for
the last week. I don’t have a fix yet, but I’ve found out what the issue is.
The reset/respawn code is locking on the wrong inventories.

Discovering the root cause of the issue has been a nightmare. I find that the
trace produced when a data race occurs does not provide enough detail. You get
the stack trace for the goroutines involved but the parameter information is
missing, unlike a trace from a nil panic for example. I guess storing the
extra information would take too much memory — there is already a large memory
overhead when enabling race detection.

It also hasn’t helped that triggering the data race can take up to half an
hour for each run. That’s a long time to wait to see if your logging is going
to capture the data you need. You’ll also be logging and capturing a lot of
data waiting for the data race, then you need to sift through all the data
after each failure.

After adding a lot of debugging code, and a lot of waiting, it was this log
that pointed to the root cause of the data race:


	Event delivered for 0xc420176480 *attr.Thing #UID-3A: "a small red ball"
		Input: "$SPAWN"
	[DEBUG] 0xc4200ad340 Handle: $SPAWN []
	[DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5
	[DEBUG] 0xc4200ad340 Handle: $SPAWN []
	[DEBUG] 0xc4200ad340 Locked: 0xc420176500 *attr.Thing #UID-3B a small leather
		pouch
	[DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5
	==================
	WARNING: DATA RACE
	Write at 0x00c420173140 by goroutine 74:
		code.wolfmud.org/WolfMUD.git/attr.(*Inventory).Move()
				.../src/code.wolfmud.org/WolfMUD.git/attr/inventory.go:218 +0xf31
		code.wolfmud.org/WolfMUD.git/attr.(*Inventory).Add()
				.../src/code.wolfmud.org/WolfMUD.git/attr/inventory.go:115 +0x6a
		code.wolfmud.org/WolfMUD.git/cmd.Reset()
				.../src/code.wolfmud.org/WolfMUD.git/cmd/reset.go:44 +0x1fb
		code.wolfmud.org/WolfMUD.git/cmd.(*state).handleCommand()
				.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:193 +0x5ab
		code.wolfmud.org/WolfMUD.git/cmd.(*state).sync()
				.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:157 +0x1a8
		code.wolfmud.org/WolfMUD.git/cmd.Script()
				.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:74 +0x8a
		code.wolfmud.org/WolfMUD.git/event.Queue.func1()
				.../src/code.wolfmud.org/WolfMUD.git/event/event.go:102 +0x7c6

	Previous read at 0x00c420173140 by goroutine 20:
		code.wolfmud.org/WolfMUD.git/attr.(*Inventory).Search()
				.../src/code.wolfmud.org/WolfMUD.git/attr/inventory.go:257 +0x49
		code.wolfmud.org/WolfMUD.git/cmd.Take()
				.../src/code.wolfmud.org/WolfMUD.git/cmd/take.go:76 +0x342
		code.wolfmud.org/WolfMUD.git/cmd.(*state).handleCommand()
				.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:193 +0x5ab
		code.wolfmud.org/WolfMUD.git/cmd.(*state).sync()
				.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:157 +0x1a8
		code.wolfmud.org/WolfMUD.git/cmd.Parse()
				.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:63 +0x74
		code.wolfmud.org/WolfMUD.git/frontend.(*game).gameProcess()
				.../src/code.wolfmud.org/WolfMUD.git/frontend/game.go:54 +0xe8
		code.wolfmud.org/WolfMUD.git/frontend.(*game).
			(code.wolfmud.org/WolfMUD.git/frontend.gameProcess)-fm()
				.../src/code.wolfmud.org/WolfMUD.git/frontend/game.go:47 +0x41
		code.wolfmud.org/WolfMUD.git/frontend.(*frontend).Parse()
				.../src/code.wolfmud.org/WolfMUD.git/frontend/frontend.go:153 +0x126
		code.wolfmud.org/WolfMUD.git/comms.(*client).process()
				.../src/code.wolfmud.org/WolfMUD.git/comms/client.go:128 +0x66b


The key lines are those just before the trace:


	Event delivered for 0xc420176480 *attr.Thing #UID-3A: "a small red ball"
		Input: "$SPAWN"
	[DEBUG] 0xc4200ad340 Handle: $SPAWN []
	[DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5
	[DEBUG] 0xc4200ad340 Handle: $SPAWN []
	[DEBUG] 0xc4200ad340 Locked: 0xc420176500 *attr.Thing #UID-3B a small leather
		pouch
	[DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5


We can see the server is processing a "$SPAWN" command. It first locks on
"bot5" and then relocks on "a small leather pouch" and "bot5". This is clearly
wrong, we should only ever be locking top level inventories.

This is quite a relief. I was starting to question whether my whole locking
scheme with the BRL — big room lock — was flawed.

Now I need to fix event processing and the reset/respawn code. I need to check
the action event code — just in case it has a similar issue. I’m also thinking
I might modify the cmd/state.AddLock method to automatically find the top most
Inventory when adding a lock.

--
Diddymus


  Up to Main Index                              Up to Journal for July, 2017