Up to Main Index Up to Journal for July, 2017
JOURNAL FOR SUNDAY 9TH JULY, 2017
______________________________________________________________________________
SUBJECT: Cause of data race found
DATE: Sun 9 Jul 23:08:31 BST 2017
I finally think have a handle on the data race that has been plaguing me for
the last week. I don’t have a fix yet, but I’ve found out what the issue is.
The reset/respawn code is locking on the wrong inventories.
Discovering the root cause of the issue has been a nightmare. I find that the
trace produced when a data race occurs does not provide enough detail. You get
the stack trace for the goroutines involved but the parameter information is
missing, unlike a trace from a nil panic for example. I guess storing the
extra information would take too much memory — there is already a large memory
overhead when enabling race detection.
It also hasn’t helped that triggering the data race can take up to half an
hour for each run. That’s a long time to wait to see if your logging is going
to capture the data you need. You’ll also be logging and capturing a lot of
data waiting for the data race, then you need to sift through all the data
after each failure.
After adding a lot of debugging code, and a lot of waiting, it was this log
that pointed to the root cause of the data race:
Event delivered for 0xc420176480 *attr.Thing #UID-3A: "a small red ball"
Input: "$SPAWN"
[DEBUG] 0xc4200ad340 Handle: $SPAWN []
[DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5
[DEBUG] 0xc4200ad340 Handle: $SPAWN []
[DEBUG] 0xc4200ad340 Locked: 0xc420176500 *attr.Thing #UID-3B a small leather
pouch
[DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5
==================
WARNING: DATA RACE
Write at 0x00c420173140 by goroutine 74:
code.wolfmud.org/WolfMUD.git/attr.(*Inventory).Move()
.../src/code.wolfmud.org/WolfMUD.git/attr/inventory.go:218 +0xf31
code.wolfmud.org/WolfMUD.git/attr.(*Inventory).Add()
.../src/code.wolfmud.org/WolfMUD.git/attr/inventory.go:115 +0x6a
code.wolfmud.org/WolfMUD.git/cmd.Reset()
.../src/code.wolfmud.org/WolfMUD.git/cmd/reset.go:44 +0x1fb
code.wolfmud.org/WolfMUD.git/cmd.(*state).handleCommand()
.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:193 +0x5ab
code.wolfmud.org/WolfMUD.git/cmd.(*state).sync()
.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:157 +0x1a8
code.wolfmud.org/WolfMUD.git/cmd.Script()
.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:74 +0x8a
code.wolfmud.org/WolfMUD.git/event.Queue.func1()
.../src/code.wolfmud.org/WolfMUD.git/event/event.go:102 +0x7c6
Previous read at 0x00c420173140 by goroutine 20:
code.wolfmud.org/WolfMUD.git/attr.(*Inventory).Search()
.../src/code.wolfmud.org/WolfMUD.git/attr/inventory.go:257 +0x49
code.wolfmud.org/WolfMUD.git/cmd.Take()
.../src/code.wolfmud.org/WolfMUD.git/cmd/take.go:76 +0x342
code.wolfmud.org/WolfMUD.git/cmd.(*state).handleCommand()
.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:193 +0x5ab
code.wolfmud.org/WolfMUD.git/cmd.(*state).sync()
.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:157 +0x1a8
code.wolfmud.org/WolfMUD.git/cmd.Parse()
.../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:63 +0x74
code.wolfmud.org/WolfMUD.git/frontend.(*game).gameProcess()
.../src/code.wolfmud.org/WolfMUD.git/frontend/game.go:54 +0xe8
code.wolfmud.org/WolfMUD.git/frontend.(*game).
(code.wolfmud.org/WolfMUD.git/frontend.gameProcess)-fm()
.../src/code.wolfmud.org/WolfMUD.git/frontend/game.go:47 +0x41
code.wolfmud.org/WolfMUD.git/frontend.(*frontend).Parse()
.../src/code.wolfmud.org/WolfMUD.git/frontend/frontend.go:153 +0x126
code.wolfmud.org/WolfMUD.git/comms.(*client).process()
.../src/code.wolfmud.org/WolfMUD.git/comms/client.go:128 +0x66b
The key lines are those just before the trace:
Event delivered for 0xc420176480 *attr.Thing #UID-3A: "a small red ball"
Input: "$SPAWN"
[DEBUG] 0xc4200ad340 Handle: $SPAWN []
[DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5
[DEBUG] 0xc4200ad340 Handle: $SPAWN []
[DEBUG] 0xc4200ad340 Locked: 0xc420176500 *attr.Thing #UID-3B a small leather
pouch
[DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5
We can see the server is processing a "$SPAWN" command. It first locks on
"bot5" and then relocks on "a small leather pouch" and "bot5". This is clearly
wrong, we should only ever be locking top level inventories.
This is quite a relief. I was starting to question whether my whole locking
scheme with the BRL — big room lock — was flawed.
Now I need to fix event processing and the reset/respawn code. I need to check
the action event code — just in case it has a similar issue. I’m also thinking
I might modify the cmd/state.AddLock method to automatically find the top most
Inventory when adding a lock.
--
Diddymus
Up to Main Index Up to Journal for July, 2017