Up to Main Index Up to Journal for July, 2017 JOURNAL FOR SUNDAY 9TH JULY, 2017 ______________________________________________________________________________ SUBJECT: Cause of data race found DATE: Sun 9 Jul 23:08:31 BST 2017 I finally think have a handle on the data race that has been plaguing me for the last week. I don’t have a fix yet, but I’ve found out what the issue is. The reset/respawn code is locking on the wrong inventories. Discovering the root cause of the issue has been a nightmare. I find that the trace produced when a data race occurs does not provide enough detail. You get the stack trace for the goroutines involved but the parameter information is missing, unlike a trace from a nil panic for example. I guess storing the extra information would take too much memory — there is already a large memory overhead when enabling race detection. It also hasn’t helped that triggering the data race can take up to half an hour for each run. That’s a long time to wait to see if your logging is going to capture the data you need. You’ll also be logging and capturing a lot of data waiting for the data race, then you need to sift through all the data after each failure. After adding a lot of debugging code, and a lot of waiting, it was this log that pointed to the root cause of the data race: Event delivered for 0xc420176480 *attr.Thing #UID-3A: "a small red ball" Input: "$SPAWN" [DEBUG] 0xc4200ad340 Handle: $SPAWN [] [DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5 [DEBUG] 0xc4200ad340 Handle: $SPAWN [] [DEBUG] 0xc4200ad340 Locked: 0xc420176500 *attr.Thing #UID-3B a small leather pouch [DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5 ================== WARNING: DATA RACE Write at 0x00c420173140 by goroutine 74: code.wolfmud.org/WolfMUD.git/attr.(*Inventory).Move() .../src/code.wolfmud.org/WolfMUD.git/attr/inventory.go:218 +0xf31 code.wolfmud.org/WolfMUD.git/attr.(*Inventory).Add() .../src/code.wolfmud.org/WolfMUD.git/attr/inventory.go:115 +0x6a code.wolfmud.org/WolfMUD.git/cmd.Reset() .../src/code.wolfmud.org/WolfMUD.git/cmd/reset.go:44 +0x1fb code.wolfmud.org/WolfMUD.git/cmd.(*state).handleCommand() .../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:193 +0x5ab code.wolfmud.org/WolfMUD.git/cmd.(*state).sync() .../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:157 +0x1a8 code.wolfmud.org/WolfMUD.git/cmd.Script() .../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:74 +0x8a code.wolfmud.org/WolfMUD.git/event.Queue.func1() .../src/code.wolfmud.org/WolfMUD.git/event/event.go:102 +0x7c6 Previous read at 0x00c420173140 by goroutine 20: code.wolfmud.org/WolfMUD.git/attr.(*Inventory).Search() .../src/code.wolfmud.org/WolfMUD.git/attr/inventory.go:257 +0x49 code.wolfmud.org/WolfMUD.git/cmd.Take() .../src/code.wolfmud.org/WolfMUD.git/cmd/take.go:76 +0x342 code.wolfmud.org/WolfMUD.git/cmd.(*state).handleCommand() .../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:193 +0x5ab code.wolfmud.org/WolfMUD.git/cmd.(*state).sync() .../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:157 +0x1a8 code.wolfmud.org/WolfMUD.git/cmd.Parse() .../src/code.wolfmud.org/WolfMUD.git/cmd/state.go:63 +0x74 code.wolfmud.org/WolfMUD.git/frontend.(*game).gameProcess() .../src/code.wolfmud.org/WolfMUD.git/frontend/game.go:54 +0xe8 code.wolfmud.org/WolfMUD.git/frontend.(*game). (code.wolfmud.org/WolfMUD.git/frontend.gameProcess)-fm() .../src/code.wolfmud.org/WolfMUD.git/frontend/game.go:47 +0x41 code.wolfmud.org/WolfMUD.git/frontend.(*frontend).Parse() .../src/code.wolfmud.org/WolfMUD.git/frontend/frontend.go:153 +0x126 code.wolfmud.org/WolfMUD.git/comms.(*client).process() .../src/code.wolfmud.org/WolfMUD.git/comms/client.go:128 +0x66b The key lines are those just before the trace: Event delivered for 0xc420176480 *attr.Thing #UID-3A: "a small red ball" Input: "$SPAWN" [DEBUG] 0xc4200ad340 Handle: $SPAWN [] [DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5 [DEBUG] 0xc4200ad340 Handle: $SPAWN [] [DEBUG] 0xc4200ad340 Locked: 0xc420176500 *attr.Thing #UID-3B a small leather pouch [DEBUG] 0xc4200ad340 Locked: 0xc420054900 *attr.Thing #UID-4S bot5 We can see the server is processing a "$SPAWN" command. It first locks on "bot5" and then relocks on "a small leather pouch" and "bot5". This is clearly wrong, we should only ever be locking top level inventories. This is quite a relief. I was starting to question whether my whole locking scheme with the BRL — big room lock — was flawed. Now I need to fix event processing and the reset/respawn code. I need to check the action event code — just in case it has a similar issue. I’m also thinking I might modify the cmd/state.AddLock method to automatically find the top most Inventory when adding a lock. -- Diddymus Up to Main Index Up to Journal for July, 2017