Up to Main Index Up to Journal for July, 2017 JOURNAL FOR SATURDAY 22ND JULY, 2017 ______________________________________________________________________________ SUBJECT: Let’s try this again… DATE: Sat 22 Jul 21:56:25 BST 2017 Argh! This is driving me mad. How does the WolfMUD server know when I am writing a journal entry? Last week I was writing an entry after the server had been running for days without issue. As I finished writing the server threw out another data race trace, making the whole post moot. This week I reworked the post after a lot more debugging and testing and the server did it again :( As you can probably guess I’m still trying to solve the data races. I believe I’m making some progress as it’s taking longer and longer for the server to throw out the warnings. I have not managed to find the root cause of the issue yet and it’s getting very frustrating. As far as I can tell all of the correct locks are being taken and in the correct order. It doesn’t help that I’ve only been getting short periods of time to work on WolfMUD lately. I really need some proper time to sit down and work through the issues and the code properly instead of adding random debugging and poking things with a sharp stick. I did find and fix one locking issue. We should always take the lock of the outermost Inventory when locking. For example, if we have a location with a player who has a container and an item resets in the container we need to lock the outermost inventory, which in this case would be the location — currently the container’s Inventory would be locked. In this example the Inventory hierarchy is container→player→location. I’ve added some code to automatically find the outermost Inventory lock when adding a lock. Guess I need to go check documentation in various places as well now. One thing I have found really helpful when debugging is this little function: func GetGID() []byte { b := make([]byte, 64) b = b[:runtime.Stack(b, false)] b = bytes.TrimPrefix(b, []byte("goroutine ")) return b[:bytes.IndexByte(b, ' ')] } It simply returns the current goroutine number so that I can prefix debug messages with it and match up the log lines with the goroutine numbers reported in the data race trace. Yes, I know it’s a bad idea[1]. The function is fragile and parses the goroutine number out of the stack data — the format of which may change in the future breaking the function. For now, for me, for debugging, it’s a great help. It’s not very efficient and I’ve found other ways of doing it, but this is portable across platforms. Where does this all leave me? I’ve had two machines running a server each with player bots running around for over a day with no data races. Then again, I’m just finishing this entry so… just checked and both machines still running okay ;) Think it might be time to tidy up the latest round of fixes and push out some updates at last. -- Diddymus [1] Apparently if you do something like this, especially to implement goroutine local storage, general consensus is you’re going to hell. Up to Main Index Up to Journal for July, 2017