Up to Main Index Up to Journal for June, 2017 JOURNAL FOR MONDAY 19TH JUNE, 2017 ______________________________________________________________________________ SUBJECT: An update on data races DATE: Mon 19 Jun 22:57:38 BST 2017 This weekend has been hot, over 30°C. This weekend has also seen me trying to debug some data races in WolfMUD. This requires running WolfMUD for hours on end with ten of thousands of players. As a result my tiny study has been extremely hot and noisy what with the cooling fans in my machines ramping up their speed and dumping the heat into the study. I did start testing on my dinky desktop machine. A few times it become non-responsive and locked up completely due to running out of RAM and starting to swap heavily to disk — this machine only has a dual core and 4GB RAM in it. So I ended up running tests on another machine with 8 cores and 16GB RAM. Running the race detector does have a large RAM overhead. Especially when you have to increase the history size to avoid “failed to restore the stack” errors in the race reports[1]. Investigating and debugging the data races has left me a little mystified. The errors I have found are indeed actual errors and have been present for some time now. However the race detector has been silent until recently. Maybe the race detector has become smarter? So what is the issue in WolfMUD that is causing a data race? In the cmd package there is a state type. The state type coordinates the parsing and processing of commands. It also handles locking via the state.sync method and the BRL (big room lock) found in attr/internal. The data race arises because the state.newState method has this line it in: s.where = attr.FindLocate(t).Where() We are accessing a Thing’s attributes to find a Where attribute to find out where the current actor/command issuer is. However this is done before calling the sync method and so before our locking has been set up. To fix the data race requires a number of changes to be made. The Locate type, specifically its ‘where’ and ‘origin’ fields, need to be accessible in a concurrently safe way. The FindLocate method iterates over a Thing’s map of attributes. This access also needs to be made concurrently safe and it looks like I’m going to have to rework all of the current finders. When I say access should be concurrently safe that is access outside of the protection of the state.sync method which would normally handle all of the locking and synchronisation. I’m none to happy about adding lots of locks as it will mean every Locate instance will have a sync.Mutex and every Thing will have a sync.Mutex as well. Instead of having a lock for the Thing type and the BRL for the Locate type I’m wondering if I should just promote the BRL to the Thing type instead? It’s now nearly 11 pm and the temperature in my study is still 32°C :P -- Diddymus [1] See: https://golang.org/doc/articles/race_detector.html#Options Up to Main Index Up to Journal for June, 2017