WolfMUD: Journal for Saturday 5th February, 2022


  Up to Main Index                          Up to Journal for February, 2022

                   JOURNAL FOR SATURDAY 5TH FEBRUARY, 2022
______________________________________________________________________________

SUBJECT: A minor setback, moving on to plan C
   DATE: Sat  5 Feb 20:31:07 GMT 2022

Still working on health and HIT for player combat. My initial plan A didn’t
work out. It was getting overly complex with far to many corner cases failing.
So I deleted all the ugly…

I then tried something I really didn’t want to do. I implemented a plan B with
the health regeneration using the event scheduling system. The implementation
was much simpler. However, what would happen if I ran 64,000 bots and had
64,000 additional events registered? Turns out CPU usage goes up a little and
some additional memory is allocated.

Problem solved right? No :( While testing I managed to trigger a nil panic.
This was due to the player being freed on the networking/client side of things
due to a network error (I hit Ctrl-c to abort the botrunner) and the health
regeneration trigger game-side side trying to access now nil maps in the freed
player. It took a good few hours of debugging to realise why the server had
panicked.

I tried to fix the issue by checking if the player had been freed already.
Luckily I test a lot with the race detector as this started causing a data
race. What? Why? The code was locking and everything looked fine:


  func (s *state) Parse(input string) (cmd string) {
    if input = strings.TrimSpace(input); len(input) != 0 {
      BWL.Lock()
      defer BWL.Unlock()

      if s.actor.Is&Freed == Freed {
        return ""
      }

      s.parse(input)
      s.mailman()
    }
    return s.cmd
  }


The “if s.actor.Is&Freed == Freed {” was racing with the “s.parse(input)” ??

Turns out I can’t use the event scheduler for player events without triggering
a data race. I’m not sure of the exact cause but suspect it’s because normally
the networking code always uses the same state, which it creates and controls
outside of the lock and all calls are made on that goroutine. When introducing
the event a new state is created when the event fires and the calls are made
on a different goroutine. Accessing the actor in two different states on two
different goroutines then causes the data race, even though the lock has been
taken by each gorotine in turn. For the above code the race happens with the
read for “s.actor.Is&Freed” in one state and a write by “s.parse(input)” in
another state affect the same actor.

This took a lot more debugging to unravel what was going on there.

I now have to throw away plan B and come up with a plan C — I already have
more ideas to try. In the meantime I’ve reverted all my changes. I’ve also had
a script running that loops through running the botrunner for a while, then
unceremoniously kills it, waits a bit then starts over — just in case the data
race wasn’t introduced by these changes…

--
Diddymus


  Up to Main Index                          Up to Journal for February, 2022