Up to Main Index Up to Journal for April, 2018 JOURNAL FOR WEDNESDAY 11TH APRIL, 2018 ______________________________________________________________________________ SUBJECT: Commenting regular expressions in Go DATE: Wed 11 Apr 21:51:14 BST 2018 I am currently waiting to see if there is any fallout from the recent player saving and loading changes before preparing the next release. So far things have been quiet, and my own testing has not uncovered any issues. It is a sad fact that more time is spent debugging WolfMUD than adding new features. I’ve been promising to write tests for WolfMUD for ages. I feel the time has come to knuckle down and sort out testing. There are a few areas of the code that have tests written already, but they need improving. Testing is often seen as a boring, unglamorous and thankless task. However, I look at it as a way of not just hunting for bugs, but of reviewing and improving the code. Once all of the tests are written it will allow me to make some far reaching changes. Changes that until now I’ve not been confident enough to make without introducing subtle mistakes. Where to start? I thought I’d start with the recordjar package. It has some tests already, but I’m not happy with them. Looking at recordjar.go one of the first lines I see is this: var splitLine = regexp.MustCompile(`^(?:([^\s:]+):)?\s*(.*?)$`) I love regular expressions, but was this one correct? What did it actually do? First thing to do, break the expression down and work out the parts. In Go there is no easy way to document regular expressions. Something like Perl’s /x modifier that lets you put comments inside the regular expression would be a nice addition. The best compromise I could come up with was to join a []string, which can be commented and formatted. However, all of the quoting and commas get in the way of seeing the regular expression, and you can’t indent parts of it: var splitLine = regexp.MustCompile(strings.Join([]string{ `^`, // match start of string `(?:`, // non-capture group for 'field:' `([^\s:]+)`, // capture 'field' - non-whitespace/non-colon `:`, // non-capture match of colon as field:value separator `)?`, // match non-captured 'field:' zero or once, prefer once `\s*`, // consume any whitepace - leading or after 'field:' if matched `(.*?)`, // capture everything left umatched, not greedy `$`, // match at end of string }, "")) Meh, better than `^(?:([^\s:]+):)?\s*(.*?)$`, but still ugly :( So, while working on this post I decided to do more than moan. I created a very simple CommentedRE function: // uncommentRE is a regular expression to remove embedded comments and // leading/trailing whitespace from a regular expression string. var uncommentRE = regexp.MustCompile(`(?m)(?:\s*#\s.*$|^\s*|\n)`); // CommentedRE uncomments a commented regular expression. It takes a regular // expression as a string and: removes comments delimited with a '#' and at // least one whitespace character, removes any leading or trailing // whitespace. The resulting string is then returned. func CommentedRE(re string) string { return uncommentRE.ReplaceAllString(re, "") } I then modified the splitLine string and commented the regular expression: var splitLine = regexp.MustCompile(CommentedRE(` ^ # match start of string (?: # non-capture group for 'field:' ([^\s:]+) # capture 'field' - non-whitespace/non-colon : # non-capture match of colon as field:value separator )? # match non-captured 'field:' zero or once, prefer once \s* # consume any whitepace - leading or after 'field:' if matched (.*?) # capture everything left umatched, not greedy $ # match at end of string `)) Much better! I decided to stick with Perl’s ‘#’ as the comment delimiter to avoid confusion with real Go comments. There are still some downsides. Any ‘#’ characters will be removed from the commented regular expression if followed by whitespace. Using ‘go fmt’ will not format the comments nicely — they have to be manually aligned. Lastly, I now have another regular expression to write tests for! Getting back to the splitLine regular expression. The regular expression is used to split a line in a .wrj file into a field name and its data. For example: Name: Diddymus The first non-capturing group will match ‘Name:’, within that group the capturing group will match ‘Name’. The reason for the non-capturing group is that if the colon is not matched with a field name we want it captured by the second capturing group, which captures the data associated with the field. For example in: OnAction: The rabbit hops around a bit. : The rabbit makes a soft squeaking and chattering noise. The second line is a continuation of the first, and the colon at the start is part of the data. A nice property of using regexp.FindSubmatch with the capturing groups is that it will always return a three element slice: the original input, the field name and the data. If there is no field name or no data those elements will be empty []byte. Testing the splitLine regular expression is quite simple: func TestSplitLine(t *testing.T) { for _, test := range []struct { input string field string // Expected field name data string // Expected data value }{ {"a: b", "a", "b"}, // Normal 'field: data' {"a:b", "a", "b"}, // 'field:data' - no space {"a:", "a", ""}, // field only {"a: ", "a", ""}, // field only - with space {":b", "", ":b"}, // no field, ':' + data only {": b", "", ": b"}, // no field, ': ' + data only {"b", "", "b"}, // data only {":", "", ":"}, // colon only {"", "", ""}, // empty line {" ", "", ""}, // space only line {"a:b:c", "a", "b:c"}, // field:data + embedded colon {"a: b:c", "a", "b:c"}, // field: data + embedded colon // Don't expect to see these lines, such lines should be filtered out // and not passed to splitLine. {"// Comment", "", "// Comment"}, // a comment line {"%%", "", "%%"}, // a record separator } { t.Run(test.input, func(t *testing.T) { have := splitLine.FindSubmatch([]byte(test.input)) if lhave, lwant := len(have), 3; lhave != lwant { t.Errorf("length - have: %d %q, want %d [%q %q %q]", lhave, have, lwant, test.input, test.field, test.data) return } if have, want := string(have[1]), test.field; have != want { t.Errorf("field - have: %q, want: %q", have, want) } if have, want := string(have[2]), test.data; have != want { t.Errorf("data - have: %q, want: %q", have, want) } }) } } That’s one line tested, just over 6300 more lines of code to go… -- Diddymus Up to Main Index Up to Journal for April, 2018