Up to Main Index Up to Journal for August, 2018 JOURNAL FOR FRIDAY 31ST AUGUST, 2018 ______________________________________________________________________________ SUBJECT: Improved recordjar encoder and tests DATE: Fri 31 Aug 05:05:56 BST 2018 This month has been a poor one for the journal. Only two entries with just enough time to squeeze in one more. I’ve not been idle. As mentioned in my last entry, I’ve continued working on writing tests and improving the existing code. Currently this involves giving the recordjar encoder and decoder packages some much needed attention. I thought the encoder and decoder were in pretty good shape — they do their jobs after all. However, there are issues such as not trimming white space, white space in keywords and inconsistent ordering due to the use of maps. A lot of the issues I’m seeing you wouldn’t notice as a normal user. Reading and writing of zone and player files just works. Data being read is sanity checked as is data being written out again. It’s the bit in the middle, between reading and writing, where code is manipulating the data that things can go wrong. I could just say “if the code is messing up the data it needs fixing”. That doesn’t sit well with me, especially as I expect other people to work with the code as well. Who knows, the recordjar package might be useful enough to someone that they want to use it in a different project altogether, which is fine by me. I’ve therefore been spending time making the encoder code a bit better: - Encoding a string now trims leading/trailing white space[1]. - Encoding a keyword now trims leading, trailing and internal white space. If you have a keyword such as “On Action” it’s actually two keywords, so the white space is now removed resulting in “Onaction”[2]. - Encoding a keyword list now does the same removal of white space as for individual keywords. In addition it drops empty and duplicate keywords, and sorts the keywords consistently. - Encoding a pair list — such as exits “E→L4 NE→L3 N→L1” — trims leading, trailing and internal white space for the name and value as per keywords. The pairs are also sorted consistently. - Encoding a string list now trims leading and trailing white space from each string. Strings are also sorted in a consistent order. - Encoding a keyed string now trims leading, trailing and internal white space from the keys. Leading and trailing white space is also trimmed from the string value. If no value is specified only the key is returned, without the delimiter. - Encoding a keyed string list now trims white space as for a keyed string. A keyed string list should be sorted by the keys, in a consistent order. - Encoding a duration now rounds to the nearest second, with half seconds rounded up. Code simplified with bytes.Replace instead of complex hand crafted replacement. - Encoding a date/time is now always in the UTC timezone, and the decoder now always decodes to a time.Time in UTC as well. I’ve paid a lot of attention to sorting and the sequencing of things so that ordering is performed consistently. This will be important for the new zone file formatting tool — I have a prototype, but it’s currently useless as it strips comments out of the files[3]. Without reproducibility, due to random ordering, the formatting tool would generate a lot of noise. Encoding a string list or keyed string list has had a documented, long standing bug fixed. Before the fix, strings in the list were just concatenated together: OnAction: The cat curls up and starts to purr like a buzz-saw. : The cat starts to claw at the furniture, scratching deep gouges into the wood. : The cat starts to wash itself. When formatted correctly, this is how it should look: OnAction: The cat curls up and starts to purr like a buzz-saw. : The cat starts to claw at the furniture, scratching deep gouges into the wood. : The cat starts to wash itself. This does introduce another documented bug, well more of a nit really. The formatting outdents strings with a leading “\n: ” by two character places. However, the text is not refolded to take the extra positions into account. Unrelated to the recordjar package, a minor change that slipped in — logging now uses UTC for the timestamp. This avoids gaps in the logs when entering daylight saving and, more importantly, when exiting daylight saving avoids logs overlapping. A few people nagged me about this, and I listened. Tests and benchmarks have been written for all of the above changes. Some of the tests might be little dubious and need refining. Everything is now on the public dev branch. -- Diddymus [1] While working on this article I’m wondering if I should be trimming white space except for line feeds ‘\n’? Could it mess up formatting using blank lines, like in the server greeting? I’ll have to spend some time looking into that… [2] Still deciding on whether to make the “Onaction” to “On-Action” change. [3] Preserving comments is also on the todo list. Up to Main Index Up to Journal for August, 2018