Up to Main Index                            Up to Journal for August, 2018

                     JOURNAL FOR FRIDAY 31ST AUGUST, 2018
______________________________________________________________________________

SUBJECT: Improved recordjar encoder and tests
   DATE: Fri 31 Aug 05:05:56 BST 2018

This month has been a poor one for the journal. Only two entries with just
enough time to squeeze in one more.

I’ve not been idle. As mentioned in my last entry, I’ve continued working on
writing tests and improving the existing code. Currently this involves giving
the recordjar encoder and decoder packages some much needed attention.

I thought the encoder and decoder were in pretty good shape — they do their
jobs after all. However, there are issues such as not trimming white space,
white space in keywords and inconsistent ordering due to the use of maps.

A lot of the issues I’m seeing you wouldn’t notice as a normal user. Reading
and writing of zone and player files just works. Data being read is sanity
checked as is data being written out again. It’s the bit in the middle,
between reading and writing, where code is manipulating the data that things
can go wrong.

I could just say “if the code is messing up the data it needs fixing”. That
doesn’t sit well with me, especially as I expect other people to work with the
code as well. Who knows, the recordjar package might be useful enough to
someone that they want to use it in a different project altogether, which is
fine by me.

I’ve therefore been spending time making the encoder code a bit better:


  - Encoding a string now trims leading/trailing white space[1].

  - Encoding a keyword now trims leading, trailing and internal white space.
    If you have a keyword such as “On Action” it’s actually two keywords, so
    the white space is now removed resulting in “Onaction”[2].

  - Encoding a keyword list now does the same removal of white space as for
    individual keywords. In addition it drops empty and duplicate keywords,
    and sorts the keywords consistently.

  - Encoding a pair list — such as exits “E→L4 NE→L3 N→L1” — trims leading,
    trailing and internal white space for the name and value as per keywords.
    The pairs are also sorted consistently.

  - Encoding a string list now trims leading and trailing white space from
    each string. Strings are also sorted in a consistent order.

  - Encoding a keyed string now trims leading, trailing and internal white
    space from the keys. Leading and trailing white space is also trimmed from
    the string value. If no value is specified only the key is returned,
    without the delimiter.

  - Encoding a keyed string list now trims white space as for a keyed string.
    A keyed string list should be sorted by the keys, in a consistent order.

  - Encoding a duration now rounds to the nearest second, with half seconds
    rounded up. Code simplified with bytes.Replace instead of complex hand
    crafted replacement.

  - Encoding a date/time is now always in the UTC timezone, and the decoder
    now always decodes to a time.Time in UTC as well.


I’ve paid a lot of attention to sorting and the sequencing of things so that
ordering is performed consistently. This will be important for the new zone
file formatting tool — I have a prototype, but it’s currently useless as it
strips comments out of the files[3]. Without reproducibility, due to random
ordering, the formatting tool would generate a lot of noise.

Encoding a string list or keyed string list has had a documented, long
standing bug fixed. Before the fix, strings in the list were just concatenated
together:


  OnAction: The cat curls up and starts to purr like a buzz-saw. : The cat
            starts to claw at the furniture, scratching deep gouges into the
            wood. : The cat starts to wash itself.


When formatted correctly, this is how it should look:


  OnAction: The cat curls up and starts to purr like a buzz-saw.
          : The cat starts to claw at the furniture, scratching deep gouges
            into the wood.
          : The cat starts to wash itself.


This does introduce another documented bug, well more of a nit really. The
formatting outdents strings with a leading “\n: ” by two character places.
However, the text is not refolded to take the extra positions into account.

Unrelated to the recordjar package, a minor change that slipped in — logging
now uses UTC for the timestamp. This avoids gaps in the logs when entering
daylight saving and, more importantly, when exiting daylight saving avoids
logs overlapping. A few people nagged me about this, and I listened.

Tests and benchmarks have been written for all of the above changes. Some of
the tests might be little dubious and need refining.

Everything is now on the public dev branch.

--
Diddymus

  [1] While working on this article I’m wondering if I should be trimming
      white space except for line feeds ‘\n’? Could it mess up formatting
      using blank lines, like in the server greeting? I’ll have to spend some
      time looking into that…

  [2] Still deciding on whether to make the “Onaction” to “On-Action” change.

  [3] Preserving comments is also on the todo list.


  Up to Main Index                            Up to Journal for August, 2018