Up to Main Index Up to Journal for September, 2021 JOURNAL FOR WEDNESDAY 15TH SEPTEMBER, 2021 ______________________________________________________________________________ SUBJECT: Text folding now enabled for text sent to players DATE: Wed 15 Sep 21:07:51 BST 2021 Plain text. Simple, reliable, portable, hrm… portable apart from various line endings, tabs, code pages and control codes. What about Unicode? Is something written using Unicode plain text? Some people say yes, some say no… I think Unicode is not strictly plain text because it requires processing. Take UTF-8, it is an encoding and needs to be decoded into codepoints. Even if you start with just unencoded codepoints they require additional processing. Take the grapheme ‘é’, in UTF-8 that could be encoded as the bytes ‘0xC3 0xA9’ or ‘0x65 0xCC 0x81’. Decoded the UTF-8 bytes become the codepoints ‘U+00E9’ or ‘U+0065 U+0301’. The second form may then be normalised to ‘U+00E9’. Depending on your operating system, locale settings and how tarnished your luck currently is, displaying a file containing the following: It's not like Zoë is going to the café, which has a very nice façade, for some crème brûlée with her doppelgänger! could result in any of: It's not like Zoe<CC><88> is going to the cafe<CC><81>, which has a very nice fac<CC><A7>ade, for some cre<CC><80>me bru<CC><82>le<CC><81>e with her doppelga<CC><88>nger! It's not like ZoeÌ is going to the cafeÌ, which has a very nice façade, for some creÌme bruÌleÌe with her doppelgaÌnger! It's not like Zoe ~L is going to the cafe ~L, which has a very nice fac ~L ade, for some cre ~Lme bru ~Lle ~Le with her doppelga ~Lnger! It's not like Zo? is going to the caf?, which has a very nice fa?ade, for some cr?me br?l?e with her doppelg?nger! It's not like Zoe�. is going to the cafe�., which has a very nice façade, for some cre�.me bru�.le�.e with her doppelga�.nger! Why do I bring this up? It sort of explains the rabbit hole I’ve been down for the last week or so. WolfMUD needs to fold — or wrap — text so that paragraphs fit nicely on a player’s screen. At the moment a width of 80 characters is assumed but it will be variable in the future — set by the player. Folding/wrapping ASCII is easy. Split text into words on an ASCII space ‘0x20’ and add words together until the next word length would exceed the line length at which point you insert ‘\r\n’ to start the next line and continue adding more words. The length of a word is the number of bytes, one byte per visible character. WolfMUD will let you use Unicode. You can write zone files in any language you want. Although, at the moment, you would need to translate any hard-coded message text. I’m also not sure how well place holders for substitutions would work in languages other than English. “You put %X into %Y” for example. This means that each and every message sent to a client is individually folded/wrapped. Every message therefore needs to be converted to runes, processed, converted to bytes and sent to the client. Converting bytes or a string to runes takes time, as does converting back to bytes. The processing takes time. Working out the length of a word is exceptionally tedious, until you process the stream you have unknown bytes per code point, unknown codepoints per grapheme — some are non-spacing, or combining, or zero width… Unicode can be quite a headache. Even with the built-in support Go provides. I tried to relieve my headache with a dose of 3rd party libraries. For what I needed they were bloated and/or slow :( WolfMUD has a Fold function in text/fold.go which is quite good. It also handles ANSI escape sequence for colours, ‘␠’[1] U+2420 for hard spaces and a few other bits. Fold is already quite fast. However, a lot of effort and work has been put into the experiment to make fast. Using the current Fold function would have just slowed things down again :( I wanted a faster Fold method dammit! So I set out to write a better Fold, and took over a week doing it… The new implementation is over 165% faster ;) Some results from benchstat folding different widths of 4.7k of ASCII text and 4.8k of Unicode text — most message are a lot shorter, usually less than 512 bytes: NAME OLD TIME/OP NEW TIME/OP DELTA FoldLipsumASCII/Width_20-4 117µs ± 6% 44µs ± 2% -62.79% (p=0 n=10+10) FoldLipsumASCII/Width_40-4 115µs ± 4% 43µs ± 2% -62.25% (p=0 n=10+10) FoldLipsumASCII/Width_80-4 116µs ± 4% 44µs ± 2% -62.31% (p=0 n=10+10) FoldLipsumASCII/Width_100-4 116µs ± 4% 43µs ± 1% -62.64% (p=0 n=10+10) FoldLipsumASCII/Width_120-4 116µs ± 3% 43µs ± 3% -62.54% (p=0 n=9+10) FoldLipsumASCII/Width_140-4 115µs ± 6% 44µs ± 2% -61.96% (p=0 n=10+9) FoldLipsumASCII/Width_160-4 116µs ± 3% 43µs ± 2% -62.61% (p=0 n=10+10) FoldLipsumUTF8/Width_20-4 122µs ± 4% 46µs ± 2% -62.67% (p=0 n=9+10) FoldLipsumUTF8/Width_40-4 121µs ± 4% 46µs ± 1% -62.22% (p=0 n=10+10) FoldLipsumUTF8/Width_80-4 123µs ± 2% 46µs ± 2% -62.60% (p=0 n=8+10) FoldLipsumUTF8/Width_100-4 123µs ± 3% 46µs ± 2% -62.60% (p=0 n=9+10) FoldLipsumUTF8/Width_120-4 124µs ± 1% 46µs ± 2% -62.80% (p=0 n=8+9) FoldLipsumUTF8/Width_140-4 125µs ± 2% 46µs ± 3% -63.08% (p=0 n=8+10) FoldLipsumUTF8/Width_160-4 122µs ± 4% 46µs ± 3% -62.24% (p=0 n=10+10) The new Fold function even passes all of the current Fold tests. With a server running 64,000 bots folding text adds about a 5-10% CPU overhead. I may have taken a few liberties with my Unicode codepoints, UTF-8, ASCII and ANSI escape sequence handling but it all seems to be hanging together. Change are out on the public experiment branch. I’ve reverted the previous changes for the gastly line endings hack for Windows players — the new Fold method converts ‘\n’ to ‘\r\n’ just like the previous Fold method. Now to add some colour and a little text formatting… -- Diddymus [1] U+2420 is the “symbol for space” which may not render on this page as it is not in the Go Mono font this site uses, although your browser may substitute with another font. It should render as a superscript ‘S’ over a subscript ‘P’. It’s like ℅ but no slash and replace C&O with S&P :P Up to Main Index Up to Journal for September, 2021