Up to Main Index                             Up to Journal for April, 2019

                     JOURNAL FOR THURSDAY 4TH APRIL, 2019
______________________________________________________________________________

SUBJECT: 16 years that are gone forever and I’ll never have again
   DATE: Thu  4 Apr 21:55:08 BST 2019

Title inspired by Guns N’ Roses “14 years” ;)

Monday morning I found an email in my spam folder. It was sent late Sunday
night, and contained a scan of a letter. My employment was terminated and I
wasn’t to finish serving my notice period. An impersonal email, no call, no
thank you for your service, no heads-up, nothing. So much for 16 years service
and loyally seeing it through till the very bitter end instead of bailing. I
am now unemployed and find myself in a strange limbo between jobs.

But enough of that!

While rewriting my static site generator I’ve been slowly fixing up entries to
this journal, and the website in general, as I spot things. It was during one
round of such changes that I noticed something odd. Some of the glyphs for
older journal entries appeared different. They were being rendered using a
different font as they didn’t appear in Roboto Mono. Odd.

Way back in 2016 I switched the site to use the then new Go Mono font. Then on
a whim I changed to Roboto Mono back in April/May 2018. I say a whim because
I’m not sure why I made the change. I suspect it was a performance thing.

Anyway, Roboto has a lot fewer glyphs in it than Go. I’m still not 100% happy
with either font. However, not finding a suitable replacement, I’ve switched
back to the Go font for now. The question is: are all of the glyphs I’ve used
now covered? How would I find out?

First step is to get a list of all the glyphs in each of the two fonts. Easy
enough:


  $ fc-match --format='%{file}\n%{charset}\n' 'Go Mono'
  /usr/share/fonts/fonts-go/Go-Mono.ttf
  20-7e a0-17f 192 1fa-1ff 218-21b 2c6-2c7 2c9 2d8-2dd 384-38a 38c 38e-3a1
  3a3-3ce 400-45f 490-491 1e80-1e85 1ef2-1ef3 2013-2015 2017-201e 2020-2022
  2026 2030 2032-2033 2039-203a 203c 203e 2044 207f 20a3-20a4 20a7 20ac 2105
  2113 2116 2122 2126 212e 215b-215e 2190-2195 21a8 2202 2206 220f 2211-2212
  2215 2219-221a 221e-221f 2229 222b 2248 2260-2261 2264-2265 2302 2310
  2320-2321 2500 2502 250c 2510 2514 2518 251c 2524 252c 2534 253c 2550-256c
  2580 2584 2588 258c 2590-2593 25a0-25a1 25aa-25ac 25b2 25ba 25bc 25c4
  25ca-25cb 25cf 25d8-25d9 25e6 263a-263c 2640 2642 2660 2663 2665-2666
  266a-266b f800 fb01-fb02 fffd

  $ fc-match --format='%{file}\n%{charset}\n' 'Roboto Mono'
  /home/rolfea/.fonts/RobotoMono-Regular.ttf
  20-7e a0-17f 192 1a0-1a1 1af-1b0 1f0 1fa-1ff 218-21b 237 259 2bc 2c6-2c7 2c9
  2d8-2dd 2f3 300-301 303 309 30f 323 384-38a 38c 38e-3a1 3a3-3ce 3d1-3d2 3d6
  400-486 488-513 1e00-1e01 1e3e-1e3f 1e80-1e85 1ea0-1ef9 1f4d 2000-200b
  2013-2015 2017-201e 2020-2022 2025-2026 2030 2032-2033 2039-203a 203c 2044
  2074 207f 20a3-20a4 20a7 20ab-20ac 2105 2113 2116 2122 2126 212e 215b-215e
  2202 2206 220f 2211-2212 221a 221e 222b 2248 2260 2264-2265 25ca f6c3 feff
  fffc-fffd


I then mixed in a little ed-fu. I really like regular expressions and because
of that I'm probably one of the few people who actually like the ed editor[1].
I ended up with glyphs.ed as:


  # Delete file name from 1st line
  1d
  # Split ranges onto separate lines
  ,s/\s/\
  /g
  # Turn single value X into a range X-X
  ,s/^\([0-9a-f]\+\)$/&-&/
  # add leading ‘  {0x’
  ,s/^/  {0x/
  # Change the ‘-’ in the range to ‘, 0x’
  ,s/-/, 0x/
  # Add trailing ‘},’
  ,s/$/},/
  # Insert ‘{’ and the ASCII range at start of file
  1i
  {
    {0x00, 0x1f}, // Added ASCII range
  .
  # Add ‘}’ at the end of the file
  $a
  }
  .
  # Write out to a temporary file
  w temp.txt
  # Quit ed
  q


Scripting ed with this file produced something I could use in a struct with
start and end ranges for a font:


  $ cat glyphs.ed | ed <(fc-match --format='%{file}\n%{charset}\n' 'Go Mono')
  $ cat temp.txt
  {
    {0x00, 0x1f}, // Added ASCII range
    {0x20, 0x7e},
    {0xa0, 0x17f},
    {0x192, 0x192},
    //
    // Table truncated for brevity
    //
    {0xf800, 0xf800},
    {0xfb01, 0xfb02},
    {0xfffd, 0xfffd},
  }


I then wrote some quick and nasty Go code that simply loads a file, converts
the data to a []rune and then checks if each rune is in the generated table:


  package main

  import (
    "bytes"
    "fmt"
    "io/ioutil"
    "os"
    "strconv"
  )

  var glyphs = []struct {
    from, to rune
  }{
    {0x00, 0x1f}, // Added ASCII range
    {0x20, 0x7e},
    {192,192},
    //
    // Table truncated for brevity
    //
    {0xf800, 0xf800},
    {0xfb01, 0xfb02},
    {0xfffd, 0xfffd},
  }

  func main() {
    data, err := ioutil.ReadFile(os.Args[1])
    if err != nil {
      fmt.Println(err)
      return
    }
    notFound := make(map[rune]struct{})
  nextr:
    for _, r := range bytes.Runes(data) {
      for _, p := range glyphs {
        if r >= p.from && r <= p.to {
          continue nextr
        }
      }
      notFound[r] = struct{}{}
    }
    if len(notFound) > 0 {
      fmt.Printf("File: %s\nGlyphs not in font:\n", os.Args[1])
      for r := range notFound {
        fmt.Printf("  %-8U  %12v  %-4q\n", r, utf8(r), r)
      }
    }
  }

  func utf8(r rune) string {
    b := []byte(string(r))
    s := bytes.Repeat([]byte(" .."), 4-len(b))
    for _, b := range b {
      s = append(s, ' ')
      if b < 16 {
        s = append(s, '0')
      }
      s = append(s, strconv.FormatInt(int64(b), 16)...)
    }
    return string(s)
  }


Like I said, quick and nasty hack :) Running it produces:


  $ find ./public -name "*.txt" | xargs -n1 ./glyphs
  File:  ./public/journal/2016/10/16.txt
  Glyphs not in font:
    U+82B3     .. e8 8a b3  '芳'
    U+0301     .. .. cc 81  '́'
  File: ./public/journal/2016/11/11.txt
  Glyphs not in font:
    U+2420     .. e2 90 a0  '␠'
  File: ./public/journal/2012/10/4.txt
  Glyphs not in font:
    U+611B     .. e6 84 9b  '愛'
  File: ./public/journal/2012/11/26.txt
  Glyphs not in font:
    U+0336     .. .. cc b6  '̶'
  File: ./public/journal/2013/4/11.txt
  Glyphs not in font:
    U+23CE     .. e2 8f 8e  '⏎'


I could then edit the files as necessary and remove or replace any offending
glyphs.

All in all, a nice quite afternoon just hacking around :)

--
Diddymus

  [1] I've written about ed before: ../../2016/6/8.html


  Up to Main Index                             Up to Journal for April, 2019