Up to Main Index                            Up to Journal for August, 2023

                    JOURNAL FOR THURSDAY 31ST AUGUST, 2023
______________________________________________________________________________

SUBJECT: What’s my line again?
   DATE: Thu 31 Aug 20:47:05 BST 2023

With Mere v0.0.7 out I’ve been casting my eye over the todo list and one item
in particular has been nagging me for ages. Debugging and errors. Having a
language that blows-up with an underlying stack trace is not good. It should
provide helpful error messages and say where the error occurred.

Which brings me nicely to today’s topic. Tracking line numbers is hard. Don’t
think so? Take this simple little program:


    1 trace true
    2 call test 3
    3 dump
    4
    5 // test function
    6 test: func limit
    7   for x = 1; x <= limit; x++
    8     if x == 1
    9       println "one"
   10     elif x == 2
   11       println "two"
   12     else
   13       println "many"
   14     fi
   15   next
   16 endfunc


To compile this into something that can be executed it needs to go through a
number of compile phases. Mere has some debugging flags that let me capture
the output from the different phases. This is after the ‘Rewrite2’ phase where
the code is still kind of readable, if you can read reverse polish notation:


    1  … true trace ;
    2  … test 3 call ;
    3  dump ;
    4  ;
    5  ;
    6  guard∙0 goto ;
    7  test: … ·test·limit func ;
    8  x 1 = ;
    9  loop0∙cond goto ;
   10  test·loop0∙start: x 1 == ! is ;
   11  if1∙0 goto ;
   12  … "one" println ;
   13  if∙1 goto ;
   14  test·if1∙0: x 2 == ! is ;
   15  if1∙1 goto ;
   16  … "two" println ;
   17  if∙1 goto ;
   18  test·if1∙1: ;
   19  … "many" println ;
   20  test·if1∙2: test·if∙1: ;
   21  x ++ ;
   22  test·loop0∙cond: x limit <= is ;
   23  loop0∙start goto ;
   24  … endfunc ;
   25  guard∙0: ;


We have now gone from 16 lines of code to 25. Which lines were the original
source code? Here is the problem, how do you keep track of the original source
code lines when the lines are rewritten and new lines added?

A ‘conventional’ compiler would generate an abstract syntax tree, or AST, and
there would be a field for such information. Mere is not conventional, it’s
very simple under the hood, and it cheats…

When Mere initially tokenizes the original source code it adds internal labels
for the original source lines. The same program as initially tokenized:


    1  ·L∙1: trace true ;
    2  ·L∙2: call test 3 ;
    3  ·L∙3: dump ;
    4  ·L∙4: ;
    5  ·L∙5: ;
    6  ·L∙6: test: func limit ;
    7  ·L∙7: for x = 1 ;
    8  x < = limit ;
    9  x + + ;
   10  ·L∙8: if x = = 1 ;
   11  ·L∙9: println "one" ;
   12  ·L∙10: elif x = = 2 ;
   13  ·L∙11: println "two" ;
   14  ·L∙12: else ;
   15  ·L∙13: println "many" ;
   16  ·L∙14: fi ;
   17  ·L∙15: next ;
   18  ·L∙16: endfunc ;


There are two extra lines, 8 and 9 in the above, where the for-next loop has
been split into its three parts. Mere is happy to cope with multiple labels
for the same line, as in line 6 where we have “·L∙6:” and “test:”. The nice
thing about tagging the lines using labels is they move around with the code.

Here is the ‘Rewrite2’ phase again, this time with the additional labels:


    1  ·L∙1: … true trace ;
    2  ·L∙2: … test 3 call ;
    3  ·L∙3: dump ;
    4  ·L∙4: ;
    5  ·L∙5: ;
    6  guard∙0 goto ;
    7  ·L∙6: test: … ·test·limit func ;
    8  ·L∙7: x 1 = ;
    9  loop0∙cond goto ;
   10  test·loop0∙start: ·L∙8: x 1 == ! is ;
   11  if1∙0 goto ;
   12  ·L∙9: … "one" println ;
   13  ·L∙10: if∙1 goto ;
   14  test·if1∙0: x 2 == ! is ;
   15  if1∙1 goto ;
   16  ·L∙11: … "two" println ;
   17  ·L∙12: if∙1 goto ;
   18  test·if1∙1: ;
   19  ·L∙13: … "many" println ;
   20  ·L∙14: test·if1∙2: test·if∙1: ;
   21  ·L∙15: x ++ ;
   22  test·loop0∙cond: x limit <= is ;
   23  loop0∙start goto ;
   24  ·L∙16: … endfunc ;
   25  guard∙0: ;


The labels can now be used to identify the original source lines. Don’t worry,
your executable isn’t going to be littered with labels. There is an ‘Unlabel’
phase which cleans up and optimizes internal labels :) If we run the program,
the dump produced by the original line 3 in the code, looks like this:


  ============================== State Dump ==============================
    SP: 9  inst: 67/0  stacks: 1 vids: 0/3 xref: 15
  -- Global Storage ------------------------------------------------------
    0x0007:L  test       13
  -- Stack Storage -------------------------------------------------------
      0:    9
  -- Code ----------------------------------------------------------------
       1       0 | O[77:…] b[true] F[21:trace] O[78:;]
       2       4 | O[77:…] V[0x0007:L "test"] i[3] F[22:call] O[78:;]
       3       9 | O[3:dump] O[78:;]
       4      11 | L[77] O[5:goto] O[78:;]
     0x0007 > test
       6      14 | O[77:…] V[0x001b:? "test·limit"] F[24:func] O[78:;]
       7      18 | V[0x001c:? "test·x"] i[1] O[58:=] O[78:;]
       7      22 | L[65] O[5:goto] O[78:;]
       8      25 | V[0x001c:? "test·x"] i[1] O[50:==] O[28:!] L[34] O[57:is]
                   O[78:;]
       8      32 | L[41] O[5:goto] O[78:;]
       9      35 | O[77:…] s["one"] F[17:println] O[78:;]
      10      39 | L[62] O[5:goto] O[78:;]
      10      42 | V[0x001c:? "test·x"] i[2] O[50:==] O[28:!] L[51] O[57:is]
                   O[78:;]
      10      49 | L[58] O[5:goto] O[78:;]
      11      52 | O[77:…] s["two"] F[17:println] O[78:;]
      12      56 | L[62] O[5:goto] O[78:;]
      13      59 | O[77:…] s["many"] F[17:println] O[78:;]
      14      63 | V[0x001c:? "test·x"] O[29:++] O[78:;]
      14      66 | V[0x001c:? "test·x"] V[0x001b:? "test·limit"] O[47:<=]
                   L[74] O[57:is] O[78:;]
      14      72 | L[24] O[5:goto] O[78:;]
      16      75 | O[77:…] F[25:endfunc] O[78:;]
      18      78 |

  =============================== End Dump ===============================


The first column of numbers is the original source line number and the second
column is Mere’s instruction number. That’s not all. Line numbers are now
provided in traces, as produced by the original line 1 in the code:


       1    3          ; |   b[false]
       2    4          … |
       2    7       call |   O[77:…] L[0x0007 test] i[3]
       6   14          … |   i[3]
       6   16       func |   i[3] O[77:…] V[0x001b:? test·limit]
       6   17          ; |
       7   20         =i |   V[0x001c:? test·x] i[1]
       7   21          ; |   i[1]
       7   23      gotoL |   L[65]
      14   68       <=ii |   V[0x001c:i test·x] V[0x001b:i test·limit]
      14   70       isLb |   b[true] L[74]
      14   71          ; |
      14   73      gotoL |   L[24]
       8   27       ==ii |   V[0x001c:i test·x] i[1]
       8   28         !b |   b[true]
       8   30       isLb |   b[false] L[34]
       9   35          … |
       9   37    println |   O[77:…] s["one"]
  one
       9   38          ; |
      10   40      gotoL |   L[62]
      14   64        ++i |   V[0x001c:i test·x]
      14   65          ; |   i[2]
      14   68       <=ii |   V[0x001c:i test·x] V[0x001b:i test·limit]
      14   70       isLb |   b[true] L[74]
      14   71          ; |
      14   73      gotoL |   L[24]
       8   27       ==ii |   V[0x001c:i test·x] i[1]
       8   28         !b |   b[false]
       8   30       isLb |   b[true] L[34]
       8   31          ; |
       8   33      gotoL |   L[41]
      10   44       ==ii |   V[0x001c:i test·x] i[2]
      10   45         !b |   b[true]
      10   47       isLb |   b[false] L[51]
      11   52          … |
      11   54    println |   O[77:…] s["two"]
  two
      11   55          ; |
      12   57      gotoL |   L[62]
      14   64        ++i |   V[0x001c:i test·x]
      14   65          ; |   i[3]
      14   68       <=ii |   V[0x001c:i test·x] V[0x001b:i test·limit]
      14   70       isLb |   b[true] L[74]
      14   71          ; |
      14   73      gotoL |   L[24]
       8   27       ==ii |   V[0x001c:i test·x] i[1]
       8   28         !b |   b[false]
       8   30       isLb |   b[true] L[34]
       8   31          ; |
       8   33      gotoL |   L[41]
      10   44       ==ii |   V[0x001c:i test·x] i[2]
      10   45         !b |   b[false]
      10   47       isLb |   b[true] L[51]
      10   48          ; |
      10   50      gotoL |   L[58]
      13   59          … |
      13   61    println |   O[77:…] s["many"]
  many
      13   62          ; |
      14   64        ++i |   V[0x001c:i test·x]
      14   65          ; |   i[4]
      14   68       <=ii |   V[0x001c:i test·x] V[0x001b:i test·limit]
      14   70       isLb |   b[false] L[74]
      16   75          … |
      16   76    endfunc |   O[77:…]
       2    8          ; |
       3    9       dump |
       3   10          ; |
       4   12      gotoL |   L[77]


As in the dump, the first column of numbers is the original source line number
and the second column is Mere’s instruction number.

Now we can match Mere’s instruction number to an original source line, we have
the information needed to write some nice error handling and error messages.

I’ve also been working on updating Mere ICE to show line numbers next to the
code in the editor.

And yes, when you’ve been looking at all of this for hours on end it does
start to look like[1] something out of “The Matrix” :)


  é π             Ÿ       ѝ   ž   ј   Ŕ       ж           ω     Ï         ī
  Ū Œ       ï     Ǻ ţ     ι   η   Ş Œ Ä       Ѕ         Ĥ Ό    Ŧ     ŀ ř ё
    Ї       Ћ     Щ ъ     Õ       Ī Ţ љ æ     Ч         Ш Υ ю   π   Ћ ş Đ З
    т   ч   ī     Џ И   ς Ц   Á   є Ѓ Ѕ Ċ   ĸ ĸ   ẅ       Њ å   ń û Û ø ż Ѐ É
    Ò   Ģ   ǿ     ω Ч   Ô Р   Ẅ ý   ί њ β   Ŷ į   Ú       ē Ĵ   ŷ Ï ћ ŵ Ẁ É д
    Ι   У   ц ł   Ò Ï   љ Ā   Ŏ ij   Σ Б Ň   П     Ť   І   Ο á   ş ŀ Ŧ ς   Ŵ â
    σ   Α     š     Щ   Π Ö   Ă     Ų Ś     ċ     ά   Д   м   Ω   ї Ğ đ   Ė Ī
    Η   к     ј     χ   ς Ņ   Ỳ     Ж ѐ           Ŵ   К   ÿ   Љ ќ   ω û   м α
    η   ď           Џ   ĕ й ā Œ     а ā     Œ     ô   Č   Ó   ŧ Ϋ   ķ     Ŗ Ļ
    ı   ẅ           ѝ   Ε Ņ Ђ ђ   ú ϊ с   Ò ν     ī   Ы         Ĕ   ë     ъ Ц
        ј           È   Џ ј ř Ó   Ż ß ŭ   ς ij є       ũ   Ţ     Т   ũ     ī
                  х Į я Θ Υ Ѓ ñ   α ί     Þ φ С       ą   з     Ў   э   Ý
          č     Ж ō Ț ţ î ξ ł     Ů Н   ё ō š û       ш   Ѐ     й   Γ   Λ
    Ẃ     љ     ы ν Ź Ô č   Ț     È     ĵ   ф з   П   Ŗ   ϋ   п Ī   Ή   Ф   ē
    ч     Ч     ζ     İ Ÿ   ш           Ţ   Ŗ ĩ   й   О Ε ŏ   ѐ ŋ ʼn Ț   Ĩ   ў
    Ы     į     Ħ       ĺ   ü               ϊ ĥ   Α   ĸ Έ Õ   Щ   π Ġ   ѓ α Ш
  ī л     ý     Ö       ό   Ŭ   ſ ʼn         ỳ Ǻ   ŧ   ё ξ Ê   Ń   ŝ Ь   ò ť â
  з Ĥ     þ     Ù       Ε   ř   Ή σ         β Ъ   Ā   é ќ ε   æ   ē ё     È Ω
  ø ι     Í     Ħ     Ѕ ī   ș Σ ă Ε         Β К   Ş   φ Ð Ϊ   Ð   Ź Ǻ     ť Ú
    ï     υ ή   û     Ŷ   Ď Β     Х   ĥ     Ŋ õ ç Ħ   IJ Ÿ     š   Х ĵ   Ť   ō
    ů     ß α   Ǻ     Н   Η ň     Ā   ģ     Ω ζ к Ί   Н ú     í   Щ ø   Ђ Ļ ѐ
    Њ     п δ   Ǻ   з ĩ   ж       ċ   ħ Э     Ђ Ç ў   Ώ ή   χ ß     ύ É Ğ φ
    Ċ     Ǽ     ĩ   я П   й     ş Ŕ Ѓ   О     ώ Ý Î   г     ǽ ψ   Ќ Ǻ ή Ύ Η
                Ý   Є           Ų   ů   γ       Ş     Ú     Φ     Т   ά П

--
Diddymus

  [1] If you want your own “Matrix” generator, here’s my quick 5 minute hack:

    chars  = "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"
    chars += "ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľ"
    chars += "ĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽ"
    chars += "žſƒǺǻǼǽǾǿȘșȚțΆΈΉΊΌΎΏΐΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩΪΫάέήίΰαβγδεζηθικλ"
    chars += "μνξοπρςστυφχψωϊϋόύώЀЁЂЃЄЅІЇЈЉЊЋЌЍЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫ"
    chars += "ЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяѐёђѓєѕіїјљњћќѝўџҐґẀẁẂẃẄẅỲỳ"
    lc = len chars
    line = ""
    for x = 0; x < 39; x++
      if rnd 100 > 80
        line += chars[rnd lc -1] + " "
      else
        line += "  "
      fi
    next
    for x = 0; x < 24; x++
      range k; v; line
        is k % 2 == 1; continue
        if v != " "
          if rnd 100 > 10
            line[k] = chars[rnd lc -1]
          else
            line[k] = " "
          fi
        elif rnd 100 > 85
          line[k] = chars[rnd lc -1]
        fi
      next
      println line
    next



  Up to Main Index                            Up to Journal for August, 2023