Up to Main Index                            Up to Journal for August, 2023

                    JOURNAL FOR SATURDAY 26TH AUGUST, 2023
______________________________________________________________________________

SUBJECT: Too magical for some?
   DATE: Sat 26 Aug 22:16:41 BST 2023

After the last journal entry there were some comments on the “regexp string”
feature. I originally said:

  A “regexp string” is only found after the regexp keyword, the string must
  be quoted with backticks, the string is only interpreted once — at compile
  time.

People felt it was a little bit too magical. In the sense that a normal string
used in a particular way had special meaning, and it could catch people out.

I was also told I was trying to be too clever — what if regexp was followed by
a concatenation such as `string` + `string`? Was it still a special case?

The “regexp string” example I gave was:


    re = regexp `
      ^            # anchor at beginning
      0x           # hexadecimal prefix
      [a-fA-F0-9]  # a hexadecimal digit
      +            # one or more times
      $            # anchor at end
    `
    println literal re


When the code ran it would produce:


    regexp(`^0x[a-fA-F0-9]+$`)


What to do? I still wanted the feature. I’ve had a play with various ideas
trying to come up with something that was simple, explicit and non-magical.

My solution is to add another regular expression specific operator ~x for
extended strings. The ~x operator takes a string, removes embedded comments
and white-space, returns the resulting string.

Let me demonstrate using the above example:


    re = regexp ~x `
      ^            # anchor at beginning
      0x           # hexadecimal prefix
      [a-fA-F0-9]  # a hexadecimal digit
      +            # one or more times
      $            # anchor at end
    `
    println literal re


The ~x operator works on a string and returns a string. Any string operations
work as expected — no magic.

As regexp and strings are closely related, I did have more magic that let you
use them interchangeably. For example, the comparison operators let you compare
regexp and strings. For now I’ve dropped those changes as there are no other
instances in the language where types behave like that. If you want to compare
a regexp and a string you can use a string conversion:


  re = regexp `[0-9]`
  println string re == "[0-9]"


I guess in the long run this makes sense, a regexp should never be equal to a
string as they are different types.

After all the tweaking and poking can you still do crazy shit?


    > cat crazy.mr
    text = `"abc", "def,ghi", "jkl"`
    range ; v; text ~s (~x `
        ^"|"$  # starts/ends with literal "
      `) "" ~c ~x `
        "      # literal "
        \s*    # optional white-space
        ,      # literal ,
        \s*    # optional white-space
        "      # literal "
      `
      println v
    next
    >mere crazy.mr
    abc
    def,ghi
    jkl
    >


Guess that depends on your definition of crazy shit…

--
Diddymus


  Up to Main Index                            Up to Journal for August, 2023