Up to Main Index Up to Journal for August, 2023 JOURNAL FOR SUNDAY 20TH AUGUST, 2023 ______________________________________________________________________________ SUBJECT: Love it or hate it… DATE: Sun 20 Aug 20:46:00 BST 2023 Since the release of Mere v0.0.6 I’ve been busy tidying up some of the Mere code. I’ve also been working on a new feature which people are either going to love or hate… I’ve used quite a number of programming languages on many different platforms over the years. One of the features I have always loved is Perl’s built in use of regular expressions. I knew I wanted something similar for Mere. It’s a feature I’ve alluded to before and why I reserved the tilde ‘~’ character ages ago. Since this afternoon I have a working implementation: >cat regexp.mr /* A mere RegExp example… ~~ is a regexp match (like compare '==') ~ is a regexp replacement ~= is a regexp replace and assign (like add and assign '+=') */ text = "The quick brown foxy..." println "Text is: " text print "Text contains f..y?: " println text ~~ "f..y" print "Replace '...' with '…' : " println text ~ "\\.\\.\\." "…" print "Replace and assign 'foxy' with 'wolf': " println text ~= "foxy" "wolf" print "Swap 'quick' and 'brown': " println text ~= "(.*)(quick)(.*)(brown)(.*)" "$1$4$3$2$5" println "Final text: " text >mere regexp.mr Text is: The quick brown foxy... Text contains f..y?: true Replace '...' with '…' : The quick brown foxy… Replace and assign 'foxy' with 'wolf': The quick brown wolf... Swap 'quick' and 'brown': The brown quick wolf... Final text: The brown quick wolf... > There is quite a lot going on here. First we have ‘~~’ which is a regular expression match. Think of it like the comparison operator ‘==’, but for regular expressions: For example a ~~ "the" // does 'a' contain the letters "the" anywhere a ~~ "\\bthe\\b" // does 'a' contain the word "the" Next we have replacement by regular expression ‘~’. This returns, as a string, the result of replacing a regular expression match with a string. For example: a ~ "\\bteh\\b" "the" // returns the result of replacing all occurrences // of the word "teh" in 'a' with "the", no // assignment, just return the resulting string Last of all we have ‘~=’ to perform a regular expression replacement and assign the result back to a variable. For example: a ~= "\\bteh\\b" "the" // replace the word "teh" with "the" in 'a' and // assign resulting string back to 'a' I’m also thinking I should have a way to identify regular expression strings. In Perl you would use ‘/’ as delimiters[1]. The only spare character I have is ‘@’ which would make code look ugly. I might be able to dual use ‘\’ instead? $foo =~ m/abc/ // Perl: does $foo contain "abc"? foo ~~ "abc" // Mere currently same thing foo ~~ \abc\ // Possible change… Why would I want to do that? If you know which strings are regular expressions you can pre-compile them. This improves performance as they can be reused. For example, assume I have a list of UK telephone numbers I want to (very badly) validate. I could write something like: >cat validate.mr range ; v; []string( "+44(0)20 7946 1234", "+44 20 7946 1234", "020 7946 1234", "020-7946-1234", "02079461234", "tel: 02079461234", "02079461234, ext 2", ) clean = v ~ "\\(0\\)|[ ()-]" "" printf "%5t - %s (%s)\n", clean ~~ "^(\\+44|0)\\d{10}$", v, clean next >mere validate.mr true - +44(0)20 7946 1234 (+442079461234) true - +44 20 7946 1234 (+442079461234) true - 020 7946 1234 (02079461234) true - 020-7946-1234 (02079461234) true - 02079461234 (02079461234) false - tel: 02079461234 (tel:02079461234) false - 02079461234, ext 2 (02079461234,ext2) > However, there are two regular expressions in the loop: “\\(0\\)|[ ()-]” and “^(\\+44|0)\\d{10}$”. The current implementation compiles a regular expression on every iteration. Knowing a string is a regular expression also lets you handle the string specially in other ways. For example to allow the regular expression to be annotated: validatePhone = \ ^ // match start of string (\\+44|0) // begins with +44 or 0 \\d{10} // followed by 10 digits $ // match end of string \ This is very important for regular expressions, complex ones can tend to look a lot like line noise :| I still need to experiment and make sure the addition of regular expressions makes sense and fits with the rest of the language. However, I’m quite excited for this feature :) Love it? Hate it? Let me know your thoughts: diddymus@wolfmud.org -- Diddymus [1] Other delimiters are available, but ‘/’ is commonly used. Up to Main Index Up to Journal for August, 2023