Up to Main Index Up to Journal for April, 2019 JOURNAL FOR WEDNESDAY 10TH APRIL, 2019 ______________________________________________________________________________ SUBJECT: Idly pondering what if… DATE: Wed 10 Apr 15:47:45 BST 2019 Yesterday I saw a commit on the golang-checkins mailing list that made me pause and ponder for a while. It concerned a test used in an ‘if’ statement. The ‘if’ statement is one that programmers use regularly when coding. It is used to control the conditional execution of code. A simple example is: if x > 0 { foo() } In this example if x is greater than 0 then we call foo. There are other forms such as ‘if…else’ and ‘if…else if…else’. However, what I want to focus on is the testing expression. The commit that made me pause and ponder was titled: strings: use Go style character range comparison in ToUpper/ToLower This was the original code: if c >= 'a' && c <= 'z' { c -= 'a' - 'A' } Which was changed to: if 'a' <= c && c <= 'z' { c -= 'a' - 'A' } The first way of writing the ‘if’ statement is how I would normally write it. It’s how you would say it: if c is greater than or equal to 'a' and c is less than or equal to 'z' do something. This seems very natural and is easy to comprehend. The second form needs some additional thinking before you can be sure it is correct. However, I can see the merit of having the “c” variable in the middle and the comparison values 'a' and 'z' on the outside, visually it’s saying if “c” is between 'a' and 'z'. What would the if statement be to test for “c” being outside the range 'a' to 'z'? Normally I would write: if c < 'a' || c > 'z' { } However to get the visual style we would write: if c < 'a' || 'z' < c { } Here we have “c” on the outside and the comparison values in the middle. Visually we are saying “c” outside of the range 'a' to 'z'. This one is harder to comprehend. What is “ 'z' < c ” actually testing for, what are the limits being tested here? I did some digging in the Go source tree and found a lot of tests using the form ‘x <= lo && hi <= x’ and only a few of the form ‘x < lo || hi < x’. I did some searching around and found conventions for comments, indenting, line lengths and naming — but nothing on ‘if’ statements. So I wonder where this came from? The only vague reference I found was on Wikipedia[1]: Early toolsmiths writing in C under Unix began developing idioms at a rapid rate to classify characters into different types. For example, in the ASCII character set, the following test identifies a letter: if (('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z')) However, no citation had been provided for this information. So based on the above I went looking for some old Unix source code. I downloaded the source for Unix 6th edition and started looking for myself. I found both of the ‘x >= lo && x <= hi’ and ‘lo <= x && x <= hi’ styles in the Unix source code. I also found usage of the ‘x < lo || x > hi’ style but not the ‘x < lo || hi < x’ style. In addition to the ‘if’ statement the styles were also used for ‘while’ loops and ‘switch’ statements. So it looks like this style of writing an ‘if’ statement, when testing for a range, has a very long, old history associated with it. I’m still not sure if I like the ‘lo <= x && x <= hi’ and ‘x < lo || hi < x’ styles though :P -- Diddymus [1] Wikipedia, C character classification: https://en.wikipedia.org/wiki/C_character_classification Up to Main Index Up to Journal for April, 2019