Reified Unreassignable Nothingness: SPACE RUNES


First, Let's Introduce RUNE!s #ᚠ #ᚡ #ᚢ #ᚣ #ᚤ

Runes are used for character literals. They are a fusion of historical Rebol's issue! and char! types.

The name has precedent:

  • Rust - "The rune type represents a user-perceived character. It roughly corresponds to a Unicode grapheme cluster but with some nice properties."

  • Go - "Go 1 introduces a new basic type, rune, to represent individual Unicode code points. It is an alias for int32, analogous to byte as an alias for uint8."

MOST (but crucially, not ALL) Ren-C runes start with # in their LOAD-able representation:

>> second "abc"
== #a

>> type of #a
== ~{rune!}~  ; antiform (datatype)

>> append "abc" #a
== "abca"

The HASH (or OCTOTHORPE/POUND) rune is a special case, where it represents itself (as opposed to being expressed by ##)

>> second "a#b"
== #

>> to text! #
== "#"

While runes like #a don't need to be wrapped in quotes, others need to be:

>> third "[x]"
== #"]"

>> third [a b #]
== #  ; as opposed to #]

Note that if we allowed things like #] to be the rune for a bracket, that would imply:

>> third [a b #]]  ; this would be pretty confusing!
== #]

... and it would lead to serious limitations on being able to allow things like #(...) or #"..." or #[...] as special forms. Discussion here.

As with historical Rebol issue!, the RUNE! type can hold more than one codepoint:

>> type of #abcd
== ~{rune!}~  ; antiform (datatype)

>> length of #abcd
== 4

When a function like PRINT is performing delimiting (e.g. injecting spacing automatically between tokens), runes aren't considered to need delimiting. So:

>> print ["abc" "def" #ghij "klm no"]
abc defghijklm no

That might seem a little arbitrary for multi-codepoint runes. But it's of high importance with single codepoint runes like newline, because you wouldn't want newline to introduce a space on the previous and next lines:

>> print ["line" "one" newline "line" "two"]
line one  ; <-- you wouldn't want a space at the end of this line
line two  ; <-- you wouldn't want a space at the beginning of this line

(Historical Redbol has bugs pertaining to this which you might not notice in printed output. There are stray spaces at the ends of lines that should not be there, that you only notice when writing files and examining the bytes.)


And now: the small design choice with big impact...


Plain _ Is The Rune For ASCII SPACE

>> second "a b"
== _

>> type of _
== ~{rune!}~  ; antiform (datatype)

>> to text! _
== " "

>> mold _
== "_"

If you have a Rune of just spaces, its representation will be that number of underscores:

>> tab4: ____

>> print [tab4 "Runes" "don't count" "in delimiting"]
    Runes don't count in delimiting
    ^-- this is spaced in by 4, not 5, because of the rune! delimit rule

But if a rune contains any non-space characters, its representation will be escaped (currently in quotes, though this may not be final)

>> to rune! "abc def ghi"
== #"abc def ghi"

This means underscores are generally ordinary WORD! characters, so long as a token does not solely consist of underscores.

A Powerful Exception With Broad Application

Let's briefly discuss history.

In the earliest days of Ren-C, the _ character was chosen as the literal representation of what one might think of as the Redbol "NONE!".

  1. _ was reified

    • it could be put in a block!

      • this contrasted with things like the state of an unset variable (which Ren-C vehemently insisted was "the sort of thing that could not be put in a block!")
  2. _ was unreassignable

    • if you had a dialected purpose with a slot that might hold a variable, holding a _ was clearly differentiated

    • _ wasn't a WORD!, so it could truly "opt out" of things that a variable name might opt-in to. e.g. with for-each _ [a b c] [...] you could fully skip out on naming a loop variable.

  3. _ was nothingness

    • ... or at least, as close to nothing as something could be, while still being able to be put in a block!

    • for-each 'x _ [...code...] was equivalent to for-each 'x [] [...code...]

    • As advanced features came online, things like [_ var]: some-multi-returner .... could realize that a slot was not meant to be assigned when unpacking multiple return values.

  4. _ was "falsey"

    • This was a by-product of the heritage of being the new analogue to NONE!.

    • The rise of generalized isotopes introduced ~null~ antiforms as a non-reified "falsey" state, raising some tensions with the reified _ and #[false] states...

[1] + [2] Were Kept, [3] "Evolved", [4] Was Dropped

In the age of isotopes :atom_symbol:, all reified values came to behave "as-is". Only antiforms had exceptions in behavior:

 >> append [a b c] [d e]
 == [a b c [d e]]

 >> spread [d e]
 == ~(d e)~  ; antiform (splice)

 >> append [a b c] spread [d e]
 == [a b c d e]

 >> append [a b c] _
 == [a b c _]

To be both "reified" and "nothing", _ had to serve two masters. When it came to mechanical contexts like APPEND or FIND, the reified-ness had to win out.

TBD: Expand...

I'm falling asleep :sleeping_face: will write more, but until I do, see:

Why Does _ Form As Space, Not Underscore?

1 Like