Why Does _ Form As Space, Not Underscore?

...and why does # form as a newline?

The following code has predictable behavior for everything except _ and #:

>> for-each 'item [
       ? * + - = |     ; word!s - ordinary
       < > : / . %     ; word!s - special (limited in some contexts)
       _               ; rune! - "space" (?)
       #               ; rune! - "hash"/"octothorpe"/"pound"
   ][
       append string item
   ]

>> string         v-- NOT a hash mark, but a newline
== "?*+-=<>|:/.% ^/"
                ^-- NOT an underscore, but a space

Why is _ the RUNE! for "space" instead of the RUNE! for "underscore" (or "blank" if that's a better name)? And why is # the RUNE for "newline"?

If _ were blank/underscore it could render as an underscore in strings, making this less surprising:

>> append "abc" _
== "abc_"

And instead of writing:

>> parse "a_b_c" [#a #_ #b #_ #c]
== #c

You could write:

>> parse "a_b_c" [#a _ #b _ #c]
== #c

It's deterministic, and therefore entirely predictable. :slight_smile:

If you look just at this situation... it does seem like it might be a design flaw.

But...

The choice to bend the mechanics for SPACE and NEWLINE is directly tied to the prevalence of spaces and newlines, and the objective difficulty of representing them otherwise. On the other hand, underscores and hash marks are less commonly spoken about... and representing them with #_ and #"#" are worthwhile tradeoffs.

(For more information, see: "Space Runes: Reified Unreassignable Nothingness").

If You Want Source Representation, Use MOLD

Had you written append string mold item instead of just append string item it would have given you the desired result.

It would have also expanded the code so it worked with all legal single-character tokens (including the ones that won't implicitly convert to string):

>> for-each 'item [
       ? * + - = |     ; word!s - ordinary
       < > : / . %     ; word!s - special (limited in some contexts)
       #               ; rune! - "hash"/"octothorpe"/"pound"
       _               ; rune! - "space"
       $               ; tied! - tied "space" rune
       ^               ; metaform! - meta "space" rune
       @               ; pinned! - pinned "space" rune
       ~               ; quasiform! - quasi "space" rune (a "quasar")
       ,               ; comma!
   ][
       append string mold item
   ]

>> string
== "?*+-=<>|:/.%#_$^@~,"

(Do note the situation is still contrived. You can't expect to put an arbitrary single character into lists and have it work out here. Consider [ or " or ) etc. But I think it's impressive that it covers as much as it does!)

Fair enough. :+1:

But then it seems you would be locked out of producing a space character out of MOLD.

Is there any x which would give:

>> mold x
== " "

:red_question_mark:

No. The closest you can get is to mold an empty splice (a.k.a. "NONE") and produce an empty string:

>> mold _
== "_"

>> none
== \~[]~\  ; antiform (splice!) "none"

>> mold none
== ""

So we might ask... what would give (none => " "), but also (_ => "_") ? :thinking:

Not Hard To Write If You Want It

???: lambda [x [<opt-out> element? none?]] [
    if none? x [copy " "] else [mold x]
]

That gives you the ability to opt-out via void to get a null, and opt-in via none to get a space. (It returns the space as a TEXT! and not a RUNE!, to give consistent output with MOLD. And it copies it so you get a new string, also for mold parity.)

>> ??? 1
== "1"

>> ??? first [$]
== "$"

>> ??? none
== " "

>> ??? ()
== ~null~  ; anti

If it were to be put in the box I don't know what to call it.

Any Builtins That Could "Just do this"?

It can't be TO RUNE!, as that must be reversible:

>> to rune! _
== #_  ; illegal... (to (type of x) x) must be (copy x) 

By contrast, (make (type of x) x) has no particular rule saying that it has to give something equal to X back. But I'm pretty sure that passing INTEGER! to MAKE RUNE! should give runes with that codepoint:

>> make rune! 32
== _  ; not #32

>> {rune! 32}  ; cool shorthand with FENCE!/CONSTRUCT
== _

Hence (make rune! 0) wouldn't be #0, which is desired for this purpose.

I kind of can't think of anything that fits the idea of producing spaces from thin air out of holes. Maybe it will occur to me.

But I'm happy enough with the proposed solution for those who need it.

I thought it would be interesting to mention that except for one cases, the single-character tokens now come in QUASIFORM!

>> for-each 'item [
   ~?~  ~*~  ~+~  ~-~  ~=~  ~|~
   ~<~  ~>~  ~:~  ~/~  ~.~  ~%~
   ~#~
   ~$~  ~^~  ~@~
   ~,~
][
   assert [quasiform? item]
]

The exceptions are ~~~ and ~_~, which don't exist because ~ is already the quasiform of _.

(However ~~~ does exist as a token... it's just the quasiform of a three-space rune (quasi ___). So it's available for dialecting, but it's not a "quasiform of ~" because that would be a quasiform of a quasiform.)

But in the list above, ~#~ has a valid antiform (it's a labeled TRASH!...) as does ~,~ (its antiform is VOID!).

~$~, ~^~, and ~@~ are forbidden from having antiforms in the current thinking (no antiforms with Sigils, and these are sigilized SPACE runes). That's an empirical rule based on gut feeling, so it could be changed.

The others do not have valid antiforms at this time, but could--they would be KEYWORD!. If anyone thinks of particularly creative uses for these keyword antiforms, let me know. (We could use any of these for "null" and "okay", but I think the words work better.)

In any case... more parts for dialecting, related to single-character intent!

I'll point out that # has not been the pound sign in historical Rebol...

Rebol2 considers it an "empty issue":

rebol2>> type? #
== issue!

rebol2>> form #
== ""

(If we need that, I don't have a problem with it being #"", though so far I've said there's no such thing and ASCII "NUL" is simply the BINARY! #{00})

Red doesn't define #:

red>> type? #
*** Syntax Error: (line 1) invalid issue at #

R3-Alpha used it as a representation of NONE!, but didn't commit to it as canon (e.g. MOLD NONE is still "none")

r3-alpha>> type? #
== none!

r3-alpha>> form #
== "none"

r3-alpha>> mold none
== "none"

Having # represent newline is functionally very useful, considering it has historically been #"^/" which is quite awkward.

Regarding usage of #, I was just looking at some of @gchiu's code:

let data: unspaced [
    surname "," firstnames space "(" opt title ")" space "DOB:" space dob space "NHI:" space nhi newline
    street newline town newline city newline newline
    "phone:" space opt phone newline
    "mobile:" space opt mobile newline
    "email:" space opt email
]

And I thought "hey, underscore is finally committed as the space rune, might as well use that":

let data: unspaced [
    surname "," firstnames _ "(" opt title ")" _ "DOB:" _ dob _ "NHI:" _ nhi newline
    street newline town newline city newline newline
    "phone:" _ opt phone newline
    "mobile:" _ opt mobile newline
    "email:" _ opt email
]

And the # helps as well:

let data: unspaced [
    surname "," firstnames _ "(" opt title ")" _ "DOB:" _ dob _ "NHI:" _ nhi #
    street # town # city ##
    "phone:" _ opt phone #
    "mobile:" _ opt mobile #
    "email:" _ opt email
]

Note you can even double it up if you like (## instead of # #) into a single token of two newlines.

_ seems fairly close to the "absence of anything in this spot", while # seems a bit like its nemesis... the "everything is filled in character".

If you were looking for a complement to "I'm actually invisible, and lightly separate" to be "I'm actually invisible, but heavily separate" you can't do too much better than this. They're tied together, in that "the two single character RUNE!s are invisible".

Of Course, Many Cases Want Interpolation...

Graham's code could be tackled with COMPOSE now...modulo some issues with leading whitespace that multiline strings should probably not include by default:

let data: compose2 '{} --[
    {surname}, {firstnames} ({? title}) DOB: {dob} NHI: {nhi}
    {street}
    {town}
    {city}

    phone: {? phone}
    mobile: {? mobile}
    email: {? email}
]--

There's still some questions about how to get an optional substitution to be able to opt out of its surrounding boilerplate. For instance: how might ({? title}) ask to not only opt out of the contents of the parentheses, but opt out of the parentheses themselves... is there an easy way to do that?

But anyway, not all cases will be a fit for turning into interpolation, so I think the # as newline may be a usage that people would come to appreciate.

4 Likes

The more I've thought about this, the more I wonder if empty splice (none) should mold as space.

Molding a splice of multiple items gets you spaces between them:

>> mold spread [a b]
== "a b"

So... why not:

>> mold spread []
== " "

Then VOID could give you an empty string (and VETO, as is nearly universal, could give you null)

>> mold ()
== ""

>> mold veto
== \~null~\  ; antiform (logic!)

The source representation of an empty splice could be thought of as any whitespace, but a single space might have the most practical value, in terms of giving you a way to make more potential outputs with mold...since emptiness can be communicated other ways than empty strings.

>> mold none
== " "

Though it gets weird. What if you spread something which had a newline flag, and mold that?

>> block: []
== []

>> new-line block 'yes
== [
]

>> mold block
== "[^/]"

>> spread block
 == \~[
 ]~\  ; antiform (splice!)

>> mold spread block
== "^/"  ; <-- does this make sense?

Asking these questions as if they are serious questions--vs. a descent into a kind of self-fueled madness--does often strike me as weird. But hey...Rebol has only ever been tangentially related to engineering. It's art, people.

:man_artist: :artist_palette: