NULL/VOID/TRASH evolution from NONE!/UNSET!

hostilefork · May 14, 2021, 2:44pm

Here is some history that explains how NULL and VOID and TRASH evolved from Rebol2's types.

Rebol2/R3-Alpha/Red Have Two Kinds of Nothing (both reified)

Historical Redbol gives you two main choices for "nothingness"...#[none] and #[unset]... both of which can be found either in variables, or as values in blocks:

rebol2>> block: reduce [none print "print returns unset"]
print returns unset
== [none unset]  ; misleadingly renders as WORD!s

rebol2>> type? first block
== none!

rebol2>> type? second block
== unset!

Using #[none] has the advantage of being "friendly" on access via word, allowing you to write things like:

rebol2>> var: none

rebol2>> either var [print "do something with var"] [print "do something else"]
do something else

But when var contained an #[unset], you'd get an error instead:

rebol2>> unset 'var

rebol2>> either var [print "do something with var"] [print "do something else"]
** Script Error: var has no value

So instead of using var directly, you had to do something more circuitous and pass the word "var" into a special test routine (morally equivalent to today's set? 'var)

Hence #[none] was reached for frequently out of convenience. Yet this convenience came with a cost: it was very easy to accidentally append one to a block, even if its non-valued intent should have conveyed you might not have wanted to add anything at all.

But it's hard to say: sometimes you did want to add #[none] to a block, to serve as a placeholder.

Also, being able to enumerate a block which contained #[unset] values was problematic, because if you did something like a FOR-EACH it would appear that the variable you were enumerating with was itself not set.

Early Ren-C Made Reified BLANK! and non-Valued NULL

One thing that bugged me was that there was no "pretty" representation for a non-valued state in a block... and that #[none] often thus displayed itself as the word none (seen in the example at the top of the post).

So the BLANK! datatype took the single underscore _.

>> second [a _]
== _

>> if blank? _ [print "yep, it's a blank"]
yep it's a blank

>> if not _ [print "blank is also falsey"]
blank is also falsey

And critically, one of the first things I tried to do was rethink the #[unset] state into something that you'd never find in a block, and called it NULL (as well as made it correspond to C/Javascript null in the API):

>> second [a _]
== _

>> third [a _]
; null

Since NULL couldn't be found in a block, it wasn't ambiguous when you got NULL back from a block operation as to whether there was a "null in that position".

But it's still just two things:

blank! - A nothing you can put in a block
- it was logically false
- it was friendly via word access (no need for GET-WORD!)
null - A nothing you couldn't put in a block
- it was also logically false
- it was unfriendly via word access (need GET-WORD! for :VAR, or SET? 'VAR)

This put you in a difficult situation for your choices of emptiness when you were dealing with something like:

append block value  ; what nothing state should you use for value?

If you wanted to avoid accidentally appending blanks to arrays, you kind of wanted NULL so you'd get an error. But once you used NULL, you could not write the convenient if value [...] control structure.

Later Ren-C added a separate "ornery" non-Value State

A third state was added to be neither logically true nor false, and that would trigger an error on accessing a variable with it. (I'll whitewash history a bit and say this state was always called "TRASH", and also always could not be put in blocks.)

This was the new state of unset variables:

>> unset $x

>> x
** Error: X is an unset variable

>> get:any $x
== \~\  ; trash!

>> if get:any $x [print "Ornery!"]
** Error: trash is neither logically true nor false

So NULL now represented a middle ground. It was something that was easy to test for being nothing (using IF) but that was impossible to accidentally put into a block.

This gave you three behaviors:

[1]  >> trash-value
     ** Error: TRASH-VALUE variable is unset

[2]  >> null-value
     ; null

     >> append [a b] null-value
     ** Error: APPEND does not allow adding NULL to blocks

[3]  >> blank-value
     == _

     >> append [a b] blank-value
     == [a b _]

WORD! Antiforms Brought Infinite Non-Valued Choices

Eventually the NULL state became an isotope of the WORD! null, so a ~null~ antiform.

It joined ~okay~ as an antiform you could test for truthiness and falseyness.

You'd use the null antiform as the initialization for something you may run some code and find it doesn't assign, and you want to be able to test that.

 directory: ~null~

 for-each [key val] config [
     if key = 'directory [
         if directory [
             fail ["Directory was already set by config:" directory]
         ]
         directory: val
     ]
 ]

VOID Provided a Clean "Opt-Out" Option

An unfortunate sacrifice that had been made in the design was that the "non-valued" status of NULL was chosen to raise attention to an error condition, rather than be an opportunity to opt-out of an APPEND:

>> append [a b] null-value
** Error: This error is the choice that we went with

>> append [a b] null-value
== [a b]  ; would have been another possibility, but too accident prone

Some "strange" things were tried...such as making it so that appending a BLANK! was a no-op, and if you wanted to append a literal blank you had to append a quoted blank:

 >> append [a b] _
 == [a b]  ; hmmm.

 >> quote _
 == '_

 >> append [a b] quote _
 == [a b _]  ; hmmm.

(It wasn't that strange considering appending a BLOCK! would append its contents, and a quoted block was being tried as the way of specifying /ONLY. This line of thinking ultimately led to the designs for the isotopes that solve things like splicing intent, so it wasn't all for naught!)

After invisibles were rethought as GHOST (antiform comma), another unstable antiform state of VOID came as another piece of the puzzle.

>> void
== \~[]~\  ; antiform (pack!) "void"

>> lift void
== ~[]~

I realized that void was the perfect choice for opting out of operations:

>> append [a b] void
== [a b]

>> append void [a b c]
== \~null~\  ; antiform

As you see above, an operation can return null when it doesn't have another good answer for giving back in case of a no-op. This gives good error locality, since the null won't trigger another opting out unless you explicitly convert the null to a void with OPT.

>> append (append void [a b c]) [d e f]
** Error: APPEND doesn't accept ~NULL~ antiform for the series argument

>> opt null
== \~[]~\  ; antiform (pack!) "void"

>> append (opt append void [a b c]) [d e f]
== \~null~\  ; antiform

Beyond this, underscore became just the character literal for SPACE...

This gives a (seemingly) complete picture

[1]  >> trash-value
     ** Error: TRASH-VALUE variable is unset

      >> append [a b] get:any $trash-value
      ** Error: APPEND does not allow adding ~ antiforms to blocks
      
[2]  >> void-func  ; must be a function, since variables can't store void
     == \~[]~\  ; antiform (pack!) "void"

     >> append [a b] void-func
     == [a b]

[3]  >> null-value
     == \~null~\  ; antiform

     >> append [a b] null-value
     ** Error: APPEND does not allow adding NULL to blocks

[3a] >> opt null-value
     == \~[]~\  ; antiform (pack!) "void"

     >> append [a b] opt null-value
     == [a b]

[4]  >> space
     == _

     >> append [a b] space
     == [a b _]

bradrn · March 9, 2024, 1:17am

I’ve read this through a couple of times, but am still not sure I understand it correctly. The discursive treatment in the [second] post is nice for motivating the various types, but confusing for actually getting a handle on how they work.

So, trying to summarise the situation, here’s my understanding at the moment of the various values involved:

‘Space’: a character literal _.
‘Trash’: the antiform of _. Throws an error on variable access, or any other attempt to use it. Used to represent unset variables.
‘Void’: the antiform of an empty block, i.e. a multi-return containing no elements. Throws on attempts to use it, except with APPEND, where it acts as a no-op. Not sure how this is used.
‘Null’: the antiform of null. Does not throw an error on variable access, and tests falsey in conditionals, but throws an error on other attempts to use it. Used to represent uninitialised variables.
‘Ghost’: the antiform of comma. Completely ignored by the evaluator.

Does this seem correct?

hostilefork · March 9, 2024, 2:14am

Note that GHOST is only ignored in "interstitial slots". If you try to call a function that isn't expecting a GHOST and pass it as an argument, that's an error:

>> append [a b c] 'd comment "ignored"
== [a b c d]

>> append [a b c] comment "not ignored" 'd
** Script Error: APPEND is missing its VALUE argument

At one point in time, the second worked. For an understanding of why it no longer does, see:

Making Invisible Functions (e.g. COMMENT, ELIDE)

Void is used generically in many places when you want things to vanish:

>> compose [<a> (if null [<b>] else [void]) <c>]
== [<a> <c>]

Allowing NULL to vanish here would be too liberal and not reveal what were likely errors. If you have something that may be NULL that you want to convert to a VOID if so, you can use OPT.

VOID is also is used for opting out of things, using the "void-in-null-out" strategy. Compare:

>> block: ["a" "b"]

>> unspaced block
== "ab"

>> to word! unspaced block
== ab

With:

>> block: []

>> unspaced block
== \~null~\  ; antiform

>> to word! unspaced block
** Script Error: to expects [~[]~ element?] for its value argument

>> opt unspaced block
== \~[]~\  ; antiform (pack!) "void"

>> to word! opt unspaced block
== \~null~\  ; antiform

With:

>> block: null

>> unspaced block
** Script Error: unspaced expects [~[]~ text! block! the-block! issue!]
                 for its line argument

>> unspaced opt block
== \~null~\  ; antiform

>> to word! opt unspaced opt block
== \~null~\  ; antiform

Historical Redbol had a lot of people asking that things give NONE! back when they took NONE! in, and this "none propagation" was messy in terms of leading to whole chains which would opt themselves out without knowing where the problem was. Void-in-null-out encourages being more purposeful--you only throw in the OPT where you need them.