Impedance Matching LIFT The Universe With Baseline

hostilefork · June 7, 2025, 11:00pm

_{(Sorry for the EE term. If you're unfamiliar: Impedance Matching)}

A while back I realized that it's best if OBJECT!s, MODULE!s, LET!s, etc. store their contents in lifted representation.

...or rather... they store "historically normal" values in lifted representation (QUASIFORM! and QUOTED!). The unlifted band would then be used for special signals.

One Example: Unlifted TRASH! would denote true unsetness...

And the term actually now fits. e.g. a state of absence of value, beneath the layer of what you could accomplish with SET.

With SET, you can only get "trashed" values:

>> x: ~
== \~\ antiform (trash!) "tripwire"

>> trashed? $x
== \~null~\  ; antiform  <-- it's trashed, all right...

>> unset? $x
== \~null~\  ; antiform   <-- but it's SET to TRASH!, it's NOT "unset"!

However, special tools and special cases would go beneath SET.

>> tweak $x ~   ; tweak doesn't do the implicit LIFT that SET does...

>> unset? $x
== \~okay~\  ; antiform

There will be shorthands for that like (unset $x). But also, you get this "unset" state as the default states in MAKE FRAME!:

>> f: make frame! negate/

>> unset? $f.number
== \~okay~\  ; antiform  <-- actually unspecialized, not specialized to trash!

This brings the long hoped-for distinction between unspecialized values, and values that are purposefully trash! And what was fretted over as being a "hidden bit" is anything but... it's just one "out-of-band" operator away.

e.g. Note that if you use TWEAK with a lifted value, that's just like SET:

>> tweak $x lift ~  ; synonym for (set $x ~)

>> unset? $x
== \~null~\  ; antiform

>> trashed? $x
== \~okay~\  ; antiform

That's Just One Example...

It's the gateway to SETTER and GETTER functions (actually desired by Carl... or setters at least... but he didn't know how to do 'em)

And for stylized setters/getters that do specific things (like type checking) there can be specially understood representations in the unlifted band.

Vocabulary term: I call the multiplexing of lifted and unlifted values together "DUAL REPRESENTATION"

All Of It Is Powered By Common GET and SET Code

This wouldn't work if random places in the code ran off and inspected fields of OBJECT!s literally.

You have to go through some common path. Otherwise you wind up with some callsites honoring the generalized conventions and others not.

It's kind of like how random code in R3-Alpha would ignore the PROTECT status of variables. Ren-C has fought hard long and hard to nail that kind of thing down, and make sure at compile-time that you can be certain the checks aren't being skipped.

Piping everything through common GET and SET code paths ensures that as features like type checks or accessors are added, you don't have rogue code that doesn't honor the convention.

It's been challenging to do this--and right now it's messy and slow--but the commonality means it's worth it to invest in optimizations for that one true path. And Ren-C has plenty of optimization tools at its disposal, which have been evolving over the years... _{for when the timing is right...}

But What About Normal, Boring, Context-Building Code?

Here's an example, just the one on my screen right now.

It's some random code out of the POSIX CALL implementation, related to... forking processes or something:

if (Bool_ARG(INFO)) {
    VarList* info = Alloc_Varlist(TYPE_OBJECT, 2);

    Init_Integer(Append_Context(info, CANON(ID)), forked_pid);
    if (Bool_ARG(WAIT))
        Init_Integer(Append_Context(info, CANON(EXIT_CODE)), exit_code);

    return Init_Object(OUT, info);
}

Dumb, simple code making an OBJECT! with 2 fields in it, appending those fields (which default to an erased state you have to fill in to be correct code), and then setting the erased cells to mundane values.

If I were to go lockstep through code that looked like this and change it for lifting to appease the common GET and SET code, it would start looking like:

if (Bool_ARG(INFO)) {
    VarList* info = Alloc_Varlist(TYPE_OBJECT, 2);

    Liftify(  // <-- new wart
        Init_Integer(Append_Context(info, CANON(ID)), forked_pid)
    );
    if (Bool_ARG(WAIT))
        Liftify(  // <-- new wart
            Init_Integer(Append_Context(info, CANON(EXIT_CODE)), exit_code)
        );

    return Init_Object(OUT, info);
}

Liftify adds 2 to the LIFT_BYTE. (Review LIFT_BYTE if you want an introduction to that.)

Liftify also has to check for overflow (e.g. that you're not going past 255 for the LIFT_BYTE value). Maybe the optimizer can figure out it doesn't need that check here? Though I try not to rely on the optimizer too much...

This Parallels The "Too Many `^`" Of Usermode Code

My observation in LIFT the UNIVERSE was that usermode code was becoming contaminated with lifts in places that weren't really the concern of that code. (That's why the robots are celebrating, they're throwing carets in the trashcan...)

Here we're seeing the C code having some of the same problem as having the carets, manifest as calls to Liftify(). It's getting uglier, and spreading that ugliness around.

Should A `CELL_FLAG_DUAL` Be Sacrificed For This?

I don't like wasting the very few CELL_FLAG_XXX. But over time, silly ones have been freed up to give us some wiggle room (e.g. the now-completely superfluous CELL_FLAG_FALSEY).

And maybe this is a really good case where it could be of help to sacrifice one. Since all the GET and SET that's not this kind of stuff is running through centralized code...it could be tolerant of cells in contexts that weren't initialized with CELL_FLAG_DUAL, and just know that those are to be taken literally.

It complicates things a little bit in that one "big, beautiful code path". But as a caller of TWEAK or GET and SET you're insulated from the complication. It's a black box... maybe the cell has CELL_FLAG_DUAL and maybe it doesn't, you'll never know.

Just Have To Catch Confusions Before They Happen...

Probably best is just to throw in some asserts if you somehow start running through code paths that don't use the common GET somehow, and make sure Type_Of() and Quotes_Of() etc will assert on anything that has CELL_FLAG_DUAL.

I'm not sure how many legitimate codepaths there will be that duck the legitimate GET, but there are some reasonable cases (such as the code I give above) that are just doing a simple construction and probably don't need to be more complicated than they already are.

Will It Slow Things Down?

I posted this under Optimization because it's trading off some runtime code to make the C code more tolerable.

But rest assured, this is not the flag test that will be the bottleneck of the system.

(Compared to naive Liftify() everywhere, it probably breaks at least even for not having to do the overflow checking of the LIFT_BYTE.)

hostilefork · June 8, 2025, 6:35pm

Really, this flag is only applicable to cells in OBJECT!, MODULE!, LET!, FRAME!, ERROR!, etc.

And it's not "sticky" (e.g. not part of CELL_MASK_PERSISTENT) so it won't be copied when the Cell is copied.

So it's actually not that "wasteful"... the flag can be used for other things for other slots (e.g. it can be the same flag that Level output cells use to track when actions/ghosts are UNSURPRISING...)

Thus it's CELL_FLAG_SLOT_HINT_DUAL, the same bit as CELL_FLAG_OUT_HINT_UNSURPRISING... and besides Level output cells and context Slots it's open for other uses (on elements of lists, etc.).

(Although, this might be contentious if we expect functions to be able to operate on and react to UNSURPRISING-ness of their arguments. Today, the flag is specifically internal. But internal things have a tendency to become exposed to usermode. It may be that we could actually implement the unsurprising bit via unlifted values without the dual representation flag, if that's the case...)

hostilefork · June 9, 2025, 5:48pm

Not Carrying CELL_FLAG_DUAL Is Actually Critical

I realized one real performance issue things bump up against is the impact on native functions.

Natives rely on the idea that their arguments in the FRAME! are not abstracted, so they don't worry about SETTERS or GETTERS etc.

They use a macro called ARG() to get at their arguments, and (currently) expect that argument to be in non-DUAL format. If it's a non-^META argument, they expect it to be unlifted. And if it's a ^META argument, they expect it to be lifted.

This does seem to suggest that the historical "last mile" unlifting is still going to be required for non-^META arguments for NATIVES, BUT with the twist that it doesn't set CELL_FLAG_SLOT_HINT_DUAL.

That is to say, it's an implementation detail. The slot is still conceptually "lifted", but for the convenience of the natives it has a flag missing... the flag that instructs GET that it's not actually an unlifted dual state, and SET that it can completely overwrite it without worrying that it's a SETTER-like thing.

I feel like this should not be done for non-native functions, because while it might seem like the usermode code wouldn't know the difference you would wind up losing the typecheck state.

^META Arguments To Natives Are Thus Weird

While stable states can squeak by unlifted, unstable antiform states must be meta-represented (if they're going to be in a context with variables, like FRAME!s are).

To continue with the theme of "do what it's always done", that means natives are "special" and just understand how to deal with ^META arguments that are lifted but don't have SLOT_HINT_DUAL set.

You might say: "but wouldn't that trip up debuggers and other code trying to look at a native's FRAME!..."

...yes, but... it's really none of any outside inspector's business what a native is doing with its frame. You can't poke arbitrary bit patterns into a native's Cell (it makes many assumptions and unlike usermode code you can cause crashes by putting unexpected bit patterns in a native's frame). For this reason, natives always mark their frames immutable by usermode code... it's read only.

I guess the long and short story of this is that natives will receive their non-^META args unlifted and without SLOT_HINT_DUAL, and their ^META args lifted without SLOT_HINT_DUAL, which is essentially exactly the same as today.

hostilefork · June 9, 2025, 6:03pm

This assumption creates some limits...

For instance, imagine you do this:

ap10: specialize append/ [value: getter [print "GETTING!" 5 + 5]]

If APPEND is native code and doesn't run through common behaviors for GET and SET, but works on the Cell directly, this can't work.

I wouldn't want a limitation like "no getters or setters as frame fields"... that's lame.

So I don't see that big of a problem if the code which does the "flattening" of lifted forms with CELL_FLAG_DUAL (into an unlifted form without the flag) just errors if it gets down to the point of native execution. So if you do something like this in usermode frames or higher-level wrappers, you have to undo it before you get to native code execution.

Making native code honor setters/getters on their own adapted frames would complicate things significantly for a fringe feature that would kind of unmoor the debugging situation completely.

Wait Just Do One GET of The Accessor

...hold up.

Given that you're handing over this FRAME! to the native to do whatever it will do, and the fields "belong to it" after the handoff... why not just turn any accessor into its last GET?

You'd have no way of knowing if it read all the fields, made a copy, and operated on the copy. So it could just act like that.

I think this isn't really a problem after all. You know you lose control once the native runs. So, it's not beholden to any contracts on any of the fields. While it might reuse them mechanically for space reasons, that's not your business.