LIFT_BYTE (Bedrock, Antiform, Quasi, Quoted...)

The "game" of Rebol is played with cells that are the size of four platform pointers. So on a 32-bit platform a cell is 16 bytes in size, and on a 64-bit platform they are 32 bytes in size.

I've illustrated Ren-C's spin on this "game" previously:

The bits and bytes in the header are arranged in a platform-independent way. Regardless of the endianness of the machine, the bits in the header will be in the same order. The first byte is chosen with a pattern that specifically will never occur as a leading byte in a UTF-8 sequence...allowing an arbitrary pointer to be discerned as pointing to a cell or to the beginning of a UTF-8 string.

The "payload" is specifically aligned to a 64-bit boundary on both 32-bit and 64-bit platforms. This is important if it contains something like a double precision floating point number. It is also a union, which means that if it has constituent fields, they must be read from exactly the same union definition which was used to assign them. The "extra" is separate, meaning it is decoupled from the payload and can be assigned and read on its own terms (e.g. BLOCK! and WORD! could have a "binding" in extra that is read and written in common, without invalidating their payloads).

The HEART_BYTE encodes what we would think of as the underlying datatype, and cues the interpretation of the contents of the cell. For instance the byte corresponding to a BLOCK! tells us that the payload consists of a pointer to an array of more cells as well as an index into the block.

Enter the QUOTE_BYTE

Bits in the header are scarce. And at one time, quoting was implemented by only two bits... for quoting levels of 0, 1, 2, or 3. Higher quoting levels were achieved by changing the HEART_BYTE to indicate QUOTED!, and then the payload was changed to point to a single-element array that held the quoted cell, and an integer of the quoting level up beyond millions. It was tricky to do, but it worked.

Eventually, the complex mechanics behind flipping to a different payload for higher levels of quoting was scrapped, and an entire byte in the header was sacrificed for the quote level. This QUOTE_BYTE permitted from 0-255 levels of quoting, and I decided that was more than enough.

When isotopes were originally introduced, there was a flag taken to say something was an antiform. However, I realized that something should not be quoted and be an antiform at the same time. Hence the antiform state could be thought of as a special value of the quote byte.

Initially I chose 255 for antiforms, leaving 0-254 as the ordinary quoting levels. But the theory of isotopes evolved to where not only were there antiforms, but there needed to be a form of quoting that would produce antiforms under evaluation...so-called quasiforms. And it began to make sense to think of the antiform state as being obviously "less" than other quoting levels, so it became 0.

But everything got bumped up by 1, when an even "lower" state was motivated. While antiforms are known for not being able to appear in lists, they can still be products of evaluation. But special states that can't be evaluative products were introduced, that could represent deeper properties of where a variable could be located...or if it couldn't be located at all (e.g. an alias, or a setter/getter function).

https://rebol.metaeducation.com/t/the-rules-of-duals/2640

QUOTE_BYTE becomes LIFT_BYTE: New Interpretation

What it worked out to was:

  • Lift byte of 0 is the "bedrock state": #define BEDROCK_0 0

  • Lift byte of 1 is an "antiform": #define ANTIFORM_1 1

  • Lift byte of 2 is plain not-quoted: #define NOQUOTE_2 2

  • Lift byte of 3 is a quasiform: #define QUASI_3 3

  • Lift byte of 4 is single quoted plain form: #define ONEQUOTE_NONQUASI_4 4

A lift byte of 5 is a single-quoted quasiform. e.g. there's no such thing as a quasi-quoted, just a quoted-quasi: '~foo~ is legal but ~'foo~ is not.

So the interpretation of the LIFT_BYTE proceeds like that.

  • Lift byte of 6 is a double quoted plain form

  • Lift byte of 7 is a double quoted quasiform

  • Lift byte of 8 is a triple quoted plain form

  • Lift byte of 9 is a triple quoted quasiform

etc.

Evaluator drops one level of quoting, with the base case that quasiforms produce antiform and the normal form does whatever its evaluator rule is (WORD! looks up, etc.).

The QUOTE operator won't work on antiforms and the UNQUOTE operator won't work on quasiforms. Instead you have to use the LIFT and UNLIFT operations, which handle those exceptions but just act like QUOTE and UNQUOTE otherwise.

Bedrock states are manipulated with the TWEAK function (as they are beyond the reach of SET and GET). But if you TWEAK with a lifted value, it will behave the same as a SET of the unlifted value.

2 Likes

As things have "evolved", the HEART_BYTE is now multiplexed with the Cell's "SIGIL", chewing out two bits:

4 sigil states (SIGIL_NONE, SIGIL_TIE, SIGIL_PIN, SIGIL_META) which take 2 bits out of the heart, leaving 6 bits for the "fundamental" types.

Thanks to extension types (HEART_BYTE() = 0, for good technical reasons) there's now as many types as one might want: "Extension Types" Implementation

Why Were SIGILs Generalized?

@bradrn always had it out for the non-generality... and I didn't disagree, just didn't know how to do any better.

As "meta-variables" became reimagined, (^foo: ...) could assign any antiform (including unstable ones) assignment, and (^foo) could fetch any antiform as well.

What's so special about 3 SIGIL? I don't know. Three is a magic number.

In any case, this reduces the number of fundamental types to 63. Due to CHAIN! replacing things like SET-WORD! or GET-PATH! etc, we're far from hitting that limit now.

1 Like

I implemented an optimization here...

I bumped everything (except BEDROCK_0) up by 1.

Then I split the antiform lift byte, to UNSTABLE_ANTIFORM_1 and STABLE_ANTIFORM_2.

This makes the test for whether you need to decay much faster (important!), while testing for Is_Antiform() got a tiny bit slower (LIFT_BYTE() <= 2 instead of LIFT_BYTE() == 1, which may not even be slower).

Lifting things from antiform to non-antiform is the "same price": you check to see if things are greater than ANTIFORM_2 and if so you add a quote level, otherwise both ANTIFORM_1 and ANTIFORM2 become QUASIFORM_4.

There's a slight added detail when transitioning things from (non-antiform => antiform), in that there's a bit more nuance that each type with an antiform has to set the byte according to stable or unstable. But there was already a switch() statement to do a bit of checks per-antiform type so this effectively adds nothing. Also, most antiform values are created directly in the antiform state and know what byte they should use.

This packs quite a punch. It's the hardest-working byte in all of Rebol!

1 Like

Reflecting on this optimization...

What I did was trade a quoting level to make a test faster. But this isn't the only test we might want to make faster.

Checking for null--for instance--could be faster if we dedicated a byte to that. NULL_ANTIFORM_3 (or whatever).

It loses a quote, but makes checking for null fast--you don't have to mask for "is it an antiform and is it a word and is it the symbol for null"

(That's accelerated by a flag already so it doesn't check the symbol--it's only a mask operation on the header... but checking just a byte is faster than a mask.)

We could go further. If we wanted Is_Block() to be fast, it could dedicate a lift byte to that.

So imagine if we said that quoting levels only got the 128 upper states. That's 64 levels of quoting (you have to have a bit for the quasiform status).

Then out of the 128 lower states of the LIFT_BYTE, 64 could be the fundamental undecorated types. There aren't a corresponding 64 antiforms... but the antiforms there are could be arranged such that the unstable ones were beneath the stable ones. No ideas off the top of my head on what to do with the other states...but, there'd be about 50 leftover.

(Update: ooh, this all applies to bedrock states too... another level in the strata. So parameter "HOLE" and "DRAIN" and "ALIAS" and such can all be their own lift bytes, too...faster recognition.)

This creates a sort of weird redundancy where for basic types their lift byte is their heart byte plus 64. But since the heart has to be right for the quoted/quasi levels I don't see much point in exploiting that because you'd throw off all the functions that check hearts for cell layout purposes.

This sounds like a worthwhile tradeoff. 64 levels of quoting is a "big enough" number; I think that gives you headroom for most things (I once implemented an "arbitrary escape hatch" that did unlimited quoting, back when there were only 2 bits of quote space in the header...but abandoned it when I reworked things for LIFT_BYTE...it would be possible to do again but I'm not sure I want to encourage people to do arbitrary unary counting with quote levels.)

I'll take a quick crack at this and see how much it helps. I imagine it will be nontrivial.

1 Like

I've spent a week working on epicycles of this change, and it has been VERY frustrating in terms of the performance outcomes. :pouting_cat: But it's been a good opportunity to simplify the code to allow changes to be experimented with more freely.

I think what's going to be fast or slow will depend a lot on your compiler (and machine architecture). If you're using something like TCC with no optimizations, one technique might be faster there. So it's good to write things in a way that makes changes easy.

Modern Optimizers Will Drive You Crazy

Today's fancy optimizers make it really hard to predict what will speed things up vs. slow them down. You might think you're saving by making an operation cheaper, but it turns out that if you've already done some more expensive operation the compiler might be able to use a subresult from that operation more effectively than the thing you lowered the cost of.

It's been so consistently the case that "improvements" have led to slowdowns that I frequently want to scream. It can be really unbelievable.

But... what I've done is broken apart the various states, so "Heart" and "Type" are not assumed to have overlapping byte values that can be converted via cast. You have to use Type_From_Heart() or Byte_From_Type() or Heart_From_Byte() etc.

I had wanted to avoid this initially, but it's really the best answer for future-proofing the code.

KIND_BYTE => HEARTSIGIL_BYTE

At this point the LIFT_BYTE actually encodes what can be thought of as the TYPE... just such that there isn't a single canonized state for quoting. It might seem tempting to call it the TYPE_BYTE, but this would be confusing since things like TYPE_BLOCK and TYPE_INTEGER are the type enum. So I'm keeping the name as LIFT_BYTE.

I renamed KIND_BYTE to HEARTSIGIL_BYTE. Because "KIND" didn't really explain anything, while "HEARTSIGIL" tells you exactly what it is (2 bits for the Sigil, 6 bits for the Heart).

Also, I pared back some of the C++-isms

It's important to use things like enum class in the debug build to be more type safe (so the enums aren't compatible with each other or integers, the way default C enums are). And it's also very important to rig up things to stop the comparison of the "0 state" of Type or Heart with other 0 states and assuming they are equal (that means they're extension types, and the actual type is stored in another Cell slot... you don't want any two extension types to compare equal).

So that's all kept. But I had tried to be clever with operator overloading so that reassigning things like the LIFT_BYTE(X) = Y ran overloaded C++ code. That makes less sense than just making a function like Tweak_Lift_Byte(X, Y), which also allows arbitrary processing (e.g. Y might not be a byte, but a larger enum value masked in).

This also means C builds can run the extra debug checks if they want, so it just makes more sense than doing operator overloading magic.