"Extension Types" Implementation

Where could I find the discussion about new implementation of user defined datatypes ?

3 Likes

The new thing that has come on the scene isn't what I'd really call "user defined datatypes" as much as "extension defined datatypes". It's for C programmers to implement types like IMAGE! or GOB! with a DLL or statically linked module...without those being built in a-priori.

The feature's goal was to get past a historical property that limited Rebol to 64 built-in datatypes, which had to be named in the core interpreter and could not be changed or extended. Ren-C wanted to be much more modular...to avoid carrying the weight of things like GOB! to the JavaScript build (or a redundant IMAGE! datatype that was handled by a browser's canvas.) Then the web build could choose its own extension types, perhaps some kind of JAVASCRIPT-OBJECT! proxy or a CANVAS!, etc.

This was needed during the breaking the project up into independently selectable extensions--of which there are now 31. See the README.md for a few notes:

https://github.com/metaeducation/ren-c/tree/master/extensions

(At this time, the web build uses only JavaScript, Console, and Debugger.)

Implementation Details

A "value cell" in Rebol and Red are four platform pointers in size. Of these, the first platform pointer slot is used as bits for a "header". How the other three pointers are interpreted depends on a byte in that header...which was called the VAL_TYPE() in R3-Alpha (though Ren-C calls this the "cell heart").

Of this byte, only 64 of the states are used in R3-Alpha--and I believe Red. This was chosen instead of 256 in order to limit the number of kinds that need to be handled in a TYPESET! to 64 bits...making typesets small enough to fit in the rest of the cell. Ren-C has broken this barrier to something nearer to 256 fundamental types (plus builtin typesets), but that's just for a finite number of built in things..

So "extension types" are a primordial implementation of a strategy to reserve one heart byte to mean "this cell gives up one of its three non-header platform-pointer-units to be a heap pointer to information about its type and its behaviors". That allows an arbitrary number of these to be added. They can't pack quite as much data into their cells as the built-in types, since they only have two pointers instead of three to work with. But given that you can always point to some allocated data (and usually need to), it's not a big problem.

Open Questions

How datatypes will participate in a naming ecology is not known. Right now the theory is that they register via a URL!. That is to say that type of foo could come back as something including http://example.com/types/matrix. While that's a bit drawn out, one idea that came up in error IDs was that there might be a form of comparison function that lets you get as specific as you want about that... e.g.

>> /matrix submatches http://example.com/types/matrix
== #[true]

>> /types/matrix submatches http://example.com/types/matrix
== #[true]

There's still plenty still to worry about. But the first tier goal of being able to build variants of Rebol without GOB! or IMAGE! or VECTOR! or STRUCT! (or mentioning them in built-in type table), while still keeping all those features working has been achieved.

1 Like

I have not logged in sice last year but I have been here reading the progresses without logging in. This has caused me to not read your reply. I'll read it during this week, thanks.

Due to the type system being a mess and needing rethinking, I had taken all of the extension type implementations (IMAGE!, GOB!, STRUCT! etc.) out of the main repository and moved them into their own GitHub projects.

(I'd wanted to do that anyway--because I wanted separate issue trackers for each extension, distinct from the core.)

Then I ripped out the first attempt at extension types because it was very messy.

But now we have a big advancement, in terms of a unified model of Generic Dispatch which uses the same methodology to dispatch things like COMPARE, MAKE, MOLD, and everything else. This uniformity makes it easier to make an extension type conform to the interface, because there's only one interface to conform to.

So I'm starting to return to putting the implementation in.

One thing I'm doing differently this time is I'm making a special choice of the HEART_BYTE() value to indicate custom types.

I'm using zero.

Why is it important to use zero? Well, because I have a C++ class for encoding optionality, like std::optional (or Rust's Option, or Haskell's Maybe). This class protects you at compile-time from trying to pass optional things to places that expect the thing to be there.

But in order to compile to C, the state used has to be C's idea of "falsey" when not under the C++ rules. That means null pointers or 0 values.

It's hard to stress how important it is when dealing with a feature like this to get good solid checking on when you are dealing with an extension type and when you aren't. You also need protections, like a protection against comparison:

if (Type_Of(cell1) == Type_Of(cell2))
    return "They're equal types!";  // ...OR NOT, if distinct extension types!

When you are comparing two bytes and one of the bytes encodes "it's custom, and you have to look elsewhere in the cell for the specific extension type", then you need to stop that kind of code from being written at compile-time. C++ can do this by deleting the overload when optional types try to compare:

#if CHECK_OPTIONAL_TYPEMACRO
    bool operator==(Option(Type)& a, Option(Type)& b) = delete;
    bool operator!=(Option(Type)& a, Option(Type)& b) = delete;
#endif

So what's neat is that in the C build, Option(Type) is a macro that just turns into Type. And you can test it for falseyness, with the code working just the same...without the compile-time checks.

It might sound like a small thing to change the byte to be 0 for extension types, but it makes a tremendous difference for problems that I was seeing in the code when it was a TYPE_CUSTOM that you had to remember to check for. This gives a big leg up.

Anyway, while the IMAGE! and VECTOR! and FFI! code are all fairly messy and outdated at this point, it would be nice to bring them back into the fold just to show that the system can be extended. So I'm working with IMAGE! at the moment as a first extension type to bring back.

1 Like