Advancements in CastHelper for Hooked Casts

Progress has been fairly amazing on the debug checks for casts.

The "user experience" of writing hooks for the casts used to be very hard-to-read code for a non-C++ programmer...but it's now very simple!

Just to give one example, consider casting to an "Element" (e.g. an Array element, something that can live in a BLOCK! or GROUP!).

As a reminder of what the idea is: anywhere in the source where you write either:

cast(Element*, ...)
cast(const Element*, ...)

In an ordinary C build, it will just act like:

(Element*)(...)
(const Element*)(...)

But in an instrumented build using C++, it's possible to add checks. These checks can be at compile-time (prohibiting conversions of some combination of types)... or they can be at runtime, validating the bit patterns of the thing being converted.

For Element, we have:

DECLARE_C_TYPE_LIST(g_convertible_to_cell,
    Cell, Atom, Element, Value,
    Pairing,
    Node, Byte, char, void
);

template<typename F>
struct CastHelper<const F*, const Element*> {
  static const Element* convert(const F* p)
  {
    STATIC_ASSERT(In_C_Type_List(g_convertible_to_cell, F));

    const Cell* c = u_cast(const Cell*, p);
    Assert_Cell_Readable(c);
    assert(LIFT_BYTE(c) != ANTIFORM_0);
    return u_cast(const Element*, c);
  }
};

That's extremely easy to read!

  • We are making sure that the LIFT_BYTE() is not 0, hence not an antiform.

  • We're also checking that it's a valid readable cell (e.g. the bits have NODE_FLAG_NODE and NODE_FLAG_CELL, and NODE_FLAG_UNREADABLE is not set.)**

  • Plus, at compile-time, it's stopping you from all manner of casts which might be accidental to make Cells from things that can't make sense as Cells.

What template<typename F> means is that this is a "wildcard" pattern-matching rule, that the compiler will try to match against any pointer to F. The name F is arbitrary, but chosen to represent "FROM", e.g. the datatype we are converting from.

(It would be possible to use more than one wildcard, e.g. template<typename F, typename T> and match patterns in both the "TO" and the "FROM". But here we're fixed as defining conversions TO an Element*, so there's no second parameter to the template.)

Write Just The Const Casts, Works For Mutable

I hammered on this until I could get it to where you could just write the const casts, and it will take care of the mutable form casts (including blocking casting away constness)... running through the same code. So you don't have to write two entry points and do the piping through common code yourself.

I'm on the fence on whether it's worth it to put in wiring to make it possible to separately hook mutable casts--to make sure the bit patterns you have are legal for a mutable pointer. That's not really needed given the protection of casting away constness, and the cases where you do cast mutably from raw pointers you probably are doing that on purpose. But definitely it needs to default to that if you write just the const casts you hook all of them for that pattern match, because this is 99% of the time what you want...it shouldn't be laborious.

C++ Is Minimized: DECLARE_C_TYPE_LIST()

Without DECLARE_C_TYPE_LIST() and In_C_Type_List(), this would look like:

using g_convertible_to_cell = CTypeList<
    Cell, Atom, Element, Value,
    Pairing,
    Node, Byte, char, void
)>;

STATIC_ASSERT((g_convertible_to_cell::contains<F>{}));

It's quirky--including the quirk that you can't call the STATIC_ASSERT() macro on expressions which contain templating <...> markers unless they're wrapped in an extra set of parentheses. So you have to use static_assert() which enforces arity-2 in C++11.

Also Less C++: u_cast()

You don't want to run the cast hooks while implementing a cast hook! Originally I used reinterpret_cast in this code, so it looked like:

    const Cell* c = reinterpret_cast<const Cell*>(p);
    Assert_Cell_Readable(c);
    assert(LIFT_BYTE(c) != ANTIFORM_0);
    return reinterpret_cast<const Element*>(c);

But the casting system offers u_cast() as an "unchecked" cast that nevers run the hooks (but is easier to spot as being a cast than the parentheses cast it expands to).

It makes it shorter and less scary to use u_cast(), and that's what's used in the rest of the codebase to implement unchecked cast. So that helps make the code more familiar to what the rest of the C looks like.

Should The template<> be Abstracted Away, Too?

I don't think so.

I think that goes into the realm of pandering a bit too much to C fraidy-cats. It's tougher to abstract and I think that token-for-token, it gets it right.

The value here is apparent--and I think I've done as much pandering as is appropriate. What's left is legitimate C++ that matches the essential complexity of the problem. (It's not even duplicating the signature of the types needlessly, as you might use reference types in those positions.)

In Total, This Is Awesome Stuff

It's not just about the common casts. This gives you surgical precision if you're facing a particular debugging problem that's narrowed down, and you want to write a bit of custom instrumentation just to catch the problem you're working on.

Ren-C is able to stay robust despite being a very "Amish" codebase, due to having many of these kinds of features to keep the trains running on time.

1 Like

I'll mention that I'm factoring this stuff out (along with the contravariance mechanics, and other things) as a library which other C projects could use.

But also... so it can have its own set of regression tests. Because currently the only "test" is having it build the Ren-C codebase. So if I tweak it and something stops working (e.g. it doesn't run a cast hook that it should) I have no real way of finding out about that, besides setting breakpoints in the hooks and seeing if they're hit or not.

Library Name: "Needful"

Because it has wrappers like Sink(T), Need(T), Init(T)... ChatGPT had a funny suggestion of calling the library "Needful", as in the amusing Indian-English phrase "do the needful".

I legitimately think this could be helpful for many codebases that have to build as C, for embedded purposes/etc. Using a C++ compiler to do static analysis is a tool you almost certainly have around, and there's nothing special to set up or install or configure.

It needs a good marketing phraseology...

"Needful: The library that does nothing in your C programs (and you NEED it!)"

:slight_smile:

1 Like