If R3-Alpha REDUCE were in Ren-C Today

If you look at the R3-Alpha implementation of REDUCE, it has two parts.

One part is a spec in a natives list (in %natives.r):

reduce: native [
    {Evaluates expressions and returns multiple results.}
    value
    /no-set {Keep set-words as-is. Do not set them.}
    /only {Only evaluate words and paths, not functions}
    words [block! none!] {Optional words that are not evaluated (keywords)}
    /into {Output results into a block with no intermediate storage}
    out [any-block!]
]

The second part is the C code (in %n-control.c):

/***********************************************************************
**
*/	REBNATIVE(reduce)
/*
***********************************************************************/
{
    if (IS_BLOCK(D_ARG(1))) {
        REBSER *ser = VAL_SERIES(D_ARG(1));
        REBCNT index = VAL_INDEX(D_ARG(1));
        REBVAL *val = D_REF(5) ? D_ARG(6) : 0;

        if (D_REF(2))
            Reduce_Block_No_Set(ser, index, val);
        else if (D_REF(3))
            Reduce_Only(ser, index, D_ARG(4), val);
        else
            Reduce_Block(ser, index, val);
        return R_TOS;
    }

    return R_ARG1;
}

Let's Reimagine This The Ren-C Way...

I'll start out just by doing the mechanical and aesthetic improvements of the build process, naming, and conventions.

Ready?


D_ARG(n) => ARG(NAME) (At Zero Runtime Cost)

:one: :man_mage:

/***********************************************************************
**
*/	REBNATIVE(reduce)
/*
***********************************************************************/
{
    INCLUDE_PARAMS_OF_REDUCE;

    if (IS_BLOCK(ARG(VALUE))) {
        REBSER *ser = VAL_SERIES(ARG(VALUE));  // e.g. not VAL_SERIES(ARG(1))
        REBCNT index = VAL_INDEX(ARG(VALUE));
        REBVAL *val = REF(INTO) ? ARG(OUT) : 0;

        if (REF(NO_SET))
            Reduce_Block_No_Set(ser, index, val);
        else if (REF(ONLY))
            Reduce_Only(ser, index, ARG(WORDS), val);
        else
            Reduce_Block(ser, index, val);
        return R_TOS;
    }

    return R_ARG1;
}

The first clarifying change is for each native, have the make prep step automatically generate a macro (e.g. INCLUDE_PARAMS_OF_REDUCE) that when you put it in the body of the function does several DECLARE_PARAM() lines that declare constants that can be found by the ARG() and REF() macros.

These INCLUDE_PARAMS_OF definitions look like this:

#define INCLUDE_PARAMS_OF_REDUCE \
    DECLARE_PARAM(VALUE, 1); \
    DECLARE_PARAM(NO_SET, 2); \
    DECLARE_PARAM(ONLY, 3); \
    DECLARE_PARAM(WORDS, 4); \
    DECLARE_PARAM(INTO, 5); \
    DECLARE_PARAM(OUT, 6)

DECLARE_PARAM makes a local enum definition:

#define DECLARE_PARAM(name,n) \
    enum { param_##name##_ = (n) }  // enums force compile-time const

Then we change:

#define D_ARG(n)  (ds+(DSF_SIZE+n))

To be:

#define ARG(name)  (ds + (DSF_SIZE + param_##name##_))

Note that these constants have no runtime cost--they are exactly as cheap as the hardcoded integers!

They can't be macros themselves (since they're scoped per-function). And const int might not be optimized out by a bad compiler--or even a good one at some optimization levels. So they're implemented as local enum declarations, which are guaranteed (for all practical purposes in any compiler you'd ever actually encounter) to act just like you typed the number directly into source.


Now Let's Put The Spec In The C File

:two: :notebook:

//
//  reduce: native [
//      {Evaluates expressions and returns multiple results.}
//      value
//      /no-set {Keep set-words as-is. Do not set them.}
//      /only {Only evaluate words and paths, not functions}
//      words [block! none!] {Optional words that are not evaluated (keywords)}
//      /into {Output results into a block with no intermediate storage}
//      out [any-block!]
//   ]
//
DECLARE_NATIVE(REDUCE)
{
    INCLUDE_PARAMS_OF_REDUCE;

    if (IS_BLOCK(ARG(VALUE))) {
        REBSER *ser = VAL_SERIES(ARG(VALUE));
        REBCNT index = VAL_INDEX(ARG(VALUE));
        REBVAL *val = REF(INTO) ? ARG(OUT) : 0;

        if (REF(NO_SET))
            Reduce_Block_No_Set(ser, index, val);
        else if (REF(ONLY))
            Reduce_Only(ser, index, ARG(WORDS), val);
        else
            Reduce_Block(ser, index, val);
        return R_TOS;
    }

    return R_ARG1;
}

The make prep build step always had the job of scanning the source code to connect the C function definitions with their matching native code.

But Ren-C's make prep step just goes the extra mile and extracts the native spec from comments.

This was done by @Brett many years ago in the beginning of Ren-C. :tada:


Now Let's Modernize Naming Conventions

:three: :man_teacher:

// ...SPEC...
//
DECLARE_NATIVE(REDUCE)
{
    if (Is_Block(ARG(VALUE))) {
        Flex* flex = Series_Flex(ARG(VALUE));
        Index index = Series_Index(ARG(VALUE));
        Value* val = REF(INTO) ? ARG(OUT) : 0;

        if (REF(NO_SET))
            Reduce_Block_No_Set(flex, index, val);
        else if (REF(ONLY))
            Reduce_Only(flex, index, ARG(WORDS), val);
        else
            Reduce_Block(flex, index, val);
        return R_TOS;
    }

    return R_ARG1;
}
  • C functions are Words_Separated_By_Underscores(). At least one underscore appears in every function name.

  • C datatypes are CamelCase with the reduced case of Type as a single word representing a datatype.

  • Macros are sometimes ALL_CAPS and sometimes not, depending on if they are trying to look "function-like" (if they are function-like, they must not duplicate their arguments.)

An important naming choice was made to call the resizable data structure inside series cells a "Flex", and not try to call that a "Series" (or REBSER)

This way we know that a Series is a type of Cell... a composite of the data (a Flex...flexible memory abstraction) and an Index, along with a header. You can get your bearings a lot better this way.

e.g. here we see Series_Index(...) and know that it's asking a question of an ANY-SERIES Cell. We don't need to say Val_Series_Index(...) or Cell_Series_Index(...) to reinforce that it's a cell ... it has to be a cell to have an index.

But if we had a datatype called REBSER would you think you should be able to ask for the Rebser_Index(...)? This is why a choice like Flex is much stronger for the non-Cell stub, to separate it from what we think of as "a series".


Enforce Flex Subclasses In C++

:four: :chart_increasing:

// ...SPEC...
//
DECLARE_NATIVE(REDUCE)
{
    if (Is_Block(ARG(VALUE))) {
        Array* array = List_Array(ARG(VALUE));  // <-- Array* subclass of Flex
        Index index = Series_Index(ARG(VALUE));
        Value* val = REF(INTO) ? ARG(OUT) : 0;

        if (REF(NO_SET))
            Reduce_Block_No_Set(array, index, val);
        else if (REF(ONLY))
            Reduce_Only(array, index, ARG(WORDS), val);
        else
            Reduce_Block(array, index, val);
        return R_TOS;
    }

    return R_ARG1;
}

It's extremely helpful to know if the Flex you are dealing with is the kind that holds Cells (an "Array"), or string data (a "Strand"), or bytes (a "Binary").

For instance, R3-Alpha's Reduce_Block() would have no idea what to do if you passed it a REBSER containing bytes for a BINARY!. It would just crash. There was no tool besides asserting at runtime that you got the unexpected type.

This is the fault of C. It doesn't have inheritance. If you wanted (Flex* Array* Strand* Binary*) which were type-incompatible in the compiler, they would have to point to completely different types...and the memory would be incompatible, unusable by common routines due to what is known as "Strict Aliasing".

Ren-C simply uses inheritance in C++, but not in C. e.g.

// 1. "Strand" holds UTF-8 bytes which are used by ANY-STRING types, but can
//     be aliased as a BLOB!.  So Strand Flexes inherit from Binary Flexes,
//     and all operations legal on a Binary Flex are legal on Strands too.
//
#if CPLUSPLUS_11
    struct Binary : public Flex {};
    struct Strand : public Binary {};  // [1]
    struct Array : public Flex {};
#else
    typedef Flex Binary;
    typedef Flex Strand;
    typedef Flex Array;
#endif

Yes, it really is that simple. If you flip a switch and compile with C++ you would find places where you mistakenly tried to pass string memory to a place that was expecting cells.

This is the tip of the iceberg for why being able to compile a C codebase with C++ gives tremendous leverage. The benefits go exponential after that.


Ren-C isn't just about naming (!!!)

The transformations in Ren-C align the implementation with a vision of smaller, more composable pieces in the evaluator, enabling easier reasoning about natives and opening the door for richer, user-exposed control over evaluation.

But what I've shown so far above doesn't actually "do" anything different: you'd generate the same binary .EXE file on disk as before.

So I'll describe things that would affect the .EXE in a separate post.

1 Like

Let Us Unify ARGuments and REFinements

:five: :robot:

//
//  reduce: native [
//      "Evaluates expressions and returns multiple results."
//      value
//      /no-set "Keep set-words as-is. Do not set them."
//      /only "Only evaluate words and paths, optionally just in a list"
//          [block! logic!]
//      /into "Output results into a block with no intermediate storage"
//          [any-block!]
//   ]
//
DECLARE_NATIVE(REDUCE)
{
    INCLUDE_PARAMS_OF_REDUCE;

    if (Is_Block(ARG(VALUE))) {
        Flex* flex = Series_Flex(ARG(VALUE));
        Index index = Series_Index(ARG(VALUE));
        Option(Value*) val = ARG(INTO);

        if (ARG(NO_SET))
            Reduce_Block_No_Set(flex, index, val);
        else if (ARG(ONLY))
            Reduce_Only(flex, index, unwrap ARG(ONLY), val);
        else
            Reduce_Block(flex, index, val);
        return R_TOS;
    }

    return R_ARG1;
}

We've saved two Cells of space on every call to REDUCE. That's 32 bytes on 32-bit machines, and 64 bytes on on 64-bit machines. Not only do we save the space, but we also save the CPU cycles to fill that space. Plus people don't have to come up with names for the refinement arguments!

(Not a day has passed where I've missed multi-argument refinements.)

Notice that REF() is gone. Instead, the ARG() macro is informed by the expansion of INCLUDE_PARAMS_OF_REDUCE as to which arguments are refinements. It would look something like this:

#define INCLUDE_PARAMS_OF_REDUCE \
    DECLARE_PARAM(Need(Value*), VALUE, 1, false); \
    DECLARE_PARAM(Option(Value*), NO_SET, 2, true) \
    DECLARE_PARAM(Option(Value*), ONLY, 3, true); \
    DECLARE_PARAM(Option(Value*), INTO, 4, true)

So what's happening here is that the ARG() macro knows whether to produce something that's either a null pointer or a Cell* (hence logically testable with C's if()) or whether to just give back the Cell*. The true or false guides the extraction.

In the C++ build you are protected from writing if (ARG(VALUE))... because it's not a refinement, the Need() wrapper enforces that you can't convert it to boolean.

In a C build Option(Value*) and Need(Value*) are both just Value* so you're just on your honor to know that it would be a bad idea to write if (ARG(VALUE)) because since it's not a refinement, ARG(VALUE) will never interpret a "none!" as a C NULL. But I think the protection is good to have!

(Note that because /ONLY is a refinement, whatever the "no refinement" state is can't be used as a value, so I changed the assumption that you use a LOGIC!... you could use anything that's not none here...)


Optimized Generalized Returns: Direct To OUT!

:six: :racing_car:

// ...SPEC...
//
DECLARE_NATIVE(REDUCE)
{
    INCLUDE_PARAMS_OF_REDUCE;

    if (Is_Block(ARG(VALUE))) {
        Flex* flex = Series_Flex(ARG(VALUE));
        Index index = Series_Index(ARG(VALUE));
        Option(Value*) val = ARG(INTO);

        if (ARG(NO_SET))
            Reduce_Block_No_Set(OUT, flex, index, val);
        else if (ARG(ONLY))
            Reduce_Only(OUT, flex, index, unwrap ARG(ONLY), val);
        else
            Reduce_Block(OUT, flex, index, val);
        return OUT;
    }

    return COPY_TO_OUT(ARG(VALUE));
}

R3-Alpha returned an enum value to say where to look to find an output value. The full-on generic way of saying this would be to write to the output cell... whose name was D_RET. And if you wanted the interpreter to look in the D_RET cell you would return the R_RET enum value.

Hence the way to implement the COPY_TO_OUT() I show in R3-Alpha would be with a macro:

#define COPY_TO_OUT(cell) (*D_RET = *cell, R_RET)  // R3-Alpha version

Ren-C doesn't use a return enum, it uses a void pointer... and due to the various values in the system having "detectability" it's possible to discern what type you returned.

But one very special pointer you can return is the OUT pointer. So it's like being both D_RET and R_RET... it's the place to write to and the signal that's where you wrote.

Just to demonstrate, let's show how to write R_TOS (return value on top of stack) in Ren-C:

#define R_TOS (Copy_Cell(OUT, TOP), DROP(), OUT)

But there is no R_TOS in Ren-C, and I re-styled the code above to pass in OUT to the Reduce_Xxx() functions. Why would I do that when it seems to make the code more verbose?*

The reason is that in Ren-C, you don't have to use the stack...or even a frame Cell...to act as a waystation to move cells to where they're ultimately going to land. OUT is a direct pointer to where the evaluator was asked to write the output.

This means that if you're building a frame for a function call, the evaluation product of another function is receiving the literal address where the data is to be written.

So not only is making the OUT parameter explicit in Reduce good for documentation that it has a product, it also means you're writing things direct to where they should be.


Further Requires Isotopes, CHAIN!s, librebol, etc.

:infinity: :atom_symbol:

I just wanted to show off how the baseline comprehensibility of Ren-C code has gotten much better even if you stayed in the R3-Alpha headspace.

In particular: I was motivated to show off how I got rid of the REF(...) vs. ARG(...) distinction. This is helpful if you ever change the spec in a way that turns something from a refinement into an argument or vice versa.

It would be fully possible to keep R3-Alpha more or less "as is" and adopt these design improvements. But if you see these things as improvements, why would you stop here?

Wouldn't you imagine even greater heights have been achieved using this baseline?

:mountain: :man_climbing:

1 Like