First Bootstrap In At Least 6 Months: Watershed Changes

hostilefork · November 21, 2024, 5:08am

Evolving Ren-C is a messy and difficult balancing act. Due to the experimental nature and bus-factor-of-one staffing, it's not always a good use of time to completely push through a change in all code...until it's seen to be a good change.

The adoption of dozens of experimental ideas had led to a situation where being able to bootstrap the codebase stalled... for possibly the longest period yet. This is to say that while the old 2018-era executable is used to make all the .h and .c files to build the current sources, the executable it built would be unable to do so.

But a few days ago, I managed to accomplish bootstrap. And I've gotten several other codebases that had been atrophying (Whitespacers, Rebol-HTTPD, Query) up to date.

While there are a lot of shaky parts (in particular, the workings of the new form of methodization that relies on leading-dot .member accesses), I'd say overall things are very promising.

One of the toughest points was the institution of "strict mode"... where you cannot assign to a variable that has not been pre-declared in some way (either as a module-level declaration, a LET, a <local>, made with WRAP, etc.) This is a big change, but a good one... for all the reasons "strict mode" is known to be good, but also because it eliminated "attachment binding"

Not Easy, But Very Instructive

In some ways, the story of Ren-C's evolution is told by Rebmake, and the process which produces all the .h and .c files that the build relies on.

It's certainly one of the top 3 biggest and snakiest Rebol codebases (I'd argue it's probably the trickiest codebase of its size, moreso than Red or Atronix Zoe). And it contains the code of Carl, Shixin, Brett, BrianH, myself, and others... a mixture of styles, and patches, and hacks. Some huge bends come from needing to run in an executable that's not much beyond a patched version of what ran in 2018.

I've also explained why adopting Rebmake was very consequential, and complicating. It brought concerns to the build process to be a superset of CMake and GNU make, instead of being narrow and focused like other approaches (Giulio and Andreas tried their hands at much more succinct ideas, and Oldes is using something much lighter in daily work today).

But this has informed the design, hardened it, and challenged it.

A New "Weird" World: Driven By Reason and Experience

With changes like :refinement instead of /refinement, and the switch to where actions can be invoked as /foo, plus member variables being indicated with .field -- things are starting to look quite different.

But I can say with confidence that from a usage perspective, it is very clearly better.

The slashes for functions is quite empowering. Unlike the pox of :deactivating GET-WORD!s, slashes tell you what you know instead of what you DON'T know.

Refinements being done with colons does come out as what seems like a casualty of that, where you can have something like:

data: copy:part series pos

It mingles refinements and assignments notationally, which some might find bad. Though I think it's nothing like mingling field accesses and refinements with both being slashes.

And once you get used to it, I think it actually is nice to have the colons blend more quietly. This allows the slashes to stand out, and it really is a better use of it popping.

The .field accesses are--I think--a pretty definite win. I've made it so that the lone period (.) defaults to one of those sneaky functions that looks at the current environment, and gives you the object that the .field accesses are from. So if you really want to bring all those fields into scope where you can use them undecorated, you'd be able to do something along the lines of:

use <*> .
print ["Now I can access" field "without dots"]

(Though the notation for that is still in flux.)

But really, it's very hard to keep things straight and know what's an argument and what's a member... so I like the dots. (They were added on purpose...and are harder to implement than just doing lookup in the object as a higher priority for regular words, so of course I must like them!)

So...What's NOT Working Well?

One of the biggest problems I've run into is that the easy interplay between WORD! and SET-WORD! and GET-WORD! has been replaced by some really finicky sequence mechanics.

For instance, this no longer works:

 >> to word! first [a: 10]
 ** Error

TO's rule is reversibility. And (to chain! 'a) shouldn't be biased to either a: or :a

Maybe you could argue for saying that a: is more useful, and so TO for sequences should put words at the head. But that's not really the case for a. or a/, is it?

Then you have composites like /a:, and the question of just how many routines have to deal with these composites... and what the rules are. If you do a SET of plain a to an action, should that be an error unless you change it to set /a ? Should SET support /a: or make you extract things down?

I've been building little parts to help attack this, such as a function RESOLVE for picking the variable out of a sequence:

>> resolve first [/a:]
== a

>> resolve first [a.b/]
== a.b

And that's the kind of thing that helps pick up the slack from things like TO WORD!.

It's still new and awkward territory. There's certainly going to be some amount of irreducible complexity that comes from working with these new parts, but I'm hopeful that I'll be able to reduce the pain as things develop.

Overall, Things Are Reasonably Strong

I talk about how Ren-C's development methodology just keeps solidifying, to let it move on to building higher things.

Even with many pieces teetering on the edge, there's still a solidity underneath it all that means I almost never wind up chasing Heisenbugs. When a part needs to be hardened, it can be hardened.

In any case, it's good to see it bootstrapping after a long time of not. I'm hopeful that by the end of the year I'll feel comfortable enough to make new bootstrap executables and push out the web build, so expect a big "drop" sometime in December.

hostilefork · July 19, 2025, 1:20am

Since the November bootstrap, there have been two more occasions where I've gotten the ducks in a row well enough to bootstrap.

The first was hacked together--e.g. the new executable could bootstrap the code with modifications, but those modifications meant the old bootstrap executable couldn't bulid it.

That bootstrap informed me of weak spots that guided more development to a second bootstrap this week, that is much more solid. And the bootstrap executable has been updated enough to be able to run the same code.

Things Are In A Good Direction, But Painfully Slow

I've been piping things through more and more general mechanics, with the goal of introducing chokepoints that enable stuff like accessor functions and typechecked variables.

Just intuitively speaking, you can imagine that code which used to assume it could just go directly to the memory location of a variable and access it now has to go through a general mechanic... which would make it slower.

But it's a fairly big deal, because now any code that accesses a variable has to be prepared to have arbtrary usermode code run that gets that variable. This means every such moment has to be safe from garbage collection. It means there has to be more arbitrary error handling--not just a boolean of "was the variable trash or not".

One of the places that gets especially hard is the evaluator, e.g. doing lookahead for infix functions. If we're saying every expression evaluation has to check if the next expression is an infix function or not, you theoretically pay for that lookup twice. (Imagine EVAL:STEP going one step at a time, looking ahead and deciding the next thing isn't infix and returning... then you take another step. The thing you looked ahead at last time is now the first thing in the subsequent step. What caching lets you take advantage of that? Can you cache it?)

Not Time For Optimization, Yet

There's a lot of pain right now related to this common code path, and it's not quite at the point of actualization yet (type checking almost implemented, but not yet...)

So definitely don't want to start optimizing until there are some working examples of type checking and accessor functions. But even then... it's not time... because...

Debugging!

While performance tuning is necessary, I really think before I go too far with anything I need to tend to debugging. It's been a long time building up to it, but I've been laying things down bit by bit so it could work.

Being able to go step-by-step isn't just useful for people who know the language, but critical for people learning the language.

And I don't want to throw in wacky optimizations that would compromise debugging. Optimizations have to be debugging sensitive, and know not to do the wacky optimization if you need to see every step.

The biggest problem now in debugging though, is how to show the code. Rebol code is a weird graph. And you're stepping through generated code a lot of the time.

On the plus side, with virtual binding now, you can see things like "what variables are in scope".

Errors Are Also Really Poor Right Now

I've been pushing ahead just getting mechanical things to work, and getting by because I can break in the C debugger and see what's going wrong. But if I couldn't, a lot of errors suck.

Something that's much more viable now with PANICs is that since they are rare and not intended to be recovered from, a working debugger could throw you in at the moment of the panic...and see the call stack. So while better error messages are important, this would make the biggest difference.

But No Excuses... Need To Push Builds

Slow or bad errors isn't really the issue. The reason CI stopped is ostensibly because there had to be changes to align the bootstrap executable to massive changes (ranging from the format of strings and refinements to the question of how methods were dispatched).

So the question is has that stuff stabilized to where hardening a batch of bootstrap executables is a good idea. I think the time is at hand.