Module Startup and Shutdown (Constructors, Destructors?)

hostilefork · December 14, 2018, 12:37am

I've been trying to move toward a model extensions are just "modules that ship along with a DLL that have some native code, too". This helps avoid having parallel-and-variously-incompatible versions of the same features.

In trying to merge the functionalities, one thing that extensions could do was that when DLLs were loaded they could run an arbitrary Startup() hook. Then when the DLL was unloaded, it could run a Shutdown() hook. So if you used a native API that had a paired open/close you had a moment to do both.

However, since a module can ship with natives, this raises the question of why the startup code can't just be run as part of the normal course of the module:

 Rebol [
     Title: {ODBC Extension}
     Type: 'Module
 ]

 call-odbc-init-c-function
 odbc-settings: make object! [...]
 ...

Being able to call a native living in the extension like CALL-ODBC-INIT-C-FUNCTION is every bit as good as having a special esoteric C function exposed, that the DLL loader looks up with OS APIs and calls with a magic incantation. All that magic is already done to provide new natives to call, why not use it?

Plus you have more options--you can break it into multiple functions, have it get parameters from the environment, etc. Also very important: there doesn't have to be a distinct model for error handling if something happens--such things already had to have an answer for everything else you might be calling, why make it special for the init?

...but what about the shutdown?

It's not totally obvious that only a module which has some of its code written as user natives would need a shutdown. What if you have a module that opens a persistent network connection--all in usermode--and wants to do some kind of graceful signoff if it can? Why should "extensions" be special?

If that generic hook were available, then native code could be run by putting it in a native ACTION! and doing it that way--just like the init.

This could be a SHUTDOWN: field in the module header. Or it could be an "register-on-shutdown-callback" method that modules offer to the code running in their body (kind of the way it would offer things like EXPORT).

But it seems like maybe it should be more general. Rebol doesn't have constructors and destructors...but, maybe it should? There is now an explicit FREE which can be used to kill off an object, and only HANDLE! does cleanup...but maybe objects should be able to do something about it too.

For now the easiest thing to do to keep extensions going is just to make some module-specific solution and move on. But it's worth thinking about--are there other languages in Rebol's family which have interesting constructor/destructor behavior? Or bad behavior that would be good to know about and avoid? Just wanted to post a note on the topic...

Brett · December 14, 2018, 11:59pm

IIRC the only place in Rebol 2 that one can attach automatic cleanup code is in a port scheme, handling a port close event. Which is not exactly a generally useful solution.

hostilefork · May 23, 2020, 7:11pm

So... constructors and destructors provide a commitment to what to do when you abruptly leave a "scope" where an object is declared...regardless of how you leave.

If you were cooking on the stove and turned on the oven, you can make a pretty good general rule that you don't want to leave the house with it on. It doesn't matter if you get a phone call that causes you to go by way of the garage and out the garage door... or out the back door, or the front, or climb out a window. Leaving the oven on is bad. So if turning on the oven could instantiate a magical "turn the oven off if I forget and leave the house" gadget, such a gadget sounds good.

Historically Rebol had at least one barrier...

Instead of the house example...imagine we think of functions, BLOCK!s, or GROUP!s as being the kind of thing you might want to run some cleanup code for no-matter-how-you-leave-them. There had been a particular "way of leaving" that was expensive to have every function catch. That was exceptions--which Ren-C has called "failures".

Rebol (and Ren-C) use a style of coding in the C that allows it to react to things like memory allocation errors with an "exception". This contrasts with needing to test every single series allocation (the way you would test a malloc for NULL). These exceptions used the only method available to jump up the machine stack in C... setjmp and longjmp

While the mechanism behind THROW was cheap, this mechanism was expensive. And only TRAP-style (formerly named TRY-style) constructs would do it...because setting up a CPU state buffer at every stack frame boundary would be prohibitive.

Stackless is removing this particular barrier

By bouncing everything through a "Trampoline", an infinite number of stack levels can be navigated with only one setjmp/longjmp. All an exception has to do is longjmp to that. Then stack levels (Rebol levels, not C levels...since there's only one!) can be traversed just as easily as you might bubble THROWs up for processing. They can be examined and do whatever handling they like.

Given that performance is no longer a barrier to reacting to arbitrary exceptions, what might such things look like? I don't really know. Might just mean people are more liberal with TRAPs, if they're cheaper. :-/

I'm not going to go research it right now. But pointing it out in case there are some other languages out there, where they have been more technically free and not enslaved to setjmp/longjmp...that devised cool features despite being largely GC-based.

So that's a question on the table. Is there something in the exception-handling world to use this as an opportunity to introduce. All I really know on the topic is C++, and nothing immediately comes to mind besides having a way to dash off code that will run when a block exits.

Gist of such an idea would be something like the LATER below:

>> do [
    print "Entering block"
    later [print "First exit code"]
    (
        print "Entering group"
        later [print "Second exit code"]
        print "Doing more stuff"
        later [print "Third exit code"]
        if condition [
            fail "now we fail"
        ]
        print "This only happens if condition is false"
    )
    print "This only happens if no failure, as well"
 ]

With condition false, you'd get:

 Entering block
 Entering group
 Doing more stuff
 This only happens if condition is false
 Third exit code  ; running in reverse order...
 Second exit code
 This only happens if no failure as well"
 First exit code

But with condition true and the failure, you'd get:

 Entering block
 Entering group
 Doing more stuff
 Third exit code
 Second exit code
 First exit code
 ** Error: now we fail

We'd imagine it not being just failures, but RETURNs and THROWs as well. But that was possible before. The new novelty is when it comes to "exceptions" that originate internally (vs. provoked by the FAIL native, which didn't have to leverage that same mechanism...but did). These features no longer need any kind of advance payment to plan for, the way declaring a TRAP would be. It costs no more for a block to be ready to accept an arbitrary failure (even a rug-pulled-out-from-under-a-native problem like out-of-memory that it didn't gracefully check for).

So a family of possibilities opens up.

hostilefork · July 29, 2020, 9:27am

I noticed something in Go related to this, which is their defer keyword:

"A defer statement pushes a function call onto a list. The list of saved calls is executed after the surrounding function returns. Defer is commonly used to simplify functions that perform various clean-up actions."

This is quite similar to my suggestion. Though we do not have the concept of a "currently running function", we do have a "currently running block", but that wouldn't be much use if you wrote something like:

 some-code
 if condition [
     defer [...cleanup...]
 ]

Running at the end of that block would be no different than just calling normally. So that wouldn't work so well.

We might also be able to use things to identify frames; e.g. defer 'return [...whatever...] could get the binding out of the RETURN. Another alternative might say that a construct that wishes to use DEFER might make a definitional defer which encodes the frame...the way that RETURN does.

hostilefork · December 1, 2024, 3:38pm

In Rebol 1.0, there's something called SHIELD:

shield before-block main-block after-block

Shield a block from catch and other types of exception handling, allowing it to take the necessary steps to initialize and finalize its state.
print catch 'throw [
    shield [
        print "entering"
    ][
        repeat n 10 [if n > 5 [throw "thrown out"]]
    ][
        print "exiting"
    ]
]
This will output:
entering
exiting
thrown out

It might seem unclear as to why you would need to have a "before-block" instead of just writing your code before the call to SHIELD. But Rebol 1.0 is very function-driven. So:

(before-code shield main-block after-block)

...would be a GROUP! instead of a single function call, and I believe the thinking was more that if it was a function that it would "fit in more slots" where a GROUP! would complicate things. (?)

Anyway...this category of things is still a functionality gap in Ren-C, because all you can do is trap and rescue and catch things...and you have to care about whether you're catching or trapping, and you have to rethrow or re-fail. Things like SHIELD and DEFER are all trivial to implement, but it's just not certain what the right way to get constructor/destructor type behaviors is in this language.

hostilefork · May 11, 2025, 2:51pm

I love GENERATORS and YIELDERS, however they have a dark side...

If you are enumerating something with a generator or yielder, and you don't call it to exhaustion, it never cleans up... leaving latent locks on series, and just clogging up memory.

History and experience with serious languages like C++ and Rust have shown that we don't really have better a better answer for default lifetime control than scope.

So let's try a thought experiment.

>> foo: func [data [block!]] [
      let g: make-generator [
          for-each 'item data [print "Inside!" yield item + 1]
      ]
      return reduce [g g]
   ]

>> foo [10 20 30 40]
Inside!
Inside!
== [11 21]

Imagine if at the moment of returning from FOO, the incomplete generator G should be destroyed, and its locks released.

OTOH, if you have a module-level variable that's a generator, we can't automatically get rid of it. And setting it to NULL wouldn't be enough to get it synchronously GC'd. You would have to FREE it.

So Many Questions...

There's actually one important mechanism that the system has now, which is the ability to FREE basically anything and have stray references crash the GC.

So if LET wants to, it can FREE whatever it holds when the scope exits.

But now we enter a situation where not just exiting from scope, but a new assignment would have to free the generator too:

foo: func [data [block!]] [
    let g: make-generator [
        for-each 'item data [print "Inside!" yield item + 1]
    ]
    let data: reduce [g g]
    g: 1020
    return data
]

Overwriting G with 1020 would synchronously FREE the previous contents of G.

Clearly not all variables being overwritten should free them. If you passed G to another function, and that function made a couple of calls and then overwrote the argument... it shouldn't in the general case mean FOO lost G... unless it did a transfer of ownership.

So maybe this needs to be conveyed with a new concept, like C++'s unique_ptr.

Let's say the concept is UNIQUE:

let g: unique make-generator [
     for-each 'item data [print "Inside!" yield item + 1]
]

When you say that something is unique, you're saying that if that variable slot gets overwritten for any reason... it should free the value. This needs to include cases of exiting scope.

This may have potential, and it may be able to be implemented using the same mechanics as accessor. In other words, it could be usermode... UNIQUE would be an infix function that specifically sets up the left hand side as a slot that runs code on overwrite of the variable.

What About Indefinite Word Lifetime?

This has always been a bit of a thorn:

What happens to function args when the call ends?

If LETs and function args are getting wiped clean on function exit... (sometimes clearing out UNIQUE or other accessors, other times having no effect)... then returning code bound to those LETs would not be usable.

There's never really been a satisfying answer to indefinite lifetime. But it's almost like you want to do the opposite of UNIQUE, to tell a variable "hey, I need you to live". Some kind of UNSCOPE operation, or SURVIVE or something.

>> f: func [x] [return [x]]

>> b: f 10
== [x]

>> reduce b
!! PANIC: X is trash  ; or whatever, wiped out by function exit

vs.

>> f: func [x] [survive $x, return [x]]

>> b: f 10
== [x]

>> reduce b
[10]

So what SURVIVE would do is stop the system from setting X to trash on exit of F.

This starts to put a lot of little invisible bits on things, which makes me a bit uneasy. But there's no in-band way to do this. The values themselves can't encode information about their lifetimes (e.g. some UNIQUE! antiform which is a box around a value... you'd have to unbox the value every time you used it).

Though... quick devil's advocacy detour ...it is the case that quasiform actions can be accessed and run via /^g, and so you might say "What if meta actions were immune to freeing, but normal ones were not." That's kind of dumb, but it does point to the idea that there might be some way of encoding the "don't free me automatically on exiting functions" state in a value itself, if you were willing to use a special decoration to refer to that value to undo whatever you did to make it denote indefinite lifetime.

Maybe this is "meta-fencing" values?

>> f: func [x] [x: generator [yield 10], return [x]]

>> b: f 10
== [x]

>> reduce b
!! PANIC: X is trash  ; or whatever, wiped out by function exit

vs.

>> f: func [x] [^{x}: generator [yield 10], return [^{x}]]

>> b: f 10
== [^{x}]

>> reduce b
[10]

The thought here would be something like:

>> ^{x}: 10
== 10

>> x
== {'10}

>> ^{x}
== 10

Then say that these fences keepalive things that would otherwise be wiped out on function exit. Notably you could put quasiforms of generators and such in these.

Not a good idea, I was just playing devil's advocate when I said the only way to talk about modifying a variable's lifetime was with out of band bits...and needed to justify that statement.

Could UNIQUE And SURVIVE Cover Enough Common Cases?

It's hard to say.

But GENERATOR and YIELDER definitely present a challenge, because cleaning them up requires running finalization code. That finalization code might panic. If that happens at an arbitrary moment in time, you'll just get a random error popping out of nowhere.

So in a sense, a running GENERATOR and YIELDER pretty much can't be GC'd--in effect, they hold references to themselves while they're still running. That's among the many reasons why it's important to clean them up intentionally.

They're not the only such entities with this problem. User objects that hold onto resources are in the same category, they need to be shut down.

It seems to me that something needs to be tried, I just wanted to look at the landscape a bit and see if there was anything new in the picture.