What happens to function args when the call ends?

R3-Alpha's CLOSURE provided two things. One was a unique identity for the words of a function's arguments and locals for each recursion. This is what I've called "specific binding" and now comes "for free" in all functions...so you don't even have to think about it. (It's not exactly free, but we can hope it will converge to "very low cost".)

So in Ren-C:

>> foo: function [x code] [
    append code $(print x)  ; 2025-update: use bound group!
    if x > 0 [
        probe code
        eval code
        foo (x - 1) code
    ]
]

>> foo 2 []
[(print x)]
2
[(print x) (print x)]
2 ;-- R3-Alpha FUNCTION! got 1, only CLOSURE! got 2
1

Users can now take that for granted. :thumbsup:

But what I want to talk about is the other emergent feature of R3-Alpha CLOSURE!. This was that if an ANY-WORD! that was bound to the arguments or locals "escaped" the lifetime of the call, that word would continue to have its value after the function ended...for as long as references to it existed.

>> f: closure [x] [return [x]]

>> b: f 10
== [x]

>> reduce b
[10]

Functions did not do this:

>> f: function [x] [return [x]]

>> b: f 10
== [x]

>> reduce b
** Script error: x word is not bound to a context

It goes without saying that the closure mechanic is going to cost more, just by the very fact that they need to hold onto the memory for what the word looks up to. But the way things work today, it doesn't just need to hold onto that cell of data...it holds onto all the args and locals of the function. (R3-Alpha was more inefficient still...it not only kept the whole frame of values alive, it made a deep copy of the function body on every invocation of that function...so that the body could be updated to refer to that "frame". Specific binding lets Ren-C dodge that bullet.)

Now and again, the "keep-things-simple" voice says that the system would be simpler and faster if all executing frames (and their frame variables) died after a function ended. If you wanted to snapshot the state of a FRAME! for debugging purposes--to look at after the function ends--you could COPY it into a heap-based object, and return that. If you really were in one of the circumstances where you wanted an arg or local's word to survive, you could manually make an object to hold just those words, and bind to that.

But @Ladislav had a compelling case:

foo: function [x] [
    y: 10
    return function [z] [x + y + z]
]

If x and y were to go bad after foo exited, the returned function would be useless.

Some new mechanics related to Move_Value() are creating possibilities for "automatic closure-i-fication", where stack cells are converted into a heap object at the moment it's noticed that a bound word is "escaping". If none escape, then everything stays on the stack.

But though you might think these kinds of escapes are rare, remember some bindings aren't even intentional. When you return a block out of a function it might just have stray bindings on words that happen to overlap with something in the binding visibility. (Which makes one wonder, when returning a BLOCK! as data, should you always UNBIND/DEEP it before returning...to scrub off any inadvertent pointers into your local state it carries? Should there be a RETURN/BOUND to avoid the scrub?) These invisible bindings would trigger the auto-closurification, on what might seem like random cases to the user.

And remember--each time a word bound to a frame escapes--we're still talking about copying all the values in the frame. (It might be possible to break this down to a smaller granularity, e.g. a PAIR!-wise binding, where what closure-i-fication does is pack each key/value into a REBSER node.)

Were the user to get involved, and specify the cases, I might suggest something a bit like this (if <HAS> were taken to mean "a kind of per-instance static", while <STATIC> were used for all instances):

foo: function [x <has> x2 y] [
    x2: x
    y: 10
    return function [z] [x2 + y + z]
] 

The advantages to this are that it would mean that any words that "escape" would be explicitly handled by the user, reducing the burden on the system. The entire frame would not need to be preserved, only the part of the frame which had these persistent values. The disadvantage is that it's not automatic, and other languages--even JavaScript--do it automatically.

So how do people feel on this matter? What's acceptable or unacceptable? @MarkI said at one point that he was opposed to locals and args outliving the function call because it created "garbage". Is it wise to hide the consequences from the user, and burden the system with the logic of making it automatic?

2 posts were split to a new topic: What is a FRAME! ?

As per Github issue 605 my preference is for automatic closure-i-fication

Then if you want to avoid any unintentional escapes perhaps add a new function spec tag like <safe> (or similar.. for eg. <pure> , <cleaned>) which then automatically UNBIND/DEEP any returning BLOCK!

PS. I'm a pragmatist and so if there are too many costs involved with automatic closure-i-fication then I'm happy to leave FUNCTION has is and use <has>, <durable> or even keep CLOS/CLOSURE wrapper.

PPS. However the x2 in the <has> workaround ruffles my feathers a bit :slight_smile: I'd prefer to go with something like:

foo: function [<durable> x <has> y] [
    y: 10
    function [z] [x + y + z]
]

In fact I'd go further and do this:

foo: function [x [<durable>] <has> y (10)] [
    function [z] [x + y + z]
]

if we can tag individual args? Could be a handy feature going forward for other things!

There's a mechanical reason. Briefly:

Let's imagine you are calling a function and it has 10 arguments, refinements, and refinement arguments. To fulfill each of the arguments involves an evaluation. During any evaluation a garbage collection may occur.

So let's say your function has fulfilled argument 1, and gets to argument 2. And let's say there is a lot of computation to do to supply argument 2--enough so that a GC is triggered.

This GC needs to know about argument 1 and not free any resources held onto by argument 1. And if argument 2 has been partially or temporarily evaluated, that needs to be taken care of as well (hence the argument slot is initialized with GC-readable "nothingness" before the evaluation starts, and during evaluation must stay GC-readably-legit in some way). But the GC must know not to look at 3-10 because those are still raw uninitialized bits.

One possibility would be to do a pre-walk and format cells 3-10 to not be random noise. R3-Alpha did this. But the frame knows how far along in argument processing it is, so if when the GC runs it can look at the frame stack and know where to stop. That's cheaper than needing to do two separate walks of the arguments on every function call.

So we happily avoid pre-walking the cells, and the evaluator itself just initializes cells as it goes along fulfilling arguments. Unfortunately, the formatting process is different for stack cells and indefinite lifetime cells, which live in arrays. If argument fulfillment has to be sensitive to whether that argument will live indefinitely, then you have to pre-walk it in the stack case...to initialize with bits that can be sniffed by the evaluator.

By splitting it out so that ordinary arguments and locals are known to always have stack lifetime, then the formatting process doesn't have to worry about the cell's previous formatting bits. It can just write stack initialization into it.

So if we can promise we aren't ever going to do argument evaluation into cells with indefinite lifetime (e.g. no slots that are also args) then it's more efficient. That said, users can be made unaware of this at a higher level... the "real argument" could be named out of the way somehow, and then the durable non-argument could take the argument's name and proxy its value once the function started running.

But such things can get messy. (what about an adaptation, how would it know about the funny named variable that actually has the argument it's interested in?) So for starters, I'd rather have the underlying mechanic and "rules of the game" visible so people understand what's happening.

1 Like

NB. Here's my idea from chat. However after reading your explanation more closely I may end of treading over same ground :frowning:

If <has> is the way to go then we could possible solve the feather ruffling by a bit of extra FUNCTION generating (ie. pre-processor/macro).

So for the following simple example:

foo: function [x [<durable>]] [
    function [y] [x + y]
]

FUNCTION could pre-expand this into (something like) this:

foo: function [`x has x] [
    x: `x
    function [y] [x + y]
]

The use of backtick is just for an example. In Lisp you have GENSYM for creating symbols (ie. words) which don't stomp on anything else.

My spec is also just an example. If we could work out what all the closed-over words are then you could go for FUNCTION [x] [...]. Alternatively it could be CLOSURE [x] [...]

Obviously the bound words would still be exposed but it is controlled and easily identifiable (by nomenclature convention).

Anyway more food for thought!

Yup, and along the lines of things I've considered.

But the central questions remain ones we can sort of discuss abstractly. Like do you really want locals to be surviving by default, on accident?

I guess if we can agree that survive-by-default is a bad thing, and that CLOSURE is too broad a brush to include in the box, then all we're down to is the question above about when you want to mark an argument as durable.

From an implementation point of view, there's some stuff I really need to get integrated on a branch that's been hanging around too long and I'm tired of rebasing it. It contains the first inklings of virtual binding, but there simply are mechanical problems with using it with persistent parameters/refinements/locals. These problems may not be forever, but they are there for now.

So how bad would be if, for the moment, args and locals and refinements did not outlive the call. Then, <has> was changed to be different from <static> to mean per-instance values. It would be an optimized way of reusing a function's frame node (pointing at stack data) to also act as the node for the portion that outlives the invocation, and duck further problems for now? This means USE can still be written as eval func compose [ (args)] body or similar (not that this is the greatest idea in the first place).

It's a first step that doesn't throw out much infrastructure for changing our minds later. If virtual binding gets further, we'll know a lot more about how everything could work...which will likely affect these discussions.

For the moment, I am going to kill CLOSURE. Also, locals to a FUNCTION which are leaked past that function's lifetime will give an error when accessed.

To facilitate this, I've turned it around so that USE is now its own native (as opposed to being built on CLOSURE) whose rebound body will have bindings that outlive the USE (since it creates an OBJECT!).

That means it's easy enough to write:

 foo: function [a b <local> c d] [
     use [e f] [
         ;-- e and f will be alive after function call ends
     ]
 ]

@draegtun Note: If this is the intended semantics for USE, then <USE> might be even better than <HAS> in the function spec. :-/ I don't know. It's a question of how often USE intended indefinite lifetime...

This is not the pinnacle of efficiency, since USE doesn't have the same power the system does to avoid copying and rebinding (yet). BUT it's far more taxing on the system to be stuck assuming that you always want leaked args and local words to have indefinite binding lifetimes. And as we've emphasized above, I've become wary of the idea that survival-by-default is even a desirable semantic, when Rebol's model leaks bindings unintentionally all over the place.

More importantly, this unblocks the development of new and interesting ideas in the core, which might even be able to make the deep binding that USE does "closer to free".

Looking back at Ladislav's "good" example of closure necessity...

foo: function [x] [
    y: 10
    return function [z] [x + y + z]
]

His point being that the returned function is useless if x and y are expired references once foo is off the stack. It is a compelling case, but...

...it suggests that if anything, the "closuring" is a property of the usage. Why would you be annotating foo to say it's a "special kind of function whose variables outlive its call", as opposed to annotating the returned function has "special kinds of references"?:

foo: function [x] [
    y: 10
    return function [z <use> x y] [x + y + z]
]

To my mind this makes a lot more sense. If you delete the motivating usage you don't have to update anything about the enclosing function.

Whether <use>-ing is automatic or not is another question. But that aside, I do think that the existence of a CLOSURE function or a CLOSURE! datatype is not the answer.

Tinkering with JavaScript a bit, the above pattern is omnipresent.

I'm definitely leaning there now myself. It goes without saying that if JavaScript can do something extremely useful that we can't do, that is bad.

When I brought this up exactly a year ago, I mentioned a possibility that was emerging:

The infrastructure to do this is there, and it implements a very coarse version of this. The poor man's version is to consider an "escape" to have happened any time a bound item that is resident in a BLOCK! in an action's body winds up being moved outside of that body.

The optimization that hasn't been done is to detect when a word bound to a frame is moved into a cell belonging to a frame that will outlive it. For instance:

below: func [<local> x] [
   x: 10
   above 'x
]

above: func [w [word!] <local> x-reference] [
    x-reference: w
    print get x-reference
]

There's no technical reason why putting the word for x into the x-reference local variable should force closure-ification, nor the argument passing into w. Because both w and x-reference are cells in the frame for above, which is above the below frame. Since below will outlive above, it can just use the direct pointer to the frame.

Without that, it's extremely coarse. Half--or more than half--of usermode actions will have to be closure-ified due to something that happens in their body. (This isn't surprising, because just calling an IF statement and passing a BLOCK! from the body would trigger it...since the optimization hasn't been implemented. The IF's condition argument is at a higher stack level, but being treated as if it were indefinite lifetime, forcing auto-closure-ification)

Good news is that there are a lot of ways to get that number down, which can now be explored. And moreover it's good news that the basic mechanism is working (e.g. the mechanism that's even letting escapes start to be counted at all). Because this mechanism is integral to virtual binding, which is still on the agenda and being enabled by advancements a bit at a time (pun intended).

But overall news is that I'm leaning toward feeling that automatic closure-ification is likely non-negotiable. Can't let JavaScript be more ergonomic about something like this.

As the question of what makes a language "timeless" has become central, we can't let JavaScript have the upper hand here. There are too many uses for this.

With the impending unification of FUNC and FUNCTION as synonyms, I think we should fold in indefinite lifetime as the default. Frames will also be smaller, because locals will be managed using a different technique.

There's a lot of optimization possible--and the codebase is under control to try it.

1 Like

This feels like justification in and of itself. I feel like most of the future users are going to be coming from a JavaScript-heavy background. We have to avoid making choices that make the language seem inferior or incapable of handling common and useful idioms.

The deed is done:

There's already some optimization and detection of cases that "leak" words. This detection is good enough that natives don't persist their frames (unless you get references to it via the debugger). e.g. the optimization is not "it's a native" but "no references to words bound into the frame escaped".

We'll just have to commit to doing better with performance (and it actually doesn't seem terrible at the moment). But the usermode experience needs to be the "timeless" one.

What is the mechanism to retain the old behaviour? Can I "unbind" locals and have deeper function calls fail to find them?

The one that you design.

All function invocations use the same "paramlist" (which is also the identity of the function) as the specification of the ordering of the keys of the frame. Each invocation of a function starts with an unmanaged "varlist" of equal or greater length (which may be recycled from other calls) to hold the values for that frame. This varlist starts out unmanaged, but may become managed if a reference to the frame "leaks".

Here you can see when the function call finishes, the behavior of what it does with frames that were never managed vs. if they were discovered to be managed. Previously, discovery that a frame had become managed would collapse the series node and free its data allocation. It had to keep the stub around because otherwise pointer dereferences would crash. Most of the change is to change this to leave the managed frame as-is and allow its references to continue resolving:

https://github.com/metaeducation/ren-c/pull/1015/files#diff-94ddbdf54cabd760b45d9ca65e2739b2R703

(Technically speaking, in the "unavailable" node strategy once the GC sees that a reference points to a stub, it could re-point the reference to a canon "unavailable" stub...and free memory for the stub it was pointing to. This would lose some amount of added information, e.g. knowing the paramlist of the specific function that was called. It was never implemented just for that reason--of making debugging harder.)

Anyway...the mechanisms are still there, but I'm convinced of what the default should be (and now quite convinced in the FUNC/FUNCTION synonym as well).

What happens to function args when the call ends?

For many years, it's been accepted that function arguments outlive the function.

I think that's right. The bias should clearly be to let them live as long as references exist... and you have to ask for them to be thrown away, e.g. FUNCTION:LITE.

Perhaps those who are biased to efficiency might define FUNC as the LITE version. Or maybe even that could be the default for FUNC? So long as you got a good error accessing dead variables telling you "hey this frame was destroyed because you used FUNC instead of FUNCTION" that might be a good tradeoff.

Needing Values To Outlive Is Actually Kind of Rare

I broke the feature accidentally, and didn't notice for a bit until I looked into why QUIT had stopped working. It turns out that QUIT depended on it, because it wraps up a function you pass in that represents your local notion of QUIT (quit*). That's a parameter to MAKE-QUIT, used by the produced quit function.

But the interesting thing is that the system STILL mostly works without it. So I do think we'd likely benefit by finding some way to prune the cases.

FUNC being a lesser-featured version of FUNCTION on this axis feels like it might be the right psychological hack. It makes the language feel more solid, compared with needing CLOSURE to pick up the slack from a weak FUNCTION.

"Noticing When Things Outlive Stack Levels" Is Gone

These mechanics were wiped out a long time ago. They were incompatible with generators and other ideas, and very tricky.

And things have gotten even harder to rein in. Due to how modern binding works, if you write something like:

                            v-- the FRAME! inherits this block's binding
foo: function [msg [text!]] [
   print ["Message is:" msg]
   return [1 2 3]
]

When that [1 2 3] block was "evaluated", it captured a binding... of the FRAME! and whatever that frame inherited from the body. So you'd be returning a BLOCK! with a binding that rolls in all of that.

If you want to avoid this, you have to write:

foo: function [msg [text!]] [
   print ["Message is:" msg]
   return '[1 2 3]  ; <-- tick mark needed to dodge binding capture
]

It's seems a bit sad to have to say "if you don't ugly up your code with tick marks, it won't be as good... so you should quote your data blocks when they're just data".

BUT on the other hand, I've definitely seen places where that's a helpful cue that your BLOCK! is in an evaluative context. So it does have some benefits, and I'm warming up to it.