How to Capture Binding Of PARSE Items

hostilefork · January 29, 2024, 7:44pm

Consider some simple code that used to "work" (in of course only the simplest of cases)

>> parse [word: 10] [
       let word: set-word! let val: integer! (
           set word val
       )
   ]

We're getting some unbound values by structural extraction. But now that structural extraction doesn't propagate bindings... how do we look those values up in an environment?

We'd get the wrong answer if we said set (inside [] word) val... that would try to bind the "word" word to the LET variable from the rule. I made it conflict just to stress the point that the processing code is not the right environment to be looking up values in the data most of the time.

When PARSE is doing the processing (and recursions in our data for us), we're cut out of the loop on binding.

Solution Tactics

You can use the <input> TAG! combinator to get the input, and if there were an IN combinator you could do this yourself... handling recursions

>> parse [[word: 10]] [
       let i: <input>
       subparse in (i) block! [  ; make subparse input propagate specifier
           let sub: <input>
           let word: set-word! let val: integer! (
               set (in sub word) val 
           )
       ]
   ]

Making this a little easier might be a combinator for capturing the parse state object, for getting the input more easily at any time.

>> parse [[word: 10]] [
       let s: <state>
       subparse in (s.input) block! [  ; subparse changes s.input
           let word: set-word! let val: integer! (
               set (in s.input word) val
           )
       ]
   ]

Certainly some pain involved here. Perhaps @bradrn can appreciate the reason why propagating binding through structure automatically seemed necessary so things like this worked "like magic".

But it was bad magic. If the structural operations presume ideas about binding, that ties our hands in the interpretation of binding for the input block. We have [[word: 10]] now, but what if we wanted something like [let word [word: 10]]? It's up to the parse of this "dialect" to decide the bindings, not have it automatic. It's only the refusal of the automaticness allowing the LET in PARSE above to be implemented!

Though actually in this simple case, you could just say:

>> parse [[word: 10]] [
       subparse in <input> block! [  ; make subparse input propagate specifier
           let word: in <input> set-word! let val: integer! (
               set (in sub word) val 
           )
       ]
   ]

Even briefer, a TAG! combinator <in> that means in <input>:

parse [[word: 10]] [
   subparse <in> block! [
       let word: <in> set-word! let val: integer! (
           set word val
       )
   ]
]

Not too arduous, and you have the necessary hook points for alternative binding interpretation when you need it. And if you're just processing code structurally, you don't have to worry about it.

(Note: Trying this I remembered that TAG! combinators haven't been set up to take arguments. Should they be able to? Maybe not... none do at the moment, and it seems a reasonable policy to say they don't. If not a TAG! then what should this be? It could be the behavior of the @ operator... which is a bit incongruous with how @word etc. are handled in PARSE, but lines up sort of with wanting to capture the current sense of binding on the next argument. Something to think about, I'm calling it *in* as a placeholder just to move along)

Other Places This Pops Up

If you're writing something like a FOR-EACH loop, and you want to get the bindings of things, you can look the thing up in an environment that you have on hand:

>> block: [word: 10]
>> for-each [word val] block [
      set (in block word) val
   ]

>> word
== 10

It's manual, but it works. But what if the block were literal, and you didn't have access to it?

>> for-each [word val] [word: 10] [
      set (??? word) val
   ]

Where this may be pointing is that instead of trying to imagine weirdly designed FOR-EACH variants that incorporate binding, it may be that you should think in terms of PARSE as the tool for when you want to enumerate with binding...

bradrn · January 29, 2024, 11:59pm

Although a combinator <in> (or, from the other thread, *in*) seems like the best option here, it’s worth noting that these kinds of combinators are quite standard in parser combinator libraries, e.g. megaparsec. It lets you write a bunch of really useful things: for instance match, which yields the portion of the input which was consumed during a subparse.

hostilefork · January 30, 2024, 12:11am

Not sure what you're referring to being similar? The binding is a very distinct issue.

Just to get on the same page for terminology...

SUBPARSE in Ren-C (traditionally INTO) spans only one element...the sub-series you are parsing.

 >> parse [1 "aabb"] [integer! subparse text! [some "a" some "b"]]
 == "b"

 >> parse [1 [a a b b]] [integer! subparse block! [some 'a some 'b]]
 == b

If you want to get a span of data out of a rule's match, there is ACROSS in Ren-C (traditionally COPY):

>> parse "aaabcbcaabc" [collect some [some "a" | keep across some ["b" | "c"]]]
== ["bcbc" "bc"]

(It doesn't have a secondary multi-return of the original synthesized product of the rule you copied across, but it could.)

But this doesn't help with the binding issue at hand, because when you copy data out of input arrays that is what I'm calling "structural". So it doesn't take the specifier into account.

bradrn · January 30, 2024, 12:30am

What I quoted:

I’m saying that such combinators, which capture the parse state, are standard in parser combinator libraries.

Ah, in that case I didn’t mean ‘subparse’ in the same way. The match combinator I mentioned in megaparsec sounds like ACROSS here.

hostilefork · January 31, 2024, 2:35am

There was an instance of this in @Brett's %source-analysis.r (yes, it's still running...and presenting ponderable situations).

    for-each list [tabbed whitespace-at-eol] [
        if not empty? get list [
            emit as tag! list [(file) (get list)]
        ]
    ]

Here you can say "well, those variables are in the current context anyway" and write:

    for-each list [tabbed whitespace-at-eol] [
        if not empty? get inside [] list [
            emit as tag! list [(file) (get inside [] list)]
        ]
    ]

Not a great answer. But it's there.

However, another idea is to use @word under reduction, picking up the binding:

    for-each list reduce [@tabbed @whitespace-at-eol] [
        if not empty? get list [
            emit as tag! list [(file) (get list)]
        ]
    ]

And if you are a fan of GET-BLOCK! for REDUCE this becomes:

    for-each list :[@tabbed @whitespace-at-eol] [
        if not empty? get list [
            emit as tag! list [(file) (get list)]
        ]
    ]

It's hard to say how "natural" this will feel. Maybe someone who has always experienced that get 'tabbed is the same as get first [tabbed whitespace-at-eol] and won't work... but get @tabbed will work and requires evaluation... this might be a perfectly obvious thing to reach for.

I'm trying to avoid a generalized harden-bindings [tabbed whitespace-at-eol] as long as I can, because I don't really like the idea of that being a common way to solve problems. At some point, something like that will have to be made, but it's going to raise a lot of questions (should all things be hardened, or just things the evaluator would... so quoted values remain unaffected?)

bradrn · January 31, 2024, 4:45am

It feels pretty natural to me: conceptually, foobar is now a simple name, while @foobar is the variable that name refers to. On the other hand, I’m not sure I would have noticed it on simply perusing this code.

Agreed on that point.

I think it should work like this:

harden-bindings: func [block] [
    return collect [for-each value block [
        keep in block value
    ]]
]

(Apologies if I made any mistakes there, I’m still not great with actually programming in Rebol, but hopefully it should be clear enough.)

But I guess this just pushes back the problem to the behaviour of IN: what happens if you do in block ''word? (Or, if you prefer, in block first ['word].) Personally, I think it should make a bound, quoted word. Perhaps one could defend a situation where quoted words can never be bound, but I don’t love that.

hostilefork · October 2, 2024, 11:58pm

So we're now in a situation where the $ operator in plain code will bind what it gets in the "current environment".

This makes $ foo a synonym for inside [] foo

It is very tempting to call this arity-1 $ operator "BIND", and to say that $foo is a BIND-WORD!, and $(a b c) is a BIND-GROUP!... etc.

That would require some pushing around of terms to get an arity-2 form, e.g.

bind element  ; use "curent context" of callsite

bind:in element context  ; use passed-in context

(I know I've insisted that there not be a lot of these functions that sense the callsite's context. Yes they are dicey, there should not be many of them. But $ / BIND is an exception.)

So if we were willing to say that arity-1 BIND is "bind by default, where it makes sense" we could have:

parse [[word: 10]] [
   subparse bind block! [
       let word: bind set-word! let val: integer! (
           set word val
       )
   ]
]

Or:

parse [[word: 10]] [
   subparse $ block! [
       let word: $ set-word! let val: integer! (
           set word val
       )
   ]
]

Or even:

parse [[word: 10]] [
   subparse $block! [
       let word: $set-word! let val: integer! (
           set word val
       )
   ]
]

Though as WRAP is showing, binding environments are going to be pretty dynamic. UPARSE's slightly-informed-guesswork may not cover your case.

So there will presumably need to be some programmatic manipulation of the "current" environment to account for whatever the dialect you are parsing is doing.

I'm not 100% sure about the arity-1 BIND idea, but it's starting to seem sensible.

hostilefork · December 28, 2025, 9:50pm

I'm realizing that one way or another, I've been talking about these two choices:

don't project the binding from the container, but keep the binding that was on the item if it had one
project the binding from the container if it doesn't have a binding, else keep it as-is

I'm kind of skeptical how useful (1) really is. You're at risk of getting unbound material, but you don't always get unbound material. What sort of situations motivate this?

If you really have a situation that's "(1)-like"...then it seems most of the time...the container itself should be unbound. If that wasn't the case and the container had a binding, what exactly is your justification for ignoring it and not using (2)?

So this is steering me toward thinking that (2) would be the default, hence your hardening here could simply be:

harden-bindings: lambda [block] [
    map-each value block [value]
]

But recognizing (1) as probably-useless has a counterpoint: what if you explicitly want unbound material? How about that's where we use the quote mark?

no-bindings: lambda [block] [
    map-each 'value block [value]
]

The idea that you can trust what you get back isn't bound helps squelch stray bindings where they aren't meaningful (at the risk of ignoring a binding that was meaningful). There's aspects of ignoring binding that makes me uneasy, but at least if you're the chokepoint for this decision you stop spreading the meaninglessness past the point where you didn't intend it.

And this has a certain technical advantage... iterators only need to have one return modality. The thing that's feeding the MAP-EACH or whatever always propagates bindings where it makes sense, and then it's the SET assignment argument that tells the variable whether to rip it off or not. So you're not having to a feed a "don't propagate bindings" flag into the informational request, it's dropped uniformly on assignment.

>> var1: $x
== x  ; bound

>> var1
== x  ; bound

>> ['var2]: $x
== x  ; bound

>> var2
== x  ; unbound

(Note that obviously can't be 'var2: var1 because the quote would quote the SET-WORD, hence the SET-BLOCK is needed as a container).

Tying This Back To PARSE Rules

Above I said I'm leaning to a default where:

FOR-EACH always propagates binding from container (or leaves binding as is, if something already has it)
If that's not what you want it's your job to either:
- Unbind the container before asking for elements out of it
- Unbind the thing you get back after the bindings are composed
  - We help make this easier with quoted iterative variables requesting the unbind

So let's look at the original problem in PARSE (rewritten to use FENCE!...)

I feel like the default might be that you get the binding propagation and that "just works".

If you want to strip off the binding (let's say to avoid the OBJECT! bound keys ambiguity) you would say:

>> parse [word: 10] {
       word: unbind set-word! val: integer! (
            make object! compose [(word): val]
       )
    }

>> parse [word: 10] {  ; ...alternatively
       word: set-word! val: integer! (
            make object! compose [(unbind word): val]
       )
    }

Or since this is an assignment context, you might prefer the quoted-assign trick I suggest, for cuing SET to strip the binding:

>> parse [word: 10] {
       ['word]: set-word! val: integer! (
            make object! compose [(word): val]
       )
   }

Is This... Err... "Bind The Galaxy"?

Whatever this is... leads to a more historical-Rebol-compatible model, spreading bindings more places (just one step at a time).

But if you have discipline of using quoted blocks as containers when bindings aren't actually meaningful, that would probably mitigate a lot of the damage.

The one nit is that Sigil composition issue: no obvious way to say "reuse variable, but reuse it ^META".

for-each var [a b c] [...]  ; bind propagate, new var
for-each $var [a b c] [...]  ; bind propagate, reuse var
for-each 'var [a b c] [...]  ; unbind result, new var
for-each '$var [a b c] [...]  ; unbind result, reuse var

for-each ^var [a b c] [...]  ; bind propagate, new var, meta
for-each '^var [a b c] [...]  ; unbind result, new var, meta
???  ; bind propagate, reuse var, meta
???  ; unbind result, reuse var, meta

Not the end of the world if reuse is distribute across BLOCK!s used for iteration vars:

for-each $[^var] [a b c] [...]  ; bind propagate, reuse var, meta
for-each '$[^var] [a b c] [...]  ; unbind result, reuse var, meta

I feel a bit less bad about that than I would saying meta is distributive:

for-each ^[$var] [a b c] [...]  ; bind propagate, reuse var, meta
for-each '^[$var] [a b c] [...]  ; unbind result, reuse var, meta

Something about that puts me off. I guess because we're used to binding being controlled invisibly from "outside", but not meta semantics. So I probably would avoid implementing that.

hostilefork · December 29, 2025, 12:16am

hostilefork:

>> ['var2]: $x
== x  ; bound

>> var2
== x  ; unbound
(Note that obviously can't be 'var2: var1 because the quote would quote the SET-WORD, hence the SET-BLOCK is needed as a container).

It occurs to me that if foo.'1 is the way of saying "don't do the default binding propagation", one might say foo.' as a way of saying "pick foo itself--no selector--and unbind it"

Could work for assignments:

>> var1: $x
== x  ; bound

>> var1
== x  ; bound

>> var2.': $x
== x  ; bound

>> var2
== x  ; unbound

But meh. Don't love it, especially on the assignment. The SET-BLOCK looks better.

hostilefork · March 3, 2026, 7:20am

hostilefork:

for-each var [a b c] [...]  ; bind propagate, new var
for-each $var [a b c] [...]  ; bind propagate, reuse var
for-each 'var [a b c] [...]  ; unbind result, new var
for-each '$var [a b c] [...]  ; unbind result, reuse var

This idea of getting binding propagation by default--where you have to use symbols to disable it--makes it easy to do certain things.

But doing "the more complex thing" without decoration does make it a bit harder to tell what's going on.

Consider this example from %source-analysis.r, that had to do a REDUCE in order to get bound variables to look up for the get list operation:

for-each 'list reduce [$tabbed $whitespace-at-eol] [
    if not empty? get list [...]
]

Now the apostrophe on 'list would throw the bindings away. Also, you can just say:

for-each list [tabbed whitespace-at-eol] [
    if not empty? get list [...]
]

The lack of an apostrophe on the variable means you get the binding propagated from the container.

Part of me prefers the version before the change--with the REDUCE, the $ signs on the things you want bound helping cue "these are variables", and the apostrophe on the loop variable to say "this is being passed to FOR-EACH as a name, not evaluatively" (being unrelated to binding)
Another part of me appreciates having mechanics that actually work to power the "Rebolness" of the much more Rebol2-looking version.