MAXMATCH Parser Combinator, With/Without Rollback!

hostilefork · August 12, 2021, 6:04am

So I've gone ahead with my proposed implementation (the version where GROUP! can be prefixed with (<delay> ...)) and implemented the PHASE construct.

You can see how simple the GROUP! combinator is. It looks for the <delay> tag, and if it finds it, it uses NEXT to skip past the tag, and adds the remaining group data to the pending list. PHASE just filters anything that's a group out of the pending list and runs it. There's a PHASE automatically added to the top-level of every UPARSE operation.

Because this is the understood protocol of meaning of GROUP!s in the pending list, any combinator can stick deferred code into that list--just by adding groups to it.

So...with that tool in hand, I went back to tackle the problem that we saw above with the hooked-word-combinator demo (I'm now calling it "TRACKPARSE" as opposed to parsetree...will save that for something more closely giving your results). As a refresher, the problem was:

>> trackparse "fffyyy" [foo-rule some "x" | foo-rule some "y"]
foo-rule [
] => "fff"
foo-rule [
] => "fff"

We didn't want the FOO-RULE to be contributing to the stack log when it was a member of the alternate that failed.

The way I approach it now is to push groups of deferred code for appending strings to the stack. It actually sticks one string append before all the pendings that are returned by the WORD!'s processed rule, and then one string after all the pendings.

(By no means am I suggesting this is an ideal way to do this, but it is just a corrected version of the off-the-cuff code from before!)

It achieves the desired result!

It's probably easiest to just look at the implementation to see what's happening in the COLLECT, GATHER, and PHASE cases.

COLLECT is siphoning out the QUOTED! values from the pending material. It unquotes them.
GATHER gets the BLOCK!s, all of which are assumed to have the form [var: value]
I've already shown that PHASE (which also brackets the entirety of a UPARSE) gets the GROUP!s

(Efficiency sidenote: I might should have used something like @[...] blocks for GATHER, and use BLOCK!s for KEEP to complement the QUOTED!s. This would save splicing until the very end COLLECT when you have a better idea of how big the total series will be.)

In any case, this strategy will obviously run out of datatypes at some point. So once the common "lightweight" values are spoken for, an ecology based around something like an OBJECT! which uses a key like combinator: as a tag to know whether to pay attention to something is probably the safest bet. Perhaps something lighter weight like EVENT! which could put a label in the cell spot where a block index would usually be would be of use here.

What I'm trying to do here--though--is to make this intrinsically hackable. If you want QUOTED! to mean something else in the pending list, it's not like you can't rewrite COLLECT and KEEP. Everything is supposed to be modular and comprehensible.

I realise case studies would be of value. C-lexicals and the build process might be one. Json type response data might be another. Will keep it in mind.

I definitely want to get scenarios worked through before going down the rabbit hole of optimizing all this with native code. The %examples/ directory in the parse tests can hopefully be home to some good challenges of the model. Throw hardballs at it!

And if there's any chance you can just take a little time to skim through the parse tests and see if everything "jibes" that would be great:

https://github.com/metaeducation/ren-c/tree/master/tests/parse

Anything you want to test should work (very slowly) in the web REPL. I'm trying to be good about taking basically every experiment I type down and making sure it gets incarnated as a test instead of just tried once and forgotten about...