The PARSE of PROGRESS

hostilefork · September 28, 2020, 5:02pm

There has been a lot of fiddling over time with PARSE's return value.

It was long believed that a failed PARSE should return NULL. This would make it play nicely with ELSE and THEN. The question was thus what to return on success:

Just returning ~okay~ makes the output of PARSE easier to read in tutorials. This isn't overwhelmingly important.

Returning the input value would make it easy to use PARSE as a validator for data.

if parse data [integer! integer!] [  ; exactly two integers
   call-routine data
] else [fail]

call-routine (parse data [integer! integer!] else [fail])  ; nicer

call-routine non null parse data [integer! integer!]  ; even nicer :-)

Returning how far a successful parse got was strictly more informative, as the information on a partial process is difficult to reconstruct otherwise.

For at least some time, @rgchris favored #3, because many sophisticated tasks are helped by knowing how far PARSE got. But that required a change to the semantics of PARSE to not automatically fail on partial inputs, so the rules had to explicitly ask to hit an <end>

But the need to tack on <end> made some things seem less concise and elegant. And surveying how other languages do "destructuring" made me feel that PARSE requiring completion was the best answer in the Redbol world. When you're matching a structure against [tag! tag!] it feels somewhat wrong for [<x> <y> <z>] to "match" when it seems "over the limit".

UPARSE Offers The Best Of All Worlds

Everything changed with UPARSE.

First of all, if a PARSE doesn't match it raises a definitional error. This provides a welcome safety net.

>> parse "abc" ["ab"]
** Error: PARSE partially matched the input, but didn't reach the tail

You can use TRY PARSE if you like, and get NULL..though possibly conflating with a NULL synthesized by the last matching rule (e.g. OPT synthesizes null when the optional thing was not there). You can use EXCEPT to specifically handle exceptions in a postfix manner. Or using META/EXCEPT will give you a plain ERROR! on definitonal error, and a META'd value otherwise.

All rules synthesize a result (though a GHOST result is legal, e.g. you can ELIDE a rule), and you can end the parse at any time with ACCEPT:

>> parse "abc" ["ab", accept <input>]
== "abc"

>> parse "abc" ["ab", accept <here>]
== "c"

You can even pack up multi-return values and give them back. The possibilties are pretty much endless, and so the policy of returning the synthesized result has won out.

hostilefork · March 7, 2025, 12:32am

I've mentioned that this is pretty easy to write. But it doesn't mean there shouldn't be a name for it...

It seems to me a reasonably good name for this is PARSE-THRU..

>> parse-thru "aaabbb" [some "a"]
== "bbb"

It can be implemented any number of ways, but an easy one is to ADAPT the rules slightly before running the PARSE. Since RULES is a BLOCK!, you can just compose it in, and follow it with an ACCEPT of wherever the current position is.

/parse-thru: adapt parse/ [
    rules: compose [(rules) accept <here>]
]

This will default to erroring if it doesn't match, so you'd have to use try parse-thru if you wanted a null when there was a deliberate mismatch:

>> parse-thru "bbbaaa" [some "a"]
** Error: PARSE BLOCK! combinator did not match input

>> try parse-thru "bbbaaa" [some "a"]
== ~null~

If you want to work around this, there's lots of ways to do it. You could make an alternative to return null:

/parse-thru: adapt parse/ [
    rules: compose:deep [[(rules) accept <here>] | accept null]
]

Or rig it up so that the rule is optional, and use PARSE:RELAX to remove the requirement that it reach the end:

/parse-thru: adapt parse:relax/ [
    rules: compose*:deep [opt [(rules) accept <here>]]
]

Lots of ways to get the effect:

>> parse-thru "bbbaaa" [some "a"]
== ~null~  ; anti

Another Interesting Interface: PARSE-MATCH

Being able to get the input, or a NULL, can be useful as well. Similar technique will get it, just swap the <input> combinator for the <here> combinator, and don't remove the requirement to reach the end:

/parse-match: adapt parse/ [
    rules: compose [(rules) <end> <input> | accept null]
]

>> parse-match "aaabbb" [some "a" some "b"]
== "aaabbb"

>> parse-match "bbbaaa" [some "a" some "b"]
== ~null~  ; anti

>> parse-match "aaabbb" [some "a"]
== ~null~  ; anti

Endless Possibilities... But How To Compose Them?

In the Visual Parse Demo I showed how a tweaked PARSE variant, that I called eparse, could be rigged up to make underlines in the web-based text editor for anything you marked with a MARK combinator (with rollback, such that marks would not be made if the whole rule did not ultimately match...)

So do you have to write EPARSE-THRU and EPARSE-MATCH?

If instead of having these modes be done with wrappers--that they were instead refinements on PARSE itself--you'd get EPARSE:THRU and EPARSE:MATCH "for free". Perhaps they could be more efficient in their implementation as well.

But then you start having situations where people can do nonsensical combinatorics, like eparse:thru:match.

...or (Weird Idea) Could PARSE Have Some Other Hookability?

It might be that if you ask to PARSE an OBJECT!, that the object could act as some kind of specification... like providing the combinators and where to look for the data.

e.g. parse editor [some "a"] could look at the editor object, and have behaviors particular to that object. This would mean that parse-match editor [some "a"] could work as well.

Separate Entry Points vs. Refinements Is The Safer Bet

In the scheme of things, having PARSE-MATCH and a PARSE-THRU entry points is easiest, because you'll be able to do that regardless.

But like I say, the default being the synthesized result of the rules... with error by default if a match or ACCEPT is not reached... that's a super powerful default that I'm really happy with.

hostilefork · May 11, 2025, 11:50am

I put together an interesting dialected example of PARSE going one step at a time:

Evaluator Hooking ("RebindableSyntax")

This is an inversion of control, that steps away from "PARSE and be done". It's kind of like how @rgchris rebelled against the "ZIP and be done" model of the ZIP dialect.

I think the GENERATOR / YIELDER model works well here. Like I point out, you can process your data one rule at a time:

make-parser: lambda [data] [
    make-yielder [rule [block!]] [
        parse data [opt some yield/ rule]
    ]
]

And there you go.

>> parser: make-parser [a b c "d" 1020 "e" 304]

>> parser [across some word!]
== [a b c]

>> parser [one]
== "d"

>> parser [collect some [keep integer! | text!]]
== [1020 304]

>> parser [one]
** Error: enumeration done

>> try parser [one]
== ~null~

TRY suppresses arbitrary mismatch errors, so probably not what you want to use for end-detection. Note you can use done? parser [] to test for if a parse is done, since it's a yielder... the only way an empty block rule wouldn't match would be if the parse ended (might need to rig that up specially, but it's doable)

Maybe this PARSER should tolerate non-BLOCK!s too...

>> parser: make-parser [a 1020]

>> parser word!
== a

>> done? parser []
== ~null~  ; anti

>> parser [one]
== 1020

>> done? parser []
== ~okay~  ; anti

This Looks Very Sweet...

I think this crystallizes that we need the creation operations to be named like MAKE-PARSER and MAKE-YIELDER, because you need to be able to name the products (PARSER, YIELDER, GENERATOR...)

hostilefork · May 11, 2025, 12:59pm

So it occurs to me that the name is kind of wrong, here.

In parser combinator systems, a parser is typically a function that takes input and returns a result (and possibly the remaining unconsumed input), while a combinator is a higher-order function that assembles or modifies parsers.

What's being produced here is a generator-like function, pre-bound to an input, that yields values extracted by applying rules (parser expressions) passed in per call. Once the input is consumed or the stream is otherwise closed, it reports completion.

That flips the traditional model: instead of fixing the rules and varying the input, it fixes the input and varies the rules.

Asking AIs they don't seem to be aware of this pattern having a name. Might have to make one up. "Parse Pump"? I don't know if pump [...rule...] looks great. Maybe call the variable "PROCESS", or "GRAB"?

>> grab: make-parse-pump [a 1020]

>> grab word!
== a

>> done? grab []
== ~null~  ; anti

>> grab [one]
== 1020

>> done? grab []
== ~okay~  ; anti

Naming is hard.