Portable Bridge Notation (PBN) Parser

I decided I would do a small task in the web Ren-C build, which is to take the "Portable Bridge Notation" for representing a deal of a hand of cards, and turn it into blocks of symbolic data representing each player's hand.

The notation is pretty straightforward, e.g.

N:QJ6.K652.J85.T98 873.J97.AT764.Q4 K5.T83.KQ9.A7652 AT942.AQ4.32.KJ3

Separated by spaces are the cards for each of 4 hands. The suits are separated by dots, and the order is clubs.diamonds.hearts.spades. T is used for 10, while J/Q/K/A are the typical Jack/Queen/King/Ace. The first letter is a direction (N=North, E=East, S=South, W=West) of which player the first hand represented is for.

This case decodes like so:

(console escaping is now done by Rebol code, so this shows some colorization and live hyperlink as an example of what we might do with that).

I wanted the PBN conversion to be accessible and demonstrate "best practices". Here's what I came up with:

whitespace: [some [space | newline]]

suit-order: [♣ ♦ ♥ ♠]
direction-order: [N E S W]
rank-rule: ['A | 'K | 'Q | 'J | 'T | '9 | '8 | '7 | '6 | '5 | '4 | '3 | '2]

pbn-to-hands: func [
    "Convert portable bridge notation to BLOCK!-structured hands"

    return: [object!]
    pbn [text!]
][
    let suit: null

    let one-hand: null
    let one-hand-rule: [
        (suit: '♣)
        [
            collect one-hand [4 [
                13 [
                    set rank rank-rule
                    keep :[
                        as word! unspaced [suit either rank = #"T" [10] [rank]]
                    ]
                ]
                :(
                    suit: select suit-order suit
                    if suit ["."] else '[ahead any [whitespace]]
                )
            ]]
        ]
        |
        (panic "Invalid Hand information in PBN")
    ]

    let hands: make object! [N: E: S: W: null]

    let start: null
    let direction: null

    parse pbn [
        any space  ; We allow leading whitespace, good idea?

        [
            set start ['N | 'E | 'S | 'W] (
                start: to word! start
                direction: start
            )
            |
            (panic "PBN must start with N, E, S, or W")
        ]

        [":" | (panic "PBN second character must be `:`")]

        [
            [4 [
                one-hand-rule (  ; Should set `one-hand` if rule succeeds.
                    hands/(direction): one-hand
                    one-hand: null
                    direction: (select direction-order direction) else [
                        first direction-order
                    ]
                )
                any whitespace  ; Should more than one space between hands be ok?
            ]]
            |
            (panic "PBN must have 4 hand definitions")
        ]
        end
    ]

    assert [direction = start]  ; skipping around should have cycled
    return hands
]
2 Likes

Conceptual Problems Encountered While Writing This

This is pretty straightforward as tasks go. And certainly the PARSE-based code is more pleasing to look at than many versions. But I'll list a few gripes.

  • The C code I looked at to model from used a variable called first for the starting "direction". I used the same name and obliviously overwrote FIRST the Rebol operation. This made later code not work. It's certainly a cautionary tale for the casual overwriting of standard library functions...which makes you wonder if there's anything we could do to warn you about doing this accidentally, and then have an override notation. Something in the function header? Something on the LET itself? What would cause these warnings and why?

  • Cards usually are written the other way, as 4♠ and not :spade_suit:4 (something about rendering in Discourse seems to think the latter notation is worthy of a much bigger spade...no idea why, might be worth finding out). But that would not be legal as a Rebol WORD!. The impedance match of such things is not a unique problem of Rebol, and there are more options like <4♠> while most languages would just have one string type. But it was a bit disappointing nonetheless to have to make the concession.

  • Returning OBJECT! still makes me uncomfortable, as I feel there's a real inelegance to the single OBJECT! coming back as MAKE OBJECT! [...], and it goes out of the domain of concrete parts fast. I'm worried about adding methods and having that look ugly. I've written about wanting to maybe find some grand unifying theory where OBJECT! is just some constrained optimized BLOCK!. But that feels distant. In short, I'm just a bit torn over the return format here... I want to feel like we made it better than the PBN, not worse.

Technical Issues Encountered While Writing This

I deliberately chose to use the UTF-8 characters for card suits in the code. Partially to make it look "cool", but also to exercise the code paths.

One of the big problem areas with doing so was the Windows Console. If I tried to paste any code containing the card suits, they'd be invisible. This is because the console layer has no idea what a PASTE is, so what Windows does is it simulates key-down and key-up events for every character as if you were typing them. They must have gotten something wrong, because the card suits were not getting key downs... only key ups. Others have faced this problem and worked around it., so I incorporated their workaround.

Next is that I got it in my head that I wanted to use the lighter notation of 'N to match a letter in the input rather than "N" or #"N". It's 3 fewer apostrophe-style marks than a string, and it seems there's no harm in allowing you to match WORD!s against strings. After all, Rebol2/Red/R3-Alpha allow you to FIND that way:

>> find "abchellodef" 'hello
== "hellodef"

So I went ahead and added that ability, for both WORD!s and INTEGER!s in strings. This kind of opens a can of worms--as we might ask why looking for an INTEGER! in a string wouldn't be searching for the codepoint. But you can do that with find some-string to char! some-int. BINARY! is another story, and with doors open for searching binaries for strings it's the case that searching for integers finds the byte value and not the string-ized representation of that integer as ASCII. It's something to think about.

I decided to use COLLECT, but when I did I realized the thing I was collecting was not KEEPing material from the input, but a synthesized card symbol. The KEEP we had has the default interpretation of assuming you mean a pattern:

>> parse "aaa" [collect data [some [keep "a"]]]
== ""

>> data
== ["a" "a" "a"]

But what if each time I see an "a", I want to keep a "b"? That's not coming from the input.

@rgchris and I have been discussing the three things you might want to do with DO-style code embedded into a parse:

  • vaporize the result (currently ()) - this is usually the traditional behavior of GROUP! in PARSE. But when used as a parameter to a rule, this could vary. e.g. change [some parse rule] (some code) will use the code to generate what to change to.

  • treat the result as a rule (currently :()) - this has become a favorite of mine, as it frees us from the oddity of PARSE's IF and trying to map control structures into PARSE. You don't have to pre-COMPOSE a PARSE rule, but every time the code is visited it effectively re-runs a composition.

  • fabricate material unavailable in input (currently :[]) - when you look at things like CHANGE or the particular need to KEEP something that isn't in the input series, you have to be able to run code to make that new data.

So the answer with this setup of "match a but keep b" would look like:

>> parse "aaa" [collect data [some ["a" keep :["b"]]]]
== ""

>> data
== ["b" "b" "b"]

Whatever our beliefs about notation, these are desires you can have. In this case I made :[] work as described so I could keep my synthesized data.

I wanted to make the KEEP rule look like:

keep :[join suit (either rank = 'T [10] [rank)]

Because I had the idea that if you matched a WORD! in your text input, that the SET would come back as the WORD! for the character... not the character itself. This is a concept that needs to be thought out more, because I'm not sure we completely understand the rationale discerning SET and COPY and what their behavior should be.


As usual, Rebol-ish code can be nice to look at once it's written...but the path to getting there can be pretty hard. As usual: a debugger would really help.

2 Likes

Tag! doesn't seem all that bad an option, nor does reversing as :spade_suit:4 either (epecially if as a word you set it to "4♠").

What about Map! ?—seems that is positioned as the more structured Block! within data structures where Object! is more for carrying class-like logic.

Note that I used SELECT here code, to step through items in order:

let suit-order: [♣ ♦ ♥ ♠]
; ...
suit: select suit-order suit

This advanced through the suits until the end was reached.

I also used it for dealing cards in directions. This case could start at any direction, and needed to cycle:

let direction-order: [N E S W]
;
direction: (select direction-order direction) else [first direction-order]

I was a bit torn on how to do this. I could have gone more like:

suit-state: head [♣ ♦ ♥ ♠]
suit: does [suit-state.1]
;
suit-state: next suit-state

This idea felt obfuscating; hiding the current suit in a series position just seemed wrong.

I even toyed with the idea of doing it as state transitions, if we are to think in terms of SELECT/SKIP 2 or SELECT seeking out SET-WORD! :

next-suit: [
     ♣: ♦
     ♦: ♥
     ♥: ♠
     ♠: _
]
; ...
suit: select next-suit suit

But that's just not what I was looking for. I felt in the end the suit order had to be just what it was, as suit-order: [♣ ♦ ♥ ♠]

...I don't know what it gets at, other than that what I was ultimately looking for here didn't fit either of our ideas. I really wanted to advance through things in a series in a context where FOR-EACH wasn't what I was using at that moment (because I was inside a PARSE rule that I didn't want to exit).

Food for thought.

Common issue. I'm sure we've all cycled through each of those different options trying to figure out the best fit.

It's nice to be able to do this:

direction: (select direction-order direction) else [first direction-order]

Obfuscating in this way is the Rebol way. If you're willing to pay a little for something a little less opaque, you could just say: append suite-order suit: take suite-order

Also, I've been using linked lists a fair bit of late (I have thoughts on potentially optimizing the linked list concept for tree building, but I digress):

♣: [next ♦ back ♠]
♦: [next ♥ back ♣]
♥: [next ♠ back ♦]
♠: [next ♣ back ♥]

suit: ♣
suit: (suit).next
; had problems with this on subsequent calls, needed (get suit).next
1 Like

This makes me wonder if there should be a native for iterating.

 >> suit-order: head [♣ ♦ ♥ ♠]

 >> suit: iterates suit-order

 >> suit
 == ♣

 >> suit
 == ♣

 >> suit/next
 == ♦

 >> suit
 == ♦

Etc. In stackless this can be written in usermode as a YIELDER, with something approximating:

 iterates: func [data /cycle] [
      return yielder [/next /back] [
          forever [  ; e.g. lib/cycle, FOREVER for clarity w/cycle refinement
               if next [
                   if tail? data: my lib/next [
                       if cycle [data: head data]
                   ]
               ]
               if back [
                   if head? data: my lib/back [
                       if cycle [data: back tail data]
                   ]
               ]
               yield data/1
          ]
      ]
 ]

It could be made as a relatively efficient native. It might even use binding tricks to bind a single prototype action it to the data it iterates, so an ACTION! need not be created on each call.

You'd be in a situation where :SUIT was an ACTION! not a WORD!, which might seem uncomfortable. Though this doesn't seem that much different than iterators in any other language--they're always another type from the data they wrap.

My thought is an optimized 5-way (parent, first[child], last, back, next) NODE! type. I've been using blocks for this and has several disadvantages, including being difficult to PROBE and inefficient to traverse or convert to BLOCK!. Maintaining relationships would still be on the user, thus could be used for a simple linked list, double linked list or full-blown tree structure.

A post was split to a new topic: Non-Expanding BLOCK! Idiom?

5 Years Have Gone By... How Much Better Can We Do?

Foundationally better... :rock:

whitespace: [some [space | newline]]

suit-order: [♣ ♦ ♥ ♠]
direction-order: [N E S W]
rank-order: [2 3 4 5 6 7 8 9 T J Q K A]

one-hand-rule: [
    let suit: ('♣)
    collect [repeat 4 [
        repeat 13 [
            let rank: any @rank-order  ; RANK-ORDER is literals, not rules
            keep (join word! [suit, rank = 'T then [10] else [rank]])
        ]
        inline (
            suit: select suit-order suit
            if suit ["."] else $[ahead [whitespace | <end>]]
        )
    ]]
    |
    reject "Invalid Hand information in PBN"
]

pbn-to-hands: func [
    "Convert portable bridge notation to BLOCK!-structured hands"

    return: [object!]
    pbn [text!]
    <local> start direction
][
    let hands: parse pbn [gather [
        opt whitespace  ; We allow leading whitespace, good idea?

        [
            start: any @direction-order  ; DIRECTION-ORDER is literals, not rules
            | reject "PBN must start with N, E, S, or W"
        ]
        direction: (start)

        [":" | reject "PBN second character must be `:`"]

        [repeat 4 [
            emit (direction): one-hand-rule (  ; e.g. [emit N: ...]
                direction: (select direction-order direction) else [
                    first direction-order
                ]
            )
            opt whitespace  ; Is more than one space between hands ok?
        ]
        |
        reject "PBN must have 4 hand definitions"
    ]]

    assert [direction = start]  ; skipping around should have cycled
    return hands
]

And it works!

>> pbn-to-hands --[
       N:QJ6.K652.J85.T98 873.J97.AT764.Q4 K5.T83.KQ9.A7652 AT942.AQ4.32.KJ3
   ]--
== #[object! [
        N: '[♣Q ♣J ♣6 ♦K ♦6 ♦5 ♦2 ♥J ♥8 ♥5 ♠10 ♠9 ♠8]
        E: '[♣8 ♣7 ♣3 ♦J ♦9 ♦7 ♥A ♥10 ♥7 ♥6 ♥4 ♠Q ♠4]
        S: '[♣K ♣5 ♦10 ♦8 ♦3 ♥K ♥Q ♥9 ♠A ♠7 ♠6 ♠5 ♠2]
        W: '[♣A ♣10 ♣9 ♣4 ♣2 ♦A ♦Q ♦4 ♥3 ♥2 ♠K ♠J ♠3]
   ]

You Can Use LET In PARSE

It might seem like a minor feature, but the underpinnings that allow a combinator to expand the environment required lots of advances.

GROUP! Rule Handling Became More Homogeneous

UPARSE has an overarching policy that rules synthesize values, and a BLOCK! will synthesize the last value synthesized by the rules within it.

I became more comfortable with GROUP! synthesizing values always. It's not like the values are discarded sometimes, and heeded other times.

This means that the style preferred by @rgchris, where you can just say keep (...) without needing to decorate to say something like keep :[...], has won out. My paranoia about the decoration being needed to say "this isn't a value-discarding case of the group" no longer applies (to the extent it was ever worth worrying about).

As for retriggering rules, I'm happy right now with that being done via inline (...). It all fits together. So instead of:

:(
    suit: select suit-order suit
    if suit ["."] else '[ahead [whitespace | end]]
)

You say:

inline (
    suit: select suit-order suit
    if suit ["."] else [[ahead [whitespace | <end>]]]
)

Note that I chose to instead make the branch $, to get a bound version of a "soft quoted" branch:

inline (
    suit: select suit-order suit
    if suit ["."] else $[ahead [whitespace | <end>]]
)

You can't just use a quote there, because the unquoted block you got back from the branch would be unbound. It's a matter of taste I guess if you like this better than putting the block in a block... but it's faster, because the evaluator does not get invoked on the block. And I think it looks clearer, once you have a basic understanding of what's going on with binding.

The [repeat 4 ...] Instead Of [4 ...] Is Much Better

No qualms about that change.

If you're in a context where you want 4 to do a REPEAT 4 implicitly, that's possible. UPARSE is easy to customize with custom combinators, and we should make it even easier to do off the cuff (as easy as RebindableSyntax in the evaluator).

Uses GATHER Instead Of Assigning Hands

The original code did:

let hands: make object! [N: E: S: W: null]
parse [... hand: collect [...] (hand/(direction): hand ...) ...]

The new code does:

hands: parse [gather [... emit (direction): collect ...]]

It's much more powerful to synthesize the object from the rule--instead of pre-creating the object and passing it to the rule to have its fields assigned. Obviously it makes the rule easier to reuse. But also it means that the rule can make decisions about the shape and structure of the object.

There's only one little detail here, which is that in today's world (and maybe tomorrow's) the order of fields in an object matters in some places. By not pre-creating the object, the field order of the generated object will be different based on what order the fields are emitted. Here that might be exactly what you want--the order of fields in the object will reflect the order of hands in the original notation. But it's something to consider.

JOIN WORD! For The Win

Instead of:

keep (as word! unspaced [suit either rank = #"T" [10] [rank]])

This has:

keep (join word! [suit, rank = 'T then [10] else [rank]])

I like the THEN/ELSE rhythm better than I like EITHER, but that's a matter of taste.

Also, matching a literal WORD! or INTEGER! rule in a string gives you the original rule (unquoted) as a product now, not a character:

>> parse "a" ['a]
== a

>> parse "10" ['10]
== 10

So this gets a bit cleaner.

ANY And Literal Matching With @PINNED!

The new meaning of the ANY rule is more like the ANY in the regular evaluator. It implicitly considers there to be a | between the elements of the block you pass it, so it's picking out of a set of alternates.

By default those alternates are rules, not literal values to match against. So if you're trying to match literal things, you have to put them in quotes:

>> parse [a a b a b c c] [some any ['a 'b 'c]]
== c

But we can use a "PINNED" block (@[...]) as a signal that we want a literal match:

>> parse [a a b a b c c] [some any @[a b c]]
== c

If your block of options are in a variable, and you don't have your block of options decorated, you can use an operator to add the @ ... that operator is called PIN :pushpin:

>> suit-order: [♣ ♦ ♥ ♠]

>> parse [♣ ♦ ♦ ♥ ♠ ♣] [some any (pin suit-order)]
== ♣

Or you could put the @ on the data array itself:

>> suit-order: @[♣ ♦ ♥ ♠]

>> parse [♣ ♦ ♦ ♥ ♠ ♣] [some any (suit-order)]
== ♣

But I offered another option, which is to use a pinned word at the reference site, and it will then assume you mean you wanted to treat what it looks up to as literal data:

>> suit-order: [♣ ♦ ♥ ♠]

>> parse [♣ ♦ ♦ ♥ ♠ ♣] [some any @suit-order]
== ♣

(If you just said any suit-order then it would only accept plain blocks, and treat it as parse rules with no literal recognition... the dialect subtleties here are complex, but I think that's for the best.)

REJECT Instead Of PANIC

reject "message" saves you on parentheses vs. (panic "message") ...

But the difference is much greater than that. REJECT, by being cooperative with PARSE, returns the error from PARSE itself, so you can react to it if you want. But also--it can encode information about what the state of the parse was at that moment into the error.

This idea that PARSE will by default give you a good diagnostic--that the console can break down for you and report visually--will hopefully make a big difference. This way the default of leaving error handling off your PARSE will be translated into a PANIC at that point, and you'll get much better information.

1 Like