@rgchris's Iterator Framework (in Oldes Rebol3)

This is a fairly underbaked show & tell. I'd hoped to have developed it more, but I'm almost out of time to work on Rebol projects before other things take precedence once again :pensive_face:. Also, I've had so much success working with some of the principles used here that I wanted to get some of it out now while I have a chance. So here is: Iterators, Part I


Not included in this article is my grids iterator. It's a little involved in the setup, but that's ok:

import r3:rgchris:iterate

size: 612x792

grid: iterate/new iterators/grids [
    /page size /margins 68x67 /gutter 14x14
    /columns 7 /rows 8
]

This dovetails nicely with my SVG module in populating the grid:

import r3:rgchris:svg

svg/encode svg/create size [
    loop 52 [
        iterate/next grid
        circle #[fill: grey] grid/middle .5 * minimum grid/width grid/height
    ]
]

The resultant SVG is a letter-sized page with fifty-two well-aligned circles. I used this principle in an unofficial MLB schedule (along with the SVG module):


Also not included is the general iterative nature of most of the 'codecs' I've added. The idea of a Rebol-mode Deflate decoder might seem tortuously (:turtle:) slow. However, it has a couple of features not available to the built-in decoder: it doesn't need to copy its input and it doesn't need to get to the end; the first is feature is useful for Deflate-encoded content within Zip files or PDFs (or indeed Xara files). The decoder terminates when the Deflate stream terminates, thus aiding decoders/unpackers of formats that don't tell you how long the Deflate stream is (see again, PDF, Xara).

import r3:rgchris:deflate

decoder: flate/decoders/new #{4bcf57484cc9492d5248cd5548ccc90100}
decoder/window: 3
to string! flate/decoders/next decoder

=> "go "

The HTML decoder works along similar lines. Caveat, HTML can't be normalized without creating a full DOM-tree first—any iterative approach tied to parsing will be limited compared to one using the DOM. (neither iterator has been developed as yet to the Iterate standard)

import r3:rgchris:html

decoder: html/decoders/new "<b>Foo"

neaten/pairs collect-while [
    html/decoders/next decoder
][
    keep decoder/event
    keep decoder/value
]

=> [
    open "b"
    text "Foo"
]
import r3:rgchris:html

decoder: dom/walk load-html "<b>Foo"

neaten/pairs collect-while [
    decoder/next
][
    keep decoder/event
    keep any [
        decoder/node/name
        decoder/node/value
    ]
]

=> [
    open _
    open "html"
    empty "head"
    open "body"
    open "b"
    text "Foo"
    close "b"
    close "body"
    close "html"
    close _
]

The iterator/folders model in the linked article is another of my favourites.


It's good to get these scripts out even if I still have to document them and that the documentation will likely prompt alterations, and ARGH, time! I think there's enough here to warrant some lower-level integration (with PORTs, which is probably/possibly where they should be).

4 Likes

Definitely useful and implements a missing functionality. (People have reinvented the directory walk so many times, each version having its own bugs.)

One thing that jumps out at me is that your iterator namespace is not separate from the iterated-upon namespace. You have iterator functions and fields (like /NEXT) and then a per-iterator set of custom fields applicable to whatever the case calls for.

This might work when you're making a custom iterator class for every iteration, but to mesh well with more mundane generic collection iterators I think that would be too verbose (the generic iterator would have to say iter/value/field and couldn't just do iter(??)field for some short symbolic definition of ??). Also it might invite bugs when you can't easily null out the entirety of a state object in one go, but have to be sure you update some subset of the total fields without missing any.

C++ separates the spaces, and uses a syntax trick to make it not so terrible:

  • Anything you want to ask for on the iterator would be done with a dot access, like a normal member

  • Anything you want to do with the current element you do through a dereference step * or an arrow which folds the dereference and dot access together ->

Like this:

 iterator.act_on_iterator()  // use dot to call method on iterator itself

 Item item = *iterator;  // use dereference to get at current item

 String s1 = (*iterator).full  // one way to extract property of current item
 String s2 = iterator->full  // alternative syntax convenience for arrow

 iterator->delete_file();  // arrow can also call methods on current object

 for (Item i : iterator) { ... }  // modern C++ has range-based for loops

(I've made a separate thread for any topic specific to discussing influences of C++ iterators.)

Could We Generically "Dereference" Iterators?

Interestingly, Boris has written some of his thoughts on iterators... and suggests using GET to access the current item:

red-hof/code-analysis/iterators.md at master · greggirwin/red-hof · GitHub

Thinking of it this way, the analogue in Ren-C would be:

 decoder/some-method-on-iterator
 decoder.some-data-member-on-iterator

 (get decoder).some-data-member-on-item
 (get decoder)/some-method-on-item

Could @ Be "Dereference"?

We could repurpose @word to mean get (or more specificlaly, get the @word, e.g. "iterator-get")

The current @xxx usages most useful for the system is the lone @ (especially for API purposes) and the @BLOCK! for asking for inert behavior, e.g.

>> join text! [1 + 1]
== "2"

>> join text! @[1 + 1]
== "1+1"

>> block: [1 + 1]

>> decorate block '@
== @[1 + 1]

>> pin block
== @[1 + 1]

>> join pin block
== "1+1"

But beyond that, the inertness of @a and @a.b hasn't been all its cracked up to be, because anyone wanting to take advantage of that is dialecting, and probably wants $a and ^a to be looked at literally as well--so you're often creating a literal context (either putting them in a block, or having a function take its arguments literally).

(I considered the concept of @foo being "dereference foo" with @foo.bar being interpreted as (@foo).bar, but this abuse is not viable, for reasons that are beyond the scope of this post.)

Grafting That In

Here's what it might look like in Ren-C

import <r3:rgchris:html>

decoder: dom/walk load-html "<b>Foo"

neaten:pairs collect-while [
    decoder/next
][
    keep decoder.event  ; property of the iterator?
    keep any [
        (@decoder).name  ; property of the thing being iterated
        (@decoder).value
    ]
]

How's that look? :+1:

(Hopefully you're getting a sense of how nice it is to see when something is a refinement vs. a field vs. a function call...it's really hard for me now to suss out what historical Redbol code is doing, the dot-vs-colon-vs-slash really helps.)

The idea here would be that the iteration idea is standardized such that if you weren't interested in the iterator-specific properties (let's say EVENT doesn't matter), you could just speak in terms of the standard:

decoder: dom/walk load-html "<b>Foo"

collect [
     for-each 'node decoder [
         keep node.name
         keep node.value
     ]
]

So this is like the C++ concept, that the things that make it an iterator are about speaking GET and /NEXT. Your specific iterator here has a notion of events, but I don't think this fits all iteration scenarios...e.g. you may already have a defined object you're iterating.

1 Like

Actually, if I correctly understand what NEATEN does here, you can use KEEP:LINE in Ren-C

collect-while [
    decoder/next
][
    keep decoder.event  ; property of the iterator?
    keep:line any [
        @decoder.name  ; property of the thing being iterated
        @decoder.value
    ]
]

KEEP is a specialization of APPEND, and APPEND supports :LINE... in lists (new-line marker) and in strings or binaries (line feed character).

>> collect [keep 'foo, keep:line 'bar, keep 'baz, keep:line 'mumble]
== [
    foo bar
    baz mumble
]

>> s: ""

>> append s "foo" append:line s "bar" append s "baz" append:line s "mumble"
== "foobar^/bazmumble^/"
1 Like

Appreciate the assessment :blush:

There's a distinction between the examples in the linked article and the examples above here. In the doc, there is a distinction between the operations in the ITERATE namespace and that of the iterated object. My principle aim is looking for those patterns to see where those separations (and hopefully optimizations) can be made. The examples above (other than GRID) were amongst my earlier efforts.

I'm mindful that PORT! has something like iterators in its current design. The idea of using standard verbs certainly is appealing (until [not event: take my-decoder]). However, something about the mechanism of creating ports (either with a URL or [scheme: 'thing ...] block doesn't sit right, as well as the awkward extraction of field values (query my-decoder 'tag-name).

My sense is that iterators need a PORT!-like construct that allows for progression through standard verbs and access through field access with some optimization mechanisms if applicable. Perhaps that is PORT! and it needs to move in that direction, but I'm not so sure about that—PORT! is already stretched in ways that make it unsuitable for some roles that are within its remit.

I don't doubt there's bugs in this one, it's the interface that's the point. There's no mandated loops, and each iteration spoon-feeds a variety of views of the current position. Prepping a folder for archive, nixing specific filenames, accessing full paths—all accounted for in this implementation.

Noted. I don't know if it's always possible to discern line placements in-place, but the option to do so is nice.

1 Like

4 posts were split to a new topic: Using @ For Iterators