INTERSECT/UNION/etc. Give Another Win For Isotopes!

hostilefork · March 24, 2025, 1:23pm

The set (collection) operations in Rebol are useful.

rebol2>> set1: [a b c]
rebol2>> set2: [b c d]

rebol2>> intersect set1 set2
== [b c]

When I first encountered it, I thought it strange that it wasn't mutating the first argument.

rebol2>> set1
== [a b c]  ; not modified by INTERSECT

Everywhere else in Rebol it seemed when you had a verb like that (APPEND, REVERSE, etc.) it was modifying. What was special about INTERSECT that it didn't?

Anyway, that was just the first thing I noticed. But digging around in the code there were questions... such as, why should it only take block lists?

rebol2>> intersect [a b c] quote (b c d)
** Script Error: Expected one of: block! - not: paren!

red>> intersect [a b c] quote (b c d)
*** Script Error: intersect does not allow paren! for its set2 argument

I raised the question to @BlackATTR who suggested that maybe single elements should just go in the list, as whole items:

>> intersect [a b c] '(b c d)
== []

>> union [a b c] '(b c d)
== [a b c (b c d)]

And then I noticed... splices could draw the distinction!

>> union [a b c] [b c d]
== [a b c [b c d]]

>> union [a b c] spread [b c d]
== [a b c d]

>> union [a b c] spread '(b c d)
== [a b c d]

>> union '(a b c) spread [b c d]
== (a b c d)

This gives you the power to easily do set operations with single elements, and splices with SPREAD dispel the type information so there's no question what the return type should be: the type of the first set!

And I think to be consistent with the rest of the language, the operations should modify the first set argument by default. But if you use the OF operations you get a copy.

Then, the OF operations might have different parts of speech:

 intersection of set1 set2
 => intersect (copy set1) set2

 union of set1 set2
 => unite (copy set1) set2

This would open up things like union and intersection to be nouns.

 union: union of set1 set2

This seems to me to be much better and a lot more consistent!

bradrn · March 25, 2025, 6:05am

To me this is highly confusing and contradicts the meaning of the term union, which can only mean ‘take these two lists/sets and merge them together’. The operation shown here is simply appending an item to the end of a list, which Python calls add (for sets) / append (for lists) and which JavaScript calls push.

So then, what to do with cases like intersect [a b c] '(b c d)? I think the only consistent behaviour is to throw an error: any other choice is leads to an inconsistency either with the case of a list, or with the case of an atomic value.

hostilefork · March 25, 2025, 10:13am

The difference is not adding it if it's already there.

>> union of [a b [b c d] c] [b c d]
== [a b [b c d] c]

>> intersect [a b c] '(b c d)
== []

>> intersect [a (b c d) c] '(b c d)
== [(b c d)]

(As usual, binding questions arise... what if the bindings are different... )

Though I feel like there needs to be canonization of the order of the sets somehow, otherwise you can't do comparisons, like:

 >> data: [a b 2 d]

 >> i1: intersection of data spread [2 d e]
 == [2 d]

 >> i2: intersection of data spread [e 2 d]
 == [d 2]
 
 >> i1 = i2
 == ~false~  ; anti

So you need something that would turn both [2 d] and [d 2] into the same order.

I don't think this is "sorting". I feel like sorting should require the items be comparable. It's canonization, and I don't think there's any way to do this that would guarantee it would give the same order in different versions of the interpreter (or outside of the same run, e.g. things might get different pointers and it has to canonize on the pointer value).

I haven't "sorted out" my thoughts on this yet.