When Should Functions Validate Their Type Specs?

hostilefork · March 31, 2025, 5:05am

You're able to use things in the body of a function that aren't defined yet:

>> foo: func [x] [if cool-number? x [print "It's cool"]]

No error there, until you run the function.

But what if you use a datatype that's not defined yet? Here's Rebol2:

rebol2>> foo: func [x [cool-number!]] [print "Typechecked as cool!"]]
** Script Error: Invalid argument: cool-number!

rebol2>> cool-number!: integer!

rebol2>> foo: func [x [cool-number!]] [print "Typechecked as cool!"]
; no error

Red seems to just do a literal symbolic check, you can't define types under other words:

red>> cool-number: integer!

red>> foo: func [x [cool-number!]] [print "Typechecked as cool!"]
*** Script Error: cool-number! has no value

This Is A Problem For Extension Types

Let's say you have a collection of routines in a module, and one of them is able to operate on IMAGE!. But let's say you don't load the IMAGE! extension, and you don't use that routine.

The aggressive requirement that all types in type specs be resolvable at FUNC declaration time would prevent that module from loading.

It creates ordering problems... where you suddenly have to worry about the order you're loading extensions in, even when they aren't dependent on each other in order to load... but just because they can operate on each other's types. And it makes it impossible if they refer to each other's types.

Is The Answer To Wait Until The Function Is Called?

The historical problem with waiting was that the information in the spec was compacted and thrown away. But modern PARAMETER! works differently, and could handle it.

There are issues with type constraints changing out from under you, with regards to specialization and such. See this post, which surveys what happens when you do things like INTEGER!: TAG! in the middle of a run:

Survey of Redefining Datatype WORD!s

The answer there may be that have to lock any variables that are used as type constraints so you can't change them.

I'm not concerned about performance issues... it can be made to perform as well as it should.

The real question is just what to do about situations like:

>> foo: func [x [integer! askjdfljaslkdjfkakjsdhf]] [return x]

>> foo 10
== 10

Is that sane? You might say "no, that's obviously insane", but consider you could have just as easily written:

>> foo: func [x] [if integer? x [return x] askjdfljaslkdjfkakjsdhf]

>> foo 10
== 10

But taken to its logical conclusion, tolerating undefined things in type specs and just skipping them would also permit:

>> foo: func [x [askjdfljaslkdjfkakjsdhf integer!]] [return x]

>> foo 10
== 10

Okay, That Is Crazy

It seems there needs to be some concept of forward-declaration. Something that type specs tolerate as a non-match unless the thing loads.

It may be (and I'm still working on the finer details of this), that all antiform fences canonize to the same antiform fence for that word, which identifies a datatype. Meaning you could do a forward-declaration by just saying:

image!: ~{image!}~

Then when the actual IMAGE! library gets loaded (perhaps through a DLL and LOAD-EXTENSION) it agrees on that.

But even if that were able to work for the narrow case of a DATATYPE!, it doesn't help so much with functions. Let's say the datatypes were RGB-IMAGE! and RGBA-IMAGE! to distinguish having an alpha channel (for instance). Then there was ANY-IMAGE?. What if you want to use ANY-IMAGE? as a constraint?

Maybe this calls for a generalized "forward declaration" datatype? Something that is essentially unset, but tolerated by type specs?

Hey, Maybe That Already Exists... And It's Just Tolerance Of A Specific TRIPWIRE?

rgb-image!: ~<forward>~  ; hm, forward can mean a lot of things.

any-image?: ~<pending>~  ; "pending" sounds better.  "I know it's coming"

rgba-image!: ~<unavailable>~  ; A "softer" unset state?

Well, it's one thought. If tripwires were immutable (they should be, but aren't today) then checking for a specific word in the tripwire can be made arbitrarily fast.

Setting things to this state might be done with something like:

extern [rgb-image! any-image? rgba-image!]

Something Like This Has To Be The Right Direction

I don't think the arbitrary-garbage-in-typespecs is an idea with a future.

One problem with tripwires, though, is that you get a difference between:

 something?: ~<pending>~
 foo: func [x [integer! something?]] [...]

and

something?: ~<pending>~
foo: func [x] [any [not integer? x not something? x] [fail "typecheck"] ...]

While the typecheck was willing to overlook something? being not defined, the explicit call is not willing to do that.

But you do have control there, to test to make sure the SOMETHING? isn't a tripwire before calling it.

foo: func [x] [
    any [
        not integer? x
        (set? $something?) and (not something? x)
    ][
        fail "typecheck"
    ]
    ...
]

bradrn · April 5, 2025, 7:58am

hostilefork:

But taken to its logical conclusion, tolerating undefined things in type specs and just skipping them would also permit:
>> foo: func [x [askjdfljaslkdjfkakjsdhf integer!]] [return x]

>> foo 10
== 10
Okay, That Is Crazy

Honestly… I don’t find it crazy in the least! In fact, this is precisely the behaviour I expect from a dynamically-typed language with runtime type validation on functions.

Building on this, here’s an idea — though I don’t know whether it’s plausible or not:

There is a single, dedicated, globally-accessible environment in which all type names are bound
When the function is run, the type spec is evaluated within the context of this special environment
At definition time the type spec is not evaluated at all: all identifiers within it are simply preserved for later evaluation, whether defined in the current environment or not

This would enable using any type name in any function, while still letting Ren-C know where to look when running the function. I also like the idea of separating the As long as the relevant module is imported before the function is run, it all works out.

hostilefork · April 5, 2025, 8:55am

This part I don't think is that crazy, and it may just wind up being what has to be done to make things work out mechanically. Don't feel sure yet.

Working with the FFI right now for instance: I definitely think validating the specs for interfacing with the C functions need to be done at FFI-binding-creation time vs. waiting for the call. And I can't totally articulate why that's different.

But regardless: I do feel I know that when the evaluation happens, that just ignoring everything that isn't defined is a bad direction. So something has to happen before the call to resolve all the things in the spec in some way, so that you know they're not gibberish.

Because type constraints are so numerous (e.g. EVEN? or ANY-LIST?) then it's hard to say that spec blocks look in a particular place, vs. just using your current binding environment.

Curiously though--for unrelated reasons--there is a module now for the datatypes themselves (builtin plus extension). It's used as a sparse map from type names (the module keys) to DATATYPE! instances...which has a couple of purposes:

It lets questions like Type_Of(cell) in the internals very quickly return a direct pointer to an existing DATATYPE! cell without having to worry about storage or lifetime of the returned result.
The module's "Patch" for the variable provides a canon identity for extension datatypes, so that all Cells that are of an extension type hold that pointer.
- (Again, extension type value cells must sacrifice one of their 4 platform pointers to indicate the type, as it doesn't fit in the header's type byte. And the header's byte is 0 to say that it is such a cell. So the extension cell has only 2 pointers worth left.)

The builtin datatypes used to have entries in the lib module, but I realized that putting them into this datatypes module could be just as optimal... so both the lib and datatypes modules are hybrids which have a certain number of members whose addresses are known at compile-time, and then others that are dynamic.

bradrn · April 5, 2025, 9:25am

I suspect the difference is that C is statically typed, so each function does have a specific type which can be checked in advance.

Oh, I’m not talking about ignoring them! Undefined type names should certainly result in an error at call-time.

I don’t see how this follows… why can’t type constraints be numerous, and at the same time have a particular environment where they’re resolved?

hostilefork · April 5, 2025, 9:58am

Well I mean there's nothing particular that makes EVEN? a type constraint... other than taking one argument and returning LOGIC!.

I don't know that it's necessary or desirable to say "everything you put in type specs has to be looked up in this place"... the place seems to be whatever your working context is.

bradrn · April 5, 2025, 11:48am

OK, fair enough! But in this case I’d say it ought to work like any other function call, i.e. by disallowing forward declarations. And then the answer to ‘what if a function uses an undefined type’ becomes ‘restructure your code’, which seems reasonable to me.