Looks Like FILE! Immutability Is A Good Idea

Since changing URL! to be immutable, we've realized several benefits. Notably that you can't produce things that don't validate as URL! that are still claiming to be of type URL!:

red>> url: https://red-lang.org
== http://red-lang.org

red>> reverse url
== gro.gnal-der//:sptth

sptth! indeed, I say. :face_vomiting:

ren-c>> url: http://hostilefork.com
== http://hostilefork.com

ren-c>> reverse url
** Script Error: reverse expects [~void~ any-series? any-sequence? pair!] 

But besides that we get another advantage: we don't have to be paranoid about URL! changing out from under us.

So for instance, when you LOAD some code from a URL! then we poke the address of that URL's data into all the blocks that get loaded. This way you can ask for the FILE OF and get that URL! back.

(As it gives a URL! sometimes and not a FILE!, that makes me wonder if we should call that SOURCE OF, and find another way to ask for source code... like IMPLEMENTATION OF).

In any case, since we know the URL! can't change out from under us, we don't have to worry about storing the pointer that was passed to TRANSCODE by LOAD directly. If we did have to worry, we'd need to make a copy of it.

But with mutable FILE! we do have to worry, and copy it. Otherwise:

>> file: %my-awesome-script.r

>> code: transcode:file (read file) file
== [reverse [my script is awesome]]

>> file of code.1
** Error: Only ANY-LIST? encode the file they were loaded from

>> file of code.2
== %my-awesome-script.r

>> replace file "awesome" "dumb"
== %my-dumb-script.r

>> file of code.2
== %my-dumb-script.r  ; ...but we loaded it from %my-awesome-script.r !

Immutable FILE! means we could kill empty FILE!

Empty FILE! doesn't make sense.

  • Did you want something that would cause an error when you appended it somewhere? Use NULL.

  • Did you want something that would be a no-op when you appended it somewhere? Use VOID.

All the AI say roughly this:

There are no known filesystems that allow the empty string ("") as a valid filename. Most modern filesystems, including ext4, NTFS, FAT variants, and others, explicitly disallow the empty string as a filename. This restriction is consistent with POSIX standards, which define filenames as non-empty character strings

Historically, some older UNIX systems might have treated an empty string as an alias for the current directory, but this behavior was likely unintended and is no longer supported in contemporary systems

If we could get rid of empty FILE!, that gives us one more leg up in terms of the datatype providing actual value vs. just being a weak alias for string.

Interpolation Is Stronger Than Mutable Manipulation

Now that we have interpolation, I think it can replace a lot of cases where you might have thought you needed to manipulate a file directly.

Also, the -OF functions will let you do things to immutable types:

>> replace of %my-dumb-script.r "dumb" "awesome"
== %my-awesome-script.r

I think this also points out a missing ability for PARSE, namely PARSE OF for immutable types. e.g.:

 parse-of tuple rule
 =>
 as tuple! parse to block! tuple rule

So for instance:

 >> parse of 'a.a.a.b.c.d [remove some 'a ~accept~]
 == b.c.d

Best of both worlds!

Retaking % As A WORD!

With no empty files, we can say that % is clearly a WORD!

Maybe some other things too. %% could be a WORD!, or a file with the name "%". There's lots of edge cases on that, e.g. to make a file with the name " you'd have to say %-{"}-. So I don't think we should be afraid to make %% a WORD! if that provides another interesting symbol.

This Seems Like A Good Direction

I've had the thought before, but recent improvements to OF makes it more palatable.

What motivated me to think about this right now was that I was resurrecting the LIBRARY! codebase, where you can load a DLL:

>> make library! %some-thing.dll
== #[library!]

I was thinking "Hm, it seems like it would be nice if it stored the filename".

>> make library! %some-thing.dll
== #[library! %some-thing.dll]

And I was going to poke the FILE! value into a Cell in the LIBRARY!'s Stub. But then I thought "oh no, what if they change it."

They shouldn't be able to. Immutable FILE! is just better all around for the system.

3 Likes

I think % as indicator of file can be avoided completely.
You have URI's already.

So you can do: file:///home/user/file.text
Or file://./my_local_file.txt

As a matter of fact with URIs you can do many more things.

Environment variables? Env://my_var.

With many usability and consistency benefits.

Marcel P. Weiher has written a couple of papers about this idea:

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=polymorphic+identifiers+weiher&btnG=#d=gs_qabs&t=1757513658703&u=%23p%3DTQmJ4XN70-kJ

env://my_var might well make sense in non-dialected code, even though $my_var is available as a lexical part, because in evaluation it means "bind in current context":

>> my_var: 10
== 10

>> 'my_var  ; quote suppresses evaluation, one quote level dropped
== my_var

>> get 'my_var
** PANIC: (my_var not bound)  ; quoting doesn't bind

>> $my_var
== my_var  ; bound

>> get $my_var
== 10

So since $my_var is already taken there for a meaning, env://my_var could be a good way of addressing it.

But if you're in a shell-dialect context where binding isn't relevant, you may want to apply that lexical part of $my_var to mean "get environment variable":

call [echo $my_var]

This is kind of the general tradeoff in the language--where the box of parts is open to your alternative designs. You tailor the dialect to make what you want to say more succinct...

...and while URL! is generalized, it's not succinct.

In line with what I'm saying above: Rebol tries to give people a set of lexical parts that are useful for dialecting purposes, so every datatype is kind of a "suggestion"... designed for "abuse". FILE! is one of those things, where you can give arbitrary meanings to it. [%foo <baz> #bar] can mean whatever you want it to mean.

But FILE!'s design choice to allow embedded slashes and not require delimiters (e.g. that %foo/baz.bar is a string type instead of a PATH! with a TUPLE! in it) makes it often useful for representing filenames.

So we don't want to "avoid it completely". It's a part of the lexical toolbox.

Also, I've been kind of shoring up the COMPOSE behavior to make it less accident prone:

>> bad-dir: %foo

>> compose %(bad-dir)/file.txt
** Error: FILE! spliced into FILE! can't end in slash
          unless splice slot followed by slash

>> good-dir: %foo/

>> compose %(good-dir)/file.txt
== %foo/file.txt

These little prescriptive details are the kinds of things that I think can make software more sane. And the generality of URL! wouldn't fit with this.

Anyway, there are reasons why we like the FILE! type. But I have been wondering if maybe the core of the system should speak in terms of URL! (e.g. the higher levels of READ always turn %file into file:/// for the lower levels). Not sure, it's just a thought...

1 Like

that is what Weiher's Objective-Smalltalk does.
There URLs, what he calls polymorphic identifiers.
Are used as the universal addressing mechanism.

here is a video discussion (only the first 30 mins are talk, the rest is discussion)

Another thing to consider here is what to do when the variable is not literal.

>> varname: "my_var"

>> read env://???

Schemes might do some kind of auto-composition, so the ENV scheme could interpret env://(varname), but since not all schemes do that it would be up to the scheme to do it. At least with modern binding in Ren-C it's possible.

>> varname: "my_var"
== "my_var"

>> compose env://(varname)
== env://my_var

(not possible in Rebol2, R3-Alpha, Red, or the other variants...)

Although I should say it's possible with caveats. Currently string types don't carry their own bindings, so a function like COMPOSE is using the binding of the callsite. This means that if you passed just env://(varname) as an argument to a function it couldn't be evaluated by the callee.

Anyway, we could say that URL! has evaluator behavior...but then you'd have to evaluate it, vs. just pick it out of a location:

>> url: second [<something> env://(varname)]
== env://(varname)

>> get url
** PANIC: URL! doesn't have binding

We could get into binding URL!s which would start looking like:

>> url: pick [<something> env://(varname)] '$2  ; alias as SECOND$ ?
== env://(varname)  ; bound

>> get url
== "MY-VARIABLE-CONTENTS"

But there's a lot of speculation in that. Up until now I've said strings should not carry bindings, but... :man_shrugging:

1 Like

+1. In fact, once URL!/FILE! are treated as immutable identifiers, safe composition across schemes falls out naturally. That’s the same direction you’re pointing at with env://(varname).

Weiher’s work frames this as Polymorphic Identifiers (PIs)—first-class, URI-like references—plus Storage Combinators, a small storage protocol (read/write/update/delete) implemented by stores (not “datatypes”). With PIs as the uniform substrate, scheme-handlers compose cleanly.


Composite Scheme-Handlers (via Storage Combinators)

These don’t provide their own storage; they mediate one or more base schemes/stores.

Relative Scheme.

Makes access relative to a base URI.
Example: rfc: as a Relative scheme with base http://datatracker.ietf.org/doc, so rfc:rfc2396 resolves to the IETF page.

Filter/Mapping Scheme.

Transforms bytes to domain objects (or vice-versa) on the way in/out (think transparent compression, encryption, or MIME-based deserialization).

Sequential Scheme.

Tries multiple bases in order until a lookup succeeds.
Example: var: that searches ivar:// → globals:// → env://.

Caching Scheme

Reads from a fast tier and falls back to a slow tier; on hit/miss it writes through to maintain the cache.
Example: combine mem:// + file:// in front of http:// for read-through caching.


So yes: treating identifiers as first-class, immutable refs makes the kind of auto-composition you describe straightforward. Polymorphic Identifiers give the uniform addressing; Storage Combinators give the uniform operations. That aligns nicely with the idea of having the core “speak in terms of URL!” while letting higher layers write in %file, sigils or literal syntaxes.

1 Like

I might have to ask @rgchris as the local URL! advocate to interpret all of this. :slightly_smiling_face:

If you are advocating for interpolation inside URL! as a feature we are going to be dealing with strings capturing environments, and URL! having evaluative behavior...

Do we actually want this?

>> env://(varname)
== "MY-VAR-CONTENTS"

>> 'env://(varname)
== env://(varname)  ; unbound

>> $env://(varname)
== env://(varname)  ; bound

Or perhaps bind on plain eval:

>> env://(varname)
== env://(varname)  ; bound

>> get env://(varname)
== "MY-VAR-CONTENTS"

>> get second [<whatever> env://(varname)]
** PANIC: env://(varname) not bound

(This does also raise questions of GET vs. READ vs LOAD... READ has been used for http:// historically. I don't like READ returning things besides binary data.)

Yes, we can do any of that. If this is the direction people want. I don't know if that implies ALL strings do binding capture under evaluation, but... if they did, that would change the COMPOSE mechanics of strings to match those of lists.

Is this a direction we want to go? I don't know. Maybe it is the logical conclusion of the methodology. Or maybe it's crazy. You guys tell me.

1 Like

There's a lot of things going on here that I'm not sure I fully grasp. Or at least getting jumbled in a way that's difficult to follow.


With regard to (im)mutability of URL values, as a current member of the string family, you have access to stringy methods of manipulation and all of the familiarity of function that offers:

combine [
    https:// hostname path
]

replace http://some.old.url http:// https://

Of course, some aspects of that seemingly make no sense, like REVERSE. But then if I have the text of Moby-Dick in a string, I'm not going to use REVERSE on that either. Or SORT, or whatever unless there's some specific reason to do so.

Not to say this NEW-URL might not be a more appropriate choice, I don't know. It really depends on what it is URL values are ultimately used for.

scheme: "http"
read compose (scheme)://some/path

Also, it would be useful to draw a distinction between FILE/URL stringy composition methods and that of pathy composition. I can't say foo: "foo" foo/bar = "foo/bar" and thus is not stringy.

Also, URLs as important as literal values as the potential values they represent. I feel that's worth stressing.


The conversion of URL to scheme/port is a separate concern. I don't believe composition should occur at the scheme level, that is to say a URL should be resolved first:

read http://(host)/index.html

How would a scheme implementation resolve HOST? It would seem to exist in a very separate (small-d) domain.


I'm reasonably cool with this narrow definition of READ. Though I also think it's important to delineate operations that happen on a URL vs. a PORT. The historical convenience of the equivalence between read url and read port has obscured the boundary and expections between the values. copy url invokes the stringiness of the URL! type instead of copy port which is scheme-specific.

How then would you resolve env:HOME? Perhaps it should be binary?

>> read env:HOME
== #{2F55736572732F4368726973}

Perhaps GET is more appropriate (and doesn't—to my knowledge—have a predefined role with URL values):

>> get env:HOME
== "/Users/Chris"

What is the rationale behind that? How would that translate were I to say get https://example.com?


The Smalltalk discussions are a little difficult to follow, the terminology is unfamiliar. If I understand correctly, it seems to track quite well with the way URL/schemes/ports work in Rebol currently. I do think that read env:HOME is more enduring than get-env "HOME". Indeed, the schemes I wrote for ReplPad stand up pretty well in this light (write log::info "Info" write log::error "Warning" read %/), not least considering how much Ren-C has changed since I wrote them.

ENV as a scheme:

; quickie read-only ENV scheme for Rebol 3 Alpha/Oldes
;
sys/make-scheme [
    name: 'env
    actor: [
        read: func [port] [
            get-env find/match/tail port/spec/ref "env:"
        ]
    ]
]

read env:HOME
1 Like

Ren-C now has an ENVIRONMENT!, which you can instantiate under what name you want, so you can alias system.environment as just env in global scope.

At which point, you are comparing to env.HOME (or env."HOME")

I guess I'm still torn on the religion of the URL!. It's a nice trick to have a namespace ("scheme space") that a colon (and slash slash, under current rules) buys you by coming after the scheme, and to implicitly quote it as a literal. That does seem a bargain: you're getting $schemes.name."what/ever" for the low-ish price of name://what/ever, with name kept free in global scope.

While I recognize that value, it's in the context of a system that has to actually do the work. And that work is helped by program structure.

But...I wouldn't have gone after string interpolation if I didn't think the language needed more competence with strings. :man_shrugging:

So using tuple access (and the ENVIRONMENT!) you can indeed say get $env.HOME.

There's a kind of expectation-management going on, which seems to constantly be subverted by the interconnected world...that a simple operation will do a simple thing. The idea of something like a GET doing a network access might look to make sense if you say get http://whatever but when it is obscured behind get var that feels dangerous.

Maybe ENVIRONMENT! is already a problem. Should there be limits on what TUPLE! access (hence PICK and GET) is allowed to do, that you trust it only reads variables from inside your program memory, and anything that's "interprocess" needs a LOAD or READ at minimum?

1 Like

A post was split to a new topic: Why doesn't e.g. $HOME default to environment lookup?