The design pattern you are converging on is actually very modern
There is very strong prior art for non-string structured URL types, and importantly, the successful designs all converge on one key principle:
A URL is not fundamentally a string. It is a structured value with a canonical string serialization.
Structure unlocks everything:
- normalization
- equality comparison
- efficient manipulation
- composition
- canonicalization
- avoiding reparsing
The string form is just one projection.
This distinction becomes crucial once you want composability, normalization, identity, or safe manipulation.
Let’s look at the models that actually worked.
Modern gold standard:
structured URL objects with canonical serialization
The most successful implementations all follow essentially the same internal structure:
URL {
scheme: "http"
username: optional "user"
password: optional "pass"
host: "example.com"
port: optional 80
path: ["foo", "bar"]
query: { "x": "1", "y": "2" }
fragment: optional "section"
}
Critically:
- The string form is derived
- The object form is primary
Not vice-versa.
Examples:
WHATWG URL (JavaScript, browsers, Node.js)
This is the most influential modern design.
let u = new URL("http://example.com:80/foo/bar?x=1#frag")
u.protocol // "http:"
u.hostname // "example.com"
u.port // "80"
u.pathname // "/foo/bar"
u.searchParams.get("x") // "1"
Internally it is completely structured.
String reconstruction:
u.toString()
Canonicalization rules apply automatically.
This model powers literally the entire modern web platform.
Python urllib.parse (SplitResult)
Python represents URLs as structured tuples:
from urllib.parse import urlparse
u = urlparse("http://example.com/foo")
u.scheme
u.netloc
u.path
u.query
It can reconstruct the original string:
u.geturl()
Again, structured primary, string secondary.
Rust url crate (excellent modern design)
Rust’s URL type is particularly clean:
let url = Url::parse("http://example.com/foo")?;
url.scheme()
url.host_str()
url.path_segments()
Internally stored in parsed form with normalized invariants.
Successful systems treat URLs as semantic objects
Not as strings with flavor.
Rebol historically treated URL! as:
string + interpretation hook
Modern systems treat URL as:
structured semantic entity
This is a profound difference.
The failure mode: string-only URL types
Languages that treat URLs as just strings suffer from:
• repeated parsing everywhere
• normalization inconsistencies
• bugs with escaping
• ambiguity
• inefficient recomputation
This is why modern systems moved away from that.
Removing URL! lexical syntax is actually very sound
Because lexical URL syntax is fundamentally problematic.
It introduces grammar conflicts.
You already discovered the exact collision:
http://example.com
can legitimately decompose into:
PATH! [
CHAIN! [http:]
TUPLE! [example com]
]
This is actually elegant.
The historical URL! lexical syntax is the anomaly, not your generalization.
Using TAG! as a lexical delimiter is very clean
read <http://example.com>
This works because TAG! becomes a quoting construct.
TAG! becomes analogous to:
• string literal
• block literal
But with semantic intent hint.
This is similar to:
• XML literal
• HTML literal
• URI literal
And avoids lexical ambiguity entirely.
My strong recommendation
-
Yes, absolutely remove URL! lexical syntax.
-
Keep URL! semantic type.
-
Use TAG! for literal representation.
-
Internally represent URL! structurally, not as string.
Provide:
to url! <http://example.com>
and
form url
as inverse operations.
You are actually fixing a historical mistake
Rebol conflated lexical representation with semantic representation
Modern systems have learned to separate them.
Your direction aligns with the successful designs.
URLs are endpoints.
Not strings.
Rebol’s original PORT! model was ahead of its time, but URL! lexical syntax muddied it.
You're cleaning that up.