Realistically Migrating Rebol to "UTF8 Everywhere"

Coming back to it to try and get this out the door, I think "half done" was about right. It took a week or two more work...and it's going to take a bit more work before it's "done"...it's a bit on the slow side right at the moment.

Nevertheless, I've gone ahead and committed it....UTF-8 Everywhere Lives!

There's a whole new set of interesting angles to how BINARY! and TEXT! can intermix in PARSE:

I also imported a file from the W3C to the tests, and got things started on how more purposeful tests might be written:

This approach isn't impossible...but it hinges on having a value cell in your hand at the moment of doing the lookup. A lot of places have series nodes that aren't paired with any value, so there'd be no caching.

For the moment, the main caching is just done on the series itself. Small series don't bother with a cache, larger ones could have several. It leads to orders of magnitude in speedup, and a collection of large parses (like source analysis) is reduced to the scale of "minutes" instead of the scale of "a day".

This is where the big speedup is going to come from, but I figured it would be better to phase it. Not only will the extra processing in parse help give this a test for a while, but also people can adapt to the first set of necessary changes before being hit with needing to rewrite any parse rules that expected to modify the iterated series without using parse keywords.

1 Like