Exposing the #{00} At The End of Strings

hostilefork · April 8, 2025, 7:37pm

When you have a UTF-8 string like TEXT! represented internally, it has a 0 byte at its end.

This is leveraged various places internally.

In the libRebol API functions like rebLockUtf8() can give you direct access to the internal byte buffer, and you'll get a terminator...which makes it much more useful:

const char* utf8 = rebLockUtf8("reverse -{cba}-");

assert(utf8[0] == 'a');
assert(utf8[4] == '\0');

char buffer[4];
strcpy(buffer, utf8);  // kind of thing that would be hard if not terminated

rebUnlockUtf8(utf8);

But what if you're doing something like the FFI, and you want a BINARY! that has the #{00} on the end?

>> utf8 of "abc"
== #{616263}  ; new allocation

>> as blob! "abc"
== #{616263}  ; alias of memory, no additional allocation

Niether case gives you the intenal #{00}

You can go about either appending the byte to the synthesized new allocation, or joining a byte to make a synthesis of the alias with the byte:

append (utf8 of "abc") #{00}

join (as blob! "abc") #{00}

But you shouldn't have to make a copy at all.

This is sort of a nuance on the AS aliasing process. Most clients don't want to see the #{00} byte at the end, but those that do could get a big savings (especially on longer strings) if they did.

Something worth thinking about.