`tac` : Implementation of UNIX's line reverser

I mentioned I wanted to study some basic utilities...even more basic than greb. Small codebases can put a focus on big design points.

Here was my off-the-cuff line-reverser (TAC ... reverse CAT, where CAT is like Windows Command Shell's TYPE). I wrote it a year ago, while I was solely focused on getting READ-LINE to work from piped input cross platform at the time...so almost no thought went into it:

; %tac.r, version 0.0.2
;
; * COLLECT the lines
; * REVERSE the collection
; * output the result DELIMIT-ed with newline
;
write stdout opt delimit:tail newline reverse collect [
    insist [not keep opt read-line]
]

Right off the bat, you can see it's using twice the memory it needs to. It's collecting a block of strings, and then while that whole block is in memory it merges it into one giant string before output. At minimum, this should loop and write the strings out of the block out one at a time. (Though doing it this way does draw attention to an interesting point about DELIMIT, which I'll get to later.)

Note: This line-reversing task is one of those pathological cases that can't be done in a "streaming" way. You can't start writing anything to the output until you've read the input to the end. (Doing better needs a random-access I/O PORT! that can SEEK the end of the file and go backwards...but the standard input device can't do this.)

Why Does DELIMIT:TAIL Ever Return NULL ?

The OPT in [write stdout opt delimit:tail ...] is there because DELIMIT can return NULL. If it does, we want to opt out of the write (since passing the null would cause a failure)

One might ask if it should never be able to return NULL when you use :TAIL. At the moment, it does:

>> delimit:tail "," ["a" "b"]
== "a,b,"

>> delimit []
== \~null~\  ; antiform

>> delimit:tail "," []
== \~null~\  ; antiform

Maybe that last one should be "," ? Perhaps when you have :HEAD or :TAIL, you never get null back?

But... let's stick to looking at the use cases.

What Does %tac.r Want From DELIMIT:TAIL Here?

If we look at the edge case here, there is a difference between these two situations:

  1. If the first call to READ-LINE returns an empty string, and the second call returns NULL

    • This happens when you pipe in a 1-byte file containing a single line feed, e.g. a file containing one line that's empty.

    • With the code above, COLLECT produces the block [""] for this case

  2. If the first call to READ-LINE returns NULL

    • This happens when you pipe in a 0-byte file, e.g. a file containing no lines at all

    • With the code above, COLLECT produces the block [] for this case.

So perhaps you see why DELIMIT chooses to react with some kind of signal when the block contents vaporize. It's precisely because cases like this tend to need some kind of special handling, and it's not good to gloss over that.

In this case, the empty block (which corresponds to the 0-byte file input, e.g. 0 lines) should result in there being no write to the output. So the default behavior of WRITE STDOUT VOID is the right answer.

More to Study, I Just Thought That Bit Was Interesting...


"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."

1 Like

The interface for READ-LINE and friends was made before the existence of definitional errors.

There were three return states:

  • Multi-return pack of ~[string data, end of file flag]~

  • NULL

  • An ~escape~ antiform (no longer a legal "keyword")

The case of the ~escape~ antiform pretty clearly should be a definitional error. We can see that it would screw up programs like TAC, with the antiform being truthy. (The original choice was made when they were ornery.)

If you aren't rigged up to handle the user canceling (via EXCEPT) then it should create an error and the program should halt. You don't want it to be conflated with NULL as an ordinary "no more input available, normal completion" condition, and you don't want it to be something that is a branch trigger (which is everything but null these days).

The reason for the end of file flag is that you could do a READ and the other side of the pipe could hang up... not strictly at the point of a newline. Returning NULL in that case might throw away data you were interested in getting. So this was a way of letting you know if it wasn't really a complete string--if you cared.

The EOF Flag Can Be Ignored, And Is A Mistake

The secondary-return-result-EOF isn't a good design, as casual usage will conflate reading an incomplete line with reading a complete line. The other side can hang up on you, and you won't know it.

I think a better answer here is to have a :RAW mode which includes the newline at the end of the string you get. Then you can detect if the newline is there or not. If it's not, your read was prematurely interrupted.

if not (line: read-line:raw except [print "Escape!", quit 1]) [
    print "No more input left to read."
    quit 0
]
if newline <> try last line [  ; need TRY, since string may be empty
   print ["Incomplete line was read:" mold line]
   quit 2
]
try take:last line  ; again, TRY in case string is empty
print ["Complete line read:" mold line]
quit 0

So that gives you coverage of what this layer of abstraction can provide you.

If you don't use the :RAW mode, then the other end of the pipe disconnecting in mid line would cause an abrupt failure.

Seems good to me.

1 Like

I retconned this from the old style UNTIL to INSIST, because I believe that it's truly better to have UNTIL as arity-2.

Writing out a BLOCK! as lines is probably common enough that it should be offered:

write-lines stdout reverse collect [
    insist [not keep opt read-line stdin]
]

[] is an edge case there. Unless someone can put together a convincing argument otherwise, I'd say it shouldn't write anything... and [""] should be required if you want to write a single newline. That would give it parity with the implementation as I had it.

Speaking in the negative with INSIST is kind of weird. I'd rather read:

write-lines stdout reverse collect [
    while [keep opt read-line stdin] [
        noop
    ]
]

I'm of course trying to not use variables here as part of the "puzzle", but using variables makes it more comprehensible:

write-lines stdout reverse collect [
    let line
    while [line: read-line stdin] [
        keep line
    ]
]

WHILE passes its condition to a body that's a function, so you could do:

write-lines stdout reverse collect [
    while [read-line stdin] (line -> [keep line])
]

And of course, that function is just KEEP:

write-lines stdout reverse collect [
    while [read-line stdin] keep/
]

There's also ATTEMPT. ATTEMPT is particularly interesting if you want to throw in error-handling, because you're not so concerned about where your loop conditions wind up:

write-lines stdout reverse collect [
    attempt [
        let line: read-line stdin except e -> [
            ... your error handing here ...
        ]
        if line [
            keep line
            again
        ]
    ]
]

And you could be creative with CYCLE, getting rid of the need for OPT on the KEEP by just putting a BREAK on the NULL-yielding READ-LINE case:

write-lines stdout reverse collect [
    cycle [keep (read-line stdin else [break])]
]

The parentheses actually aren't necessary there, because it runs one complete expression on the left, so it connects that way naturally:

write-lines stdout reverse collect [
    cycle [keep read-line stdin else [break]]
]

Whether that reads intuitively to people or not is subject to your acclimation to the feature. You could also say the more traditional:

write-lines stdout reverse collect [
    cycle [keep any [read-line stdin, break]]
]

Most Readable Is WHILE w/LINE Variable

It would be even better If LET variables could be visible in WHILE bodies, which I think is definitely worth pursuing:

write-lines stdout reverse collect [
    while [let line: read-line stdin] [
        keep line
    ]
]

I kind of feel like there's a comprehensibility threshold with "don't use any intermediate variables", and testing the result of a KEEP is kind of counterintuitive.

Best power-user version is likely:

write-lines stdout reverse collect [
    while [read-line stdin] keep/
]
1 Like