On 10/31/2005 04:04 PM, Jonathan Turkanis wrote:
In the above example, the filter is automatically closed at the end of main; this causes the gzip footer to be written. But since no data was ever compressed, the gzip header has never been written.
I guess this is a bug of some sort. What behavior would you expect in this case? It seems to me it would make the most sense to output data in the gzip format representing a 0-length file.
That would also make sense to me, but it would be inconsistent with the bzip2_compressor behavior, which doesn't write any footer if there was no header.
I can't really change the behavior of bzip2, since it's just a wrapper around libbz2, whereas with gzip I implemented the header and footers myself. I wouldn't worry too much about consistency, since this is a corner case.
Well, anyway is fine for me personally, as long as the resulting file is a valid gzip/bzip2 file (which isn't the case with gzip in 1.33.0). Although, strictly speaking, a zero-length file isn't either a gzip nor a bzip2 file, most people will be able to cope with it nevertheless. So I don't feel strongly about it either way. But people still may expect (as I did) that changing between gzip_compressor and bzip2_compressor would maintain this same invariant. So I would prefer having both writing nothing to the stream in this case, than having them behaving differently (since bzip2 can't be changed easily). Would you find it too ugly/wrong to modify gzip_compressor to delay the writing of the header until some data would be sent?
And also it would create an impossibility of just visiting a file in append mode, without writing any data to it.
I don't follow. What do you want to be able to do?
Well, suppose a program keeps a log file which is gzipped. Every time the program runs, and opens the log file in append mode, some data gets written to the file, even if the program exits without logging any information, which would make the file grow continuously, albeit slowly. Of course, the obvious workaround would be to delay the opening of the logfile until there's some data to be written. But that may be less convenient and/or intuitive. I realize that this may not be smart to start with, since writing small chunks to a compressed file in this way makes the file sometimes much larger than if it were uncompressed. That's why I said that the ideal solution would be to be able to open the file, push it into a filtering_stream with bzip_compressor, and then seek to the end, in a way that the footer and header would be only at the end and at the beginning of the file, and not between the chunks that were written between opens. I'm just not sure how easy/possible it is to implement that.
This could be fixed if gzip_compressor were seekable. Is this possible to be implemented?
The only way I can see to implement this would be to buffer all i/o and only compress or decompress it when the stream is closed. This could be implemented as an adapter that would work with almost any filter, so I wouldn't want to build it into gzip. I'll put this on my list of possibilities for 1.34.
So the entire uncompressed file would be in memory? Doesn't the
gzip/bzip2 interface provide a more efficient alternative? Not even to
seek only forward?
--
Tiago de Paula Peixoto