IOStreams zlib_decompressor does not check crc for empty string
When decompressing an empty file, the zlib_decompressor does not check the crc, which can cause an exception to be raised on a correctly-formatted gzip file, especially when the gzip file is created by concatenating compressed blocks. This comes up especially when working with bgzip files (see specification at: http://bioinformatics.oxfordjournals.org/content/27/5/718.full; all bgzip files end in a special empty block) I have attached a demonstration file (zlib_test.cpp) that illustrates the problem. This program simply reads a compressed input line by line from stdin and outputs to stdout (nearly equivalent to zcat). Assuming we compile to a.out, the following BASH commands illustrate the issue: $ cat <(echo "foo" | gzip -c) <(echo -n "" | gzip) <(echo "bar" | gzip) | ./a.out foo Whereas, we expect the following: $ cat <(echo "foo" | gzip -c) <(echo -n "" | gzip) <(echo "bar" | gzip) | zcat foo bar I have attached a patch to zlib.cpp (base version 1.51, but I confirmed that it persists in 1.58) which I believe should resolve the issue by simply setting the crc to 0 when an empty string is decompressed. Thanks for your time, John Wallace
participants (1)
-
John Wallace