My guess is that this compresses less efficiently as you would have to shard the...

kbaker · on Aug 20, 2016

The max window size for zlib is 32 KB, so I don't think the default sharding at 128 KB would change much. You can pass the -b parameter if you find out a bigger shard works better on your data.

If you are looking for details of the design of pigz, there is a very well-documented overview in the source of pigz.c:

https://github.com/madler/pigz/blob/master/pigz.c#L187

chii · on Aug 21, 2016

had a quick scroll thru that source code - i didnt know you could implement a try/catch in C via macros...mind blown.

TimJYoung · on Aug 21, 2016

This book (C Interfaces and Implementations: Techniques for Creating Reusable Software) has a whole section on it:

https://www.amazon.com/Interfaces-Implementations-Techniques...

hrehhf · on Aug 20, 2016

I tested it on a 680 MB of text. gzip compresses to 246.0 MB. pigz compresses to 245.5 MB. I see similar percent change on a 3.8 MB text file. So they are approximately equivalent.

toomuchtodo · on Aug 20, 2016

What was the difference in speed between runs?

hrehhf · on Aug 20, 2016

With 2 cores (4 logical) 58.9% decrease in time for the 680 MB file a.txt:

# time pigz -c /tmp/a.txt > /dev/null

real 0m19.352s user 1m16.148s sys 0m0.344s

# time gzip -c /tmp/a.txt > /dev/null

real 0m47.093s user 0m46.940s sys 0m0.104s

kbenson · on Aug 20, 2016

First, thanks for the numbers, it's useful to see real world examples.

Second, and this isn't meant to be a critique (I'm just trying to understand phenomena I see), is there a reason you prefer presenting it as a percentage decrease? Every time I read "X% decrease" I feel obliged to read the source numbers because I'm never sure if the person is using the terminology correctly or not (you are), since so often people mess that up. For myself, I generally use "X ran in Y% of the time Z took." specifically because I don't want people to misinterpret. Is the "X% decrease" presentation preferred/taught, or considered standard? Am I alone in feeling it's more likely to be misinterpreted?

(Sorry your comment is the one I brought this up on, I've just been wondering this for a while.)

hrehhf · on Aug 21, 2016

In my experience, it is more typical to use percent change or relative change in the physical sciences and this is how I was taught. Just to be clear, if you have values t1 and t2, the relative change is (t1-t2)/t1. There is a 1 to 1 correspondence with what you described which is t2/t1.

I think teej explained it well. If I say "the new value is +20% or -20%", it is immediately obvious those have the same magnitude and opposite direction. But, for some people, when I say "the new value is 120% or 80% of the old value", it is not immediately obvious that they have the same magnitude. It requires a small extra step for the reader to realize that this means the same amount of relative change.

teej · on Aug 20, 2016

I always state as relative change. I find that people can get really confused if you were to say "X ran in 110% in the time of Y" even though it is stated in a clear way.

My preferred way of communicating this concept is "we observed a +10% change in X compared to Y." I always use a +/- sign and this helps signal that I'm talking about a relative change.

If I am comparing percents, I'll always specify "relative" or "absolute" change though I prefer to use relative change. Occasionally if the change is small, I will use basis points instead of percents to communicate absolute change.

protomok · on Aug 20, 2016

You may be interested in these benchmarks: http://vbtechsupport.com/1614/

pwg · on Aug 21, 2016

Actually, it does not compress less efficiently, but you do not learn this fact from the README. Instead you have to look inside the man page (https://raw.githubusercontent.com/madler/pigz/master/pigz.1):

The input blocks, while compressed independently, have the last 32K of the previous block loaded as a preset dictionary to preserve the compression effectiveness of deflating in a single thread.

spullara · on Aug 22, 2016

I'll check it out. However, if you have to wait for the previous block to compress the following block, I don't see how you can parallelize it completely. My assumption is that you would have to shard the file at a higher level and still compress those shards independently. That should get close to the same results as using a sequential compressor but for small file sizes both this effect and Amdahl's law would start to show its head. I suppose you could get around that by not parallelizing compression at some minimum size automatically.

pwg · on Aug 23, 2016

> However, if you have to wait for the previous block to compress the following block, I don't see how you can parallelize it completely.

That above is not what the man-page says. The block size for any given session is fixed, so you know the boundaries of each block prior to compression.

Each block has a copy of the last 32kbytes of data at the end of the prior block.

The algorithm used by gzip compresses by finding repetitive strings in the last seen 32kbyte window of uncompressed data, so there is no compression dependency between blocks, even with a copy of the last 32k of uncompressed data from a prior block being present for the current block.

spullara · on Aug 24, 2016

So the lossage there is that you don't get to compress and generate that dictionary at the same time. Interesting.

pwg · on Aug 28, 2016

There's no "generating" of a dictionary. The gzip algorithm is based upon the "dictionary" being the last seen 32k bytes of uncompressed data based upon where in the file the compressor is presently working (technically in compression circles it is a 'windowed' algorithm, not a 'dictionary' algorithm). It compresses by finding a repetitive string in that 32k window that matches a string at the current ___location, and outputting an instruction that says (in effect): "seek backwards in the uncompressed data 200 bytes, then copy 100 bytes from there to here".

So as long as each parallel block has pre-pended the final 32k of the prior block, the output from the compression algorithm will be identical between a straight sequential compression and a pigz parallel compression. Because at byte 0 of the current block, the 32k window of the uncompressed prior block is available to perform matching against, just as if it were running sequentially.

The only growth from pigz comes from needing to round each parallel compressed block up to a multiple of 8 bits (the huffman codes that are output are bitstrings that don't match with 8-bit byte boundaries). But worst case that is 7 bits per block for each parallel block. Given the performance gains on multiple CPU systems, a few hundred bytes net increase is not likely to matter. If those few hundred bytes did matter, then one should use bzip2 or lzip or xz and get much higher compression ratios (at the expense of much longer time).