dbzip2 vincit
I’ve managed to bang my dbzip2 prototype into a pretty decent state now, rewriting some of the lower-level bitstream code as a C module while keeping the high-level bits in Python.
It divides up input into proper-sized blocks and combines output blocks into a single output stream, achieving bit-for-bit compatibility with single-threaded standard bzip2. While still slower than bzip2smp for local threads, I was quite pleased to find it scales to multiple remote threads well enough to really look worth it:

This was using Wikimedia’s database servers; beefy Opteron boxes with gigabit ethernet and usually a lot of idle CPU cycles while they wait on local disk I/O.
The peak throughput on my initial multiple-server tests was about 24 megabytes per second with 10 remote threads, and I was able to get over 19 megs/sec on my full gigabyte test file, compressing it in under a minute. With some further work and better stability, this could be really helpful in getting the big data dumps going faster.
Next step: parallel decompression…?
