Chapter 3. Data compression

Table of Contents

Utilities
Which compression method is best?

At4J has an implementation of bzip2 and provides other data compression algorithms through third party libraries. All compression methods use Java's streams metaphor—data is compressed by writing to an OutputStream and decompressed by reading from an InputStream. The example below shows how data is compressed and then decompressed using bzip2 compression.

Example 3.1. Compressing and decompressing with bzip2

String toCompress = "Compress me!";

// This will contain the compressed byte array
ByteArrayOutputStream bout = new ByteArrayOutputStream();

// Settings for the bzip2 compressor
BZip2OutputStreamSettings settings = new BZip2OutputStreamSettings().
  // Use four encoder threads to speed up compression
  setNumberOfEncoderThreads(4);

OutputStream out = new BZip2OutputStream(bout, settings);
try
{
  // Compress the data
  out.write(toCompress.getBytes());
}
finally
{
  out.close();
}

byte[] compressed = bout.toByteArray();

// This will print a long range of numbers starting with "[66, 90, 104, ..."
System.out.println(Arrays.toString(compressed));

// Decompress the data again
StringBuilder decompressed = new StringBuilder();
InputStream in = new BZip2InputStream(
  new ByteArrayInputStream(compressed));
try
{
  byte[] barr = new byte[64];
  int noRead = in.read(barr);
  while(noRead > 0)
  {
    decompressed.append(new String(barr, 0, noRead));
    
    noRead = in.read(barr);
  }
}
finally
{
  in.close();
}

// This will print "Compress me!"
System.out.println(decompressed.toString());


EntityFS has some utility classes that makes the I/O programming less verbose. The example below does the same as the example above, but uses the StreamUtil class for reading data from the decompressing stream.

Example 3.2. Compressing and decompressing with bzip2 using EntityFS utilities

String toCompress = "Compress me!";

// This will contain the compressed byte array
ByteArrayOutputStream bout = new ByteArrayOutputStream();

OutputStream out = new BZip2OutputStream(bout);
try
{
  // Compress the data
  out.write(toCompress.getBytes());
}
finally
{
  out.close();
}

byte[] compressed = bout.toByteArray();

// This will print a long range of numbers starting with "[66, 90, 104, ..."
System.out.println(Arrays.toString(compressed));

// Decompress the data again. Use StreamUtil to read data.
byte[] decompressed = StreamUtil.readStreamFully(
  new BZip2InputStream(
    new ByteArrayInputStream(compressed)), 64);

// This will print "Compress me!"
System.out.println(new String(decompressed));


The following EntityFS classes are useful when working with files and streams:


The answer is, of course: it depends. The performance characteristics of the different compression methods are investigated in At4J test report. The table below summarizes the characteristics of the different compression methods: