Chapter 6. LZMA compression

Table of Contents

Standalone LZMA tools

LZMA, the Lempel-Ziv-Markov chain-Algorithm, is a compression algorithm that has been under development since 1998. See the Wikipedia article on LZMA.

At4J uses Igor Pavlov's LZMA implementation from the LZMA SDK. It is built around a standalone encoder and a standalone decoder. The encoder reads data from an uncompressed stream and writes it to a compressed stream, and the decoder does the opposite. This pull method of processing data is quite unlike the push model employed by the Java streams API.

At4J provides stream implementations on top of the encoder and the decoder, in effect turning them inside out. To accomplish this, the encoder or the decoder is launched in a separate thread that is running as long as the stream writing to it or reading from it is open. The compressing stream implementation is LzmaOutputStream and the decompressing stream implementation is LzmaInputStream.

Clients are, of course, free to choose between using the LZMA SDK's Encoder and Decoder classes, or using At4J's LzmaOutputStream and LzmaInputStream.

Warning

LZMA does not seem to work well with the IBM JDK. See the At4J test results.

An LzmaOutputStream is configured using an LzmaOutputStreamSettings object. There are several configurable parameters. See the LzmaOutputStreamSettings documentation for details. By default, the output stream writes its configuration before the compressed data. By doing so, an LzmaInputStream reading from the file does not have to be configured manually; it just reads its configuration from the file header. If the compressed data does not contain the compression settings, the input stream can be configured using an LzmaInputStreamSettings object.

The example below shows how data is compressed by writing it to an LZMA output stream and then decompressed again by reading from an LZMA input stream.

Example 6.1. Compressing and decompressing with LZMA

// Data will be compressed to the File f

String toCompress = "Compress me!";

// Create a new LZMA output stream with the default settings. This will write
// the compression settings before the compressed data, so that the stream that
// will read the data later on does not have to be configured manually.
// This starts a new encoder thread.
OutputStream os = new LzmaOutputStream(new FileOutputStream(f));
try
{
    os.write(toCompress.getBytes());
}
finally
{
  // This closes the encoder thread.
  os.close();
}

// Read the compressed data
// This starts a new decoder thread.
InputStream is = new LzmaInputStream(new FileInputStream(f));
try
{
  // Use the EntityFS StreamUtil utility to make our job easier.
  // This will print "Compress me!"
  System.out.println(
    new String(
      StreamUtil.readStreamFully(is, 32)));
}
finally
{
  // This closes the decoder thread.
  is.close();
}


The example below writes LZMA compressed data to a file without writing the the compression settings, and then reads the data again using a manually configured input stream.

Example 6.2. Compressing and decompressing with LZMA using manual configuration

// Data will be compressed to the File f

String toCompress = "Compress me!";

// Create the configuration for the output stream. Set two properties and use
// the default values for the other properties. 
LzmaOutputStreamSettings outSettings = new LzmaOutputStreamSettings().
  // Do not write the configuration to the file
  setWriteStreamProperties(false).
  // Use a dictionary size of 2^8 = 256 bytes
  setDictionarySizeExponent(8);
  
// Create a new LZMA output stream with the custom settings.
OutputStream os = new LzmaOutputStream(new FileOutputStream(f), outSettings);
try
{
    os.write(toCompress.getBytes());
}
finally
{
  os.close();
}

// Create the configuration for the input stream. Configure it using properties
// from the output stream configuration above.
LzmaInputStreamSettings inSettings = new LzmaInputStreamSettings().
  setProperties(outSettings.getProperties());
  
// Read the compressed data with a manually configured input stream.
InputStream is = new LzmaInputStream(new FileInputStream(f), inSettings);
try
{
  // Use the EntityFS StreamUtil utility to make our job easier.
  // This will print "Compress me!"
  System.out.println(
    new String(
      StreamUtil.readStreamFully(is, 32)));
}
finally
{
  // This closes the decoder thread.
  is.close();
}


The LzmaWritableFile and LzmaReadableFile objects can transparently compress data written to and decompress data read from a file.

The next example does the same as Example 6.1, “Compressing and decompressing with LZMA”, except that it uses the LzmaReadableFile and LzmaWritableFile classes.

Example 6.3. Compressing and decompressing with LZMA using At4J readable and writable LZMA files

// Data will be compressed to the File f

String toCompress = "Compress me!";

// Wrap the File in a ReadWritableFileAdapter to make it a
// ReadWritableFile
ReadWritableFile fa = new ReadWritableFileAdapter(f);

// Write the data using the EntityFS utility class Files and a
// LzmaWritableFile using its default configuration.
Files.writeText(new LzmaWritableFile(fa), toCompress);

// Read the data, again using Files. The data is read from an unconfigured
// LzmaReadableFile.
// This will print out "Compress me!"
System.out.println(
  Files.readTextFile(
    new LzmaReadableFile(fa)));


The Lzma and UnLzma classes have runnable main methods that emulate the behavior of the lzma and unlzma commands. See their API documentation for details on how to use them.