Chapter 9. Zip

Table of Contents

Character encoding in Zip files
Significant Zip features not supported by At4J
Reading Zip archives
Extracting from Zip archives
Creating Zip archives
Adding support for unsupported features
Adding a new compression method
Adding a new external attribute type
Adding a new extra field type
Standalone Zip tools

The Zip file format was originally developed in 1989 by Phil Katz for the company PKZIP. A Zip archive contains file and directory entries, where each file's data is compressed individually. The archive contains a number of Zip entries containing metadata on the entry and file data for file entries, followed by a central directory where some of the metadata for each entry is repeated.

The Zip specification allows for several different compression methods, even within the same Zip archive. The At4J implementation supports the following:

The Deflated and Stored methods are most common and are widely supported by Zip software.

Each Zip entry has a metadata record associated with it. It contains data such as the entry's absolute location in the archive, its last modification time, its external file attributes and a comment. (See ZipEntry.) The format of the external file attributes is configurable in order to be able to capture significant attributes from the file system containing the files that were added to the archive. Unix external file attributes, for instance, contains information on the entry's permission mode (same mode as the chmod command), such as 0644 or 0755.

The entry metadata can be, and often is, extended using extra fields that contain metadata that does not fit into the standard metadata record. This can for instance be timestamps with a higher precision than the timestamps in the standard record.

The Zip archive itself can also have a comment. It is often printed by the Zip program when the archive is being unzipped.

The Zip file format is specified in PKWARE Zip application note and in Info-Zip's Zip application note. See also the Wikipedia article on Zip.

Neither PKWARE's nor Info-Zip's application notes specify which character encoding to use for encoding text metadata. Windows (and DOS) programs use Codepage 437 to encode file paths, and the platform's default charset (Codepage 1252 in Sweden, for instance) for other text metadata such as comments. Unix programs use the platform's default charset (often UTF-8 or ISO-8859-1) for all text data. The Unicode path extra field (UnicodePathExtraField) can be, but seldom is, used to store an UTF-8-encoded version of an entry's path.

The following significant Zip features are not supported:

A Zip archive is read by creating a ZipFile object on the Zip file. The ZipFile object contains a ZipEntry object for each entry in the archive.

Example 9.1. Reading data from a Zip archive

// Read data from the Zip file f

// The UTF-8 charset
Charset utf8 = Charset.forName("utf8");

// Create the Zip file object. Text and file paths in the Zip file are encoded
// in UTF-8
ZipFile zf = new ZipFile(f, utf8, utf8);
try
{
  // Print out the names of the child entries of the directory entry /d
  ZipDirectoryEntry d = (ZipDirectoryEntry) zf.get(new AbsoluteLocation("/d"));
  System.out.println("Contents of /d: " + d.getChildEntries().keySet());

  // Print out the contents of the file /d/f
  ZipFileEntry df = (ZipFileEntry) d.getChildEntries().get("f");
  
  // Use the EntityFS utility class Files to read the text in the file.
  System.out.println(Files.readTextFile(df, utf8));
}
finally
{
  // Close the Zip archive to release all resources associated with it.
  zf.close();
}


External file attributes, compression method metadata and extra fields can be accessed through ZipEntry objects. External file attributes are represented by a ZipExternalFileAttributes-implementing object, compression method metadata by a ZipEntryCompressionMethod object and extra fields with a list of ZipEntryExtraField objects. Each extra field is represented by two objects since it occurs both in the Zip entry's metadata (the local header) and in the central directory at the end of the Zip file. The isInLocalHeader method of an ZipEntryExtraField object can be used to query it about where it got its data from – the local header or the central directory.

Example 9.2. Reading metadata from a Zip entry

// Create a Zip archive object for the archive in the file f
// The Zip file metadata is encoded using the UTF-8 charset.
ZipFile zf = new ZipFile(f, Charsets.UTF8, Charsets.UTF8);

// Print out the archive comment
System.out.println(zf.getComment());

// Get the file entry /f1
ZipFileEntry zfe = (ZipFileEntry) zf.get(new AbsoluteLocation("/f1"));

// Print out its comment
System.out.println(zfe.getComment());

// Print out its compression method
System.out.println(zfe.getCompressionMethod().getName());

// Print out its Unix permissions mode
System.out.println(
  ((UnixExternalFileAttributes) zfe.getExternalFileAttributes()).
    getEntityMode());

// Print out the value of the last modification time from the extended timestamp
// extra field from the local file header. Format the data using a
// SimpleDateFormat object.
System.out.println(
  new SimpleDateFormat("yyyyMMdd").format(
    zfe.getExtraField(ExtendedTimestampExtraField.class, true).
      getLastModified()));


The ZipFile object uses a ZipFileParser object to parse the contents of the Zip file. It has a few extension points where additional functionality can be plugged in. See the section called “Adding support for unsupported features” below.

Zip entries can be extracted using the ArchiveExtractor. There is no custom extractor for Zip archives.

A Zip archive is created using a ZipBuilder object. It is configured with a ZipBuilderSettings object.

Each added entry is configured with a ZipEntrySettings object. It contains properties for the compression method to use, for the extra fields to add, for the entry comment and for how the external file attributes should be represented. The builder uses the strategy described in the section called “Determining the metadata for an entry” to arrive at the effective settings for each entry.

Below is an example that shows how a Zip archive is built using a ZipBuilder.

Example 9.3. Building a Zip archive

// Build the Zip file "myArchive.zip" in the directory targetDir.
RandomlyAccessibleFile zipFile = Directories.newFile(targetDir, "myArchive.zip");

// Configure the global Zip builder settings.

// Create a factory object for the external attributes metadata
ZipExternalFileAttributesFactory extAttrsFactory =
  new UnixExternalFileAttributesFactory(
    //
    // Set files to be world readable
    UnixEntityMode.forCode(0644),
    //
    // Set directories to be world executable
    UnixEntityMode.forCode(0755));

ZipBuilderSettings settings = new ZipBuilderSettings().
  //
  // Set the default file entry settings.
  setDefaultFileEntrySettings(
    new ZipEntrySettings().
      //
      // Use bzip2 compression for files entries.
      // NOTE: bzip2 is not supported by all Zip implementations!
      setCompressionMethod(BZip2CompressionMethod.INSTANCE).
      //
      // Use the external attributes factory created above
      setExternalFileAttributesFactory(extAttrsFactory).
      //
      // Add an extra field factory for creating the Unicode path extra field
      // that stores the entry's path name encoded in UTF-8.
      addExtraFieldFactory(UnicodePathExtraFieldFactory.INSTANCE)).
  //
  // Set the default directory entry settings.
  setDefaultDirectoryEntrySettings(
    new ZipEntrySettings().
      //
      // Use the external attributes factory created above.
      setExternalFileAttributesFactory(extAttrsFactory).
      //
      // An extra field factory for creating the Unicode path extra field.
      addExtraFieldFactory(UnicodePathExtraFieldFactory.INSTANCE)).
  //
  // Set a Zip file comment.
  setFileComment("This is myArchive.zip's comment.");

// Create the Zip builder
ZipBuilder zb = new ZipBuilder(zipFile, settings);

// Add a global rule that says that all script files (files ending with .sh)
// should be world executable.
zb.addRule(
  new ArchiveEntrySettingsRule<ZipEntrySettings>(
    new ZipEntrySettings().
      //
      // This object only has to contain the difference between the default file
      // settings and the settings for this rule due to the way in which
      // settings are combined.
      setExternalFileAttributesFactory(
        new UnixExternalFileAttributesFactory(
          //
          // Files are world executable.
          UnixEntityMode.forCode(0755),
          //
          // Directories are world executable. (No directories will be matched
          // by the rule's filter, though.)
          UnixEntityMode.forCode(0755))),
    //
    // The filter that determines which entries the rule will be applied to.
    FileETAF.FILTER.and(
      new NameGlobETAF("*.sh"))));

// Add the directory hierarchy under the directory src to the location /source
// in the archive.
zb.addRecursively(src, new AbsoluteLocation("/source"));

// Close the builder to finish writing the archive.
zb.close();


The shortcut method setCompressionLevel on the ZipBuilder object can be used for setting the default compression level for files without having to create a new ZipEntryCompressionMethod object.

Example 9.4. Build a Zip archive and set the compression level

// Build the Zip file "myArchive.zip" in the directory targetDir. Use the best
// possible (deflate) compression.
RandomlyAccessibleFile zipFile = Directories.newFile(targetDir, "myArchive.zip");

// Configure the global Zip builder settings.

ZipBuilderSettings settings = new ZipBuilderSettings().
  //
  // Set maximum compression level for the default file compression method
  // (deflate)
  setCompressionLevel(CompressionLevel.BEST);

// Create the Zip builder
ZipBuilder zb = new ZipBuilder(zipFile, settings);

// Add the directory hierarchy under the directory src to the location /source
// in the archive.
zb.addRecursively(src, new AbsoluteLocation("/source"));

// Close the builder to finish writing the archive.
zb.close();


It is possible to plug in support for new extra field types, new compression methods and new external attribute types in the ZipFile and ZipBuilder objects.

Feature implementations will have to work with raw, binary data read from and written to Zip files. They will probably find the number types in the org.at4j.support.lang package and perhaps the utilities in the org.at4j.support.util package useful.

This is how to make ZipFile understand a new compression method:

  1. Implement a new ZipEntryCompressionMethod class.
  2. Implement a new ZipEntryCompressionMethodFactory class.
  3. Create a new ZipFileParser instance.
  4. Register the new compression method factory in the Zip file parser's compression method factory registry.

To use the new compression method with the ZipBuilder, use it with the ZipEntrySettings objects for the files that should be compressed using the new method, or with the default file settings objects if all files should be compressed using it.

This is how to make ZipFile understand a new external attribute type:

  1. Implement a new ZipExternalFileAttributes class.
  2. Implement a new ZipExternalFileAttributesFactory class.
  3. Create a new ZipFileParser instance.
  4. Register the new external attributes factory in the Zip file parser's external attributes factory registry.

To use the new external attributes object with the ZipBuilder, use the factory with the ZipEntrySettings objects for the entries that should use the new attributes, or with the default file and directory settings objects if all entries should use them.

This is how to make ZipFile understand a new extra field type:

  1. Implement a new ZipEntryExtraField class.
  2. Implement a new ZipEntryExtraFieldParser class.
  3. Create a new ZipFileParser instance.
  4. Register the new extra field parser in the Zip file parser's extra field parser registry.

This is how to add entries using the new extra fields to a ZipBuilder:

  1. Implement a new ZipEntryExtraFieldFactory class.
  2. Use the new extra field factory with the ZipEntrySettings for the entries that should have the new extra fields, or with the default file and directory settings objects if all file and directory entries should have them.

The Zip and Unzip emulates the behavior of the zip and unzip commands. See their API documentation for details on how to use them.