Table of Contents
The Zip file format was originally developed in 1989 by Phil Katz for the company PKZIP. A Zip archive contains file and directory entries, where each file's data is compressed individually. The archive contains a number of Zip entries containing metadata on the entry and file data for file entries, followed by a central directory where some of the metadata for each entry is repeated.
The Zip specification allows for several different compression methods, even within the same Zip archive. The At4J implementation supports the following:
The Deflated and Stored methods are most common and are widely supported by Zip software.
Each Zip entry has a metadata record associated with it. It contains
data such as the entry's absolute location in the archive, its last modification
time, its external file attributes and a comment. (See
ZipEntry.) The format of the external file attributes is
configurable in order to be able to capture significant attributes from the file
system containing the files that were added to the archive. Unix external file
attributes, for instance, contains information on the entry's permission mode 
(same mode as the chmod command), such as
0644 or 0755.
The entry metadata can be, and often is, extended using extra fields that contain metadata that does not fit into the standard metadata record. This can for instance be timestamps with a higher precision than the timestamps in the standard record.
The Zip archive itself can also have a comment. It is often printed by the Zip program when the archive is being unzipped.
The Zip file format is specified in PKWARE Zip application note and in Info-Zip's Zip application note. See also the Wikipedia article on Zip.
Neither PKWARE's nor Info-Zip's application notes specify which character encoding to use for encoding text metadata. Windows (and DOS) programs use Codepage 437 to encode file paths, and the platform's default charset (Codepage 1252 in Sweden, for instance) for other text metadata such as comments. Unix programs use the platform's default charset (often UTF-8 or ISO-8859-1) for all text data. The Unicode path extra field (UnicodePathExtraField) can be, but seldom is, used to store an UTF-8-encoded version of an entry's path.
The following significant Zip features are not supported:
A Zip archive is read by creating a ZipFile object on the Zip file. The ZipFile object contains a ZipEntry object for each entry in the archive.
Example 9.1. Reading data from a Zip archive
// Read data from the Zip file f // The UTF-8 charset Charset utf8 = Charset.forName("utf8"); // Create the Zip file object. Text and file paths in the Zip file are encoded // in UTF-8 ZipFile zf = new ZipFile(f, utf8, utf8); try { // Print out the names of the child entries of the directory entry /d ZipDirectoryEntry d = (ZipDirectoryEntry) zf.get(new AbsoluteLocation("/d")); System.out.println("Contents of /d: " + d.getChildEntries().keySet()); // Print out the contents of the file /d/f ZipFileEntry df = (ZipFileEntry) d.getChildEntries().get("f"); // Use the EntityFS utility class Files to read the text in the file. System.out.println(Files.readTextFile(df, utf8)); } finally { // Close the Zip archive to release all resources associated with it. zf.close(); }
External file attributes, compression method metadata and extra fields can
be accessed through ZipEntry objects. External file attributes are
represented by a ZipExternalFileAttributes-implementing
object, compression method metadata by a ZipEntryCompressionMethod
object and extra fields with a list of ZipEntryExtraField objects.
Each extra field is represented by two objects since it occurs both in the Zip
entry's metadata (the local header) and in the central directory at the end of
the Zip file. The isInLocalHeader method of
an ZipEntryExtraField object can be used to
query it about where it got its data from – the local header or the central
directory.
Example 9.2. Reading metadata from a Zip entry
// Create a Zip archive object for the archive in the file f // The Zip file metadata is encoded using the UTF-8 charset. ZipFile zf = new ZipFile(f, Charsets.UTF8, Charsets.UTF8); // Print out the archive comment System.out.println(zf.getComment()); // Get the file entry /f1 ZipFileEntry zfe = (ZipFileEntry) zf.get(new AbsoluteLocation("/f1")); // Print out its comment System.out.println(zfe.getComment()); // Print out its compression method System.out.println(zfe.getCompressionMethod().getName()); // Print out its Unix permissions mode System.out.println( ((UnixExternalFileAttributes) zfe.getExternalFileAttributes()). getEntityMode()); // Print out the value of the last modification time from the extended timestamp // extra field from the local file header. Format the data using a // SimpleDateFormat object. System.out.println( new SimpleDateFormat("yyyyMMdd").format( zfe.getExtraField(ExtendedTimestampExtraField.class, true). getLastModified()));
The ZipFile object uses a ZipFileParser object to parse the contents of the Zip file. It has a few extension points where additional functionality can be plugged in. See the section called “Adding support for unsupported features” below.
Zip entries can be extracted using the ArchiveExtractor. There is no custom extractor for Zip archives.
A Zip archive is created using a ZipBuilder object. It is configured with a ZipBuilderSettings object.
Each added entry is configured with a ZipEntrySettings object. It contains properties for the compression method to use, for the extra fields to add, for the entry comment and for how the external file attributes should be represented. The builder uses the strategy described in the section called “Determining the metadata for an entry” to arrive at the effective settings for each entry.
Below is an example that shows how a Zip archive is built using a ZipBuilder.
Example 9.3. Building a Zip archive
// Build the Zip file "myArchive.zip" in the directory targetDir. RandomlyAccessibleFile zipFile = Directories.newFile(targetDir, "myArchive.zip"); // Configure the global Zip builder settings. // Create a factory object for the external attributes metadata ZipExternalFileAttributesFactory extAttrsFactory = new UnixExternalFileAttributesFactory( // // Set files to be world readable UnixEntityMode.forCode(0644), // // Set directories to be world executable UnixEntityMode.forCode(0755)); ZipBuilderSettings settings = new ZipBuilderSettings(). // // Set the default file entry settings. setDefaultFileEntrySettings( new ZipEntrySettings(). // // Use bzip2 compression for files entries. // NOTE: bzip2 is not supported by all Zip implementations! setCompressionMethod(BZip2CompressionMethod.INSTANCE). // // Use the external attributes factory created above setExternalFileAttributesFactory(extAttrsFactory). // // Add an extra field factory for creating the Unicode path extra field // that stores the entry's path name encoded in UTF-8. addExtraFieldFactory(UnicodePathExtraFieldFactory.INSTANCE)). // // Set the default directory entry settings. setDefaultDirectoryEntrySettings( new ZipEntrySettings(). // // Use the external attributes factory created above. setExternalFileAttributesFactory(extAttrsFactory). // // An extra field factory for creating the Unicode path extra field. addExtraFieldFactory(UnicodePathExtraFieldFactory.INSTANCE)). // // Set a Zip file comment. setFileComment("This is myArchive.zip's comment."); // Create the Zip builder ZipBuilder zb = new ZipBuilder(zipFile, settings); // Add a global rule that says that all script files (files ending with .sh) // should be world executable. zb.addRule( new ArchiveEntrySettingsRule<ZipEntrySettings>( new ZipEntrySettings(). // // This object only has to contain the difference between the default file // settings and the settings for this rule due to the way in which // settings are combined. setExternalFileAttributesFactory( new UnixExternalFileAttributesFactory( // // Files are world executable. UnixEntityMode.forCode(0755), // // Directories are world executable. (No directories will be matched // by the rule's filter, though.) UnixEntityMode.forCode(0755))), // // The filter that determines which entries the rule will be applied to. FileETAF.FILTER.and( new NameGlobETAF("*.sh")))); // Add the directory hierarchy under the directory src to the location /source // in the archive. zb.addRecursively(src, new AbsoluteLocation("/source")); // Close the builder to finish writing the archive. zb.close();
The shortcut method setCompressionLevel
on the ZipBuilder object can be used for setting the
default compression level for files without having to create a new
ZipEntryCompressionMethod object.
Example 9.4. Build a Zip archive and set the compression level
// Build the Zip file "myArchive.zip" in the directory targetDir. Use the best // possible (deflate) compression. RandomlyAccessibleFile zipFile = Directories.newFile(targetDir, "myArchive.zip"); // Configure the global Zip builder settings. ZipBuilderSettings settings = new ZipBuilderSettings(). // // Set maximum compression level for the default file compression method // (deflate) setCompressionLevel(CompressionLevel.BEST); // Create the Zip builder ZipBuilder zb = new ZipBuilder(zipFile, settings); // Add the directory hierarchy under the directory src to the location /source // in the archive. zb.addRecursively(src, new AbsoluteLocation("/source")); // Close the builder to finish writing the archive. zb.close();
It is possible to plug in support for new extra field types, new compression methods and new external attribute types in the ZipFile and ZipBuilder objects.
Feature implementations will have to work with raw, binary data read from
and written to Zip files. They will probably find the number types in the
org.at4j.support.lang package and perhaps the
utilities in the org.at4j.support.util package
useful.
This is how to make ZipFile understand a new compression method:
To use the new compression method with the ZipBuilder, use it with the ZipEntrySettings objects for the files that should be compressed using the new method, or with the default file settings objects if all files should be compressed using it.
This is how to make ZipFile understand a new external attribute type:
To use the new external attributes object with the ZipBuilder, use the factory with the ZipEntrySettings objects for the entries that should use the new attributes, or with the default file and directory settings objects if all entries should use them.
This is how to make ZipFile understand a new extra field type:
This is how to add entries using the new extra fields to a ZipBuilder: