Table of Contents
Tar is an ancient file format, originally used for making tape backups (Tape ARchives). A Tar file consists of a list of tar entries. Each entry has a header containing its metadata, followed by the its data. The metadata contains, at least, the following data:
0755 or 0644.There are four significant versions of the Tar file format:
Each format is backwards compatible with earlier formats.
The Tar file format does not support any kind of compression of its entries. However, the Tar file itself is often compressed using gzip or bzip2 compression.
For more information on the Tar file format, see the Wikipedia article on Tar and the Gnu Tar manual.
There is no standard dictating which character encoding to use for a Tar entry's text metadata, such as its path. Unix Tar programs use the platform's default charset (often UTF-8 or ISO8859-1), while Windows programs often use Codepage 437. Pax metadata variables are always encoded in UTF-8.
The following significant Tar features are not supported:
A Tar archive is read by creating a TarFile object on the Tar file. The TarFile object contains a TarEntry object for each entry in the archive.
Example 8.1. Reading data from a Tar archive
// Read data from the Tar file f // The UTF-8 charset Charset utf8 = Charset.forName("utf8"); // Create the Tar file object. Text in the Tar file is encoded in UTF-8 TarFile tf = new TarFile(f, utf8); try { // Print out the names of the child entries of the directory entry /d TarDirectoryEntry d = (TarDirectoryEntry) tf.get(new AbsoluteLocation("/d")); System.out.println("Contents of /d: " + d.getChildEntries().keySet()); // Print out the contents of the file /d/f TarFileEntry df = (TarFileEntry) d.getChildEntries().get("f"); // Use the EntityFS utility class Files to read the text in the file. System.out.println(Files.readTextFile(df, utf8)); } finally { // Close the Tar archive to release all resources associated with it. tf.close(); }
To access file format version-specific data, the TarEntry objects can be cast to the types representing each Tar file format:
Table 8.1. Tar entry objects
| Format | Base | File entries | Directory entries | Symbolic link entries | 
|---|---|---|---|---|
| Unix V7 | TarEntry | TarFileEntry | TarDirectoryEntry | TarSymbolicLinkEntry | 
| Ustar | UstarEntry | UstarFileEntry | UstarDirectoryEntry | UstarSymbolicLinkEntry | 
| Gnu Tar | UstarEntry | UstarFileEntry | UstarDirectoryEntry | UstarSymbolicLinkEntry | 
| Pax | PaxEntry | PaxFileEntry | PaxDirectoryEntry | PaxSymbolicLinkEntry | 
More sophisticated entry types inherit from their less sophisticated brethren,
for instance PaxFileEntry → UstarFileEntry →
TarFileEntry.
The root directory entry in the TarFile, i.e. the directory
entry with the absolute location / in the
archive, is never present in the Tar archive itself. It is always of the type
TarDirectoryEntry.
The next example shows how a pax variable for an entry in a Posix.1-2001- compatible Tar archive is read:
Example 8.2. Reading a pax variable for an entry
// Parse the Tar archive in the file f // The contents of this Tar archive is encoded with UTF-8. Most of its metadata // are stored in PAX variables which always are encoded in UTF-8, so if we did // not know the archive's encoding beforehand, that would probably not matter. TarFile tf = new TarFile(f, Charset.forName("utf8")); try { // The Tar entry for the file räksmörgås.txt (räksmörgås = shrimp sandwich) PaxFileEntry fe = (PaxFileEntry) tf.get( new AbsoluteLocation("/räksmörgås.txt")); // Print out all Pax variable names System.out.println("Pax variables: " + fe.getPaxVariables().keySet()); // Print out the value of the ctime variable (file creation time) System.out.println("ctime: " + fe.getPaxVariables().get("ctime")); } finally { // Close the Tar archive to release its associated resources tf.close(); }
To extract entries from a Tar archive, use the TarExtractor. It extracts entries while parsing the archive, which makes it faster than the more generic ArchiveExtractor. The extraction process can be configured with a TarExtractSpecification object.
Example 8.3. Extracting Java source files from a Tar archive
// Extract XML and Java source files from the Tar archive in the file f to the // target directory d. The archive is compressed using gzip. // Use a GZipReadableFile to transparently decompress the file contents. ReadableFile decompressedView = new GZipReadableFile(f); TarExtractor te = new TarExtractor(decompressedView); // Create a custom specification object TarExtractSpecification spec = new TarExtractSpecification(). // // Don't overwrite files setOverwriteStrategy(DontOverwriteAndLogWarning.INSTANCE). // // Only extract XML and Java source files. // Filter on data found in the Tar entry header. The filters used are from // the org.at4j.tar package and they implement EntityFS' ConvenientFilter // interface and the marker interface TarEntryHeaderDataFilter. Custom filters are // easy to implement. // // We choose to only extract files. Necessary parent directories will be // created automatically. // // Be sure to get the parentheses right when combining filters! setFilter( TarFileEntryFilter.FILTER.and( new TarEntryNameGlobFilter("*.java").or( new TarEntryNameGlobFilter("*.xml")))). // // The archive is encoded using UTF-8. setFileNameCharset(Charset.forName("utf8")); // Extract! te.extract(d, spec);
There are two different classes for creating Tar archives: TarBuilder and TarStreamBuilder. TarBuilder is a StreamAddCapableArchiveBuilder, but it requires a RandomlyAccessibleFile to write to. TarStreamBuilder is not stream add capable, but it makes do with only a WritableFile to write data to[1].
Both Tar archive builders use a TarEntryStrategy object that determines which Tar file format version that the created archive will be compatible with. The available strategies are V7TarEntryStrategy, UstarEntryStrategy, GnuTarEntryStrategy and PaxTarEntryStrategy. The default strategy is the GnuTarEntryStrategy.
The configurable metadata for each added Tar entry is represented by a TarEntrySettings object. The effective metadata for the entry is arrived at using the process described in the section called “Determining the metadata for an entry”.
Below is an example that shows how a Tar archive is built using the TarBuilder.
Example 8.4. Build a Tar archive using the Tar builder
// Build the Tar file "myArchive.tar" in the directory targetDir. RandomlyAccessibleFile tarFile = Directories.newFile(targetDir, "myArchive.tar"); // Configure global Tar builder settings. TarBuilderSettings settings = new TarBuilderSettings(). // // Make files and directories owned by the user rmoore (1234), group bonds // (4321). // // The settings object we create here will be combined with the default // default settings, which means that we only have to set the properties that // we want to change from the default values. setDefaultFileEntrySettings( new TarEntrySettings(). setOwnerUid(1234). setOwnerUserName("rmoore"). setOwnerGid(4321). setOwnerGroupName("bonds")). setDefaultDirectoryEntrySettings( new TarEntrySettings(). setOwnerUid(1234). setOwnerUserName("rmoore"). setOwnerGid(4321). setOwnerGroupName("bonds")). // // Use a Tar entry strategy that will create a Posix.1-2001-compatible // archive setEntryStrategy( // Encode file names using UTF-8 new PaxTarEntryStrategy(Charset.forName("utf8"))); // Create the Tar builder TarBuilder builder = new TarBuilder(tarFile, settings); // Add a global rule that says that script files should be executable. builder.addRule( new ArchiveEntrySettingsRule<TarEntrySettings>( // // The global rule's settings new TarEntrySettings(). // // The code is an octal value, the same as is used with the chmod command. setEntityMode(UnixEntityMode.forCode(0755)), // // The global rule's filter new NameGlobETAF("*.sh"))); // Add all files and directories from the src directory to the /source directory // in the archive builder.addRecursively(src, new AbsoluteLocation("/source")); // Add the headlines from The Times online to indicate the build date... // Open a stream InputStream is = new URL("http://www.timesonline.co.uk/tol/feeds/rss/topstories.xml"). openStream(); try { builder.add(is, new AbsoluteLocation("/todays_news.xml")); } finally { is.close(); } // Close the builder to finish writing the archive. builder.close();
The following example shows how a Tar archive is built and compressed using the TarStreamBuilder
Example 8.5. Build a Tar archive using the Tar stream builder
// Build the Tar file "myArchive.tar.bz2" in the directory targetDir. // Use a BZip2WritableFile to compress the archive while it is created. WritableFile tarFile = new BZip2WritableFile( Directories.newFile(targetDir, "myArchive.tar.bz2")); // Configure global Tar builder settings. // Use the default Tar entry strategy (GnuTarEntryStrategy). TarBuilderSettings settings = new TarBuilderSettings(). // // Files are not world readable setDefaultFileEntrySettings( new TarEntrySettings(). setEntityMode(UnixEntityMode.forCode(0640))); // Create the Tar builder TarStreamBuilder builder = new TarStreamBuilder(tarFile, settings); // Add two files builder.add( new NamedReadableFileAdapter( new CharSequenceReadableFile("The contents of this file are secret!"), "secret.txt"), AbsoluteLocation.ROOT_DIR); builder.add( new NamedReadableFileAdapter( new CharSequenceReadableFile("The contents of this file are public!"), "public.txt"), AbsoluteLocation.ROOT_DIR, // // Use custom settings for this file new TarEntrySettings(). setEntityMode(UnixEntityMode.forCode(0644))); // Close the builder to finish the file. builder.close();
The Tar class has a runnable main method that emulates
the behavior of the tar command. See its API
documentation for details on how to use it.
[1] This means that a program can give the Tar stream builder a transparently compressing writable file implementation such as GZipWritableFile, BZip2WritableFile or LzmaWritableFile to have the archive compressed while it is created.