source: http://www.pkware.com/documents/casestudies/APPNOTE.TXT␊ |
␊ |
File: APPNOTE.TXT - .ZIP File Format Specification␊ |
Version: 6.3.1 ␊ |
Revised: April 11, 2007␊ |
Copyright (c) 1989 - 2007 PKWARE Inc., All Rights Reserved.␊ |
␊ |
The use of certain technological aspects disclosed in the current␊ |
APPNOTE is available pursuant to the below section entitled␊ |
"Incorporating PKWARE Proprietary Technology into Your Product".␊ |
␊ |
I. Purpose␊ |
----------␊ |
␊ |
This specification is intended to define a cross-platform,␊ |
interoperable file storage and transfer format. Since its ␊ |
first publication in 1989, PKWARE has remained committed to ␊ |
ensuring the interoperability of the .ZIP file format through ␊ |
publication and maintenance of this specification. We trust that ␊ |
all .ZIP compatible vendors and application developers that have ␊ |
adopted and benefited from this format will share and support ␊ |
this commitment to interoperability.␊ |
␊ |
II. Contacting PKWARE␊ |
---------------------␊ |
␊ |
PKWARE, Inc.␊ |
648 N. Plankinton Avenue, Suite 220␊ |
Milwaukee, WI 53203␊ |
+1-414-289-9788␊ |
+1-414-289-9789 FAX␊ |
zipformat@pkware.com␊ |
␊ |
III. Disclaimer␊ |
---------------␊ |
␊ |
Although PKWARE will attempt to supply current and accurate␊ |
information relating to its file formats, algorithms, and the␊ |
subject programs, the possibility of error or omission cannot ␊ |
be eliminated. PKWARE therefore expressly disclaims any warranty ␊ |
that the information contained in the associated materials relating ␊ |
to the subject programs and/or the format of the files created or␊ |
accessed by the subject programs and/or the algorithms used by␊ |
the subject programs, or any other matter, is current, correct or␊ |
accurate as delivered. Any risk of damage due to any possible␊ |
inaccurate information is assumed by the user of the information.␊ |
Furthermore, the information relating to the subject programs␊ |
and/or the file formats created or accessed by the subject␊ |
programs and/or the algorithms used by the subject programs is␊ |
subject to change without notice.␊ |
␊ |
If the version of this file is marked as a NOTIFICATION OF CHANGE,␊ |
the content defines an Early Feature Specification (EFS) change ␊ |
to the .ZIP file format that may be subject to modification prior ␊ |
to publication of the Final Feature Specification (FFS). This␊ |
document may also contain information on Planned Feature ␊ |
Specifications (PFS) defining recognized future extensions.␊ |
␊ |
IV. Change Log␊ |
--------------␊ |
␊ |
Version Change Description Date␊ |
------- ------------------ ----------␊ |
5.2 -Single Password Symmetric Encryption 06/02/2003␊ |
storage␊ |
␊ |
6.1.0 -Smartcard compatibility 01/20/2004␊ |
-Documentation on certificate storage␊ |
␊ |
6.2.0 -Introduction of Central Directory 04/26/2004␊ |
Encryption for encrypting metadata␊ |
-Added OS/X to Version Made By values␊ |
␊ |
6.2.1 -Added Extra Field placeholder for 04/01/2005␊ |
POSZIP using ID 0x4690␊ |
␊ |
-Clarified size field on ␊ |
"zip64 end of central directory record"␊ |
␊ |
6.2.2 -Documented Final Feature Specification 01/06/2006␊ |
for Strong Encryption␊ |
␊ |
-Clarifications and typographical ␊ |
corrections␊ |
␊ |
6.3.0 -Added tape positioning storage 09/29/2006␊ |
parameters␊ |
␊ |
-Expanded list of supported hash algorithms␊ |
␊ |
-Expanded list of supported compression␊ |
algorithms␊ |
␊ |
-Expanded list of supported encryption␊ |
algorithms␊ |
␊ |
-Added option for Unicode filename ␊ |
storage␊ |
␊ |
-Clarifications for consistent use␊ |
of Data Descriptor records␊ |
␊ |
-Added additional "Extra Field" ␊ |
definitions␊ |
␊ |
6.3.1 -Corrected standard hash values for 04/11/2007␊ |
SHA-256/384/512␊ |
␊ |
␊ |
V. General Format of a .ZIP file␊ |
--------------------------------␊ |
␊ |
Files stored in arbitrary order. Large .ZIP files can span multiple␊ |
volumes or be split into user-defined segment sizes. All values␊ |
are stored in little-endian byte order unless otherwise specified. ␊ |
␊ |
Overall .ZIP file format:␊ |
␊ |
[local file header 1]␊ |
[file data 1]␊ |
[data descriptor 1]␊ |
. ␊ |
.␊ |
.␊ |
[local file header n]␊ |
[file data n]␊ |
[data descriptor n]␊ |
[archive decryption header] ␊ |
[archive extra data record] ␊ |
[central directory]␊ |
[zip64 end of central directory record]␊ |
[zip64 end of central directory locator] ␊ |
[end of central directory record]␊ |
␊ |
␊ |
A. Local file header:␊ |
␊ |
local file header signature 4 bytes (0x04034b50)␊ |
version needed to extract 2 bytes␊ |
general purpose bit flag 2 bytes␊ |
compression method 2 bytes␊ |
last mod file time 2 bytes␊ |
last mod file date 2 bytes␊ |
crc-32 4 bytes␊ |
compressed size 4 bytes␊ |
uncompressed size 4 bytes␊ |
file name length 2 bytes␊ |
extra field length 2 bytes␊ |
␊ |
file name (variable size)␊ |
extra field (variable size)␊ |
␊ |
B. File data␊ |
␊ |
Immediately following the local header for a file␊ |
is the compressed or stored data for the file. ␊ |
The series of [local file header][file data][data␊ |
descriptor] repeats for each file in the .ZIP archive. ␊ |
␊ |
C. Data descriptor:␊ |
␊ |
crc-32 4 bytes␊ |
compressed size 4 bytes␊ |
uncompressed size 4 bytes␊ |
␊ |
This descriptor exists only if bit 3 of the general␊ |
purpose bit flag is set (see below). It is byte aligned␊ |
and immediately follows the last byte of compressed data.␊ |
This descriptor is used only when it was not possible to␊ |
seek in the output .ZIP file, e.g., when the output .ZIP file␊ |
was standard output or a non-seekable device. For ZIP64(tm) format␊ |
archives, the compressed and uncompressed sizes are 8 bytes each.␊ |
␊ |
When compressing files, compressed and uncompressed sizes ␊ |
should be stored in ZIP64 format (as 8 byte values) when a ␊ |
files size exceeds 0xFFFFFFFF. However ZIP64 format may be ␊ |
used regardless of the size of a file. When extracting, if ␊ |
the zip64 extended information extra field is present for ␊ |
the file the compressed and uncompressed sizes will be 8␊ |
byte values. ␊ |
␊ |
Although not originally assigned a signature, the value ␊ |
0x08074b50 has commonly been adopted as a signature value ␊ |
for the data descriptor record. Implementers should be ␊ |
aware that ZIP files may be encountered with or without this ␊ |
signature marking data descriptors and should account for␊ |
either case when reading ZIP files to ensure compatibility.␊ |
When writing ZIP files, it is recommended to include the␊ |
signature value marking the data descriptor record. When␊ |
the signature is used, the fields currently defined for␊ |
the data descriptor record will immediately follow the␊ |
signature.␊ |
␊ |
An extensible data descriptor will be released in a future␊ |
version of this APPNOTE. This new record is intended to␊ |
resolve conflicts with the use of this record going forward,␊ |
and to provide better support for streamed file processing.␊ |
␊ |
When the Central Directory Encryption method is used, the data␊ |
descriptor record is not required, but may be used. If present,␊ |
and bit 3 of the general purpose bit field is set to indicate␊ |
its presence, the values in fields of the data descriptor␊ |
record should be set to binary zeros.␊ |
␊ |
D. Archive decryption header: ␊ |
␊ |
The Archive Decryption Header is introduced in version 6.2␊ |
of the ZIP format specification. This record exists in support␊ |
of the Central Directory Encryption Feature implemented as part of ␊ |
the Strong Encryption Specification as described in this document.␊ |
When the Central Directory Structure is encrypted, this decryption␊ |
header will precede the encrypted data segment. The encrypted␊ |
data segment will consist of the Archive extra data record (if␊ |
present) and the encrypted Central Directory Structure data.␊ |
The format of this data record is identical to the Decryption␊ |
header record preceding compressed file data. If the central ␊ |
directory structure is encrypted, the location of the start of␊ |
this data record is determined using the Start of Central Directory␊ |
field in the Zip64 End of Central Directory record. Refer to the ␊ |
section on the Strong Encryption Specification for information␊ |
on the fields used in the Archive Decryption Header record.␊ |
␊ |
␊ |
E. Archive extra data record: ␊ |
␊ |
archive extra data signature 4 bytes (0x08064b50)␊ |
extra field length 4 bytes␊ |
extra field data (variable size)␊ |
␊ |
The Archive Extra Data Record is introduced in version 6.2␊ |
of the ZIP format specification. This record exists in support␊ |
of the Central Directory Encryption Feature implemented as part of ␊ |
the Strong Encryption Specification as described in this document.␊ |
When present, this record immediately precedes the central ␊ |
directory data structure. The size of this data record will be␊ |
included in the Size of the Central Directory field in the␊ |
End of Central Directory record. If the central directory structure␊ |
is compressed, but not encrypted, the location of the start of␊ |
this data record is determined using the Start of Central Directory␊ |
field in the Zip64 End of Central Directory record. ␊ |
␊ |
␊ |
F. Central directory structure:␊ |
␊ |
[file header 1]␊ |
.␊ |
.␊ |
. ␊ |
[file header n]␊ |
[digital signature] ␊ |
␊ |
File header:␊ |
␊ |
central file header signature 4 bytes (0x02014b50)␊ |
version made by 2 bytes␊ |
version needed to extract 2 bytes␊ |
general purpose bit flag 2 bytes␊ |
compression method 2 bytes␊ |
last mod file time 2 bytes␊ |
last mod file date 2 bytes␊ |
crc-32 4 bytes␊ |
compressed size 4 bytes␊ |
uncompressed size 4 bytes␊ |
file name length 2 bytes␊ |
extra field length 2 bytes␊ |
file comment length 2 bytes␊ |
disk number start 2 bytes␊ |
internal file attributes 2 bytes␊ |
external file attributes 4 bytes␊ |
relative offset of local header 4 bytes␊ |
␊ |
file name (variable size)␊ |
extra field (variable size)␊ |
file comment (variable size)␊ |
␊ |
Digital signature:␊ |
␊ |
header signature 4 bytes (0x05054b50)␊ |
size of data 2 bytes␊ |
signature data (variable size)␊ |
␊ |
With the introduction of the Central Directory Encryption ␊ |
feature in version 6.2 of this specification, the Central ␊ |
Directory Structure may be stored both compressed and encrypted. ␊ |
Although not required, it is assumed when encrypting the␊ |
Central Directory Structure, that it will be compressed␊ |
for greater storage efficiency. Information on the␊ |
Central Directory Encryption feature can be found in the section␊ |
describing the Strong Encryption Specification. The Digital ␊ |
Signature record will be neither compressed nor encrypted.␊ |
␊ |
G. Zip64 end of central directory record␊ |
␊ |
zip64 end of central dir ␊ |
signature 4 bytes (0x06064b50)␊ |
size of zip64 end of central␊ |
directory record 8 bytes␊ |
version made by 2 bytes␊ |
version needed to extract 2 bytes␊ |
number of this disk 4 bytes␊ |
number of the disk with the ␊ |
start of the central directory 4 bytes␊ |
total number of entries in the␊ |
central directory on this disk 8 bytes␊ |
total number of entries in the␊ |
central directory 8 bytes␊ |
size of the central directory 8 bytes␊ |
offset of start of central␊ |
directory with respect to␊ |
the starting disk number 8 bytes␊ |
zip64 extensible data sector (variable size)␊ |
␊ |
The value stored into the "size of zip64 end of central␊ |
directory record" should be the size of the remaining␊ |
record and should not include the leading 12 bytes.␊ |
␊ |
Size = SizeOfFixedFields + SizeOfVariableData - 12.␊ |
␊ |
The above record structure defines Version 1 of the ␊ |
zip64 end of central directory record. Version 1 was ␊ |
implemented in versions of this specification preceding ␊ |
6.2 in support of the ZIP64 large file feature. The ␊ |
introduction of the Central Directory Encryption feature ␊ |
implemented in version 6.2 as part of the Strong Encryption ␊ |
Specification defines Version 2 of this record structure. ␊ |
Refer to the section describing the Strong Encryption ␊ |
Specification for details on the version 2 format for ␊ |
this record.␊ |
␊ |
Special purpose data may reside in the zip64 extensible data␊ |
sector field following either a V1 or V2 version of this␊ |
record. To ensure identification of this special purpose data␊ |
it must include an identifying header block consisting of the␊ |
following:␊ |
␊ |
Header ID - 2 bytes␊ |
Data Size - 4 bytes␊ |
␊ |
The Header ID field indicates the type of data that is in the ␊ |
data block that follows.␊ |
␊ |
Data Size identifies the number of bytes that follow for this␊ |
data block type.␊ |
␊ |
Multiple special purpose data blocks may be present, but each␊ |
must be preceded by a Header ID and Data Size field. Current␊ |
mappings of Header ID values supported in this field are as␊ |
defined in APPENDIX C.␊ |
␊ |
H. Zip64 end of central directory locator␊ |
␊ |
zip64 end of central dir locator ␊ |
signature 4 bytes (0x07064b50)␊ |
number of the disk with the␊ |
start of the zip64 end of ␊ |
central directory 4 bytes␊ |
relative offset of the zip64␊ |
end of central directory record 8 bytes␊ |
total number of disks 4 bytes␊ |
␊ |
I. End of central directory record:␊ |
␊ |
end of central dir signature 4 bytes (0x06054b50)␊ |
number of this disk 2 bytes␊ |
number of the disk with the␊ |
start of the central directory 2 bytes␊ |
total number of entries in the␊ |
central directory on this disk 2 bytes␊ |
total number of entries in␊ |
the central directory 2 bytes␊ |
size of the central directory 4 bytes␊ |
offset of start of central␊ |
directory with respect to␊ |
the starting disk number 4 bytes␊ |
.ZIP file comment length 2 bytes␊ |
.ZIP file comment (variable size)␊ |
␊ |
J. Explanation of fields:␊ |
␊ |
version made by (2 bytes)␊ |
␊ |
The upper byte indicates the compatibility of the file␊ |
attribute information. If the external file attributes ␊ |
are compatible with MS-DOS and can be read by PKZIP for ␊ |
DOS version 2.04g then this value will be zero. If these ␊ |
attributes are not compatible, then this value will ␊ |
identify the host system on which the attributes are ␊ |
compatible. Software can use this information to determine␊ |
the line record format for text files etc. The current␊ |
mappings are:␊ |
␊ |
0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)␊ |
1 - Amiga 2 - OpenVMS␊ |
3 - UNIX 4 - VM/CMS␊ |
5 - Atari ST 6 - OS/2 H.P.F.S.␊ |
7 - Macintosh 8 - Z-System␊ |
9 - CP/M 10 - Windows NTFS␊ |
11 - MVS (OS/390 - Z/OS) 12 - VSE␊ |
13 - Acorn Risc 14 - VFAT␊ |
15 - alternate MVS 16 - BeOS␊ |
17 - Tandem 18 - OS/400␊ |
19 - OS/X (Darwin) 20 thru 255 - unused␊ |
␊ |
The lower byte indicates the ZIP specification version ␊ |
(the version of this document) supported by the software ␊ |
used to encode the file. The value/10 indicates the major ␊ |
version number, and the value mod 10 is the minor version ␊ |
number. ␊ |
␊ |
version needed to extract (2 bytes)␊ |
␊ |
The minimum supported ZIP specification version needed to ␊ |
extract the file, mapped as above. This value is based on ␊ |
the specific format features a ZIP program must support to ␊ |
be able to extract the file. If multiple features are␊ |
applied to a file, the minimum version should be set to the ␊ |
feature having the highest value. New features or feature ␊ |
changes affecting the published format specification will be ␊ |
implemented using higher version numbers than the last ␊ |
published value to avoid conflict.␊ |
␊ |
Current minimum feature versions are as defined below:␊ |
␊ |
1.0 - Default value␊ |
1.1 - File is a volume label␊ |
2.0 - File is a folder (directory)␊ |
2.0 - File is compressed using Deflate compression␊ |
2.0 - File is encrypted using traditional PKWARE encryption␊ |
2.1 - File is compressed using Deflate64(tm)␊ |
2.5 - File is compressed using PKWARE DCL Implode ␊ |
2.7 - File is a patch data set ␊ |
4.5 - File uses ZIP64 format extensions␊ |
4.6 - File is compressed using BZIP2 compression*␊ |
5.0 - File is encrypted using DES␊ |
5.0 - File is encrypted using 3DES␊ |
5.0 - File is encrypted using original RC2 encryption␊ |
5.0 - File is encrypted using RC4 encryption␊ |
5.1 - File is encrypted using AES encryption␊ |
5.1 - File is encrypted using corrected RC2 encryption**␊ |
5.2 - File is encrypted using corrected RC2-64 encryption**␊ |
6.1 - File is encrypted using non-OAEP key wrapping***␊ |
6.2 - Central directory encryption␊ |
6.3 - File is compressed using LZMA␊ |
6.3 - File is compressed using PPMd+␊ |
6.3 - File is encrypted using Blowfish␊ |
6.3 - File is encrypted using Twofish␊ |
␊ |
␊ |
* Early 7.x (pre-7.2) versions of PKZIP incorrectly set the␊ |
version needed to extract for BZIP2 compression to be 50␊ |
when it should have been 46.␊ |
␊ |
** Refer to the section on Strong Encryption Specification␊ |
for additional information regarding RC2 corrections.␊ |
␊ |
*** Certificate encryption using non-OAEP key wrapping is the␊ |
intended mode of operation for all versions beginning with 6.1.␊ |
Support for OAEP key wrapping should only be used for␊ |
backward compatibility when sending ZIP files to be opened by␊ |
versions of PKZIP older than 6.1 (5.0 or 6.0).␊ |
␊ |
+ Files compressed using PPMd should set the version␊ |
needed to extract field to 6.3, however, not all ZIP ␊ |
programs enforce this and may be unable to decompress ␊ |
data files compressed using PPMd if this value is set.␊ |
␊ |
When using ZIP64 extensions, the corresponding value in the␊ |
zip64 end of central directory record should also be set. ␊ |
This field should be set appropriately to indicate whether ␊ |
Version 1 or Version 2 format is in use. ␊ |
␊ |
general purpose bit flag: (2 bytes)␊ |
␊ |
Bit 0: If set, indicates that the file is encrypted.␊ |
␊ |
(For Method 6 - Imploding)␊ |
Bit 1: If the compression method used was type 6,␊ |
Imploding, then this bit, if set, indicates␊ |
an 8K sliding dictionary was used. If clear,␊ |
then a 4K sliding dictionary was used.␊ |
Bit 2: If the compression method used was type 6,␊ |
Imploding, then this bit, if set, indicates␊ |
3 Shannon-Fano trees were used to encode the␊ |
sliding dictionary output. If clear, then 2␊ |
Shannon-Fano trees were used.␊ |
␊ |
(For Methods 8 and 9 - Deflating)␊ |
Bit 2 Bit 1␊ |
0 0 Normal (-en) compression option was used.␊ |
0 1 Maximum (-exx/-ex) compression option was used.␊ |
1 0 Fast (-ef) compression option was used.␊ |
1 1 Super Fast (-es) compression option was used.␊ |
␊ |
(For Method 14 - LZMA)␊ |
Bit 1: If the compression method used was type 14,␊ |
LZMA, then this bit, if set, indicates␊ |
an end-of-stream (EOS) marker is used to␊ |
mark the end of the compressed data stream.␊ |
If clear, then an EOS marker is not present␊ |
and the compressed data size must be known␊ |
to extract.␊ |
␊ |
Note: Bits 1 and 2 are undefined if the compression␊ |
method is any other.␊ |
␊ |
Bit 3: If this bit is set, the fields crc-32, compressed ␊ |
size and uncompressed size are set to zero in the ␊ |
local header. The correct values are put in the ␊ |
data descriptor immediately following the compressed␊ |
data. (Note: PKZIP version 2.04g for DOS only ␊ |
recognizes this bit for method 8 compression, newer ␊ |
versions of PKZIP recognize this bit for any ␊ |
compression method.)␊ |
␊ |
Bit 4: Reserved for use with method 8, for enhanced␊ |
deflating. ␊ |
␊ |
Bit 5: If this bit is set, this indicates that the file is ␊ |
compressed patched data. (Note: Requires PKZIP ␊ |
version 2.70 or greater)␊ |
␊ |
Bit 6: Strong encryption. If this bit is set, you should␊ |
set the version needed to extract value to at least␊ |
50 and you must also set bit 0. If AES encryption␊ |
is used, the version needed to extract value must ␊ |
be at least 51.␊ |
␊ |
Bit 7: Currently unused.␊ |
␊ |
Bit 8: Currently unused.␊ |
␊ |
Bit 9: Currently unused.␊ |
␊ |
Bit 10: Currently unused.␊ |
␊ |
Bit 11: Language encoding flag (EFS). If this bit is set,␊ |
the filename and comment fields for this file␊ |
must be encoded using UTF-8. (see APPENDIX D)␊ |
␊ |
Bit 12: Reserved by PKWARE for enhanced compression.␊ |
␊ |
Bit 13: Used when encrypting the Central Directory to indicate ␊ |
selected data values in the Local Header are masked to␊ |
hide their actual values. See the section describing ␊ |
the Strong Encryption Specification for details.␊ |
␊ |
Bit 14: Reserved by PKWARE.␊ |
␊ |
Bit 15: Reserved by PKWARE.␊ |
␊ |
compression method: (2 bytes)␊ |
␊ |
(see accompanying documentation for algorithm␊ |
descriptions)␊ |
␊ |
0 - The file is stored (no compression)␊ |
1 - The file is Shrunk␊ |
2 - The file is Reduced with compression factor 1␊ |
3 - The file is Reduced with compression factor 2␊ |
4 - The file is Reduced with compression factor 3␊ |
5 - The file is Reduced with compression factor 4␊ |
6 - The file is Imploded␊ |
7 - Reserved for Tokenizing compression algorithm␊ |
8 - The file is Deflated␊ |
9 - Enhanced Deflating using Deflate64(tm)␊ |
10 - PKWARE Data Compression Library Imploding (old IBM TERSE)␊ |
11 - Reserved by PKWARE␊ |
12 - File is compressed using BZIP2 algorithm␊ |
13 - Reserved by PKWARE␊ |
14 - LZMA (EFS)␊ |
15 - Reserved by PKWARE␊ |
16 - Reserved by PKWARE␊ |
17 - Reserved by PKWARE␊ |
18 - File is compressed using IBM TERSE (new)␊ |
19 - IBM LZ77 z Architecture (PFS)␊ |
98 - PPMd version I, Rev 1␊ |
␊ |
date and time fields: (2 bytes each)␊ |
␊ |
The date and time are encoded in standard MS-DOS format.␊ |
If input came from standard input, the date and time are␊ |
those at which compression was started for this data. ␊ |
If encrypting the central directory and general purpose bit ␊ |
flag 13 is set indicating masking, the value stored in the ␊ |
Local Header will be zero. ␊ |
␊ |
CRC-32: (4 bytes)␊ |
␊ |
The CRC-32 algorithm was generously contributed by␊ |
David Schwaderer and can be found in his excellent␊ |
book "C Programmers Guide to NetBIOS" published by␊ |
Howard W. Sams & Co. Inc. The 'magic number' for␊ |
the CRC is 0xdebb20e3. The proper CRC pre and post␊ |
conditioning is used, meaning that the CRC register␊ |
is pre-conditioned with all ones (a starting value␊ |
of 0xffffffff) and the value is post-conditioned by␊ |
taking the one's complement of the CRC residual.␊ |
If bit 3 of the general purpose flag is set, this␊ |
field is set to zero in the local header and the correct␊ |
value is put in the data descriptor and in the central␊ |
directory. When encrypting the central directory, if the␊ |
local header is not in ZIP64 format and general purpose ␊ |
bit flag 13 is set indicating masking, the value stored ␊ |
in the Local Header will be zero. ␊ |
␊ |
compressed size: (4 bytes)␊ |
uncompressed size: (4 bytes)␊ |
␊ |
The size of the file compressed and uncompressed,␊ |
respectively. When a decryption header is present it will␊ |
be placed in front of the file data and the value of the␊ |
compressed file size will include the bytes of the decryption␊ |
header. If bit 3 of the general purpose bit flag is set, ␊ |
these fields are set to zero in the local header and the ␊ |
correct values are put in the data descriptor and␊ |
in the central directory. If an archive is in ZIP64 format␊ |
and the value in this field is 0xFFFFFFFF, the size will be␊ |
in the corresponding 8 byte ZIP64 extended information ␊ |
extra field. When encrypting the central directory, if the␊ |
local header is not in ZIP64 format and general purpose bit ␊ |
flag 13 is set indicating masking, the value stored for the ␊ |
uncompressed size in the Local Header will be zero. ␊ |
␊ |
file name length: (2 bytes)␊ |
extra field length: (2 bytes)␊ |
file comment length: (2 bytes)␊ |
␊ |
The length of the file name, extra field, and comment␊ |
fields respectively. The combined length of any␊ |
directory record and these three fields should not␊ |
generally exceed 65,535 bytes. If input came from standard␊ |
input, the file name length is set to zero. ␊ |
␊ |
disk number start: (2 bytes)␊ |
␊ |
The number of the disk on which this file begins. If an ␊ |
archive is in ZIP64 format and the value in this field is ␊ |
0xFFFF, the size will be in the corresponding 4 byte zip64 ␊ |
extended information extra field.␊ |
␊ |
internal file attributes: (2 bytes)␊ |
␊ |
Bits 1 and 2 are reserved for use by PKWARE.␊ |
␊ |
The lowest bit of this field indicates, if set, that␊ |
the file is apparently an ASCII or text file. If not␊ |
set, that the file apparently contains binary data.␊ |
The remaining bits are unused in version 1.0.␊ |
␊ |
The 0x0002 bit of this field indicates, if set, that a ␊ |
4 byte variable record length control field precedes each ␊ |
logical record indicating the length of the record. The ␊ |
record length control field is stored in little-endian byte␊ |
order. This flag is independent of text control characters, ␊ |
and if used in conjunction with text data, includes any ␊ |
control characters in the total length of the record. This ␊ |
value is provided for mainframe data transfer support.␊ |
␊ |
external file attributes: (4 bytes)␊ |
␊ |
The mapping of the external attributes is␊ |
host-system dependent (see 'version made by'). For␊ |
MS-DOS, the low order byte is the MS-DOS directory␊ |
attribute byte. If input came from standard input, this␊ |
field is set to zero.␊ |
␊ |
relative offset of local header: (4 bytes)␊ |
␊ |
This is the offset from the start of the first disk on␊ |
which this file appears, to where the local header should␊ |
be found. If an archive is in ZIP64 format and the value␊ |
in this field is 0xFFFFFFFF, the size will be in the ␊ |
corresponding 8 byte zip64 extended information extra field.␊ |
␊ |
file name: (Variable)␊ |
␊ |
The name of the file, with optional relative path.␊ |
The path stored should not contain a drive or␊ |
device letter, or a leading slash. All slashes␊ |
should be forward slashes '/' as opposed to␊ |
backwards slashes '\' for compatibility with Amiga␊ |
and UNIX file systems etc. If input came from standard␊ |
input, there is no file name field. If encrypting␊ |
the central directory and general purpose bit flag 13 is set ␊ |
indicating masking, the file name stored in the Local Header ␊ |
will not be the actual file name. A masking value consisting ␊ |
of a unique hexadecimal value will be stored. This value will ␊ |
be sequentially incremented for each file in the archive. See␊ |
the section on the Strong Encryption Specification for details ␊ |
on retrieving the encrypted file name. ␊ |
␊ |
extra field: (Variable)␊ |
␊ |
This is for expansion. If additional information␊ |
needs to be stored for special needs or for specific ␊ |
platforms, it should be stored here. Earlier versions ␊ |
of the software can then safely skip this file, and ␊ |
find the next file or header. This field will be 0 ␊ |
length in version 1.0.␊ |
␊ |
In order to allow different programs and different types␊ |
of information to be stored in the 'extra' field in .ZIP␊ |
files, the following structure should be used for all␊ |
programs storing data in this field:␊ |
␊ |
header1+data1 + header2+data2 . . .␊ |
␊ |
Each header should consist of:␊ |
␊ |
Header ID - 2 bytes␊ |
Data Size - 2 bytes␊ |
␊ |
Note: all fields stored in Intel low-byte/high-byte order.␊ |
␊ |
The Header ID field indicates the type of data that is in␊ |
the following data block.␊ |
␊ |
Header ID's of 0 thru 31 are reserved for use by PKWARE.␊ |
The remaining ID's can be used by third party vendors for␊ |
proprietary usage.␊ |
␊ |
The current Header ID mappings defined by PKWARE are:␊ |
␊ |
0x0001 Zip64 extended information extra field␊ |
0x0007 AV Info␊ |
0x0008 Reserved for extended language encoding data (PFS)␊ |
(see APPENDIX D)␊ |
0x0009 OS/2␊ |
0x000a NTFS ␊ |
0x000c OpenVMS␊ |
0x000d UNIX␊ |
0x000e Reserved for file stream and fork descriptors␊ |
0x000f Patch Descriptor␊ |
0x0014 PKCS#7 Store for X.509 Certificates␊ |
0x0015 X.509 Certificate ID and Signature for ␊ |
individual file␊ |
0x0016 X.509 Certificate ID for Central Directory␊ |
0x0017 Strong Encryption Header␊ |
0x0018 Record Management Controls␊ |
0x0019 PKCS#7 Encryption Recipient Certificate List␊ |
0x0065 IBM S/390 (Z390), AS/400 (I400) attributes ␊ |
- uncompressed␊ |
0x0066 Reserved for IBM S/390 (Z390), AS/400 (I400) ␊ |
attributes - compressed␊ |
0x4690 POSZIP 4690 (reserved) ␊ |
␊ |
Third party mappings commonly used are:␊ |
␊ |
␊ |
0x07c8 Macintosh␊ |
0x2605 ZipIt Macintosh␊ |
0x2705 ZipIt Macintosh 1.3.5+␊ |
0x2805 ZipIt Macintosh 1.3.5+␊ |
0x334d Info-ZIP Macintosh␊ |
0x4341 Acorn/SparkFS ␊ |
0x4453 Windows NT security descriptor (binary ACL)␊ |
0x4704 VM/CMS␊ |
0x470f MVS␊ |
0x4b46 FWKCS MD5 (see below)␊ |
0x4c41 OS/2 access control list (text ACL)␊ |
0x4d49 Info-ZIP OpenVMS␊ |
0x4f4c Xceed original location extra field␊ |
0x5356 AOS/VS (ACL)␊ |
0x5455 extended timestamp␊ |
0x554e Xceed unicode extra field␊ |
0x5855 Info-ZIP UNIX (original, also OS/2, NT, etc)␊ |
0x6542 BeOS/BeBox␊ |
0x756e ASi UNIX␊ |
0x7855 Info-ZIP UNIX (new)␊ |
0xa220 Microsoft Open Packaging Growth Hint␊ |
0xfd4a SMS/QDOS␊ |
␊ |
Detailed descriptions of Extra Fields defined by third ␊ |
party mappings will be documented as information on␊ |
these data structures is made available to PKWARE. ␊ |
PKWARE does not guarantee the accuracy of any published␊ |
third party data.␊ |
␊ |
The Data Size field indicates the size of the following␊ |
data block. Programs can use this value to skip to the␊ |
next header block, passing over any data blocks that are␊ |
not of interest.␊ |
␊ |
Note: As stated above, the size of the entire .ZIP file␊ |
header, including the file name, comment, and extra␊ |
field should not exceed 64K in size.␊ |
␊ |
In case two different programs should appropriate the same␊ |
Header ID value, it is strongly recommended that each␊ |
program place a unique signature of at least two bytes in␊ |
size (and preferably 4 bytes or bigger) at the start of␊ |
each data area. Every program should verify that its␊ |
unique signature is present, in addition to the Header ID␊ |
value being correct, before assuming that it is a block of␊ |
known type.␊ |
␊ |
-Zip64 Extended Information Extra Field (0x0001):␊ |
␊ |
The following is the layout of the zip64 extended ␊ |
information "extra" block. If one of the size or␊ |
offset fields in the Local or Central directory␊ |
record is too small to hold the required data,␊ |
a Zip64 extended information record is created.␊ |
The order of the fields in the zip64 extended ␊ |
information record is fixed, but the fields will␊ |
only appear if the corresponding Local or Central␊ |
directory record field is set to 0xFFFF or 0xFFFFFFFF.␊ |
␊ |
Note: all fields stored in Intel low-byte/high-byte order.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(ZIP64) 0x0001 2 bytes Tag for this "extra" block type␊ |
Size 2 bytes Size of this "extra" block␊ |
Original ␊ |
Size 8 bytes Original uncompressed file size␊ |
Compressed␊ |
Size 8 bytes Size of compressed data␊ |
Relative Header␊ |
Offset 8 bytes Offset of local header record␊ |
Disk Start␊ |
Number 4 bytes Number of the disk on which␊ |
this file starts ␊ |
␊ |
This entry in the Local header must include BOTH original␊ |
and compressed file size fields. If encrypting the ␊ |
central directory and bit 13 of the general purpose bit␊ |
flag is set indicating masking, the value stored in the␊ |
Local Header for the original file size will be zero.␊ |
␊ |
␊ |
-OS/2 Extra Field (0x0009):␊ |
␊ |
The following is the layout of the OS/2 attributes "extra" ␊ |
block. (Last Revision 09/05/95)␊ |
␊ |
Note: all fields stored in Intel low-byte/high-byte order.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(OS/2) 0x0009 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size for the following data block␊ |
BSize 4 bytes Uncompressed Block Size␊ |
CType 2 bytes Compression type␊ |
EACRC 4 bytes CRC value for uncompress block␊ |
(var) variable Compressed block␊ |
␊ |
The OS/2 extended attribute structure (FEA2LIST) is ␊ |
compressed and then stored in it's entirety within this ␊ |
structure. There will only ever be one "block" of data in ␊ |
VarFields[].␊ |
␊ |
-NTFS Extra Field (0x000a):␊ |
␊ |
The following is the layout of the NTFS attributes ␊ |
"extra" block. (Note: At this time the Mtime, Atime␊ |
and Ctime values may be used on any WIN32 system.) ␊ |
␊ |
Note: all fields stored in Intel low-byte/high-byte order.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(NTFS) 0x000a 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size of the total "extra" block␊ |
Reserved 4 bytes Reserved for future use␊ |
Tag1 2 bytes NTFS attribute tag value #1␊ |
Size1 2 bytes Size of attribute #1, in bytes␊ |
(var.) Size1 Attribute #1 data␊ |
.␊ |
.␊ |
.␊ |
TagN 2 bytes NTFS attribute tag value #N␊ |
SizeN 2 bytes Size of attribute #N, in bytes␊ |
(var.) SizeN Attribute #N data␊ |
␊ |
For NTFS, values for Tag1 through TagN are as follows:␊ |
(currently only one set of attributes is defined for NTFS)␊ |
␊ |
Tag Size Description␊ |
----- ---- -----------␊ |
0x0001 2 bytes Tag for attribute #1 ␊ |
Size1 2 bytes Size of attribute #1, in bytes␊ |
Mtime 8 bytes File last modification time␊ |
Atime 8 bytes File last access time␊ |
Ctime 8 bytes File creation time␊ |
␊ |
-OpenVMS Extra Field (0x000c):␊ |
␊ |
The following is the layout of the OpenVMS attributes ␊ |
"extra" block.␊ |
␊ |
Note: all fields stored in Intel low-byte/high-byte order.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(VMS) 0x000c 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size of the total "extra" block␊ |
CRC 4 bytes 32-bit CRC for remainder of the block␊ |
Tag1 2 bytes OpenVMS attribute tag value #1␊ |
Size1 2 bytes Size of attribute #1, in bytes␊ |
(var.) Size1 Attribute #1 data␊ |
.␊ |
.␊ |
.␊ |
TagN 2 bytes OpenVMS attribute tag value #N␊ |
SizeN 2 bytes Size of attribute #N, in bytes␊ |
(var.) SizeN Attribute #N data␊ |
␊ |
Rules:␊ |
␊ |
1. There will be one or more of attributes present, which ␊ |
will each be preceded by the above TagX & SizeX values. ␊ |
These values are identical to the ATR$C_XXXX and ␊ |
ATR$S_XXXX constants which are defined in ATR.H under ␊ |
OpenVMS C. Neither of these values will ever be zero.␊ |
␊ |
2. No word alignment or padding is performed.␊ |
␊ |
3. A well-behaved PKZIP/OpenVMS program should never produce␊ |
more than one sub-block with the same TagX value. Also,␊ |
there will never be more than one "extra" block of type␊ |
0x000c in a particular directory record.␊ |
␊ |
-UNIX Extra Field (0x000d):␊ |
␊ |
The following is the layout of the UNIX "extra" block.␊ |
Note: all fields are stored in Intel low-byte/high-byte ␊ |
order.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(UNIX) 0x000d 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size for the following data block␊ |
Atime 4 bytes File last access time␊ |
Mtime 4 bytes File last modification time␊ |
Uid 2 bytes File user ID␊ |
Gid 2 bytes File group ID␊ |
(var) variable Variable length data field␊ |
␊ |
The variable length data field will contain file type ␊ |
specific data. Currently the only values allowed are␊ |
the original "linked to" file names for hard or symbolic ␊ |
links, and the major and minor device node numbers for␊ |
character and block device nodes. Since device nodes␊ |
cannot be either symbolic or hard links, only one set of␊ |
variable length data is stored. Link files will have the␊ |
name of the original file stored. This name is NOT NULL␊ |
terminated. Its size can be determined by checking TSize -␊ |
12. Device entries will have eight bytes stored as two 4␊ |
byte entries (in little endian format). The first entry␊ |
will be the major device number, and the second the minor␊ |
device number.␊ |
␊ |
-PATCH Descriptor Extra Field (0x000f):␊ |
␊ |
The following is the layout of the Patch Descriptor "extra"␊ |
block.␊ |
␊ |
Note: all fields stored in Intel low-byte/high-byte order.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(Patch) 0x000f 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size of the total "extra" block␊ |
Version 2 bytes Version of the descriptor␊ |
Flags 4 bytes Actions and reactions (see below) ␊ |
OldSize 4 bytes Size of the file about to be patched ␊ |
OldCRC 4 bytes 32-bit CRC of the file to be patched ␊ |
NewSize 4 bytes Size of the resulting file ␊ |
NewCRC 4 bytes 32-bit CRC of the resulting file ␊ |
␊ |
Actions and reactions␊ |
␊ |
Bits Description␊ |
---- ----------------␊ |
0 Use for auto detection␊ |
1 Treat as a self-patch␊ |
2-3 RESERVED␊ |
4-5 Action (see below)␊ |
6-7 RESERVED␊ |
8-9 Reaction (see below) to absent file ␊ |
10-11 Reaction (see below) to newer file␊ |
12-13 Reaction (see below) to unknown file␊ |
14-15 RESERVED␊ |
16-31 RESERVED␊ |
␊ |
Actions␊ |
␊ |
Action Value␊ |
------ ----- ␊ |
none 0␊ |
add 1␊ |
delete 2␊ |
patch 3␊ |
␊ |
Reactions␊ |
␊ |
Reaction Value␊ |
-------- -----␊ |
ask 0␊ |
skip 1␊ |
ignore 2␊ |
fail 3␊ |
␊ |
Patch support is provided by PKPatchMaker(tm) technology and is ␊ |
covered under U.S. Patents and Patents Pending. The use or ␊ |
implementation in a product of certain technological aspects set␊ |
forth in the current APPNOTE, including those with regard to ␊ |
strong encryption, patching, or extended tape operations requires␊ |
a license from PKWARE. Please contact PKWARE with regard to ␊ |
acquiring a license. ␊ |
␊ |
-PKCS#7 Store for X.509 Certificates (0x0014):␊ |
␊ |
This field contains information about each of the certificates ␊ |
files may be signed with. When the Central Directory Encryption ␊ |
feature is enabled for a ZIP file, this record will appear in ␊ |
the Archive Extra Data Record, otherwise it will appear in the ␊ |
first central directory record and will be ignored in any ␊ |
other record.␊ |
␊ |
Note: all fields stored in Intel low-byte/high-byte order.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(Store) 0x0014 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size of the store data␊ |
TData TSize Data about the store␊ |
␊ |
␊ |
-X.509 Certificate ID and Signature for individual file (0x0015):␊ |
␊ |
This field contains the information about which certificate in ␊ |
the PKCS#7 store was used to sign a particular file. It also ␊ |
contains the signature data. This field can appear multiple ␊ |
times, but can only appear once per certificate.␊ |
␊ |
Note: all fields stored in Intel low-byte/high-byte order.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(CID) 0x0015 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size of data that follows␊ |
TData TSize Signature Data␊ |
␊ |
-X.509 Certificate ID and Signature for central directory (0x0016):␊ |
␊ |
This field contains the information about which certificate in ␊ |
the PKCS#7 store was used to sign the central directory structure.␊ |
When the Central Directory Encryption feature is enabled for a ␊ |
ZIP file, this record will appear in the Archive Extra Data Record, ␊ |
otherwise it will appear in the first central directory record.␊ |
␊ |
Note: all fields stored in Intel low-byte/high-byte order.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(CDID) 0x0016 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size of data that follows␊ |
TData TSize Data␊ |
␊ |
-Strong Encryption Header (0x0017):␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
0x0017 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size of data that follows␊ |
Format 2 bytes Format definition for this record␊ |
AlgID 2 bytes Encryption algorithm identifier␊ |
Bitlen 2 bytes Bit length of encryption key␊ |
Flags 2 bytes Processing flags␊ |
CertData TSize-8 Certificate decryption extra field data␊ |
(refer to the explanation for CertData␊ |
in the section describing the ␊ |
Certificate Processing Method under ␊ |
the Strong Encryption Specification)␊ |
␊ |
␊ |
-Record Management Controls (0x0018):␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(Rec-CTL) 0x0018 2 bytes Tag for this "extra" block type␊ |
CSize 2 bytes Size of total extra block data␊ |
Tag1 2 bytes Record control attribute 1␊ |
Size1 2 bytes Size of attribute 1, in bytes␊ |
Data1 Size1 Attribute 1 data␊ |
.␊ |
.␊ |
.␊ |
TagN 2 bytes Record control attribute N␊ |
SizeN 2 bytes Size of attribute N, in bytes␊ |
DataN SizeN Attribute N data␊ |
␊ |
␊ |
-PKCS#7 Encryption Recipient Certificate List (0x0019): ␊ |
␊ |
This field contains information about each of the certificates␊ |
used in encryption processing and it can be used to identify who is␊ |
allowed to decrypt encrypted files. This field should only appear ␊ |
in the archive extra data record. This field is not required and ␊ |
serves only to aide archive modifications by preserving public ␊ |
encryption key data. Individual security requirements may dictate ␊ |
that this data be omitted to deter information exposure.␊ |
␊ |
Note: all fields stored in Intel low-byte/high-byte order.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(CStore) 0x0019 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size of the store data␊ |
TData TSize Data about the store␊ |
␊ |
TData:␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
Version 2 bytes Format version number - must 0x0001 at this time␊ |
CStore (var) PKCS#7 data blob␊ |
␊ |
␊ |
-MVS Extra Field (0x0065):␊ |
␊ |
The following is the layout of the MVS "extra" block.␊ |
Note: Some fields are stored in Big Endian format.␊ |
All text is in EBCDIC format unless otherwise specified.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(MVS) 0x0065 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size for the following data block␊ |
ID 4 bytes EBCDIC "Z390" 0xE9F3F9F0 or␊ |
"T4MV" for TargetFour␊ |
(var) TSize-4 Attribute data (see APPENDIX B)␊ |
␊ |
␊ |
-OS/400 Extra Field (0x0065):␊ |
␊ |
The following is the layout of the OS/400 "extra" block.␊ |
Note: Some fields are stored in Big Endian format.␊ |
All text is in EBCDIC format unless otherwise specified.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(OS400) 0x0065 2 bytes Tag for this "extra" block type␊ |
TSize 2 bytes Size for the following data block␊ |
ID 4 bytes EBCDIC "I400" 0xC9F4F0F0 or␊ |
"T4MV" for TargetFour␊ |
(var) TSize-4 Attribute data (see APPENDIX A)␊ |
␊ |
␊ |
Third-party Mappings:␊ |
␊ |
-ZipIt Macintosh Extra Field (long) (0x2605):␊ |
␊ |
The following is the layout of the ZipIt extra block ␊ |
for Macintosh. The local-header and central-header versions ␊ |
are identical. This block must be present if the file is ␊ |
stored MacBinary-encoded and it should not be used if the file ␊ |
is not stored MacBinary-encoded.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(Mac2) 0x2605 Short tag for this extra block type␊ |
TSize Short total data size for this block␊ |
"ZPIT" beLong extra-field signature␊ |
FnLen Byte length of FileName␊ |
FileName variable full Macintosh filename␊ |
FileType Byte[4] four-byte Mac file type string␊ |
Creator Byte[4] four-byte Mac creator string␊ |
␊ |
␊ |
-ZipIt Macintosh Extra Field (short, for files) (0x2705):␊ |
␊ |
The following is the layout of a shortened variant of the␊ |
ZipIt extra block for Macintosh (without "full name" entry).␊ |
This variant is used by ZipIt 1.3.5 and newer for entries of␊ |
files (not directories) that do not have a MacBinary encoded␊ |
file. The local-header and central-header versions are identical.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(Mac2b) 0x2705 Short tag for this extra block type␊ |
TSize Short total data size for this block (12)␊ |
"ZPIT" beLong extra-field signature␊ |
FileType Byte[4] four-byte Mac file type string␊ |
Creator Byte[4] four-byte Mac creator string␊ |
fdFlags beShort attributes from FInfo.frFlags,␊ |
may be omitted␊ |
0x0000 beShort reserved, may be omitted␊ |
␊ |
␊ |
-ZipIt Macintosh Extra Field (short, for directories) (0x2805):␊ |
␊ |
The following is the layout of a shortened variant of the␊ |
ZipIt extra block for Macintosh used only for directory␊ |
entries. This variant is used by ZipIt 1.3.5 and newer to ␊ |
save some optional Mac-specific information about directories.␊ |
The local-header and central-header versions are identical.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(Mac2c) 0x2805 Short tag for this extra block type␊ |
TSize Short total data size for this block (12)␊ |
"ZPIT" beLong extra-field signature␊ |
frFlags beShort attributes from DInfo.frFlags, may␊ |
be omitted␊ |
View beShort ZipIt view flag, may be omitted␊ |
␊ |
␊ |
The View field specifies ZipIt-internal settings as follows:␊ |
␊ |
Bits of the Flags:␊ |
bit 0 if set, the folder is shown expanded (open)␊ |
when the archive contents are viewed in ZipIt.␊ |
bits 1-15 reserved, zero;␊ |
␊ |
␊ |
-FWKCS MD5 Extra Field (0x4b46):␊ |
␊ |
The FWKCS Contents_Signature System, used in␊ |
automatically identifying files independent of file name,␊ |
optionally adds and uses an extra field to support the␊ |
rapid creation of an enhanced contents_signature:␊ |
␊ |
Header ID = 0x4b46␊ |
Data Size = 0x0013␊ |
Preface = 'M','D','5'␊ |
followed by 16 bytes containing the uncompressed file's␊ |
128_bit MD5 hash(1), low byte first.␊ |
␊ |
When FWKCS revises a .ZIP file central directory to add␊ |
this extra field for a file, it also replaces the␊ |
central directory entry for that file's uncompressed␊ |
file length with a measured value.␊ |
␊ |
FWKCS provides an option to strip this extra field, if␊ |
present, from a .ZIP file central directory. In adding␊ |
this extra field, FWKCS preserves .ZIP file Authenticity␊ |
Verification; if stripping this extra field, FWKCS␊ |
preserves all versions of AV through PKZIP version 2.04g.␊ |
␊ |
FWKCS, and FWKCS Contents_Signature System, are␊ |
trademarks of Frederick W. Kantor.␊ |
␊ |
(1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer␊ |
Science and RSA Data Security, Inc., April 1992.␊ |
ll.76-77: "The MD5 algorithm is being placed in the␊ |
public domain for review and possible adoption as a␊ |
standard."␊ |
␊ |
-Microsoft Open Packaging Growth Hint (0xa220):␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
0xa220 Short tag for this extra block type␊ |
TSize Short size of Sig + PadVal + Padding␊ |
Sig Short verification signature (A028)␊ |
PadVal Short Initial padding value␊ |
Padding variable filled with NULL characters␊ |
␊ |
␊ |
file comment: (Variable)␊ |
␊ |
The comment for this file.␊ |
␊ |
number of this disk: (2 bytes)␊ |
␊ |
The number of this disk, which contains central␊ |
directory end record. If an archive is in ZIP64 format␊ |
and the value in this field is 0xFFFF, the size will ␊ |
be in the corresponding 4 byte zip64 end of central ␊ |
directory field.␊ |
␊ |
␊ |
number of the disk with the start of the central␊ |
directory: (2 bytes)␊ |
␊ |
The number of the disk on which the central␊ |
directory starts. If an archive is in ZIP64 format␊ |
and the value in this field is 0xFFFF, the size will ␊ |
be in the corresponding 4 byte zip64 end of central ␊ |
directory field.␊ |
␊ |
total number of entries in the central dir on ␊ |
this disk: (2 bytes)␊ |
␊ |
The number of central directory entries on this disk.␊ |
If an archive is in ZIP64 format and the value in ␊ |
this field is 0xFFFF, the size will be in the ␊ |
corresponding 8 byte zip64 end of central ␊ |
directory field.␊ |
␊ |
total number of entries in the central dir: (2 bytes)␊ |
␊ |
The total number of files in the .ZIP file. If an ␊ |
archive is in ZIP64 format and the value in this field␊ |
is 0xFFFF, the size will be in the corresponding 8 byte ␊ |
zip64 end of central directory field.␊ |
␊ |
size of the central directory: (4 bytes)␊ |
␊ |
The size (in bytes) of the entire central directory.␊ |
If an archive is in ZIP64 format and the value in ␊ |
this field is 0xFFFFFFFF, the size will be in the ␊ |
corresponding 8 byte zip64 end of central ␊ |
directory field.␊ |
␊ |
offset of start of central directory with respect to␊ |
the starting disk number: (4 bytes)␊ |
␊ |
Offset of the start of the central directory on the␊ |
disk on which the central directory starts. If an ␊ |
archive is in ZIP64 format and the value in this ␊ |
field is 0xFFFFFFFF, the size will be in the ␊ |
corresponding 8 byte zip64 end of central ␊ |
directory field.␊ |
␊ |
.ZIP file comment length: (2 bytes)␊ |
␊ |
The length of the comment for this .ZIP file.␊ |
␊ |
.ZIP file comment: (Variable)␊ |
␊ |
The comment for this .ZIP file. ZIP file comment data␊ |
is stored unsecured. No encryption or data authentication␊ |
is applied to this area at this time. Confidential information␊ |
should not be stored in this section.␊ |
␊ |
zip64 extensible data sector (variable size)␊ |
␊ |
(currently reserved for use by PKWARE)␊ |
␊ |
␊ |
K. Splitting and Spanning ZIP files␊ |
␊ |
Spanning is the process of segmenting a ZIP file across ␊ |
multiple removable media. This support has typically only ␊ |
been provided for DOS formatted floppy diskettes. ␊ |
␊ |
File splitting is a newer derivative of spanning. ␊ |
Splitting follows the same segmentation process as␊ |
spanning, however, it does not require writing each␊ |
segment to a unique removable medium and instead supports␊ |
placing all pieces onto local or non-removable locations␊ |
such as file systems, local drives, folders, etc...␊ |
␊ |
A key difference between spanned and split ZIP files is␊ |
that all pieces of a spanned ZIP file have the same name. ␊ |
Since each piece is written to a separate volume, no name ␊ |
collisions occur and each segment can reuse the original ␊ |
.ZIP file name given to the archive.␊ |
␊ |
Sequence ordering for DOS spanned archives uses the DOS ␊ |
volume label to determine segment numbers. Volume labels␊ |
for each segment are written using the form PKBACK#xxx, ␊ |
where xxx is the segment number written as a decimal ␊ |
value from 001 - nnn.␊ |
␊ |
Split ZIP files are typically written to the same location␊ |
and are subject to name collisions if the spanned name␊ |
format is used since each segment will reside on the same ␊ |
drive. To avoid name collisions, split archives are named ␊ |
as follows.␊ |
␊ |
Segment 1 = filename.z01␊ |
Segment n-1 = filename.z(n-1)␊ |
Segment n = filename.zip␊ |
␊ |
The .ZIP extension is used on the last segment to support␊ |
quickly reading the central directory. The segment number␊ |
n should be a decimal value.␊ |
␊ |
Spanned ZIP files may be PKSFX Self-extracting ZIP files.␊ |
PKSFX files may also be split, however, in this case␊ |
the first segment must be named filename.exe. The first␊ |
segment of a split PKSFX archive must be large enough to␊ |
include the entire executable program.␊ |
␊ |
Capacities for split archives are as follows.␊ |
␊ |
Maximum number of segments = 4,294,967,295 - 1␊ |
Maximum .ZIP segment size = 4,294,967,295 bytes␊ |
Minimum segment size = 64K␊ |
Maximum PKSFX segment size = 2,147,483,647 bytes␊ |
␊ |
Segment sizes may be different however by convention, all ␊ |
segment sizes should be the same with the exception of the ␊ |
last, which may be smaller. Local and central directory ␊ |
header records must never be split across a segment boundary. ␊ |
When writing a header record, if the number of bytes remaining ␊ |
within a segment is less than the size of the header record,␊ |
end the current segment and write the header at the start␊ |
of the next segment. The central directory may span segment␊ |
boundaries, but no single record in the central directory␊ |
should be split across segments.␊ |
␊ |
Spanned/Split archives created using PKZIP for Windows␊ |
(V2.50 or greater), PKZIP Command Line (V2.50 or greater),␊ |
or PKZIP Explorer will include a special spanning ␊ |
signature as the first 4 bytes of the first segment of␊ |
the archive. This signature (0x08074b50) will be ␊ |
followed immediately by the local header signature for␊ |
the first file in the archive. ␊ |
␊ |
A special spanning marker may also appear in spanned/split ␊ |
archives if the spanning or splitting process starts but ␊ |
only requires one segment. In this case the 0x08074b50 ␊ |
signature will be replaced with the temporary spanning ␊ |
marker signature of 0x30304b50. Split archives can␊ |
only be uncompressed by other versions of PKZIP that␊ |
know how to create a split archive.␊ |
␊ |
The signature value 0x08074b50 is also used by some␊ |
ZIP implementations as a marker for the Data Descriptor ␊ |
record. Conflict in this alternate assignment can be␊ |
avoided by ensuring the position of the signature␊ |
within the ZIP file to determine the use for which it␊ |
is intended. ␊ |
␊ |
L. General notes:␊ |
␊ |
1) All fields unless otherwise noted are unsigned and stored␊ |
in Intel low-byte:high-byte, low-word:high-word order.␊ |
␊ |
2) String fields are not null terminated, since the␊ |
length is given explicitly.␊ |
␊ |
3) The entries in the central directory may not necessarily␊ |
be in the same order that files appear in the .ZIP file.␊ |
␊ |
4) If one of the fields in the end of central directory␊ |
record is too small to hold required data, the field␊ |
should be set to -1 (0xFFFF or 0xFFFFFFFF) and the␊ |
ZIP64 format record should be created.␊ |
␊ |
5) The end of central directory record and the␊ |
Zip64 end of central directory locator record must␊ |
reside on the same disk when splitting or spanning␊ |
an archive.␊ |
␊ |
VI. UnShrinking - Method 1␊ |
--------------------------␊ |
␊ |
Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm␊ |
with partial clearing. The initial code size is 9 bits, and␊ |
the maximum code size is 13 bits. Shrinking differs from␊ |
conventional Dynamic Ziv-Lempel-Welch implementations in several␊ |
respects:␊ |
␊ |
1) The code size is controlled by the compressor, and is not␊ |
automatically increased when codes larger than the current␊ |
code size are created (but not necessarily used). When␊ |
the decompressor encounters the code sequence 256␊ |
(decimal) followed by 1, it should increase the code size␊ |
read from the input stream to the next bit size. No␊ |
blocking of the codes is performed, so the next code at␊ |
the increased size should be read from the input stream␊ |
immediately after where the previous code at the smaller␊ |
bit size was read. Again, the decompressor should not␊ |
increase the code size used until the sequence 256,1 is␊ |
encountered.␊ |
␊ |
2) When the table becomes full, total clearing is not␊ |
performed. Rather, when the compressor emits the code␊ |
sequence 256,2 (decimal), the decompressor should clear␊ |
all leaf nodes from the Ziv-Lempel tree, and continue to␊ |
use the current code size. The nodes that are cleared␊ |
from the Ziv-Lempel tree are then re-used, with the lowest␊ |
code value re-used first, and the highest code value␊ |
re-used last. The compressor can emit the sequence 256,2␊ |
at any time.␊ |
␊ |
VII. Expanding - Methods 2-5␊ |
----------------------------␊ |
␊ |
The Reducing algorithm is actually a combination of two␊ |
distinct algorithms. The first algorithm compresses repeated␊ |
byte sequences, and the second algorithm takes the compressed␊ |
stream from the first algorithm and applies a probabilistic␊ |
compression method.␊ |
␊ |
The probabilistic compression stores an array of 'follower␊ |
sets' S(j), for j=0 to 255, corresponding to each possible␊ |
ASCII character. Each set contains between 0 and 32␊ |
characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.␊ |
The sets are stored at the beginning of the data area for a␊ |
Reduced file, in reverse order, with S(255) first, and S(0)␊ |
last.␊ |
␊ |
The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] },␊ |
where N(j) is the size of set S(j). N(j) can be 0, in which␊ |
case the follower set for S(j) is empty. Each N(j) value is␊ |
encoded in 6 bits, followed by N(j) eight bit character values␊ |
corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If␊ |
N(j) is 0, then no values for S(j) are stored, and the value␊ |
for N(j-1) immediately follows.␊ |
␊ |
Immediately after the follower sets, is the compressed data␊ |
stream. The compressed data stream can be interpreted for the␊ |
probabilistic decompression as follows:␊ |
␊ |
let Last-Character <- 0.␊ |
loop until done␊ |
if the follower set S(Last-Character) is empty then␊ |
read 8 bits from the input stream, and copy this␊ |
value to the output stream.␊ |
otherwise if the follower set S(Last-Character) is non-empty then␊ |
read 1 bit from the input stream.␊ |
if this bit is not zero then␊ |
read 8 bits from the input stream, and copy this␊ |
value to the output stream.␊ |
otherwise if this bit is zero then␊ |
read B(N(Last-Character)) bits from the input␊ |
stream, and assign this value to I.␊ |
Copy the value of S(Last-Character)[I] to the␊ |
output stream.␊ |
␊ |
assign the last value placed on the output stream to␊ |
Last-Character.␊ |
end loop␊ |
␊ |
B(N(j)) is defined as the minimal number of bits required to␊ |
encode the value N(j)-1.␊ |
␊ |
The decompressed stream from above can then be expanded to␊ |
re-create the original file as follows:␊ |
␊ |
let State <- 0.␊ |
␊ |
loop until done␊ |
read 8 bits from the input stream into C.␊ |
case State of␊ |
0: if C is not equal to DLE (144 decimal) then␊ |
copy C to the output stream.␊ |
otherwise if C is equal to DLE then␊ |
let State <- 1.␊ |
␊ |
1: if C is non-zero then␊ |
let V <- C.␊ |
let Len <- L(V)␊ |
let State <- F(Len).␊ |
otherwise if C is zero then␊ |
copy the value 144 (decimal) to the output stream.␊ |
let State <- 0␊ |
␊ |
2: let Len <- Len + C␊ |
let State <- 3.␊ |
␊ |
3: move backwards D(V,C) bytes in the output stream␊ |
(if this position is before the start of the output␊ |
stream, then assume that all the data before the␊ |
start of the output stream is filled with zeros).␊ |
copy Len+3 bytes from this position to the output stream.␊ |
let State <- 0.␊ |
end case␊ |
end loop␊ |
␊ |
The functions F,L, and D are dependent on the 'compression␊ |
factor', 1 through 4, and are defined as follows:␊ |
␊ |
For compression factor 1:␊ |
L(X) equals the lower 7 bits of X.␊ |
F(X) equals 2 if X equals 127 otherwise F(X) equals 3.␊ |
D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.␊ |
For compression factor 2:␊ |
L(X) equals the lower 6 bits of X.␊ |
F(X) equals 2 if X equals 63 otherwise F(X) equals 3.␊ |
D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.␊ |
For compression factor 3:␊ |
L(X) equals the lower 5 bits of X.␊ |
F(X) equals 2 if X equals 31 otherwise F(X) equals 3.␊ |
D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.␊ |
For compression factor 4:␊ |
L(X) equals the lower 4 bits of X.␊ |
F(X) equals 2 if X equals 15 otherwise F(X) equals 3.␊ |
D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.␊ |
␊ |
VIII. Imploding - Method 6␊ |
--------------------------␊ |
␊ |
The Imploding algorithm is actually a combination of two distinct␊ |
algorithms. The first algorithm compresses repeated byte␊ |
sequences using a sliding dictionary. The second algorithm is␊ |
used to compress the encoding of the sliding dictionary output,␊ |
using multiple Shannon-Fano trees.␊ |
␊ |
The Imploding algorithm can use a 4K or 8K sliding dictionary␊ |
size. The dictionary size used can be determined by bit 1 in the␊ |
general purpose flag word; a 0 bit indicates a 4K dictionary␊ |
while a 1 bit indicates an 8K dictionary.␊ |
␊ |
The Shannon-Fano trees are stored at the start of the compressed␊ |
file. The number of trees stored is defined by bit 2 in the␊ |
general purpose flag word; a 0 bit indicates two trees stored, a␊ |
1 bit indicates three trees are stored. If 3 trees are stored,␊ |
the first Shannon-Fano tree represents the encoding of the␊ |
Literal characters, the second tree represents the encoding of␊ |
the Length information, the third represents the encoding of the␊ |
Distance information. When 2 Shannon-Fano trees are stored, the␊ |
Length tree is stored first, followed by the Distance tree.␊ |
␊ |
The Literal Shannon-Fano tree, if present is used to represent␊ |
the entire ASCII character set, and contains 256 values. This␊ |
tree is used to compress any data not compressed by the sliding␊ |
dictionary algorithm. When this tree is present, the Minimum␊ |
Match Length for the sliding dictionary is 3. If this tree is␊ |
not present, the Minimum Match Length is 2.␊ |
␊ |
The Length Shannon-Fano tree is used to compress the Length part␊ |
of the (length,distance) pairs from the sliding dictionary␊ |
output. The Length tree contains 64 values, ranging from the␊ |
Minimum Match Length, to 63 plus the Minimum Match Length.␊ |
␊ |
The Distance Shannon-Fano tree is used to compress the Distance␊ |
part of the (length,distance) pairs from the sliding dictionary␊ |
output. The Distance tree contains 64 values, ranging from 0 to␊ |
63, representing the upper 6 bits of the distance value. The␊ |
distance values themselves will be between 0 and the sliding␊ |
dictionary size, either 4K or 8K.␊ |
␊ |
The Shannon-Fano trees themselves are stored in a compressed␊ |
format. The first byte of the tree data represents the number of␊ |
bytes of data representing the (compressed) Shannon-Fano tree␊ |
minus 1. The remaining bytes represent the Shannon-Fano tree␊ |
data encoded as:␊ |
␊ |
High 4 bits: Number of values at this bit length + 1. (1 - 16)␊ |
Low 4 bits: Bit Length needed to represent value + 1. (1 - 16)␊ |
␊ |
The Shannon-Fano codes can be constructed from the bit lengths␊ |
using the following algorithm:␊ |
␊ |
1) Sort the Bit Lengths in ascending order, while retaining the␊ |
order of the original lengths stored in the file.␊ |
␊ |
2) Generate the Shannon-Fano trees:␊ |
␊ |
Code <- 0␊ |
CodeIncrement <- 0␊ |
LastBitLength <- 0␊ |
i <- number of Shannon-Fano codes - 1 (either 255 or 63)␊ |
␊ |
loop while i >= 0␊ |
Code = Code + CodeIncrement␊ |
if BitLength(i) <> LastBitLength then␊ |
LastBitLength=BitLength(i)␊ |
CodeIncrement = 1 shifted left (16 - LastBitLength)␊ |
ShannonCode(i) = Code␊ |
i <- i - 1␊ |
end loop␊ |
␊ |
3) Reverse the order of all the bits in the above ShannonCode()␊ |
vector, so that the most significant bit becomes the least␊ |
significant bit. For example, the value 0x1234 (hex) would␊ |
become 0x2C48 (hex).␊ |
␊ |
4) Restore the order of Shannon-Fano codes as originally stored␊ |
within the file.␊ |
␊ |
Example:␊ |
␊ |
This example will show the encoding of a Shannon-Fano tree␊ |
of size 8. Notice that the actual Shannon-Fano trees used␊ |
for Imploding are either 64 or 256 entries in size.␊ |
␊ |
Example: 0x02, 0x42, 0x01, 0x13␊ |
␊ |
The first byte indicates 3 values in this table. Decoding the␊ |
bytes:␊ |
0x42 = 5 codes of 3 bits long␊ |
0x01 = 1 code of 2 bits long␊ |
0x13 = 2 codes of 4 bits long␊ |
␊ |
This would generate the original bit length array of:␊ |
(3, 3, 3, 3, 3, 2, 4, 4)␊ |
␊ |
There are 8 codes in this table for the values 0 thru 7. Using ␊ |
the algorithm to obtain the Shannon-Fano codes produces:␊ |
␊ |
Reversed Order Original␊ |
Val Sorted Constructed Code Value Restored Length␊ |
--- ------ ----------------- -------- -------- ------␊ |
0: 2 1100000000000000 11 101 3␊ |
1: 3 1010000000000000 101 001 3␊ |
2: 3 1000000000000000 001 110 3␊ |
3: 3 0110000000000000 110 010 3␊ |
4: 3 0100000000000000 010 100 3␊ |
5: 3 0010000000000000 100 11 2␊ |
6: 4 0001000000000000 1000 1000 4␊ |
7: 4 0000000000000000 0000 0000 4␊ |
␊ |
The values in the Val, Order Restored and Original Length columns␊ |
now represent the Shannon-Fano encoding tree that can be used for␊ |
decoding the Shannon-Fano encoded data. How to parse the␊ |
variable length Shannon-Fano values from the data stream is beyond␊ |
the scope of this document. (See the references listed at the end of␊ |
this document for more information.) However, traditional decoding␊ |
schemes used for Huffman variable length decoding, such as the␊ |
Greenlaw algorithm, can be successfully applied.␊ |
␊ |
The compressed data stream begins immediately after the␊ |
compressed Shannon-Fano data. The compressed data stream can be␊ |
interpreted as follows:␊ |
␊ |
loop until done␊ |
read 1 bit from input stream.␊ |
␊ |
if this bit is non-zero then (encoded data is literal data)␊ |
if Literal Shannon-Fano tree is present␊ |
read and decode character using Literal Shannon-Fano tree.␊ |
otherwise␊ |
read 8 bits from input stream.␊ |
copy character to the output stream.␊ |
otherwise (encoded data is sliding dictionary match)␊ |
if 8K dictionary size␊ |
read 7 bits for offset Distance (lower 7 bits of offset).␊ |
otherwise␊ |
read 6 bits for offset Distance (lower 6 bits of offset).␊ |
␊ |
using the Distance Shannon-Fano tree, read and decode the␊ |
upper 6 bits of the Distance value.␊ |
␊ |
using the Length Shannon-Fano tree, read and decode␊ |
the Length value.␊ |
␊ |
Length <- Length + Minimum Match Length␊ |
␊ |
if Length = 63 + Minimum Match Length␊ |
read 8 bits from the input stream,␊ |
add this value to Length.␊ |
␊ |
move backwards Distance+1 bytes in the output stream, and␊ |
copy Length characters from this position to the output␊ |
stream. (if this position is before the start of the output␊ |
stream, then assume that all the data before the start of␊ |
the output stream is filled with zeros).␊ |
end loop␊ |
␊ |
IX. Tokenizing - Method 7␊ |
-------------------------␊ |
␊ |
This method is not used by PKZIP.␊ |
␊ |
X. Deflating - Method 8␊ |
-----------------------␊ |
␊ |
The Deflate algorithm is similar to the Implode algorithm using␊ |
a sliding dictionary of up to 32K with secondary compression␊ |
from Huffman/Shannon-Fano codes.␊ |
␊ |
The compressed data is stored in blocks with a header describing␊ |
the block and the Huffman codes used in the data block. The header␊ |
format is as follows:␊ |
␊ |
Bit 0: Last Block bit This bit is set to 1 if this is the last␊ |
compressed block in the data.␊ |
Bits 1-2: Block type␊ |
00 (0) - Block is stored - All stored data is byte aligned.␊ |
Skip bits until next byte, then next word = block ␊ |
length, followed by the ones compliment of the block␊ |
length word. Remaining data in block is the stored ␊ |
data.␊ |
␊ |
01 (1) - Use fixed Huffman codes for literal and distance codes.␊ |
Lit Code Bits Dist Code Bits␊ |
--------- ---- --------- ----␊ |
0 - 143 8 0 - 31 5␊ |
144 - 255 9␊ |
256 - 279 7␊ |
280 - 287 8␊ |
␊ |
Literal codes 286-287 and distance codes 30-31 are ␊ |
never used but participate in the huffman construction.␊ |
␊ |
10 (2) - Dynamic Huffman codes. (See expanding Huffman codes)␊ |
␊ |
11 (3) - Reserved - Flag a "Error in compressed data" if seen.␊ |
␊ |
Expanding Huffman Codes␊ |
-----------------------␊ |
If the data block is stored with dynamic Huffman codes, the Huffman␊ |
codes are sent in the following compressed format:␊ |
␊ |
5 Bits: # of Literal codes sent - 256 (256 - 286)␊ |
All other codes are never sent.␊ |
5 Bits: # of Dist codes - 1 (1 - 32)␊ |
4 Bits: # of Bit Length codes - 3 (3 - 19)␊ |
␊ |
The Huffman codes are sent as bit lengths and the codes are built as␊ |
described in the implode algorithm. The bit lengths themselves are␊ |
compressed with Huffman codes. There are 19 bit length codes:␊ |
␊ |
0 - 15: Represent bit lengths of 0 - 15␊ |
16: Copy the previous bit length 3 - 6 times.␊ |
The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)␊ |
Example: Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will␊ |
expand to 12 bit lengths of 8 (1 + 6 + 5)␊ |
17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length)␊ |
18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length)␊ |
␊ |
The lengths of the bit length codes are sent packed 3 bits per value␊ |
(0 - 7) in the following order:␊ |
␊ |
16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15␊ |
␊ |
The Huffman codes should be built as described in the Implode algorithm␊ |
except codes are assigned starting at the shortest bit length, i.e. the␊ |
shortest code should be all 0's rather than all 1's. Also, codes with␊ |
a bit length of zero do not participate in the tree construction. The␊ |
codes are then used to decode the bit lengths for the literal and ␊ |
distance tables.␊ |
␊ |
The bit lengths for the literal tables are sent first with the number␊ |
of entries sent described by the 5 bits sent earlier. There are up␊ |
to 286 literal characters; the first 256 represent the respective 8␊ |
bit character, code 256 represents the End-Of-Block code, the remaining␊ |
29 codes represent copy lengths of 3 thru 258. There are up to 30␊ |
distance codes representing distances from 1 thru 32k as described␊ |
below.␊ |
␊ |
Length Codes␊ |
------------␊ |
Extra Extra Extra Extra␊ |
Code Bits Length Code Bits Lengths Code Bits Lengths Code Bits Length(s)␊ |
---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- ---------␊ |
257 0 3 265 1 11,12 273 3 35-42 281 5 131-162␊ |
258 0 4 266 1 13,14 274 3 43-50 282 5 163-194␊ |
259 0 5 267 1 15,16 275 3 51-58 283 5 195-226␊ |
260 0 6 268 1 17,18 276 3 59-66 284 5 227-257␊ |
261 0 7 269 2 19-22 277 4 67-82 285 0 258␊ |
262 0 8 270 2 23-26 278 4 83-98␊ |
263 0 9 271 2 27-30 279 4 99-114␊ |
264 0 10 272 2 31-34 280 4 115-130␊ |
␊ |
Distance Codes␊ |
--------------␊ |
Extra Extra Extra Extra␊ |
Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance␊ |
---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- --------␊ |
0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144␊ |
1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192␊ |
2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288␊ |
3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384␊ |
4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576␊ |
5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768␊ |
6 2 9-12 14 6 129-192 22 10 2049-3072␊ |
7 2 13-16 15 6 193-256 23 10 3073-4096␊ |
␊ |
The compressed data stream begins immediately after the␊ |
compressed header data. The compressed data stream can be␊ |
interpreted as follows:␊ |
␊ |
do␊ |
read header from input stream.␊ |
␊ |
if stored block␊ |
skip bits until byte aligned␊ |
read count and 1's compliment of count␊ |
copy count bytes data block␊ |
otherwise␊ |
loop until end of block code sent␊ |
decode literal character from input stream␊ |
if literal < 256␊ |
copy character to the output stream␊ |
otherwise␊ |
if literal = end of block␊ |
break from loop␊ |
otherwise␊ |
decode distance from input stream␊ |
␊ |
move backwards distance bytes in the output stream, and␊ |
copy length characters from this position to the output␊ |
stream.␊ |
end loop␊ |
while not last block␊ |
␊ |
if data descriptor exists␊ |
skip bits until byte aligned␊ |
read crc and sizes␊ |
endif␊ |
␊ |
XI. Enhanced Deflating - Method 9␊ |
---------------------------------␊ |
␊ |
The Enhanced Deflating algorithm is similar to Deflate but␊ |
uses a sliding dictionary of up to 64K. Deflate64(tm) is supported␊ |
by the Deflate extractor. ␊ |
␊ |
XII. BZIP2 - Method 12␊ |
----------------------␊ |
␊ |
BZIP2 is an open-source data compression algorithm developed by ␊ |
Julian Seward. Information and source code for this algorithm␊ |
can be found on the internet.␊ |
␊ |
XIII. LZMA - Method 14 (EFS)␊ |
----------------------------␊ |
␊ |
LZMA is a block-oriented, general purpose data compression algorithm ␊ |
developed and maintained by Igor Pavlov. It is a derivative of LZ77␊ |
that utilizes Markov chains and a range coder. Information and ␊ |
source code for this algorithm can be found on the internet. Consult ␊ |
with the author of this algorithm for information on terms or ␊ |
restrictions on use.␊ |
␊ |
Support for LZMA within the ZIP format is defined as follows: ␊ |
␊ |
The Compression method field within the ZIP Local and Central ␊ |
Header records will be set to the value 14 to indicate data was␊ |
compressed using LZMA. ␊ |
␊ |
The Version needed to extract field within the ZIP Local and ␊ |
Central Header records will be set to 6.3 to indicate the ␊ |
minimum ZIP format version supporting this feature.␊ |
␊ |
File data compressed using the LZMA algorithm must be placed ␊ |
immediately following the Local Header for the file. If a ␊ |
standard ZIP encryption header is required, it will follow ␊ |
the Local Header and will precede the LZMA compressed file ␊ |
data segment. The location of LZMA compressed data segment ␊ |
within the ZIP format will be as shown:␊ |
␊ |
[local header file 1]␊ |
[encryption header file 1]␊ |
[LZMA compressed data segment for file 1]␊ |
[data descriptor 1]␊ |
[local header file 2]␊ |
␊ |
The encryption header and data descriptor records may␊ |
be conditionally present. The LZMA Compressed Data Segment ␊ |
will consist of an LZMA Properties Header followed by the ␊ |
LZMA Compressed Data as shown:␊ |
␊ |
[LZMA properties header for file 1]␊ |
[LZMA compressed data for file 1]␊ |
␊ |
The LZMA Compressed Data will be stored as provided by the ␊ |
LZMA compression library. Compressed size, uncompressed ␊ |
size and other file characteristics about the file being ␊ |
compressed must be stored in standard ZIP storage format.␊ |
␊ |
The LZMA Properties Header will store specific data required to ␊ |
decompress the LZMA compressed Data. This data is set by the ␊ |
LZMA compression engine using the function WriteCoderProperties() ␊ |
as documented within the LZMA SDK. ␊ |
␊ |
Storage fields for the property information within the LZMA ␊ |
Properties Header are as follows:␊ |
␊ |
LZMA Version Information 2 bytes␊ |
LZMA Properties Size 2 bytes␊ |
LZMA Properties Data variable, defined by "LZMA Properties Size"␊ |
␊ |
LZMA Version Information - this field identifies which version of ␊ |
the LZMA SDK was used to compress a file. The first byte will ␊ |
store the major version number of the LZMA SDK and the second ␊ |
byte will store the minor number. ␊ |
␊ |
LZMA Properties Size - this field defines the size of the remaining ␊ |
property data. Typically this size should be determined by the ␊ |
version of the SDK. This size field is included as a convenience␊ |
and to help avoid any ambiguity should it arise in the future due␊ |
to changes in this compression algorithm. ␊ |
␊ |
LZMA Property Data - this variable sized field records the required ␊ |
values for the decompressor as defined by the LZMA SDK. The ␊ |
data stored in this field should be obtained using the ␊ |
WriteCoderProperties() in the version of the SDK defined by ␊ |
the "LZMA Version Information" field. ␊ |
␊ |
The layout of the "LZMA Properties Data" field is a function of the␊ |
LZMA compression algorithm. It is possible that this layout may be␊ |
changed by the author over time. The data layout in version 4.32 ␊ |
of the LZMA SDK defines a 5 byte array that uses 4 bytes to store ␊ |
the dictionary size in little-endian order. This is preceded by a ␊ |
single packed byte as the first element of the array that contains␊ |
the following fields:␊ |
␊ |
PosStateBits␊ |
LiteralPosStateBits␊ |
LiteralContextBits␊ |
␊ |
Refer to the LZMA documentation for a more detailed explanation of ␊ |
these fields. ␊ |
␊ |
Data compressed with method 14, LZMA, may include an end-of-stream␊ |
(EOS) marker ending the compressed data stream. This marker is not␊ |
required, but its use is highly recommended to facilitate processing␊ |
and implementers should include the EOS marker whenever possible.␊ |
When the EOS marker is used, general purpose bit 1 must be set. If␊ |
general purpose bit 1 is not set, the EOS marker is not present.␊ |
␊ |
XIV. PPMd - Method 98␊ |
---------------------␊ |
␊ |
PPMd is a data compression algorithm developed by Dmitry Shkarin␊ |
which includes a carryless rangecoder developed by Dmitry Subbotin.␊ |
This algorithm is based on predictive phrase matching on multiple␊ |
order contexts. Information and source code for this algorithm␊ |
can be found on the internet. Consult with the author of this␊ |
algorithm for information on terms or restrictions on use.␊ |
␊ |
Support for PPMd within the ZIP format currently is provided only ␊ |
for version I, revision 1 of the algorithm. Storage requirements␊ |
for using this algorithm are as follows:␊ |
␊ |
Parameters needed to control the algorithm are stored in the two␊ |
bytes immediately preceding the compressed data. These bytes are␊ |
used to store the following fields:␊ |
␊ |
Model order - sets the maximum model order, default is 8, possible␊ |
values are from 2 to 16 inclusive␊ |
␊ |
Sub-allocator size - sets the size of sub-allocator in MB, default is 50,␊ |
possible values are from 1MB to 256MB inclusive␊ |
␊ |
Model restoration method - sets the method used to restart context␊ |
model at memory insufficiency, values are:␊ |
␊ |
0 - restarts model from scratch - default␊ |
1 - cut off model - decreases performance by as much as 2x␊ |
2 - freeze context tree - not recommended␊ |
␊ |
An example for packing these fields into the 2 byte storage field is␊ |
illustrated below. These values are stored in Intel low-byte/high-byte␊ |
order.␊ |
␊ |
wPPMd = (Model order - 1) + ␊ |
((Sub-allocator size - 1) << 4) + ␊ |
(Model restoration method << 12)␊ |
␊ |
␊ |
XV. Traditional PKWARE Encryption␊ |
---------------------------------␊ |
␊ |
The following information discusses the decryption steps␊ |
required to support traditional PKWARE encryption. This␊ |
form of encryption is considered weak by today's standards␊ |
and its use is recommended only for situations with␊ |
low security needs or for compatibility with older .ZIP ␊ |
applications.␊ |
␊ |
Decryption␊ |
----------␊ |
␊ |
PKWARE is grateful to Mr. Roger Schlafly for his expert contribution ␊ |
towards the development of PKWARE's traditional encryption.␊ |
␊ |
PKZIP encrypts the compressed data stream. Encrypted files must␊ |
be decrypted before they can be extracted.␊ |
␊ |
Each encrypted file has an extra 12 bytes stored at the start of␊ |
the data area defining the encryption header for that file. The␊ |
encryption header is originally set to random values, and then␊ |
itself encrypted, using three, 32-bit keys. The key values are␊ |
initialized using the supplied encryption password. After each byte␊ |
is encrypted, the keys are then updated using pseudo-random number␊ |
generation techniques in combination with the same CRC-32 algorithm␊ |
used in PKZIP and described elsewhere in this document.␊ |
␊ |
The following is the basic steps required to decrypt a file:␊ |
␊ |
1) Initialize the three 32-bit keys with the password.␊ |
2) Read and decrypt the 12-byte encryption header, further␊ |
initializing the encryption keys.␊ |
3) Read and decrypt the compressed data stream using the␊ |
encryption keys.␊ |
␊ |
Step 1 - Initializing the encryption keys␊ |
-----------------------------------------␊ |
␊ |
Key(0) <- 305419896␊ |
Key(1) <- 591751049␊ |
Key(2) <- 878082192␊ |
␊ |
loop for i <- 0 to length(password)-1␊ |
update_keys(password(i))␊ |
end loop␊ |
␊ |
Where update_keys() is defined as:␊ |
␊ |
update_keys(char):␊ |
Key(0) <- crc32(key(0),char)␊ |
Key(1) <- Key(1) + (Key(0) & 000000ffH)␊ |
Key(1) <- Key(1) * 134775813 + 1␊ |
Key(2) <- crc32(key(2),key(1) >> 24)␊ |
end update_keys␊ |
␊ |
Where crc32(old_crc,char) is a routine that given a CRC value and a␊ |
character, returns an updated CRC value after applying the CRC-32␊ |
algorithm described elsewhere in this document.␊ |
␊ |
Step 2 - Decrypting the encryption header␊ |
-----------------------------------------␊ |
␊ |
The purpose of this step is to further initialize the encryption␊ |
keys, based on random data, to render a plaintext attack on the␊ |
data ineffective.␊ |
␊ |
Read the 12-byte encryption header into Buffer, in locations␊ |
Buffer(0) thru Buffer(11).␊ |
␊ |
loop for i <- 0 to 11␊ |
C <- buffer(i) ^ decrypt_byte()␊ |
update_keys(C)␊ |
buffer(i) <- C␊ |
end loop␊ |
␊ |
Where decrypt_byte() is defined as:␊ |
␊ |
unsigned char decrypt_byte()␊ |
local unsigned short temp␊ |
temp <- Key(2) | 2␊ |
decrypt_byte <- (temp * (temp ^ 1)) >> 8␊ |
end decrypt_byte␊ |
␊ |
After the header is decrypted, the last 1 or 2 bytes in Buffer␊ |
should be the high-order word/byte of the CRC for the file being␊ |
decrypted, stored in Intel low-byte/high-byte order. Versions of␊ |
PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is␊ |
used on versions after 2.0. This can be used to test if the password␊ |
supplied is correct or not.␊ |
␊ |
Step 3 - Decrypting the compressed data stream␊ |
----------------------------------------------␊ |
␊ |
The compressed data stream can be decrypted as follows:␊ |
␊ |
loop until done␊ |
read a character into C␊ |
Temp <- C ^ decrypt_byte()␊ |
update_keys(temp)␊ |
output Temp␊ |
end loop␊ |
␊ |
␊ |
XVI. Strong Encryption Specification␊ |
------------------------------------␊ |
␊ |
The Strong Encryption technology defined in this specification is ␊ |
covered under a pending patent application. The use or implementation␊ |
in a product of certain technological aspects set forth in the current␊ |
APPNOTE, including those with regard to strong encryption, patching, ␊ |
or extended tape operations requires a license from PKWARE. Portions␊ |
of this Strong Encryption technology are available for use at no charge.␊ |
Contact PKWARE for licensing terms and conditions. Refer to section II␊ |
of this APPNOTE (Contacting PKWARE) for information on how to ␊ |
contact PKWARE. ␊ |
␊ |
Version 5.x of this specification introduced support for strong ␊ |
encryption algorithms. These algorithms can be used with either ␊ |
a password or an X.509v3 digital certificate to encrypt each file. ␊ |
This format specification supports either password or certificate ␊ |
based encryption to meet the security needs of today, to enable ␊ |
interoperability between users within both PKI and non-PKI ␊ |
environments, and to ensure interoperability between different ␊ |
computing platforms that are running a ZIP program. ␊ |
␊ |
Password based encryption is the most common form of encryption ␊ |
people are familiar with. However, inherent weaknesses with ␊ |
passwords (e.g. susceptibility to dictionary/brute force attack) ␊ |
as well as password management and support issues make certificate ␊ |
based encryption a more secure and scalable option. Industry ␊ |
efforts and support are defining and moving towards more advanced ␊ |
security solutions built around X.509v3 digital certificates and ␊ |
Public Key Infrastructures(PKI) because of the greater scalability, ␊ |
administrative options, and more robust security over traditional ␊ |
password based encryption. ␊ |
␊ |
Most standard encryption algorithms are supported with this␊ |
specification. Reference implementations for many of these ␊ |
algorithms are available from either commercial or open source ␊ |
distributors. Readily available cryptographic toolkits make␊ |
implementation of the encryption features straight-forward. ␊ |
This document is not intended to provide a treatise on data ␊ |
encryption principles or theory. Its purpose is to document the ␊ |
data structures required for implementing interoperable data ␊ |
encryption within the .ZIP format. It is strongly recommended that ␊ |
you have a good understanding of data encryption before reading ␊ |
further.␊ |
␊ |
The algorithms introduced in Version 5.0 of this specification ␊ |
include:␊ |
␊ |
RC2 40 bit, 64 bit, and 128 bit␊ |
RC4 40 bit, 64 bit, and 128 bit␊ |
DES␊ |
3DES 112 bit and 168 bit␊ |
␊ |
Version 5.1 adds support for the following:␊ |
␊ |
AES 128 bit, 192 bit, and 256 bit␊ |
␊ |
␊ |
Version 6.1 introduces encryption data changes to support ␊ |
interoperability with Smartcard and USB Token certificate storage ␊ |
methods which do not support the OAEP strengthening standard.␊ |
␊ |
Version 6.2 introduces support for encrypting metadata by compressing ␊ |
and encrypting the central directory data structure to reduce information ␊ |
leakage. Information leakage can occur in legacy ZIP applications ␊ |
through exposure of information about a file even though that file is ␊ |
stored encrypted. The information exposed consists of file ␊ |
characteristics stored within the records and fields defined by this ␊ |
specification. This includes data such as a files name, its original ␊ |
size, timestamp and CRC32 value. ␊ |
␊ |
Version 6.3 introduces support for encrypting data using the Blowfish␊ |
and Twofish algorithms. These are symmetric block ciphers developed ␊ |
by Bruce Schneier. Blowfish supports using a variable length key from ␊ |
32 to 448 bits. Block size is 64 bits. Implementations should use 16␊ |
rounds and the only mode supported within ZIP files is CBC. Twofish ␊ |
supports key sizes 128, 192 and 256 bits. Block size is 128 bits. ␊ |
Implementations should use 16 rounds and the only mode supported within␊ |
ZIP files is CBC. Information and source code for both Blowfish and ␊ |
Twofish algorithms can be found on the internet. Consult with the author␊ |
of these algorithms for information on terms or restrictions on use.␊ |
␊ |
Central Directory Encryption provides greater protection against ␊ |
information leakage by encrypting the Central Directory structure and ␊ |
by masking key values that are replicated in the unencrypted Local ␊ |
Header. ZIP compatible programs that cannot interpret an encrypted ␊ |
Central Directory structure cannot rely on the data in the corresponding ␊ |
Local Header for decompression information. ␊ |
␊ |
Extra Field records that may contain information about a file that should ␊ |
not be exposed should not be stored in the Local Header and should only ␊ |
be written to the Central Directory where they can be encrypted. This ␊ |
design currently does not support streaming. Information in the End of ␊ |
Central Directory record, the Zip64 End of Central Directory Locator, ␊ |
and the Zip64 End of Central Directory records are not encrypted. Access ␊ |
to view data on files within a ZIP file with an encrypted Central Directory␊ |
requires the appropriate password or private key for decryption prior to ␊ |
viewing any files, or any information about the files, in the archive. ␊ |
␊ |
Older ZIP compatible programs not familiar with the Central Directory ␊ |
Encryption feature will no longer be able to recognize the Central ␊ |
Directory and may assume the ZIP file is corrupt. Programs that ␊ |
attempt streaming access using Local Headers will see invalid ␊ |
information for each file. Central Directory Encryption need not be ␊ |
used for every ZIP file. Its use is recommended for greater security. ␊ |
ZIP files not using Central Directory Encryption should operate as ␊ |
in the past. ␊ |
␊ |
This strong encryption feature specification is intended to provide for ␊ |
scalable, cross-platform encryption needs ranging from simple password␊ |
encryption to authenticated public/private key encryption. ␊ |
␊ |
Encryption provides data confidentiality and privacy. It is ␊ |
recommended that you combine X.509 digital signing with encryption ␊ |
to add authentication and non-repudiation.␊ |
␊ |
␊ |
Single Password Symmetric Encryption Method:␊ |
-------------------------------------------␊ |
␊ |
The Single Password Symmetric Encryption Method using strong ␊ |
encryption algorithms operates similarly to the traditional ␊ |
PKWARE encryption defined in this format. Additional data ␊ |
structures are added to support the processing needs of the ␊ |
strong algorithms.␊ |
␊ |
The Strong Encryption data structures are:␊ |
␊ |
1. General Purpose Bits - Bits 0 and 6 of the General Purpose bit ␊ |
flag in both local and central header records. Both bits set ␊ |
indicates strong encryption. Bit 13, when set indicates the Central␊ |
Directory is encrypted and that selected fields in the Local Header␊ |
are masked to hide their actual value.␊ |
␊ |
␊ |
2. Extra Field 0x0017 in central header only.␊ |
␊ |
Fields to consider in this record are:␊ |
␊ |
Format - the data format identifier for this record. The only␊ |
value allowed at this time is the integer value 2.␊ |
␊ |
AlgId - integer identifier of the encryption algorithm from the␊ |
following range␊ |
␊ |
0x6601 - DES␊ |
0x6602 - RC2 (version needed to extract < 5.2)␊ |
0x6603 - 3DES 168␊ |
0x6609 - 3DES 112␊ |
0x660E - AES 128 ␊ |
0x660F - AES 192 ␊ |
0x6610 - AES 256 ␊ |
0x6702 - RC2 (version needed to extract >= 5.2)␊ |
0x6720 - Blowfish␊ |
0x6721 - Twofish␊ |
0x6801 - RC4␊ |
0xFFFF - Unknown algorithm␊ |
␊ |
Bitlen - Explicit bit length of key␊ |
␊ |
32 - 448 bits␊ |
␊ |
Flags - Processing flags needed for decryption␊ |
␊ |
0x0001 - Password is required to decrypt␊ |
0x0002 - Certificates only␊ |
0x0003 - Password or certificate required to decrypt␊ |
␊ |
Values > 0x0003 reserved for certificate processing␊ |
␊ |
␊ |
3. Decryption header record preceding compressed file data.␊ |
␊ |
-Decryption Header:␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
IVSize 2 bytes Size of initialization vector (IV)␊ |
IVData IVSize Initialization vector for this file␊ |
Size 4 bytes Size of remaining decryption header data␊ |
Format 2 bytes Format definition for this record␊ |
AlgID 2 bytes Encryption algorithm identifier␊ |
Bitlen 2 bytes Bit length of encryption key␊ |
Flags 2 bytes Processing flags␊ |
ErdSize 2 bytes Size of Encrypted Random Data␊ |
ErdData ErdSize Encrypted Random Data␊ |
Reserved1 4 bytes Reserved certificate processing data␊ |
Reserved2 (var) Reserved for certificate processing data␊ |
VSize 2 bytes Size of password validation data␊ |
VData VSize-4 Password validation data␊ |
VCRC32 4 bytes Standard ZIP CRC32 of password validation data␊ |
␊ |
IVData - The size of the IV should match the algorithm block size.␊ |
The IVData can be completely random data. If the size of␊ |
the randomly generated data does not match the block size␊ |
it should be complemented with zero's or truncated as␊ |
necessary. If IVSize is 0,then IV = CRC32 + Uncompressed␊ |
File Size (as a 64 bit little-endian, unsigned integer value).␊ |
␊ |
Format - the data format identifier for this record. The only␊ |
value allowed at this time is the integer value 3.␊ |
␊ |
AlgId - integer identifier of the encryption algorithm from the␊ |
following range␊ |
␊ |
0x6601 - DES␊ |
0x6602 - RC2 (version needed to extract < 5.2)␊ |
0x6603 - 3DES 168␊ |
0x6609 - 3DES 112␊ |
0x660E - AES 128 ␊ |
0x660F - AES 192 ␊ |
0x6610 - AES 256 ␊ |
0x6702 - RC2 (version needed to extract >= 5.2)␊ |
0x6720 - Blowfish␊ |
0x6721 - Twofish␊ |
0x6801 - RC4␊ |
0xFFFF - Unknown algorithm␊ |
␊ |
Bitlen - Explicit bit length of key␊ |
␊ |
32 - 448 bits␊ |
␊ |
Flags - Processing flags needed for decryption␊ |
␊ |
0x0001 - Password is required to decrypt␊ |
0x0002 - Certificates only␊ |
0x0003 - Password or certificate required to decrypt␊ |
␊ |
Values > 0x0003 reserved for certificate processing␊ |
␊ |
ErdData - Encrypted random data is used to store random data that␊ |
is used to generate a file session key for encrypting ␊ |
each file. SHA1 is used to calculate hash data used to ␊ |
derive keys. File session keys are derived from a master ␊ |
session key generated from the user-supplied password.␊ |
If the Flags field in the decryption header contains ␊ |
the value 0x4000, then the ErdData field must be ␊ |
decrypted using 3DES. If the value 0x4000 is not set,␊ |
then the ErdData field must be decrypted using AlgId.␊ |
␊ |
␊ |
Reserved1 - Reserved for certificate processing, if value is␊ |
zero, then Reserved2 data is absent. See the explanation␊ |
under the Certificate Processing Method for details on␊ |
this data structure.␊ |
␊ |
Reserved2 - If present, the size of the Reserved2 data structure ␊ |
is located by skipping the first 4 bytes of this field ␊ |
and using the next 2 bytes as the remaining size. See␊ |
the explanation under the Certificate Processing Method␊ |
for details on this data structure.␊ |
␊ |
VSize - This size value will always include the 4 bytes of the␊ |
VCRC32 data and will be greater than 4 bytes.␊ |
␊ |
VData - Random data for password validation. This data is VSize␊ |
in length and VSize must be a multiple of the encryption␊ |
block size. VCRC32 is a checksum value of VData. ␊ |
VData and VCRC32 are stored encrypted and start the␊ |
stream of encrypted data for a file.␊ |
␊ |
␊ |
4. Useful Tips␊ |
␊ |
Strong Encryption is always applied to a file after compression. The␊ |
block oriented algorithms all operate in Cypher Block Chaining (CBC) ␊ |
mode. The block size used for AES encryption is 16. All other block␊ |
algorithms use a block size of 8. Two ID's are defined for RC2 to ␊ |
account for a discrepancy found in the implementation of the RC2␊ |
algorithm in the cryptographic library on Windows XP SP1 and all ␊ |
earlier versions of Windows. It is recommended that zero length files␊ |
not be encrypted, however programs should be prepared to extract them␊ |
if they are found within a ZIP file.␊ |
␊ |
A pseudo-code representation of the encryption process is as follows:␊ |
␊ |
Password = GetUserPassword()␊ |
MasterSessionKey = DeriveKey(SHA1(Password)) ␊ |
RD = CryptographicStrengthRandomData() ␊ |
For Each File␊ |
IV = CryptographicStrengthRandomData() ␊ |
VData = CryptographicStrengthRandomData()␊ |
VCRC32 = CRC32(VData)␊ |
FileSessionKey = DeriveKey(SHA1(IV + RD) ␊ |
ErdData = Encrypt(RD,MasterSessionKey,IV) ␊ |
Encrypt(VData + VCRC32 + FileData, FileSessionKey,IV)␊ |
Done␊ |
␊ |
The function names and parameter requirements will depend on␊ |
the choice of the cryptographic toolkit selected. Almost any␊ |
toolkit supporting the reference implementations for each␊ |
algorithm can be used. The RSA BSAFE(r), OpenSSL, and Microsoft␊ |
CryptoAPI libraries are all known to work well. ␊ |
␊ |
␊ |
Single Password - Central Directory Encryption:␊ |
-----------------------------------------------␊ |
␊ |
Central Directory Encryption is achieved within the .ZIP format by ␊ |
encrypting the Central Directory structure. This encapsulates the metadata ␊ |
most often used for processing .ZIP files. Additional metadata is stored for ␊ |
redundancy in the Local Header for each file. The process of concealing ␊ |
metadata by encrypting the Central Directory does not protect the data within ␊ |
the Local Header. To avoid information leakage from the exposed metadata ␊ |
in the Local Header, the fields containing information about a file are masked. ␊ |
␊ |
Local Header:␊ |
␊ |
Masking replaces the true content of the fields for a file in the Local ␊ |
Header with false information. When masked, the Local Header is not ␊ |
suitable for streaming access and the options for data recovery of damaged␊ |
archives is reduced. Extra Data fields that may contain confidential␊ |
data should not be stored within the Local Header. The value set into␊ |
the Version needed to extract field should be the correct value needed to␊ |
extract the file without regard to Central Directory Encryption. The fields ␊ |
within the Local Header targeted for masking when the Central Directory is ␊ |
encrypted are:␊ |
␊ |
Field Name Mask Value␊ |
------------------ ---------------------------␊ |
compression method 0␊ |
last mod file time 0␊ |
last mod file date 0␊ |
crc-32 0␊ |
compressed size 0␊ |
uncompressed size 0␊ |
file name (variable size) Base 16 value from the␊ |
range 1 - 0xFFFFFFFFFFFFFFFF␊ |
represented as a string whose␊ |
size will be set into the␊ |
file name length field␊ |
␊ |
The Base 16 value assigned as a masked file name is simply a sequentially␊ |
incremented value for each file starting with 1 for the first file. ␊ |
Modifications to a ZIP file may cause different values to be stored for ␊ |
each file. For compatibility, the file name field in the Local Header ␊ |
should never be left blank. As of Version 6.2 of this specification, ␊ |
the Compression Method and Compressed Size fields are not yet masked.␊ |
Fields having a value of 0xFFFF or 0xFFFFFFFF for the ZIP64 format␊ |
should not be masked. ␊ |
␊ |
Encrypting the Central Directory:␊ |
␊ |
Encryption of the Central Directory does not include encryption of the ␊ |
Central Directory Signature data, the Zip64 End of Central Directory␊ |
record, the Zip64 End of Central Directory Locator, or the End␊ |
of Central Directory record. The ZIP file comment data is never␊ |
encrypted.␊ |
␊ |
Before encrypting the Central Directory, it may optionally be compressed.␊ |
Compression is not required, but for storage efficiency it is assumed␊ |
this structure will be compressed before encrypting. Similarly, this ␊ |
specification supports compressing the Central Directory without␊ |
requiring that it also be encrypted. Early implementations of this␊ |
feature will assume the encryption method applied to files matches the ␊ |
encryption applied to the Central Directory.␊ |
␊ |
Encryption of the Central Directory is done in a manner similar to␊ |
that of file encryption. The encrypted data is preceded by a ␊ |
decryption header. The decryption header is known as the Archive␊ |
Decryption Header. The fields of this record are identical to␊ |
the decryption header preceding each encrypted file. The location␊ |
of the Archive Decryption Header is determined by the value in the␊ |
Start of the Central Directory field in the Zip64 End of Central␊ |
Directory record. When the Central Directory is encrypted, the␊ |
Zip64 End of Central Directory record will always be present.␊ |
␊ |
The layout of the Zip64 End of Central Directory record for all␊ |
versions starting with 6.2 of this specification will follow the␊ |
Version 2 format. The Version 2 format is as follows:␊ |
␊ |
The leading fixed size fields within the Version 1 format for this␊ |
record remain unchanged. The record signature for both Version 1 ␊ |
and Version 2 will be 0x06064b50. Immediately following the last␊ |
byte of the field known as the Offset of Start of Central ␊ |
Directory With Respect to the Starting Disk Number will begin the ␊ |
new fields defining Version 2 of this record. ␊ |
␊ |
New fields for Version 2:␊ |
␊ |
Note: all fields stored in Intel low-byte/high-byte order.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
Compression Method 2 bytes Method used to compress the␊ |
Central Directory␊ |
Compressed Size 8 bytes Size of the compressed data␊ |
Original Size 8 bytes Original uncompressed size␊ |
AlgId 2 bytes Encryption algorithm ID␊ |
BitLen 2 bytes Encryption key length␊ |
Flags 2 bytes Encryption flags␊ |
HashID 2 bytes Hash algorithm identifier␊ |
Hash Length 2 bytes Length of hash data␊ |
Hash Data (variable) Hash data␊ |
␊ |
The Compression Method accepts the same range of values as the ␊ |
corresponding field in the Central Header.␊ |
␊ |
The Compressed Size and Original Size values will not include the␊ |
data of the Central Directory Signature which is compressed or␊ |
encrypted.␊ |
␊ |
The AlgId, BitLen, and Flags fields accept the same range of values␊ |
the corresponding fields within the 0x0017 record. ␊ |
␊ |
Hash ID identifies the algorithm used to hash the Central Directory ␊ |
data. This data does not have to be hashed, in which case the␊ |
values for both the HashID and Hash Length will be 0. Possible ␊ |
values for HashID are:␊ |
␊ |
Value Algorithm␊ |
------ ---------␊ |
0x0000 none␊ |
0x0001 CRC32␊ |
0x8003 MD5␊ |
0x8004 SHA1␊ |
0x8007 RIPEMD160␊ |
0x800C SHA256␊ |
0x800D SHA384␊ |
0x800E SHA512␊ |
␊ |
When the Central Directory data is signed, the same hash algorithm␊ |
used to hash the Central Directory for signing should be used.␊ |
This is recommended for processing efficiency, however, it is ␊ |
permissible for any of the above algorithms to be used independent ␊ |
of the signing process.␊ |
␊ |
The Hash Data will contain the hash data for the Central Directory.␊ |
The length of this data will vary depending on the algorithm used.␊ |
␊ |
The Version Needed to Extract should be set to 62.␊ |
␊ |
The value for the Total Number of Entries on the Current Disk will␊ |
be 0. These records will no longer support random access when␊ |
encrypting the Central Directory.␊ |
␊ |
When the Central Directory is compressed and/or encrypted, the␊ |
End of Central Directory record will store the value 0xFFFFFFFF␊ |
as the value for the Total Number of Entries in the Central␊ |
Directory. The value stored in the Total Number of Entries in␊ |
the Central Directory on this Disk field will be 0. The actual␊ |
values will be stored in the equivalent fields of the Zip64␊ |
End of Central Directory record.␊ |
␊ |
Decrypting and decompressing the Central Directory is accomplished␊ |
in the same manner as decrypting and decompressing a file.␊ |
␊ |
Certificate Processing Method:␊ |
-----------------------------␊ |
␊ |
The Certificate Processing Method of for ZIP file encryption ␊ |
defines the following additional data fields:␊ |
␊ |
1. Certificate Flag Values␊ |
␊ |
Additional processing flags that can be present in the Flags field of both ␊ |
the 0x0017 field of the central directory Extra Field and the Decryption ␊ |
header record preceding compressed file data are:␊ |
␊ |
0x0007 - reserved for future use␊ |
0x000F - reserved for future use␊ |
0x0100 - Indicates non-OAEP key wrapping was used. If this␊ |
this field is set, the version needed to extract must␊ |
be at least 61. This means OAEP key wrapping is not␊ |
used when generating a Master Session Key using␊ |
ErdData.␊ |
0x4000 - ErdData must be decrypted using 3DES-168, otherwise use the␊ |
same algorithm used for encrypting the file contents.␊ |
0x8000 - reserved for future use␊ |
␊ |
␊ |
2. CertData - Extra Field 0x0017 record certificate data structure␊ |
␊ |
The data structure used to store certificate data within the section␊ |
of the Extra Field defined by the CertData field of the 0x0017␊ |
record are as shown:␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
RCount 4 bytes Number of recipients. ␊ |
HashAlg 2 bytes Hash algorithm identifier␊ |
HSize 2 bytes Hash size␊ |
SRList (var) Simple list of recipients hashed public keys␊ |
␊ |
␊ |
RCount This defines the number intended recipients whose ␊ |
public keys were used for encryption. This identifies␊ |
the number of elements in the SRList.␊ |
␊ |
HashAlg This defines the hash algorithm used to calculate␊ |
the public key hash of each public key used␊ |
for encryption. This field currently supports␊ |
only the following value for SHA-1␊ |
␊ |
0x8004 - SHA1␊ |
␊ |
HSize This defines the size of a hashed public key.␊ |
␊ |
SRList This is a variable length list of the hashed ␊ |
public keys for each intended recipient. Each ␊ |
element in this list is HSize. The total size of ␊ |
SRList is determined using RCount * HSize.␊ |
␊ |
␊ |
3. Reserved1 - Certificate Decryption Header Reserved1 Data:␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
RCount 4 bytes Number of recipients. ␊ |
␊ |
RCount This defines the number intended recipients whose ␊ |
public keys were used for encryption. This defines␊ |
the number of elements in the REList field defined below.␊ |
␊ |
␊ |
4. Reserved2 - Certificate Decryption Header Reserved2 Data Structures:␊ |
␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
HashAlg 2 bytes Hash algorithm identifier␊ |
HSize 2 bytes Hash size␊ |
REList (var) List of recipient data elements␊ |
␊ |
␊ |
HashAlg This defines the hash algorithm used to calculate␊ |
the public key hash of each public key used␊ |
for encryption. This field currently supports␊ |
only the following value for SHA-1␊ |
␊ |
0x8004 - SHA1␊ |
␊ |
HSize This defines the size of a hashed public key␊ |
defined in REHData.␊ |
␊ |
REList This is a variable length of list of recipient data. ␊ |
Each element in this list consists of a Recipient␊ |
Element data structure as follows:␊ |
␊ |
␊ |
Recipient Element (REList) Data Structure:␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
RESize 2 bytes Size of REHData + REKData␊ |
REHData HSize Hash of recipients public key␊ |
REKData (var) Simple key blob␊ |
␊ |
␊ |
RESize This defines the size of an individual REList ␊ |
element. This value is the combined size of the␊ |
REHData field + REKData field. REHData is defined by␊ |
HSize. REKData is variable and can be calculated␊ |
for each REList element using RESize and HSize.␊ |
␊ |
REHData Hashed public key for this recipient.␊ |
␊ |
REKData Simple Key Blob. The format of this data structure␊ |
is identical to that defined in the Microsoft␊ |
CryptoAPI and generated using the CryptExportKey()␊ |
function. The version of the Simple Key Blob␊ |
supported at this time is 0x02 as defined by␊ |
Microsoft.␊ |
␊ |
Certificate Processing - Central Directory Encryption:␊ |
------------------------------------------------------␊ |
␊ |
Central Directory Encryption using Digital Certificates will ␊ |
operate in a manner similar to that of Single Password Central␊ |
Directory Encryption. This record will only be present when there ␊ |
is data to place into it. Currently, data is placed into this␊ |
record when digital certificates are used for either encrypting ␊ |
or signing the files within a ZIP file. When only password ␊ |
encryption is used with no certificate encryption or digital ␊ |
signing, this record is not currently needed. When present, this ␊ |
record will appear before the start of the actual Central Directory ␊ |
data structure and will be located immediately after the Archive ␊ |
Decryption Header if the Central Directory is encrypted.␊ |
␊ |
The Archive Extra Data record will be used to store the following␊ |
information. Additional data may be added in future versions.␊ |
␊ |
Extra Data Fields:␊ |
␊ |
0x0014 - PKCS#7 Store for X.509 Certificates␊ |
0x0016 - X.509 Certificate ID and Signature for central directory␊ |
0x0019 - PKCS#7 Encryption Recipient Certificate List␊ |
␊ |
The 0x0014 and 0x0016 Extra Data records that otherwise would be ␊ |
located in the first record of the Central Directory for digital ␊ |
certificate processing. When encrypting or compressing the Central ␊ |
Directory, the 0x0014 and 0x0016 records must be located in the ␊ |
Archive Extra Data record and they should not remain in the first ␊ |
Central Directory record. The Archive Extra Data record will also ␊ |
be used to store the 0x0019 data. ␊ |
␊ |
When present, the size of the Archive Extra Data record will be␊ |
included in the size of the Central Directory. The data of the␊ |
Archive Extra Data record will also be compressed and encrypted␊ |
along with the Central Directory data structure.␊ |
␊ |
Certificate Processing Differences:␊ |
␊ |
The Certificate Processing Method of encryption differs from the␊ |
Single Password Symmetric Encryption Method as follows. Instead␊ |
of using a user-defined password to generate a master session key,␊ |
cryptographically random data is used. The key material is then␊ |
wrapped using standard key-wrapping techniques. This key material␊ |
is wrapped using the public key of each recipient that will need␊ |
to decrypt the file using their corresponding private key.␊ |
␊ |
This specification currently assumes digital certificates will follow␊ |
the X.509 V3 format for 1024 bit and higher RSA format digital␊ |
certificates. Implementation of this Certificate Processing Method␊ |
requires supporting logic for key access and management. This logic␊ |
is outside the scope of this specification.␊ |
␊ |
OAEP Processing with Certificate-based Encryption:␊ |
␊ |
OAEP stands for Optimal Asymmetric Encryption Padding. It is a␊ |
strengthening technique used for small encoded items such as decryption␊ |
keys. This is commonly applied in cryptographic key-wrapping techniques␊ |
and is supported by PKCS #1. Versions 5.0 and 6.0 of this specification ␊ |
were designed to support OAEP key-wrapping for certificate-based ␊ |
decryption keys for additional security. ␊ |
␊ |
Support for private keys stored on Smartcards or Tokens introduced␊ |
a conflict with this OAEP logic. Most card and token products do ␊ |
not support the additional strengthening applied to OAEP key-wrapped ␊ |
data. In order to resolve this conflict, versions 6.1 and above of this ␊ |
specification will no longer support OAEP when encrypting using ␊ |
digital certificates. ␊ |
␊ |
Versions of PKZIP available during initial development of the ␊ |
certificate processing method set a value of 61 into the ␊ |
version needed to extract field for a file. This indicates that ␊ |
non-OAEP key wrapping is used. This affects certificate encryption ␊ |
only, and password encryption functions should not be affected by ␊ |
this value. This means values of 61 may be found on files encrypted␊ |
with certificates only, or on files encrypted with both password␊ |
encryption and certificate encryption. Files encrypted with both␊ |
methods can safely be decrypted using the password methods documented.␊ |
␊ |
XVII. Change Process␊ |
--------------------␊ |
␊ |
In order for the .ZIP file format to remain a viable definition, this␊ |
specification should be considered as open for periodic review and␊ |
revision. Although this format was originally designed with a ␊ |
certain level of extensibility, not all changes in technology␊ |
(present or future) were or will be necessarily considered in its␊ |
design. If your application requires new definitions to the␊ |
extensible sections in this format, or if you would like to ␊ |
submit new data structures, please forward your request to␊ |
zipformat@pkware.com. All submissions will be reviewed by the␊ |
ZIP File Specification Committee for possible inclusion into␊ |
future versions of this specification. Periodic revisions␊ |
to this specification will be published to ensure interoperability. ␊ |
We encourage comments and feedback that may help improve clarity ␊ |
or content.␊ |
␊ |
XVIII. Incorporating PKWARE Proprietary Technology into Your Product␊ |
--------------------------------------------------------------------␊ |
␊ |
PKWARE is committed to the interoperability and advancement of the␊ |
.ZIP format. PKWARE offers a free license for certain technological␊ |
aspects described above under certain restrictions and conditions.␊ |
However, the use or implementation in a product of certain technological␊ |
aspects set forth in the current APPNOTE, including those with regard to␊ |
strong encryption, patching, or extended tape operations requires a ␊ |
license from PKWARE. Please contact PKWARE with regard to acquiring␊ |
a license.␊ |
␊ |
XIX. Acknowledgements␊ |
----------------------␊ |
␊ |
In addition to the above mentioned contributors to PKZIP and PKUNZIP,␊ |
I would like to extend special thanks to Robert Mahoney for suggesting␊ |
the extension .ZIP for this software.␊ |
␊ |
XX. References␊ |
--------------␊ |
␊ |
Fiala, Edward R., and Greene, Daniel H., "Data compression with␊ |
finite windows", Communications of the ACM, Volume 32, Number 4,␊ |
April 1989, pages 490-505.␊ |
␊ |
Held, Gilbert, "Data Compression, Techniques and Applications,␊ |
Hardware and Software Considerations", John Wiley & Sons, 1987.␊ |
␊ |
Huffman, D.A., "A method for the construction of minimum-redundancy␊ |
codes", Proceedings of the IRE, Volume 40, Number 9, September 1952,␊ |
pages 1098-1101.␊ |
␊ |
Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14,␊ |
Number 10, October 1989, pages 29-37.␊ |
␊ |
Nelson, Mark, "The Data Compression Book", M&T Books, 1991.␊ |
␊ |
Storer, James A., "Data Compression, Methods and Theory",␊ |
Computer Science Press, 1988␊ |
␊ |
Welch, Terry, "A Technique for High-Performance Data Compression",␊ |
IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19.␊ |
␊ |
Ziv, J. and Lempel, A., "A universal algorithm for sequential data␊ |
compression", Communications of the ACM, Volume 30, Number 6,␊ |
June 1987, pages 520-540.␊ |
␊ |
Ziv, J. and Lempel, A., "Compression of individual sequences via␊ |
variable-rate coding", IEEE Transactions on Information Theory,␊ |
Volume 24, Number 5, September 1978, pages 530-536.␊ |
␊ |
␊ |
APPENDIX A - AS/400 Extra Field (0x0065) Attribute Definitions␊ |
--------------------------------------------------------------␊ |
␊ |
Field Definition Structure:␊ |
␊ |
a. field length including length 2 bytes␊ |
b. field code 2 bytes␊ |
c. data x bytes␊ |
␊ |
Field Code Description␊ |
4001 Source type i.e. CLP etc␊ |
4002 The text description of the library ␊ |
4003 The text description of the file␊ |
4004 The text description of the member␊ |
4005 x'F0' or 0 is PF-DTA, x'F1' or 1 is PF_SRC␊ |
4007 Database Type Code 1 byte␊ |
4008 Database file and fields definition␊ |
4009 GZIP file type 2 bytes␊ |
400B IFS code page 2 bytes␊ |
400C IFS Creation Time 4 bytes␊ |
400D IFS Access Time 4 bytes␊ |
400E IFS Modification time 4 bytes␊ |
005C Length of the records in the file 2 bytes␊ |
0068 GZIP two words 8 bytes␊ |
␊ |
APPENDIX B - z/OS Extra Field (0x0065) Attribute Definitions␊ |
------------------------------------------------------------␊ |
␊ |
Field Definition Structure:␊ |
␊ |
a. field length including length 2 bytes␊ |
b. field code 2 bytes␊ |
c. data x bytes␊ |
␊ |
Field Code Description␊ |
0001 File Type 2 bytes ␊ |
0002 NonVSAM Record Format 1 byte␊ |
0003 Reserved␉␉␊ |
0004 NonVSAM Block Size 2 bytes Big Endian␊ |
0005 Primary Space Allocation 3 bytes Big Endian␊ |
0006 Secondary Space Allocation 3 bytes Big Endian␊ |
0007 Space Allocation Type1 byte flag␉␉␊ |
0008 Modification Date Retired with PKZIP 5.0 +␊ |
0009 Expiration Date Retired with PKZIP 5.0 +␊ |
000A PDS Directory Block Allocation 3 bytes Big Endian binary value␊ |
000B NonVSAM Volume List variable␉␉␊ |
000C UNIT Reference Retired with PKZIP 5.0 +␊ |
000D DF/SMS Management Class 8 bytes EBCDIC Text Value␊ |
000E DF/SMS Storage Class 8 bytes EBCDIC Text Value␊ |
000F DF/SMS Data Class 8 bytes EBCDIC Text Value␊ |
0010 PDS/PDSE Member Info. 30 bytes␉␊ |
0011 VSAM sub-filetype 2 bytes␉␉␊ |
0012 VSAM LRECL 13 bytes EBCDIC "(num_avg num_max)"␊ |
0013 VSAM Cluster Name Retired with PKZIP 5.0 +␊ |
0014 VSAM KSDS Key Information 13 bytes EBCDIC "(num_length num_position)"␊ |
0015 VSAM Average LRECL 5 bytes EBCDIC num_value padded with blanks␊ |
0016 VSAM Maximum LRECL 5 bytes EBCDIC num_value padded with blanks␊ |
0017 VSAM KSDS Key Length 5 bytes EBCDIC num_value padded with blanks␊ |
0018 VSAM KSDS Key Position 5 bytes EBCDIC num_value padded with blanks␊ |
0019 VSAM Data Name 1-44 bytes EBCDIC text string␊ |
001A VSAM KSDS Index Name 1-44 bytes EBCDIC text string␊ |
001B VSAM Catalog Name 1-44 bytes EBCDIC text string␊ |
001C VSAM Data Space Type 9 bytes EBCDIC text string␊ |
001D VSAM Data Space Primary 9 bytes EBCDIC num_value left-justified␊ |
001E VSAM Data Space Secondary 9 bytes EBCDIC num_value left-justified␊ |
001F VSAM Data Volume List variable EBCDIC text list of 6-character Volume IDs␊ |
0020 VSAM Data Buffer Space 8 bytes EBCDIC num_value left-justified␊ |
0021 VSAM Data CISIZE 5 bytes EBCDIC num_value left-justified␊ |
0022 VSAM Erase Flag 1 byte flag␉␉␊ |
0023 VSAM Free CI % 3 bytes EBCDIC num_value left-justified␊ |
0024 VSAM Free CA % 3 bytes EBCDIC num_value left-justified␊ |
0025 VSAM Index Volume List variable EBCDIC text list of 6-character Volume IDs␊ |
0026 VSAM Ordered Flag 1 byte flag␉␉␊ |
0027 VSAM REUSE Flag 1 byte flag␉␉␊ |
0028 VSAM SPANNED Flag 1 byte flag␉␉␊ |
0029 VSAM Recovery Flag 1 byte flag␉␉␊ |
002A VSAM WRITECHK Flag 1 byte flag␉␉␊ |
002B VSAM Cluster/Data SHROPTS 3 bytes EBCDIC "n,y"␉␊ |
002C VSAM Index SHROPTS 3 bytes EBCDIC "n,y"␉␊ |
002D VSAM Index Space Type 9 bytes EBCDIC text string␊ |
002E VSAM Index Space Primary 9 bytes EBCDIC num_value left-justified␊ |
002F VSAM Index Space Secondary 9 bytes EBCDIC num_value left-justified␊ |
0030 VSAM Index CISIZE 5 bytes EBCDIC num_value left-justified␊ |
0031 VSAM Index IMBED 1 byte flag␉␉␊ |
0032 VSAM Index Ordered Flag 1 byte flag␉␉␊ |
0033 VSAM REPLICATE Flag 1 byte flag␉␉␊ |
0034 VSAM Index REUSE Flag 1 byte flag␉␉␊ |
0035 VSAM Index WRITECHK Flag 1 byte flag Retired with PKZIP 5.0 +␊ |
0036 VSAM Owner 8 bytes EBCDIC text string␊ |
0037 VSAM Index Owner 8 bytes EBCDIC text string␊ |
0038 Reserved␊ |
0039 Reserved␊ |
003A Reserved␊ |
003B Reserved␊ |
003C Reserved␊ |
003D Reserved␊ |
003E Reserved␊ |
003F Reserved␊ |
0040 Reserved␊ |
0041 Reserved␊ |
0042 Reserved␊ |
0043 Reserved␊ |
0044 Reserved␊ |
0045 Reserved␊ |
0046 Reserved␊ |
0047 Reserved␊ |
0048 Reserved␊ |
0049 Reserved␊ |
004A Reserved␊ |
004B Reserved␊ |
004C Reserved␊ |
004D Reserved␊ |
004E Reserved␊ |
004F Reserved␊ |
0050 Reserved␊ |
0051 Reserved␊ |
0052 Reserved␊ |
0053 Reserved␊ |
0054 Reserved␊ |
0055 Reserved␊ |
0056 Reserved␊ |
0057 Reserved␊ |
0058 PDS/PDSE Member TTR Info. 6 bytes Big Endian␊ |
0059 PDS 1st LMOD Text TTR 3 bytes Big Endian␊ |
005A PDS LMOD EP Rec # 4 bytes Big Endian␊ |
005B Reserved␊ |
005C Max Length of records 2 bytes Big Endian␊ |
005D PDSE Flag 1 byte flag␊ |
005E Reserved␊ |
005F Reserved␊ |
0060 Reserved␊ |
0061 Reserved␊ |
0062 Reserved␊ |
0063 Reserved␊ |
0064 Reserved␊ |
0065 Last Date Referenced 4 bytes Packed Hex "yyyymmdd"␊ |
0066 Date Created 4 bytes Packed Hex "yyyymmdd"␊ |
0068 GZIP two words 8 bytes␊ |
0071 Extended NOTE Location 12 bytes Big Endian␊ |
0072 Archive device UNIT 6 bytes EBCDIC␊ |
0073 Archive 1st Volume 6 bytes EBCDIC␊ |
0074 Archive 1st VOL File Seq# 2 bytes Binary␊ |
␊ |
APPENDIX C - Zip64 Extensible Data Sector Mappings (EFS)␊ |
--------------------------------------------------------␊ |
␊ |
-Z390 Extra Field:␊ |
␊ |
The following is the general layout of the attributes for the ␊ |
ZIP 64 "extra" block for extended tape operations. Portions of ␊ |
this extended tape processing technology is covered under a ␊ |
pending patent application. The use or implementation in a ␊ |
product of certain technological aspects set forth in the ␊ |
current APPNOTE, including those with regard to strong encryption,␊ |
patching or extended tape operations, requires a license from␊ |
PKWARE. Please contact PKWARE with regard to acquiring a license. ␊ |
␊ |
␊ |
Note: some fields stored in Big Endian format. All text is ␊ |
␉ in EBCDIC format unless otherwise specified.␊ |
␊ |
Value Size Description␊ |
----- ---- -----------␊ |
(Z390) 0x0065 2 bytes Tag for this "extra" block type␊ |
Size 4 bytes Size for the following data block␊ |
Tag 4 bytes EBCDIC "Z390"␊ |
Length71 2 bytes Big Endian␊ |
Subcode71 2 bytes Enote type code␊ |
FMEPos 1 byte␊ |
Length72 2 bytes Big Endian␊ |
Subcode72 2 bytes Unit type code␊ |
Unit 1 byte Unit␊ |
Length73 2 bytes Big Endian␊ |
Subcode73 2 bytes Volume1 type code␊ |
FirstVol 1 byte Volume␊ |
Length74 2 bytes Big Endian␊ |
Subcode74 2 bytes FirstVol file sequence␊ |
FileSeq 2 bytes Sequence ␊ |
␊ |
APPENDIX D - Language Encoding (EFS)␊ |
------------------------------------␊ |
␊ |
The ZIP format has historically supported only the original IBM PC character ␊ |
encoding set, commonly referred to as IBM Code Page 437. This limits storing ␊ |
file name characters to only those within the original MS-DOS range of values ␊ |
and does not properly support file names in other character encodings, or ␊ |
languages. To address this limitation, this specification will support the ␊ |
following change. ␊ |
␊ |
If general purpose bit 11 is unset, the file name and comment should conform ␊ |
to the original ZIP character encoding. If general purpose bit 11 is set, the ␊ |
filename and comment must support The Unicode Standard, Version 4.1.0 or ␊ |
greater using the character encoding form defined by the UTF-8 storage ␊ |
specification. The Unicode Standard is published by the The Unicode␊ |
Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files ␊ |
is expected to not include a byte order mark (BOM). ␊ |
␊ |
Applications may choose to supplement this file name storage through the use ␊ |
of the 0x0008 Extra Field. Storage for this optional field is currently ␊ |
undefined, however it will be used to allow storing extended information ␊ |
on source or target encoding that may further assist applications with file ␊ |
name, or file content encoding tasks. Please contact PKWARE with any␊ |
requirements on how this field should be used.␊ |
␊ |
The 0x0008 Extra Field storage may be used with either setting for general ␊ |
purpose bit 11. Examples of the intended usage for this field is to store ␊ |
whether "modified-UTF-8" (JAVA) is used, or UTF-8-MAC. Similarly, other ␊ |
commonly used character encoding (code page) designations can be indicated ␊ |
through this field. Formalized values for use of the 0x0008 record remain ␊ |
undefined at this time. The definition for the layout of the 0x0008 field␊ |
will be published when available. Use of the 0x0008 Extra Field provides␊ |
for storing data within a ZIP file in an encoding other than IBM Code␊ |
Page 437 or UTF-8.␊ |
␊ |
General purpose bit 11 will not imply any encoding of file content or␊ |
password. Values defining character encoding for file content or ␊ |
password must be stored within the 0x0008 Extended Language Encoding ␊ |
Extra Field.␊ |
␊ |
␊ |