Warning, /cpukit/compression/zlib/doc/rfc1950.txt is written in an unsupported language. File is not indexed.
0001
0002
0003
0004
0005
0006
0007 Network Working Group P. Deutsch
0008 Request for Comments: 1950 Aladdin Enterprises
0009 Category: Informational J-L. Gailly
0010 Info-ZIP
0011 May 1996
0012
0013
0014 ZLIB Compressed Data Format Specification version 3.3
0015
0016 Status of This Memo
0017
0018 This memo provides information for the Internet community. This memo
0019 does not specify an Internet standard of any kind. Distribution of
0020 this memo is unlimited.
0021
0022 IESG Note:
0023
0024 The IESG takes no position on the validity of any Intellectual
0025 Property Rights statements contained in this document.
0026
0027 Notices
0028
0029 Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly
0030
0031 Permission is granted to copy and distribute this document for any
0032 purpose and without charge, including translations into other
0033 languages and incorporation into compilations, provided that the
0034 copyright notice and this notice are preserved, and that any
0035 substantive changes or deletions from the original are clearly
0036 marked.
0037
0038 A pointer to the latest version of this and related documentation in
0039 HTML format can be found at the URL
0040 <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
0041
0042 Abstract
0043
0044 This specification defines a lossless compressed data format. The
0045 data can be produced or consumed, even for an arbitrarily long
0046 sequentially presented input data stream, using only an a priori
0047 bounded amount of intermediate storage. The format presently uses
0048 the DEFLATE compression method but can be easily extended to use
0049 other compression methods. It can be implemented readily in a manner
0050 not covered by patents. This specification also defines the ADLER-32
0051 checksum (an extension and improvement of the Fletcher checksum),
0052 used for detection of data corruption, and provides an algorithm for
0053 computing it.
0054
0055
0056
0057
0058 Deutsch & Gailly Informational [Page 1]
0059
0060 RFC 1950 ZLIB Compressed Data Format Specification May 1996
0061
0062
0063 Table of Contents
0064
0065 1. Introduction ................................................... 2
0066 1.1. Purpose ................................................... 2
0067 1.2. Intended audience ......................................... 3
0068 1.3. Scope ..................................................... 3
0069 1.4. Compliance ................................................ 3
0070 1.5. Definitions of terms and conventions used ................ 3
0071 1.6. Changes from previous versions ............................ 3
0072 2. Detailed specification ......................................... 3
0073 2.1. Overall conventions ....................................... 3
0074 2.2. Data format ............................................... 4
0075 2.3. Compliance ................................................ 7
0076 3. References ..................................................... 7
0077 4. Source code .................................................... 8
0078 5. Security Considerations ........................................ 8
0079 6. Acknowledgements ............................................... 8
0080 7. Authors' Addresses ............................................. 8
0081 8. Appendix: Rationale ............................................ 9
0082 9. Appendix: Sample code ..........................................10
0083
0084 1. Introduction
0085
0086 1.1. Purpose
0087
0088 The purpose of this specification is to define a lossless
0089 compressed data format that:
0090
0091 * Is independent of CPU type, operating system, file system,
0092 and character set, and hence can be used for interchange;
0093
0094 * Can be produced or consumed, even for an arbitrarily long
0095 sequentially presented input data stream, using only an a
0096 priori bounded amount of intermediate storage, and hence can
0097 be used in data communications or similar structures such as
0098 Unix filters;
0099
0100 * Can use a number of different compression methods;
0101
0102 * Can be implemented readily in a manner not covered by
0103 patents, and hence can be practiced freely.
0104
0105 The data format defined by this specification does not attempt to
0106 allow random access to compressed data.
0107
0108
0109
0110
0111
0112
0113
0114 Deutsch & Gailly Informational [Page 2]
0115
0116 RFC 1950 ZLIB Compressed Data Format Specification May 1996
0117
0118
0119 1.2. Intended audience
0120
0121 This specification is intended for use by implementors of software
0122 to compress data into zlib format and/or decompress data from zlib
0123 format.
0124
0125 The text of the specification assumes a basic background in
0126 programming at the level of bits and other primitive data
0127 representations.
0128
0129 1.3. Scope
0130
0131 The specification specifies a compressed data format that can be
0132 used for in-memory compression of a sequence of arbitrary bytes.
0133
0134 1.4. Compliance
0135
0136 Unless otherwise indicated below, a compliant decompressor must be
0137 able to accept and decompress any data set that conforms to all
0138 the specifications presented here; a compliant compressor must
0139 produce data sets that conform to all the specifications presented
0140 here.
0141
0142 1.5. Definitions of terms and conventions used
0143
0144 byte: 8 bits stored or transmitted as a unit (same as an octet).
0145 (For this specification, a byte is exactly 8 bits, even on
0146 machines which store a character on a number of bits different
0147 from 8.) See below, for the numbering of bits within a byte.
0148
0149 1.6. Changes from previous versions
0150
0151 Version 3.1 was the first public release of this specification.
0152 In version 3.2, some terminology was changed and the Adler-32
0153 sample code was rewritten for clarity. In version 3.3, the
0154 support for a preset dictionary was introduced, and the
0155 specification was converted to RFC style.
0156
0157 2. Detailed specification
0158
0159 2.1. Overall conventions
0160
0161 In the diagrams below, a box like this:
0162
0163 +---+
0164 | | <-- the vertical bars might be missing
0165 +---+
0166
0167
0168
0169
0170 Deutsch & Gailly Informational [Page 3]
0171
0172 RFC 1950 ZLIB Compressed Data Format Specification May 1996
0173
0174
0175 represents one byte; a box like this:
0176
0177 +==============+
0178 | |
0179 +==============+
0180
0181 represents a variable number of bytes.
0182
0183 Bytes stored within a computer do not have a "bit order", since
0184 they are always treated as a unit. However, a byte considered as
0185 an integer between 0 and 255 does have a most- and least-
0186 significant bit, and since we write numbers with the most-
0187 significant digit on the left, we also write bytes with the most-
0188 significant bit on the left. In the diagrams below, we number the
0189 bits of a byte so that bit 0 is the least-significant bit, i.e.,
0190 the bits are numbered:
0191
0192 +--------+
0193 |76543210|
0194 +--------+
0195
0196 Within a computer, a number may occupy multiple bytes. All
0197 multi-byte numbers in the format described here are stored with
0198 the MOST-significant byte first (at the lower memory address).
0199 For example, the decimal number 520 is stored as:
0200
0201 0 1
0202 +--------+--------+
0203 |00000010|00001000|
0204 +--------+--------+
0205 ^ ^
0206 | |
0207 | + less significant byte = 8
0208 + more significant byte = 2 x 256
0209
0210 2.2. Data format
0211
0212 A zlib stream has the following structure:
0213
0214 0 1
0215 +---+---+
0216 |CMF|FLG| (more-->)
0217 +---+---+
0218
0219
0220
0221
0222
0223
0224
0225
0226 Deutsch & Gailly Informational [Page 4]
0227
0228 RFC 1950 ZLIB Compressed Data Format Specification May 1996
0229
0230
0231 (if FLG.FDICT set)
0232
0233 0 1 2 3
0234 +---+---+---+---+
0235 | DICTID | (more-->)
0236 +---+---+---+---+
0237
0238 +=====================+---+---+---+---+
0239 |...compressed data...| ADLER32 |
0240 +=====================+---+---+---+---+
0241
0242 Any data which may appear after ADLER32 are not part of the zlib
0243 stream.
0244
0245 CMF (Compression Method and flags)
0246 This byte is divided into a 4-bit compression method and a 4-
0247 bit information field depending on the compression method.
0248
0249 bits 0 to 3 CM Compression method
0250 bits 4 to 7 CINFO Compression info
0251
0252 CM (Compression method)
0253 This identifies the compression method used in the file. CM = 8
0254 denotes the "deflate" compression method with a window size up
0255 to 32K. This is the method used by gzip and PNG (see
0256 references [1] and [2] in Chapter 3, below, for the reference
0257 documents). CM = 15 is reserved. It might be used in a future
0258 version of this specification to indicate the presence of an
0259 extra field before the compressed data.
0260
0261 CINFO (Compression info)
0262 For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
0263 size, minus eight (CINFO=7 indicates a 32K window size). Values
0264 of CINFO above 7 are not allowed in this version of the
0265 specification. CINFO is not defined in this specification for
0266 CM not equal to 8.
0267
0268 FLG (FLaGs)
0269 This flag byte is divided as follows:
0270
0271 bits 0 to 4 FCHECK (check bits for CMF and FLG)
0272 bit 5 FDICT (preset dictionary)
0273 bits 6 to 7 FLEVEL (compression level)
0274
0275 The FCHECK value must be such that CMF and FLG, when viewed as
0276 a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
0277 is a multiple of 31.
0278
0279
0280
0281
0282 Deutsch & Gailly Informational [Page 5]
0283
0284 RFC 1950 ZLIB Compressed Data Format Specification May 1996
0285
0286
0287 FDICT (Preset dictionary)
0288 If FDICT is set, a DICT dictionary identifier is present
0289 immediately after the FLG byte. The dictionary is a sequence of
0290 bytes which are initially fed to the compressor without
0291 producing any compressed output. DICT is the Adler-32 checksum
0292 of this sequence of bytes (see the definition of ADLER32
0293 below). The decompressor can use this identifier to determine
0294 which dictionary has been used by the compressor.
0295
0296 FLEVEL (Compression level)
0297 These flags are available for use by specific compression
0298 methods. The "deflate" method (CM = 8) sets these flags as
0299 follows:
0300
0301 0 - compressor used fastest algorithm
0302 1 - compressor used fast algorithm
0303 2 - compressor used default algorithm
0304 3 - compressor used maximum compression, slowest algorithm
0305
0306 The information in FLEVEL is not needed for decompression; it
0307 is there to indicate if recompression might be worthwhile.
0308
0309 compressed data
0310 For compression method 8, the compressed data is stored in the
0311 deflate compressed data format as described in the document
0312 "DEFLATE Compressed Data Format Specification" by L. Peter
0313 Deutsch. (See reference [3] in Chapter 3, below)
0314
0315 Other compressed data formats are not specified in this version
0316 of the zlib specification.
0317
0318 ADLER32 (Adler-32 checksum)
0319 This contains a checksum value of the uncompressed data
0320 (excluding any dictionary data) computed according to Adler-32
0321 algorithm. This algorithm is a 32-bit extension and improvement
0322 of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
0323 standard. See references [4] and [5] in Chapter 3, below)
0324
0325 Adler-32 is composed of two sums accumulated per byte: s1 is
0326 the sum of all bytes, s2 is the sum of all s1 values. Both sums
0327 are done modulo 65521. s1 is initialized to 1, s2 to zero. The
0328 Adler-32 checksum is stored as s2*65536 + s1 in most-
0329 significant-byte first (network) order.
0330
0331
0332
0333
0334
0335
0336
0337
0338 Deutsch & Gailly Informational [Page 6]
0339
0340 RFC 1950 ZLIB Compressed Data Format Specification May 1996
0341
0342
0343 2.3. Compliance
0344
0345 A compliant compressor must produce streams with correct CMF, FLG
0346 and ADLER32, but need not support preset dictionaries. When the
0347 zlib data format is used as part of another standard data format,
0348 the compressor may use only preset dictionaries that are specified
0349 by this other data format. If this other format does not use the
0350 preset dictionary feature, the compressor must not set the FDICT
0351 flag.
0352
0353 A compliant decompressor must check CMF, FLG, and ADLER32, and
0354 provide an error indication if any of these have incorrect values.
0355 A compliant decompressor must give an error indication if CM is
0356 not one of the values defined in this specification (only the
0357 value 8 is permitted in this version), since another value could
0358 indicate the presence of new features that would cause subsequent
0359 data to be interpreted incorrectly. A compliant decompressor must
0360 give an error indication if FDICT is set and DICTID is not the
0361 identifier of a known preset dictionary. A decompressor may
0362 ignore FLEVEL and still be compliant. When the zlib data format
0363 is being used as a part of another standard format, a compliant
0364 decompressor must support all the preset dictionaries specified by
0365 the other format. When the other format does not use the preset
0366 dictionary feature, a compliant decompressor must reject any
0367 stream in which the FDICT flag is set.
0368
0369 3. References
0370
0371 [1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
0372 available in ftp://ftp.uu.net/pub/archiving/zip/doc/
0373
0374 [2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
0375 available in ftp://ftp.uu.net/graphics/png/documents/
0376
0377 [3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
0378 available in ftp://ftp.uu.net/pub/archiving/zip/doc/
0379
0380 [4] Fletcher, J. G., "An Arithmetic Checksum for Serial
0381 Transmissions," IEEE Transactions on Communications, Vol. COM-30,
0382 No. 1, January 1982, pp. 247-252.
0383
0384 [5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
0385 November, 1993, pp. 144, 145. (Available from
0386 gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.
0387
0388
0389
0390
0391
0392
0393
0394 Deutsch & Gailly Informational [Page 7]
0395
0396 RFC 1950 ZLIB Compressed Data Format Specification May 1996
0397
0398
0399 4. Source code
0400
0401 Source code for a C language implementation of a "zlib" compliant
0402 library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.
0403
0404 5. Security Considerations
0405
0406 A decoder that fails to check the ADLER32 checksum value may be
0407 subject to undetected data corruption.
0408
0409 6. Acknowledgements
0410
0411 Trademarks cited in this document are the property of their
0412 respective owners.
0413
0414 Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
0415 the related software described in this specification. Glenn
0416 Randers-Pehrson converted this document to RFC and HTML format.
0417
0418 7. Authors' Addresses
0419
0420 L. Peter Deutsch
0421 Aladdin Enterprises
0422 203 Santa Margarita Ave.
0423 Menlo Park, CA 94025
0424
0425 Phone: (415) 322-0103 (AM only)
0426 FAX: (415) 322-1734
0427 EMail: <ghost@aladdin.com>
0428
0429
0430 Jean-Loup Gailly
0431
0432 EMail: <gzip@prep.ai.mit.edu>
0433
0434 Questions about the technical content of this specification can be
0435 sent by email to
0436
0437 Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
0438 Mark Adler <madler@alumni.caltech.edu>
0439
0440 Editorial comments on this specification can be sent by email to
0441
0442 L. Peter Deutsch <ghost@aladdin.com> and
0443 Glenn Randers-Pehrson <randeg@alumni.rpi.edu>
0444
0445
0446
0447
0448
0449
0450 Deutsch & Gailly Informational [Page 8]
0451
0452 RFC 1950 ZLIB Compressed Data Format Specification May 1996
0453
0454
0455 8. Appendix: Rationale
0456
0457 8.1. Preset dictionaries
0458
0459 A preset dictionary is specially useful to compress short input
0460 sequences. The compressor can take advantage of the dictionary
0461 context to encode the input in a more compact manner. The
0462 decompressor can be initialized with the appropriate context by
0463 virtually decompressing a compressed version of the dictionary
0464 without producing any output. However for certain compression
0465 algorithms such as the deflate algorithm this operation can be
0466 achieved without actually performing any decompression.
0467
0468 The compressor and the decompressor must use exactly the same
0469 dictionary. The dictionary may be fixed or may be chosen among a
0470 certain number of predefined dictionaries, according to the kind
0471 of input data. The decompressor can determine which dictionary has
0472 been chosen by the compressor by checking the dictionary
0473 identifier. This document does not specify the contents of
0474 predefined dictionaries, since the optimal dictionaries are
0475 application specific. Standard data formats using this feature of
0476 the zlib specification must precisely define the allowed
0477 dictionaries.
0478
0479 8.2. The Adler-32 algorithm
0480
0481 The Adler-32 algorithm is much faster than the CRC32 algorithm yet
0482 still provides an extremely low probability of undetected errors.
0483
0484 The modulo on unsigned long accumulators can be delayed for 5552
0485 bytes, so the modulo operation time is negligible. If the bytes
0486 are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
0487 and order sensitive, unlike the first sum, which is just a
0488 checksum. That 65521 is prime is important to avoid a possible
0489 large class of two-byte errors that leave the check unchanged.
0490 (The Fletcher checksum uses 255, which is not prime and which also
0491 makes the Fletcher check insensitive to single byte changes 0 <->
0492 255.)
0493
0494 The sum s1 is initialized to 1 instead of zero to make the length
0495 of the sequence part of s2, so that the length does not have to be
0496 checked separately. (Any sequence of zeroes has a Fletcher
0497 checksum of zero.)
0498
0499
0500
0501
0502
0503
0504
0505
0506 Deutsch & Gailly Informational [Page 9]
0507
0508 RFC 1950 ZLIB Compressed Data Format Specification May 1996
0509
0510
0511 9. Appendix: Sample code
0512
0513 The following C code computes the Adler-32 checksum of a data buffer.
0514 It is written for clarity, not for speed. The sample code is in the
0515 ANSI C programming language. Non C users may find it easier to read
0516 with these hints:
0517
0518 & Bitwise AND operator.
0519 >> Bitwise right shift operator. When applied to an
0520 unsigned quantity, as here, right shift inserts zero bit(s)
0521 at the left.
0522 << Bitwise left shift operator. Left shift inserts zero
0523 bit(s) at the right.
0524 ++ "n++" increments the variable n.
0525 % modulo operator: a % b is the remainder of a divided by b.
0526
0527 #define BASE 65521 /* largest prime smaller than 65536 */
0528
0529 /*
0530 Update a running Adler-32 checksum with the bytes buf[0..len-1]
0531 and return the updated checksum. The Adler-32 checksum should be
0532 initialized to 1.
0533
0534 Usage example:
0535
0536 unsigned long adler = 1L;
0537
0538 while (read_buffer(buffer, length) != EOF) {
0539 adler = update_adler32(adler, buffer, length);
0540 }
0541 if (adler != original_adler) error();
0542 */
0543 unsigned long update_adler32(unsigned long adler,
0544 unsigned char *buf, int len)
0545 {
0546 unsigned long s1 = adler & 0xffff;
0547 unsigned long s2 = (adler >> 16) & 0xffff;
0548 int n;
0549
0550 for (n = 0; n < len; n++) {
0551 s1 = (s1 + buf[n]) % BASE;
0552 s2 = (s2 + s1) % BASE;
0553 }
0554 return (s2 << 16) + s1;
0555 }
0556
0557 /* Return the adler32 of the bytes buf[0..len-1] */
0558
0559
0560
0561
0562 Deutsch & Gailly Informational [Page 10]
0563
0564 RFC 1950 ZLIB Compressed Data Format Specification May 1996
0565
0566
0567 unsigned long adler32(unsigned char *buf, int len)
0568 {
0569 return update_adler32(1L, buf, len);
0570 }
0571
0572
0573
0574
0575
0576
0577
0578
0579
0580
0581
0582
0583
0584
0585
0586
0587
0588
0589
0590
0591
0592
0593
0594
0595
0596
0597
0598
0599
0600
0601
0602
0603
0604
0605
0606
0607
0608
0609
0610
0611
0612
0613
0614
0615
0616
0617
0618 Deutsch & Gailly Informational [Page 11]
0619