Just An Application

July 13, 2013

The Great Android Security Hole Of ’08 ? – Part One: ZIP Files

1.0 The Structure

A ZIP file comprises one or more Files, followed by a Central Directory, followed by an End of Central Directory Record.

zip_structure

1.1 Files

A File comprises a Local File Header, followed by the File data, optionally followed by a Data Descriptor.

1.1.1 Local File Header

Signature: 0x04034b50

Field Size In Bytes
signature 4
version needed to extract 2
general purpose bit flags 2
compression method 2
last mod file time 2
last mod file data 2
crc-32 4
compressed size 4
uncompressed size 4
file name length 2
extra field length 2
file name file name length
extra field extra field length

The Local File Header contains information about the file whose data immediately follows.

Unfortunately it does not always contain the information you actually want which is the crc32 of the File data and the compressed and uncompressed sizes.

This is because it is permissible to generate a Local File Header with all these values as zero.

This makes it possible to write a ZIP file without ever seeking backwards which is great for the writer of the file but rubbish for anyone trying to read it.

If the crc32 and compressed and uncompressed sizes are set to zero for this reason, then bit three of the general purpose bit flags field should be set and a Data Descriptor containing the correct values should follow the File data.

1.1.2 File Data

The File Data is the contents of the file. Its format is specified by the value of the ‘compression method’ of the preceding Local File Header.

1.1.3 Data Descriptor

Signature: 0x08074b50

Field Size In Bytes
signature [optional] 4
crc-32 4
compressed size 4
uncompressed size 4

A Data Descriptor should be present immediately following the File data if bit 3 of the general purpose bit flags field of the preceding Local File Header is set.

Just to make it more interesting for anyone writing code to read ZIP files, the signature field does not have to be present.

This is because originally it did not have a signature.

1.2 The Central Directory

The Central Directory comprises one or more File Headers

1.2.1 File Header

Signature: 0x02014b50

Field Size In Bytes
signature 4
version made by 2
version needed to extract 2
general purpose bit flags 2
compression method 2
last mod file time 2
last mod file data 2
crc-32 4
compressed size 4
uncompressed size 4
file name length 2
extra field length 2
file comment length 2
disk number start 2
internal file attributes 2
external file attributes 4
relative offset of local header 4
file name file name length
extra field extra field length
file comment file comment length

A File Header contains information about a ‘File’ somewhere ahead of the Central Directory in the ZIP file.

Unlike the corresponding Local File Header the crc32, compressed size and uncompressed size fields must be present and correct.

The ‘relative offset of local header field’ specifies the offset from the start of the ZIP file of the Local File Header of the File
the information in the File Header applies to.

This means that given a File Header it is always possible to read the File data of the associated file.

1.3 The End Of Central Directory Record

Signature: 0x06054b50

Field Size In Bytes
signature 4
6
total number of entries in the central directory 2
size of the central directory 4
offset of start of central directory 4
.ZIP file comment length 2
.ZIP file comment .ZIP file comment length

The End of Central Directory Record marks the end of the Central Directory. It is the last thing in the ZIP file.

It contains the location of the first File Header in the Central Directory, the ‘offset of start of central directory’ field, as well as the number of File Headers in the Central Directory.

2.0 The Idiosyncrasies

2.1 You Cannot Read A ZIP File From The Front

This is not strictly true.

You can attempt to read a ZIP file from the front, that is, starting at byte 0, but if, as is often the case, the Local File Headers do not include the size of the file data which follows then you have a problem.

At this point you have to start scanning forward looking for the next recognizable signature, which can be the signature of a Data Descriptor, a Local File Header, or a File Header.

This is slow and tedious and assumes that file data will never include a valid record signature which is a tad optimistic.

2.2 Finding The Central Directory

So if you cannot reliably read a ZIP file from the front, what do you do ?

Answer, you read it from the back.

The only guaranteed way to find out what is supposed to be in a ZIP file is by examining the Central Directory, so the first thing you do with a ZIP file is read that.

But to read the Central Directory you need to know where it is.

Helpfully the End of Central Directory Record contains the offset of the start of the Central Directory, so actually the first thing to do is to read that, and we know where that is.

Well we do and we don’t because although it is at the end of the ZIP file it is not fixed length.

So the first thing we need to do is scan backwards from the end of the ZIP file looking for the signature of the End of Central Directory Record.

3.0 The Semantics

A ZIP file is simply a container and as such it has no semantics of its own, which is perfectly reasonable.

As long as the format is correct a ZIP file can contain pretty much anything.

This is something of a trap for the unwary programmer writing code to handle ZIP files as part of an ‘application’ that accepts them as some sort of ‘input’.

A great deal of effort will be expended dealing with the idiosyncrasies of the ZIP file format because otherwise it won’t be possible to do whatever it is with the contents of the ZIP file which prompted the writing of the code in the first place.

Rather less effort may go in to thinking about what ought to be in the ZIP file being handled and perhaps more importantly what ought not to be.

The problem is exacerbated by the fact that the test data for the code, hopefully there will be some, will probably be created by a tool that effectively enforces some set of implicit semantics of its own and which is guaranteed to generate well-formed vanilla ZIP files.

One result of all this may be that the code is ill-prepared to handle an inadvertently or intentionally generated ZIP file which contains things it ought not to,


Copyright (c) 2013 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Blog at WordPress.com.

%d bloggers like this: