Just An Application

November 16, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Three — Now Read Your Header

The 512 byte header of a compound file can be represented as a Swift struct like this

    struct FlatFileHeader
    {
        let signature               : EightBytes
        let clsid                   : CLSID
        //
        let minor                   : UInt16
        let major                   : UInt16
        let byteOrder               : UInt16
        let sectorShift             : UInt16
        let miniSectorShift         : UInt16
        let reserved                : SixBytes
        let nDirSectors             : UInt32
        let nFATSectors             : UInt32
        let firstDirSector          : UInt32
        let xactionSig              : UInt32
        let miniStreamCutoffSize    : UInt32
        let firstMiniFATSector      : UInt32
        let nMiniFATSectors         : UInt32
        let firstDIFATSector        : UInt32
        let nDIFATSectors           : UInt32
        let difat                   : DIFAT
    }

It is effectively a straight transcription from the specification with four exceptions

In the specification the first field is defined as

    Header Signature (8 bytes): ... 

the second field is defined as

    Header CLSID (16 bytes): ... 

the eighth field is defined as

    Reserved (6 bytes): ... 

and the last field is defined as

    DIFAT (436 bytes): ... 

In all these cases the field could be represented as

    [UInt8]

but that fails to capture the exact size of each field, so we do this instead.

We represent the ‘Header Signature’ using the struct EightBytes which looks something like this

    struct EightBytes
    {
        let b0 : UInt8
        let b1 : UInt8
        let b2 : UInt8
        let b3 : UInt8
        let b4 : UInt8
        let b5 : UInt8
        let b6 : UInt8
        let b7 : UInt8
    }

We represent the ‘Header CLSID’ using the struct CLSID which looks something like this

    struct CLSID
    {
        let first   : EightBytes
        let second  : EightBytes
    }

We represent the ‘Reserved’ field using the struct SixBytes which looks something like this

    struct SixBytes
    {
        let b0 : UInt8
        let b1 : UInt8
        let b2 : UInt8
        let b3 : UInt8
        let b4 : UInt8
        let b5 : UInt8
    }

The DIFAT field is not really 436 bytes but 109 32-bit integers which we can represent using the struct DIFAT which looks something like this

    struct DIFAT
    {
        let i0  : UInt32
        let i1  : UInt32
    }

At the moment it only represents the first two values but it can be ‘extended’ if necessary.

The result of using this seemingly random combination of rather odd structures is that the struct FlatFileHeader is indeed ‘flat’ which is to say that are all its fields are value types. They are in fact all structs.

Bearing in mind that the compound file format is little endian and so is this computer, and if we assume the Swift compiler

  1. represents the values of the UInt<N> types by the exact number of bytes necessary when the value is contained in a struct

  2. represents the fields in exactly the same order that they were defined and wihout padding,

  3. that it does the same recursively with the nested struct values, and

  4. that it ensures that the memory allocated for the struct at runtime is at least 4 byte aligned

then, not at all accidentally, the representation of the struct in memory would be identical to the representation of the header in the compound file, and vice-versa.

It is the vice-versa case which is of interest since it would imply that if we had an NSData object containing at least the
first 512 bytes of a compound file then we could ‘read’ the header like this

    let flatHeader = UnsafePointer<FlatFileHeader>(data.bytes).memory

This is not necessarily the piece of insane optimism that it might at first appear.

Given the seamless interworking between Swift and Objective-C it would make a great deal of sense if at runtime a Swift struct meeting the right criteria was identical to the equivalent Objective-C struct.

Running this

    ...
    
    let data = NSData(contentsOfFile:fileName)
    
    if data == nil
    {
        return
    }
    
    let nBytes = data!.length
    
    if nBytes < CFBFFormat.HEADER_SIZE
    {
        return
    }
        
    let flatHeader = UnsafePointer<FlatFileHeader>(data!.bytes).memory
            
    print("Signature:\t\t\t")
    for i in 0 ..< 8
    {
        print("\(flatHeader.signature[i]) ")
    }
    println()
    println("Major:\t\t\t\t\(flatHeader.major)")
    println("Minor:\t\t\t\t\(flatHeader.minor)")
    println("Byte order:\t\t\t\(flatHeader.byteOrder)")
    println("Sector shift:\t\t\(flatHeader.sectorShift)")
    println("MiniSector shift:\t\(flatHeader.miniSectorShift)")
    println("N dir sectors:\t\t\(flatHeader.nDirSectors)")
    println("N FAT sectors:\t\t\(flatHeader.nFATSectors)")
    println()
            
    ...

prints this

    Signature:          208 207 17 224 161 177 26 225
    Major:              3
    Minor:              62
    Byte order:         65534
    Sector shift:       9
    MiniSector shift:   6
    N dir sectors:      0
    N FAT sectors:      1

The specification gives the signature bytes as

    0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1

so we appear to have ‘read’ the header successfully.

Additional checks on the fields with predefined values.

Major version is 3 in which case the specification says the minor version should be 0x003E which it is.

Byte order should be 0xFFFE which it is.

The sector shift is correct, as is the minisector shift.

The number of directory sectors in a version 3 file is always 0 and it is

All done with nary a getUInt16 or a getUInt32 in sight.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Advertisements

Blog at WordPress.com.

%d bloggers like this: