Just An Application

November 16, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Three — Now Read Your Header

The 512 byte header of a compound file can be represented as a Swift struct like this

    struct FlatFileHeader
    {
        let signature               : EightBytes
        let clsid                   : CLSID
        //
        let minor                   : UInt16
        let major                   : UInt16
        let byteOrder               : UInt16
        let sectorShift             : UInt16
        let miniSectorShift         : UInt16
        let reserved                : SixBytes
        let nDirSectors             : UInt32
        let nFATSectors             : UInt32
        let firstDirSector          : UInt32
        let xactionSig              : UInt32
        let miniStreamCutoffSize    : UInt32
        let firstMiniFATSector      : UInt32
        let nMiniFATSectors         : UInt32
        let firstDIFATSector        : UInt32
        let nDIFATSectors           : UInt32
        let difat                   : DIFAT
    }

It is effectively a straight transcription from the specification with four exceptions

In the specification the first field is defined as

    Header Signature (8 bytes): ... 

the second field is defined as

    Header CLSID (16 bytes): ... 

the eighth field is defined as

    Reserved (6 bytes): ... 

and the last field is defined as

    DIFAT (436 bytes): ... 

In all these cases the field could be represented as

    [UInt8]

but that fails to capture the exact size of each field, so we do this instead.

We represent the ‘Header Signature’ using the struct EightBytes which looks something like this

    struct EightBytes
    {
        let b0 : UInt8
        let b1 : UInt8
        let b2 : UInt8
        let b3 : UInt8
        let b4 : UInt8
        let b5 : UInt8
        let b6 : UInt8
        let b7 : UInt8
    }

We represent the ‘Header CLSID’ using the struct CLSID which looks something like this

    struct CLSID
    {
        let first   : EightBytes
        let second  : EightBytes
    }

We represent the ‘Reserved’ field using the struct SixBytes which looks something like this

    struct SixBytes
    {
        let b0 : UInt8
        let b1 : UInt8
        let b2 : UInt8
        let b3 : UInt8
        let b4 : UInt8
        let b5 : UInt8
    }

The DIFAT field is not really 436 bytes but 109 32-bit integers which we can represent using the struct DIFAT which looks something like this

    struct DIFAT
    {
        let i0  : UInt32
        let i1  : UInt32
    }

At the moment it only represents the first two values but it can be ‘extended’ if necessary.

The result of using this seemingly random combination of rather odd structures is that the struct FlatFileHeader is indeed ‘flat’ which is to say that are all its fields are value types. They are in fact all structs.

Bearing in mind that the compound file format is little endian and so is this computer, and if we assume the Swift compiler

  1. represents the values of the UInt<N> types by the exact number of bytes necessary when the value is contained in a struct

  2. represents the fields in exactly the same order that they were defined and wihout padding,

  3. that it does the same recursively with the nested struct values, and

  4. that it ensures that the memory allocated for the struct at runtime is at least 4 byte aligned

then, not at all accidentally, the representation of the struct in memory would be identical to the representation of the header in the compound file, and vice-versa.

It is the vice-versa case which is of interest since it would imply that if we had an NSData object containing at least the
first 512 bytes of a compound file then we could ‘read’ the header like this

    let flatHeader = UnsafePointer<FlatFileHeader>(data.bytes).memory

This is not necessarily the piece of insane optimism that it might at first appear.

Given the seamless interworking between Swift and Objective-C it would make a great deal of sense if at runtime a Swift struct meeting the right criteria was identical to the equivalent Objective-C struct.

Running this

    ...
    
    let data = NSData(contentsOfFile:fileName)
    
    if data == nil
    {
        return
    }
    
    let nBytes = data!.length
    
    if nBytes < CFBFFormat.HEADER_SIZE
    {
        return
    }
        
    let flatHeader = UnsafePointer<FlatFileHeader>(data!.bytes).memory
            
    print("Signature:\t\t\t")
    for i in 0 ..< 8
    {
        print("\(flatHeader.signature[i]) ")
    }
    println()
    println("Major:\t\t\t\t\(flatHeader.major)")
    println("Minor:\t\t\t\t\(flatHeader.minor)")
    println("Byte order:\t\t\t\(flatHeader.byteOrder)")
    println("Sector shift:\t\t\(flatHeader.sectorShift)")
    println("MiniSector shift:\t\(flatHeader.miniSectorShift)")
    println("N dir sectors:\t\t\(flatHeader.nDirSectors)")
    println("N FAT sectors:\t\t\(flatHeader.nFATSectors)")
    println()
            
    ...

prints this

    Signature:          208 207 17 224 161 177 26 225
    Major:              3
    Minor:              62
    Byte order:         65534
    Sector shift:       9
    MiniSector shift:   6
    N dir sectors:      0
    N FAT sectors:      1

The specification gives the signature bytes as

    0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1

so we appear to have ‘read’ the header successfully.

Additional checks on the fields with predefined values.

Major version is 3 in which case the specification says the minor version should be 0x003E which it is.

Byte order should be 0xFFFE which it is.

The sector shift is correct, as is the minisector shift.

The number of directory sectors in a version 3 file is always 0 and it is

All done with nary a getUInt16 or a getUInt32 in sight.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 15, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Two — Of Bytes And Pointers To Bytes

Whether we chose to access the contents of a file as a whole or in part, we will end up with an NSData object with some bytes in, so how do you get at the bytes ?

As before the answer is, in exactly the same way as you do in Objective-C.

So that would be like this then ?

    let bytes = data.bytes

It would.

The similarities between doing things in Swift and Objective-C extend to the type of the property bytes.

In Objective-C it is declared like this

    @property(readonly) const void *bytes

and in Swift like this

    var bytes: UnsafePointer<Void> { get }

So what we have got hold of is something with the type

    UnsafePointer<Void>

which is the equivalent of

    const void*

and it is about as useful, which is to say, not very.

If we do this

    let b = bytes[0]

then the compiler will helpfully volunteer the warning

    Constant 'b' inferred to have type 'Void' which may be unexpected

Possibly not unexpected, it is an ‘unsafe pointer’ to ‘Void’ after all, but definitely of limited utility.

An empty tuple, for that is what a ‘Void’ is of course, specialises in representing nothing, a job at which it excels. but it makes for a very unconvincing byte.

What we want is a

    UnsafePointer<UInt8>

in the same way that we would want a

    const uint8_t*

or something in Objective-C.

In Objective-C you do this to get one

    const uint8_t* bytes = (const uint8_t*)data.bytes;

and in Swift you do this

    let bytes = UnsafePointer<UInt8>(data.bytes)

Once you have one, you can access the byte to which it ‘points’ directly

    let b = bytes.memory

or by using a subscript

    let c = bytes[1]

Also, just as you can in Objective-C, you can ‘walk’ right off the end of the associated memory because it really is an ‘unsafe’ pointer.

    ...
    
    let data  = NSData()
    let bytes = UnsafePointer<UInt8>(data.bytes)
    
    for i in 0 ..< 16
    {
        println(bytes[i])
    }

    ...

at which point everything may come to a grinding halt, but then again it may not, it all depends.

The other way to get at the bytes in an NSData object is, needless to say, exactly the same as the other you would do it in
Objective-C, viz.

    let bytes = UnsafeMutablePointer<UInt8>.alloc(length)

    data.getBytes(bytes, length:data.length)

If you are not happy walking off the end of other people’s memory and prefer walking off the end of your own, this is the option for you.

The memory returned by the call to alloc is not managed and must be explicitly freed by a call to dealloc.

    bytes.dealloc(length)

The allocated memory is also not initialized to anything in particular and especially not to zero.

In Swift an

    UnsafeMutablePointer<T>

is to an

    UnsafePointer<T>

as, in Objective-C,

    T*

is to

    const T*

so you can modify the memory an UnsafeMutablePointer<T> ‘points at’ directly

    bytes.memory = UInt8(length)

or using a subscript

    bytes[8] = UInt8(length)

As well as the subscript functions UnsafePointer<T> and UnsafeMutablePointer<T> types support a variety of operators.

For example

    let bytes  = UnsafeMutablePointer<UInt8>.alloc(16)
    
    for i in 0 ..< 16
    {
        bytes[i] = UInt8(i)
    }
        
    var p = bytes
        
    for i in 0 ..< 8
    {
        println(p.memory)
        p += 2
    }
        
    let end = bytes + 8
        
    p = bytes
        
    while p < end
    {
        println(p++.memory)
    }
        
    p = bytes + 7
        
    while p >= bytes
    {
        println(p--.memory)
    }

The ‘mutating’ operators pre/post decrement/increment etc. can only be used if the ‘pointer’ is referenced from a mutable variable.

You can create an UnsafePointer<T> from an UnsafeMutablePointer<T>

    let bytes     = UnsafeMutablePointer<UInt8>.alloc(16)
    let immutable = UnsafePointer<UInt8>(bytes)

and even vice-versa

    let mutable   = UnsafeMutablePointer<UInt8>(data.bytes)

which is a bit worrying but then the clue is in the name. UnsafePointer<T>s and UnsafeMutablePointer<T>s, are ‘unsafe’.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Create a free website or blog at WordPress.com.