The 512 byte header of a compound file can be represented as a Swift struct like this
struct FlatFileHeader
{
let signature : EightBytes
let clsid : CLSID
//
let minor : UInt16
let major : UInt16
let byteOrder : UInt16
let sectorShift : UInt16
let miniSectorShift : UInt16
let reserved : SixBytes
let nDirSectors : UInt32
let nFATSectors : UInt32
let firstDirSector : UInt32
let xactionSig : UInt32
let miniStreamCutoffSize : UInt32
let firstMiniFATSector : UInt32
let nMiniFATSectors : UInt32
let firstDIFATSector : UInt32
let nDIFATSectors : UInt32
let difat : DIFAT
}
It is effectively a straight transcription from the specification with four exceptions
In the specification the first field is defined as
Header Signature (8 bytes): ...
the second field is defined as
Header CLSID (16 bytes): ...
the eighth field is defined as
Reserved (6 bytes): ...
and the last field is defined as
DIFAT (436 bytes): ...
In all these cases the field could be represented as
[UInt8]
but that fails to capture the exact size of each field, so we do this instead.
We represent the ‘Header Signature’ using the struct EightBytes
which looks something like this
struct EightBytes
{
let b0 : UInt8
let b1 : UInt8
let b2 : UInt8
let b3 : UInt8
let b4 : UInt8
let b5 : UInt8
let b6 : UInt8
let b7 : UInt8
}
We represent the ‘Header CLSID’ using the struct CLSID
which looks something like this
struct CLSID
{
let first : EightBytes
let second : EightBytes
}
We represent the ‘Reserved’ field using the struct SixBytes
which looks something like this
struct SixBytes
{
let b0 : UInt8
let b1 : UInt8
let b2 : UInt8
let b3 : UInt8
let b4 : UInt8
let b5 : UInt8
}
The DIFAT field is not really 436 bytes but 109 32-bit integers which we can represent using the struct DIFAT
which looks something like this
struct DIFAT
{
let i0 : UInt32
let i1 : UInt32
}
At the moment it only represents the first two values but it can be ‘extended’ if necessary.
The result of using this seemingly random combination of rather odd structures is that the struct FlatFileHeader
is indeed ‘flat’ which is to say that are all its fields are value types. They are in fact all structs.
Bearing in mind that the compound file format is little endian and so is this computer, and if we assume the Swift compiler
-
represents the values of the UInt<N> types by the exact number of bytes necessary when the value is contained in a struct
-
represents the fields in exactly the same order that they were defined and wihout padding,
-
that it does the same recursively with the nested struct values, and
-
that it ensures that the memory allocated for the struct at runtime is at least 4 byte aligned
then, not at all accidentally, the representation of the struct in memory would be identical to the representation of the header in the compound file, and vice-versa.
It is the vice-versa case which is of interest since it would imply that if we had an NSData
object containing at least the
first 512 bytes of a compound file then we could ‘read’ the header like this
let flatHeader = UnsafePointer<FlatFileHeader>(data.bytes).memory
This is not necessarily the piece of insane optimism that it might at first appear.
Given the seamless interworking between Swift and Objective-C it would make a great deal of sense if at runtime a Swift struct meeting the right criteria was identical to the equivalent Objective-C struct.
Running this
...
let data = NSData(contentsOfFile:fileName)
if data == nil
{
return
}
let nBytes = data!.length
if nBytes < CFBFFormat.HEADER_SIZE
{
return
}
let flatHeader = UnsafePointer<FlatFileHeader>(data!.bytes).memory
print("Signature:\t\t\t")
for i in 0 ..< 8
{
print("\(flatHeader.signature[i]) ")
}
println()
println("Major:\t\t\t\t\(flatHeader.major)")
println("Minor:\t\t\t\t\(flatHeader.minor)")
println("Byte order:\t\t\t\(flatHeader.byteOrder)")
println("Sector shift:\t\t\(flatHeader.sectorShift)")
println("MiniSector shift:\t\(flatHeader.miniSectorShift)")
println("N dir sectors:\t\t\(flatHeader.nDirSectors)")
println("N FAT sectors:\t\t\(flatHeader.nFATSectors)")
println()
...
prints this
Signature: 208 207 17 224 161 177 26 225
Major: 3
Minor: 62
Byte order: 65534
Sector shift: 9
MiniSector shift: 6
N dir sectors: 0
N FAT sectors: 1
The specification gives the signature bytes as
0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1
so we appear to have ‘read’ the header successfully.
Additional checks on the fields with predefined values.
Major version is 3 in which case the specification says the minor version should be 0x003E which it is.
Byte order should be 0xFFFE which it is.
The sector shift is correct, as is the minisector shift.
The number of directory sectors in a version 3 file is always 0 and it is
All done with nary a getUInt16
or a getUInt32
in sight.
Copyright (c) 2014 By Simon Lewis. All Rights Reserved.
Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.
Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.