Just An Application

November 21, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Seven — Stream And Storage Objects

We can model the combination of a set of sectors and the associated file allocation table as a SectorSpace.

    protocol SectorSpace
    {
        func data(index:SectorIndex) -> ByteData?
    }

A SectorSpace is an object capable of returning the contents of a stream given the index of the first sector in that ‘space’.

There are two possible SectorSpaces in a compound file. The first represents the sectors in the file itself in combination with the FAT. The second represents the sectors in the mini stream in combination with the mini FAT.

We can implements the first straight away. We have a SectorSource for the sectors in the file and the FAT.

For the second we have the mini FAT but we do not have the sectors stored in the mini stream.

The mini stream is an internal stream stored in sectors in the file itself, so it can be constructed using the first SectorSpace which represents those sectors and the FAT.

To construct the mini stream we need to know the starting sector and the size. These are stored in the directory entry for the root storage object.

We can define them as properties in the class RootStorageEntry

    let miniStreamStart : SectorIndex
    let miniStreamSize  : StreamSize

We can make the mini stream sector space like this

    private func makeMiniStreamSpace(
                     rootStorageEntry:
                         RootStorageEntry,
                     miniFAT:
                         FileAllocationTable,
                     fileSpace:
                         SectorSpace) -> SectorSpace?
    {
        if let data = fileSpace.data(rootStorageEntry.miniStreamStart)
        {
            return
                MiniStreamSpace(
                    data:
                        data,
                    size:
                        rootStorageEntry.miniStreamSize,
                    fat:
                        miniFAT,
                    sectorSize:
                        CFBFFormat.MINI_SECTOR_SIZE)
        }
        else
        {
            return nil
        }
    }

Now we have our two sector spaces we can implement a stream factory that can create a stream object given the index of its first sector and its size.

The size below which a stream object is stored in the mini stream is defined by the miniStreamCutoffSize field in the header. This and
the two sector spaces is all the stream factory needs.

    final class StreamFactory
    {
        init(fileSpace:SectorSpace, miniStreamSpace:SectorSpace, miniStreamCutoffSize:StreamSize)
        {
            self.fileSpace            = fileSpace
            self.miniStreamSpace      = miniStreamSpace
            self.miniStreamCutoffSize = miniStreamCutoffSize
        }
    
        //
    
        func makeStream(entry:StreamEntry) -> Stream?
        {
            let size = entry.streamSize
    
            if size > miniStreamCutoffSize
            {
                return Stream(size:size, start: entry.startingSector, space: fileSpace)
            }
            else
            {
                return Stream(size:size, start: entry.startingSector, space: miniStreamSpace)
            }
        }
    
        //
    
        private let fileSpace               : SectorSpace
        private let miniStreamSpace         : SectorSpace
        private let miniStreamCutoffSize    : StreamSize
    }

Once we have the stream factory we can define a class which implements a storage object.

All it needs is the StorageEntry which represents the storage object in the directory so it can find the stream and storage objects it contains, and the stream factory so that it can create stream objects as necessary,

    final class Storage
    {
        init(entry:StorageEntry, streamFactory:StreamFactory)
        {
            self.entry          = entry
            self.streamFactory  = streamFactory
            self.storageTable   = [String: Storage]()
            self.streamTable    = [String: Stream]()
        }
    
        //
    
        func getStream(var path:[String], name:String) -> Stream?
        {
            if path.count != 0
            {
                return getStorage(path.removeAtIndex(0))?.getStream(path, name: name)
            }
            else
            {
                return getStream(name)
            }
        }
    
        func getStorage(storageName:String) -> Storage?
        {
            var storage = storageTable[storageName]
    
            if storage != nil
            {
                return storage
            }
    
            let storageEntry = entry.getStorageEntry(storageName)
    
            if storageEntry == nil
            {
                return nil
            }
            storage = Storage(entry: storageEntry!, streamFactory: streamFactory)
            storageTable[storageName] = storage
            return storage
        }
    
        func getStream(streamName:String) -> Stream?
        {
            var stream = streamTable[streamName]
    
            if stream == nil
            {
                let streamEntry = entry.getStreamEntry(streamName)
    
                if streamEntry == nil
                {
                    return nil
                }
                stream = streamFactory.makeStream(streamEntry!)
                streamTable[streamName] = stream
            }
            return stream
        }
    
        //
    
        private let entry           : StorageEntry
        private let streamFactory   : StreamFactory
        //
        private var storageTable    : [String: Storage]
        private var streamTable     : [String: Stream]
    }

We can define a CompoundFile as a very simple wrapper around the Storage instance which represents the root storage object.

    final class CompoundFile
    {
        init(rootStorage:Storage)
        {
            self.rootStorage = rootStorage
        }
    
        //
    
        func getStream(#storage:[String], name:String) -> Stream?
        {
            return rootStorage.getStream(storage, name:name)
        }
    
        //
    
        private let rootStorage: Storage
    }

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Six — Where Is Everything ? The Directory Edition

Now we have the file allocation table we can also read ‘the directory’.

The directory is an internal stream containing an entry for each storage object and stream object in the compound file.

The first entry in the directory is always the entry for the root storage object.

The entries for all the storage and stream objects contained within a given storage object are linked together as a red-black tree. The containing storage object has a link to the root of that tree.

Each entry in the directory is represented by a DirectoryEntry which is 128 bytes long.

This means we can read the directory at the sector level rather the stream level as there are guaranteed to be a whole number of entries per sector.

The firstDirSector header field identifies the first sector of the directory internal stream and the nDirSectors header field specifies the number of sectors.

In a version three compound file the nDirSectors field is always zero so it is of no use whatsoever. We have no choice but to iterate over all the sectors in sequence.

    private func readDirectoryV3(
                     header:
                         FileHeader,
                     fat:
                         FileAllocationTable,
                     sectors:
                         SectorSource) -> RootStorageEntry?
    {
        if let sequence = fat.sequence(header.firstDirSector)
        {
            let builder = DirectoryBuilder()
    
            for sectorIndex in sequence
            {
                let sector = sectors.sector(sectorIndex)
    
                if sector == nil
                {
                    return nil
                }
    
                var entryBytes = sector!
    
                for i in 0 ..< CFBFFormat.V3_DIRECTORY_ENTRIES_PER_SECTOR
                {
                    if !builder.addEntry(entryBytes)
                    {
                        return nil
                    }
                    entryBytes += CFBFFormat.DIRECTORY_ENTRY_SIZE
                }
            }
            return builder.build()
        }
        else
        {
            return nil
        }
    }

The DirectoryBuilder is passed a pointer to the bytes for each DirectoryEntry in each sector of the directory.

Within the DirectoryBuilder we can use the ‘flat struct’ technique to ‘read’ the DirectoryEntry.

In this case the struct FlatDirectoryEntry looks like this

    struct FlatDirectoryEntry
    {
        struct Name
        {
            let name0   : EightBytes
            let name1   : EightBytes
            let name2   : EightBytes
            let name3   : EightBytes
            let name4   : EightBytes
            let name5   : EightBytes
            let name6   : EightBytes
            let name7   : EightBytes
        }
    
        let name            : Name          // 64 bytes
        let nameLength      : UInt16
        let type            : UInt8
        let colour          : UInt8
        let left            : UInt32
        let right           : UInt32
        let child           : UInt32
        let clsid           : CLSID
        let state           : UInt32
        let created         : EightBytes
        let modified        : EightBytes
        let startingSector  : UInt32
        let streamSize      : UInt64
    }

The name field can contain up to thirty-two little-endian UTF-16 characters including a terminating ‘null’ character.

It is possible to define a struct with thirty-two UInt16 fields but its not going to be much use without some additional effort so we settle for something that is the right length.

The nameLength field specifies the length of the name including the terminating ‘null’ character, in bytes for some reason.

The type field must be one of

  • 0x00 (Unknown/Unallocated)

  • 0x01 (Storage Object)

  • 0x02 (Stream Object)

  • 0x05 (Root Storage Object)

The colour field is the ‘colour’ of the entry in the red-black tree in which it appears.

The left and right fields give the indices of the left and right children of the entry, if any, in the red-black tree in which it appears.

The child field is only valid if the entry represents a storage object. It is the index of the entry at the root of the red-black tree of the entries for the storage objects and stream objects ‘contained’ within the storage object.

The startingSector and streamSize fields are only valid if the entry represents a stream object,
and they specify the first sector of the stream object and its total size.

We can ‘read’ the DirectoryEntry using the FlatDirectoryEntry struct like this.

    let flatEntry = UnsafePointer<FlatDirectoryEntry>(bytes).memory

We can represent the type of a DirectoryEntry using an enum with the appropriate raw values.

    enum ObjectType: UInt8
    {
        case Unknown     = 0
        case Storage     = 1
        case Stream      = 2
        case RootStorage = 5
    }

and then attempt to construct one with the value of the type field to see whether it is valid.

    let type      = ObjectType(rawValue:flatEntry.type)
    
    if type == nil
    {
        return nil
    }

We can check that the nameLength field is an even number and that it is within bounds.

    let nameLength = Int(flatEntry.nameLength)
    
    if ((nameLength & 1) != 0) || nameLength > CFBFFormat.DIRECTORY_ENTRY_MAX_NAME_LENGTH
    {
        return nil
    }

The easiest way to construct the name itself is to use the pointer to the bytes while remembering that an Unknown/Unallocated entry has no name.

    var n : String?

    if nameLength != 0
    {
        n = NSString(bytes:bytes, length:Int(nameLength - 2), encoding: NSUTF16LittleEndianStringEncoding)
    }
    else
    {
        n = ""
    }
    if n == nil
    {
        return nil
    }

    let name = n!

Now we have both a type and a name we can engage in some gratuitous switchery to check that they are both valid.

    private func ensure(index:Int, name:String, type:ObjectType) -> Bool
    {
        switch type
        {
            case .Unknown where index != 0 && name == "":
    
                return true
    
            case .Storage, .Stream where index != 0 && name != "":
    
                return true
    
            case .RootStorage where index == 0 && name == CFBFFormat.DIRECTORY_ROOT_ENTRY_NAME:
    
                return true
    
            default:
    
                return false
        }
    }

If all the entries are read successfully we can then call the DirectoryBuilder build method.

If successful the build method returns a RootStorageEntry object. This can be used to find any storage object or stream object in the compound file.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 18, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Five — Where Is Everything ? The Much Smaller Sectors Edition

The size of the sectors in a compound file is a function of the version. In a version three compound file the sector size is 512 bytes. In a version four compound file the sector size is 4096 bytes.

If a compound file contains a large number of stream objects that are smaller than a sector, and/or whose last part only partially fills a sector than there can be a considerable amount of wasted space.

To help avoid this stream objects below a certain size may be stored as a series of much smaller 64 byte sectors instead.

These sectors are in turn stored as the contents of an internal stream called the ‘mini stream’. This stream has an associated file allocation table, the ‘mini FAT’.

The mini FAT like the FAT is stored in sectors. Unlike the FAT the sector index chain for the Mini FAT is stored in the FAT.

Having read the FAT we can now read the mini FAT.

The starting sector and the number of sectors are specified by the

    firstMiniFATSector

and

    nMiniFATSectors

fields in the header

We can read the mini FAT using readFAT since the structure of the mini FAT is identical to that of the FAT.

    private func readMiniFAT(
                     header:
                         FileHeader,
                     nEntriesPerSector:
                         Int,
                     fat:
                         FileAllocationTable,
                     sectors:
                         SectorSource) -> FileAllocationTable?
    {
        if let sequence = fat.sequence(header.firstMiniFATSector)
        {
            return
                readFAT(
                        header.nMiniFATSectors,
                    nEntriesPerSector:
                        nEntriesPerSector,
                    sequence:
                        sequence,
                    sectors:
                        sectors)
        }
        else
        {
            return nil
        }
    }

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Blog at WordPress.com.