Just An Application

November 21, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Seven — Stream And Storage Objects

We can model the combination of a set of sectors and the associated file allocation table as a SectorSpace.

    protocol SectorSpace
    {
        func data(index:SectorIndex) -> ByteData?
    }

A SectorSpace is an object capable of returning the contents of a stream given the index of the first sector in that ‘space’.

There are two possible SectorSpaces in a compound file. The first represents the sectors in the file itself in combination with the FAT. The second represents the sectors in the mini stream in combination with the mini FAT.

We can implements the first straight away. We have a SectorSource for the sectors in the file and the FAT.

For the second we have the mini FAT but we do not have the sectors stored in the mini stream.

The mini stream is an internal stream stored in sectors in the file itself, so it can be constructed using the first SectorSpace which represents those sectors and the FAT.

To construct the mini stream we need to know the starting sector and the size. These are stored in the directory entry for the root storage object.

We can define them as properties in the class RootStorageEntry

    let miniStreamStart : SectorIndex
    let miniStreamSize  : StreamSize

We can make the mini stream sector space like this

    private func makeMiniStreamSpace(
                     rootStorageEntry:
                         RootStorageEntry,
                     miniFAT:
                         FileAllocationTable,
                     fileSpace:
                         SectorSpace) -> SectorSpace?
    {
        if let data = fileSpace.data(rootStorageEntry.miniStreamStart)
        {
            return
                MiniStreamSpace(
                    data:
                        data,
                    size:
                        rootStorageEntry.miniStreamSize,
                    fat:
                        miniFAT,
                    sectorSize:
                        CFBFFormat.MINI_SECTOR_SIZE)
        }
        else
        {
            return nil
        }
    }

Now we have our two sector spaces we can implement a stream factory that can create a stream object given the index of its first sector and its size.

The size below which a stream object is stored in the mini stream is defined by the miniStreamCutoffSize field in the header. This and
the two sector spaces is all the stream factory needs.

    final class StreamFactory
    {
        init(fileSpace:SectorSpace, miniStreamSpace:SectorSpace, miniStreamCutoffSize:StreamSize)
        {
            self.fileSpace            = fileSpace
            self.miniStreamSpace      = miniStreamSpace
            self.miniStreamCutoffSize = miniStreamCutoffSize
        }
    
        //
    
        func makeStream(entry:StreamEntry) -> Stream?
        {
            let size = entry.streamSize
    
            if size > miniStreamCutoffSize
            {
                return Stream(size:size, start: entry.startingSector, space: fileSpace)
            }
            else
            {
                return Stream(size:size, start: entry.startingSector, space: miniStreamSpace)
            }
        }
    
        //
    
        private let fileSpace               : SectorSpace
        private let miniStreamSpace         : SectorSpace
        private let miniStreamCutoffSize    : StreamSize
    }

Once we have the stream factory we can define a class which implements a storage object.

All it needs is the StorageEntry which represents the storage object in the directory so it can find the stream and storage objects it contains, and the stream factory so that it can create stream objects as necessary,

    final class Storage
    {
        init(entry:StorageEntry, streamFactory:StreamFactory)
        {
            self.entry          = entry
            self.streamFactory  = streamFactory
            self.storageTable   = [String: Storage]()
            self.streamTable    = [String: Stream]()
        }
    
        //
    
        func getStream(var path:[String], name:String) -> Stream?
        {
            if path.count != 0
            {
                return getStorage(path.removeAtIndex(0))?.getStream(path, name: name)
            }
            else
            {
                return getStream(name)
            }
        }
    
        func getStorage(storageName:String) -> Storage?
        {
            var storage = storageTable[storageName]
    
            if storage != nil
            {
                return storage
            }
    
            let storageEntry = entry.getStorageEntry(storageName)
    
            if storageEntry == nil
            {
                return nil
            }
            storage = Storage(entry: storageEntry!, streamFactory: streamFactory)
            storageTable[storageName] = storage
            return storage
        }
    
        func getStream(streamName:String) -> Stream?
        {
            var stream = streamTable[streamName]
    
            if stream == nil
            {
                let streamEntry = entry.getStreamEntry(streamName)
    
                if streamEntry == nil
                {
                    return nil
                }
                stream = streamFactory.makeStream(streamEntry!)
                streamTable[streamName] = stream
            }
            return stream
        }
    
        //
    
        private let entry           : StorageEntry
        private let streamFactory   : StreamFactory
        //
        private var storageTable    : [String: Storage]
        private var streamTable     : [String: Stream]
    }

We can define a CompoundFile as a very simple wrapper around the Storage instance which represents the root storage object.

    final class CompoundFile
    {
        init(rootStorage:Storage)
        {
            self.rootStorage = rootStorage
        }
    
        //
    
        func getStream(#storage:[String], name:String) -> Stream?
        {
            return rootStorage.getStream(storage, name:name)
        }
    
        //
    
        private let rootStorage: Storage
    }

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 17, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Four — Where Is Everything ? Sectors Edition

Conceptually a compound file is a collection of

  • storage objects, and

  • stream objects.

arranged in a ‘tree’, plus internal streams which hold metadata.

A storage object is a named collection of stream objects and storage objects.

A stream object is a named sequence of bytes.

A compound file is considered to be a single storage object, the ‘root storage’ object, containing other storage objects and stream objects.

A storage object is a purely logical collection.

Storage objects do not exist as a separate entities unlike stream objects and stream objects ‘contained’ within a single storage object are not grouped together for example.

At the lowest level a compound file comprises a header and some number of fixed-length sectors.

Sectors are used to store the contents of stream objects and internal streams.

If a stream object or internal stream spans multiple sectors then those sectors may appear anywhere in the file in any order.

To access the contents of a stream object or internal stream it is necessary to know which sector contains the first part and which sectors contain the other parts and in what order.

Given the name of a stream object the starting sector can be determined using ‘the directory’ which is an internal stream.

Given the index of the sector which contains part of a stream object the index of the sector which contains the next part, if any, can be determined using the ‘file allocation table’ (FAT) which is another internal stream.

To access the contents of a given named stream object therefore it is first necessary to read the directory.

The directory is stored in one or more sectors so to read it, it is first necessary to have read the file allocation table to determine their whereabouts.

The file allocation table is also stored in one or more sectors. Fortunately we don’t have to have already read it in order to read it, because the sectors which comprise the file allocation table are specified by the ‘double-indirect file allocation table’ (DIFAT) which is another internal stream.

The first sector of the DIFAT is specified in the header and unlike all other sectors in a compound file the DIFAT sectors are chained together using information in the sectors themselves not the FAT.

In addition the first 109 entries in the DIFAT also appear in the header as the DIFAT field, which means that in some cases it may not be necessary to read the DIFAT sectors at all.

To read the file allocation table we must iterate over the entries in the DIFAT reading each of the specified sectors and concatenating the contents.

From this point on everything has to be accessed in terms of sectors so we can start by defining the SectorSource protocol.

    protocol SectorSource
    {
        var sectorSize : Int { get }
    
        func sector(index:SectorIndex) -> UnsafePointer<UInt8>?
    }

A SectorSource is an object capable of returning a sector given its index.

The SectorIndex type is defined like this

    typealias SectorIndex   = UInt32

The sectorSize property specifies the size of all sectors returned by a successful call to the sector method.

Given a SectorSource object for the sectors in the file and the sequence of sector indexes from the DIFAT we can read the FAT like this

     private func readFAT(
                      nSectors:
                          Int,
                      nEntriesPerSector:
                          Int,
                      sequence:
                          SectorIndexSequence,
                      sectors:
                          SectorSource) -> FileAllocationTable?
    {
        if nSectors == 1
        {
            var g = sequence.generate()
    
            if let sectorIndex = g.next()
            {
                return readSingleSectorFAT(sectorIndex, nEntries:nEntriesPerSector, sectors: sectors)
            }
            else
            {
                return nil
            }
        }
        else
        {
            return nil
        }
    }

It is especially easy in the case of this particular file since the entire FAT is contained in a single sector.

    private func readSingleSectorFAT(index:SectorIndex, nEntries:Int, sectors:SectorSource) -> FileAllocationTable?
    {
        if let sector = sectors.sector(index)
        {
            return SingleSectorFAT(bytes:sector, nEntries:nEntries)
        }
        else
        {
            return nil
        }
    }

The result is an object which implements the FileAllocationTable protocol.

    protocol FileAllocationTable
    {
        func next(index: SectorIndex) -> SectorIndex?
        
        func sequence(index:SectorIndex) -> SectorIndexSequence?
    }

Given a sector index the next method returns the index of the next sector as held in the file allocation table.

Given the index of the first sector of a stream object the sequence method will return the indices of all the sectors containing the stream object in order.

The SectorIndexSequence type is defined like this

    typealias SectorIndexSequence = SequenceOf<SectorIndex>

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Blog at WordPress.com.