Just An Application

November 17, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Four — Where Is Everything ? Sectors Edition

Conceptually a compound file is a collection of

  • storage objects, and

  • stream objects.

arranged in a ‘tree’, plus internal streams which hold metadata.

A storage object is a named collection of stream objects and storage objects.

A stream object is a named sequence of bytes.

A compound file is considered to be a single storage object, the ‘root storage’ object, containing other storage objects and stream objects.

A storage object is a purely logical collection.

Storage objects do not exist as a separate entities unlike stream objects and stream objects ‘contained’ within a single storage object are not grouped together for example.

At the lowest level a compound file comprises a header and some number of fixed-length sectors.

Sectors are used to store the contents of stream objects and internal streams.

If a stream object or internal stream spans multiple sectors then those sectors may appear anywhere in the file in any order.

To access the contents of a stream object or internal stream it is necessary to know which sector contains the first part and which sectors contain the other parts and in what order.

Given the name of a stream object the starting sector can be determined using ‘the directory’ which is an internal stream.

Given the index of the sector which contains part of a stream object the index of the sector which contains the next part, if any, can be determined using the ‘file allocation table’ (FAT) which is another internal stream.

To access the contents of a given named stream object therefore it is first necessary to read the directory.

The directory is stored in one or more sectors so to read it, it is first necessary to have read the file allocation table to determine their whereabouts.

The file allocation table is also stored in one or more sectors. Fortunately we don’t have to have already read it in order to read it, because the sectors which comprise the file allocation table are specified by the ‘double-indirect file allocation table’ (DIFAT) which is another internal stream.

The first sector of the DIFAT is specified in the header and unlike all other sectors in a compound file the DIFAT sectors are chained together using information in the sectors themselves not the FAT.

In addition the first 109 entries in the DIFAT also appear in the header as the DIFAT field, which means that in some cases it may not be necessary to read the DIFAT sectors at all.

To read the file allocation table we must iterate over the entries in the DIFAT reading each of the specified sectors and concatenating the contents.

From this point on everything has to be accessed in terms of sectors so we can start by defining the SectorSource protocol.

    protocol SectorSource
    {
        var sectorSize : Int { get }
    
        func sector(index:SectorIndex) -> UnsafePointer<UInt8>?
    }

A SectorSource is an object capable of returning a sector given its index.

The SectorIndex type is defined like this

    typealias SectorIndex   = UInt32

The sectorSize property specifies the size of all sectors returned by a successful call to the sector method.

Given a SectorSource object for the sectors in the file and the sequence of sector indexes from the DIFAT we can read the FAT like this

     private func readFAT(
                      nSectors:
                          Int,
                      nEntriesPerSector:
                          Int,
                      sequence:
                          SectorIndexSequence,
                      sectors:
                          SectorSource) -> FileAllocationTable?
    {
        if nSectors == 1
        {
            var g = sequence.generate()
    
            if let sectorIndex = g.next()
            {
                return readSingleSectorFAT(sectorIndex, nEntries:nEntriesPerSector, sectors: sectors)
            }
            else
            {
                return nil
            }
        }
        else
        {
            return nil
        }
    }

It is especially easy in the case of this particular file since the entire FAT is contained in a single sector.

    private func readSingleSectorFAT(index:SectorIndex, nEntries:Int, sectors:SectorSource) -> FileAllocationTable?
    {
        if let sector = sectors.sector(index)
        {
            return SingleSectorFAT(bytes:sector, nEntries:nEntries)
        }
        else
        {
            return nil
        }
    }

The result is an object which implements the FileAllocationTable protocol.

    protocol FileAllocationTable
    {
        func next(index: SectorIndex) -> SectorIndex?
        
        func sequence(index:SectorIndex) -> SectorIndexSequence?
    }

Given a sector index the next method returns the index of the next sector as held in the file allocation table.

Given the index of the first sector of a stream object the sequence method will return the indices of all the sectors containing the stream object in order.

The SectorIndexSequence type is defined like this

    typealias SectorIndexSequence = SequenceOf<SectorIndex>

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Blog at WordPress.com.

%d bloggers like this: