Just An Application

November 21, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Six — Where Is Everything ? The Directory Edition

Now we have the file allocation table we can also read ‘the directory’.

The directory is an internal stream containing an entry for each storage object and stream object in the compound file.

The first entry in the directory is always the entry for the root storage object.

The entries for all the storage and stream objects contained within a given storage object are linked together as a red-black tree. The containing storage object has a link to the root of that tree.

Each entry in the directory is represented by a DirectoryEntry which is 128 bytes long.

This means we can read the directory at the sector level rather the stream level as there are guaranteed to be a whole number of entries per sector.

The firstDirSector header field identifies the first sector of the directory internal stream and the nDirSectors header field specifies the number of sectors.

In a version three compound file the nDirSectors field is always zero so it is of no use whatsoever. We have no choice but to iterate over all the sectors in sequence.

    private func readDirectoryV3(
                     header:
                         FileHeader,
                     fat:
                         FileAllocationTable,
                     sectors:
                         SectorSource) -> RootStorageEntry?
    {
        if let sequence = fat.sequence(header.firstDirSector)
        {
            let builder = DirectoryBuilder()
    
            for sectorIndex in sequence
            {
                let sector = sectors.sector(sectorIndex)
    
                if sector == nil
                {
                    return nil
                }
    
                var entryBytes = sector!
    
                for i in 0 ..< CFBFFormat.V3_DIRECTORY_ENTRIES_PER_SECTOR
                {
                    if !builder.addEntry(entryBytes)
                    {
                        return nil
                    }
                    entryBytes += CFBFFormat.DIRECTORY_ENTRY_SIZE
                }
            }
            return builder.build()
        }
        else
        {
            return nil
        }
    }

The DirectoryBuilder is passed a pointer to the bytes for each DirectoryEntry in each sector of the directory.

Within the DirectoryBuilder we can use the ‘flat struct’ technique to ‘read’ the DirectoryEntry.

In this case the struct FlatDirectoryEntry looks like this

    struct FlatDirectoryEntry
    {
        struct Name
        {
            let name0   : EightBytes
            let name1   : EightBytes
            let name2   : EightBytes
            let name3   : EightBytes
            let name4   : EightBytes
            let name5   : EightBytes
            let name6   : EightBytes
            let name7   : EightBytes
        }
    
        let name            : Name          // 64 bytes
        let nameLength      : UInt16
        let type            : UInt8
        let colour          : UInt8
        let left            : UInt32
        let right           : UInt32
        let child           : UInt32
        let clsid           : CLSID
        let state           : UInt32
        let created         : EightBytes
        let modified        : EightBytes
        let startingSector  : UInt32
        let streamSize      : UInt64
    }

The name field can contain up to thirty-two little-endian UTF-16 characters including a terminating ‘null’ character.

It is possible to define a struct with thirty-two UInt16 fields but its not going to be much use without some additional effort so we settle for something that is the right length.

The nameLength field specifies the length of the name including the terminating ‘null’ character, in bytes for some reason.

The type field must be one of

  • 0x00 (Unknown/Unallocated)

  • 0x01 (Storage Object)

  • 0x02 (Stream Object)

  • 0x05 (Root Storage Object)

The colour field is the ‘colour’ of the entry in the red-black tree in which it appears.

The left and right fields give the indices of the left and right children of the entry, if any, in the red-black tree in which it appears.

The child field is only valid if the entry represents a storage object. It is the index of the entry at the root of the red-black tree of the entries for the storage objects and stream objects ‘contained’ within the storage object.

The startingSector and streamSize fields are only valid if the entry represents a stream object,
and they specify the first sector of the stream object and its total size.

We can ‘read’ the DirectoryEntry using the FlatDirectoryEntry struct like this.

    let flatEntry = UnsafePointer<FlatDirectoryEntry>(bytes).memory

We can represent the type of a DirectoryEntry using an enum with the appropriate raw values.

    enum ObjectType: UInt8
    {
        case Unknown     = 0
        case Storage     = 1
        case Stream      = 2
        case RootStorage = 5
    }

and then attempt to construct one with the value of the type field to see whether it is valid.

    let type      = ObjectType(rawValue:flatEntry.type)
    
    if type == nil
    {
        return nil
    }

We can check that the nameLength field is an even number and that it is within bounds.

    let nameLength = Int(flatEntry.nameLength)
    
    if ((nameLength & 1) != 0) || nameLength > CFBFFormat.DIRECTORY_ENTRY_MAX_NAME_LENGTH
    {
        return nil
    }

The easiest way to construct the name itself is to use the pointer to the bytes while remembering that an Unknown/Unallocated entry has no name.

    var n : String?

    if nameLength != 0
    {
        n = NSString(bytes:bytes, length:Int(nameLength - 2), encoding: NSUTF16LittleEndianStringEncoding)
    }
    else
    {
        n = ""
    }
    if n == nil
    {
        return nil
    }

    let name = n!

Now we have both a type and a name we can engage in some gratuitous switchery to check that they are both valid.

    private func ensure(index:Int, name:String, type:ObjectType) -> Bool
    {
        switch type
        {
            case .Unknown where index != 0 && name == "":
    
                return true
    
            case .Storage, .Stream where index != 0 && name != "":
    
                return true
    
            case .RootStorage where index == 0 && name == CFBFFormat.DIRECTORY_ROOT_ENTRY_NAME:
    
                return true
    
            default:
    
                return false
        }
    }

If all the entries are read successfully we can then call the DirectoryBuilder build method.

If successful the build method returns a RootStorageEntry object. This can be used to find any storage object or stream object in the compound file.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: