Just An Application

September 10, 2015

Swift Miscellany: Too Many Switches — The Case Of The Nested Enums

At the lexical level Swift code comprises five token types

  • identifiers

  • keywords

  • literals

  • operators

  • punctuation

seperated by whitespace, so a Swift token can be represented quite nicely by an enum.

    enum Token
    {
        case IDENTIFIER
        case KEYWORD
        case LITERAL
        case OPERATOR
        case PUNCTUATION
    }

The token type on its own is not use much use.

In each case we also need the associated value which, because its Swift, we can store in the enum as well.

    enum Token
    {
        case IDENTIFIER(Identifier)
        case KEYWORD(Keyword)
        case LITERAL(Literal)
        case OPERATOR(Operator)
        case PUNCTUATION(Punctuation)
    }

The types of the associated values are also best represented as enums, for example,

    enum Keyword: String
    {
        case AS             = "as"
        case ASSOCIATIVITY  = "associativity"
    
        case BREAK          = "break"
    
        case CASE           = "case"
        case CATCH          = "catch"
        case CLASS          = "class"
        case CONTINUE       = "continue"
        case CONVENIENCE    = "convenience"
    
        ...
    
        case WEAK           = "weak"
        case WHERE          = "where"
        case WHILE          = "while"
        case WILL_SET       = "willSet"
    }
    enum Punctuation: String
    {
        case AMPERSAND              = "&"
        case ARROW                  = "->"
        case AT                     = "@"
    
        case COLON                  = ":"
        case COMMA                  = ","
    
        case DOT                    = "."
    
        ...
        
        case LEFT_PAREN             = "("
        case RIGHT_PAREN            = ")"
    }

Now we have a 'nested enum' which is all very interesting but how do you use it ?

Parsing A Swift Pattern

This is the Swift grammar for a pattern

    patternwildcard-pattern type-annotationopt 
    patternidentifier-pattern type-annotationopt 
    patternvalue-binding-pattern 
    patterntuple-pattern type-annotationopt 
    patternenum-case-pattern 
    patternoptional-pattern 
    patterntype-casting-pattern 
    patternexpression-pattern 

and these are the grammars for the various types of pattern

    wildcard-pattern_ 
    identifier-patternidentifier 
    value-binding-patternvar pattern | let pattern 
    tuple-pattern( tuple-pattern-element-listopt ) 
    enum-case-patterntype-identifieropt . enum-case-name tuple-patternopt 
    optional-patternidentifier-pattern ?
    type-casting-patternis-pattern | as-pattern
    is-patternis type 
    as-patternpattern  as type 
    expression-patternexpression

A pattern can start with an indentifier, a keyword, punctuation, or anything an expression can start with which adds operators so we now have a full-house.

To parse a pattern given a token we need to identify the token type and then for each type identify whether it can start a pattern.

To identify the token type given the enum token representation above we can use a switch.

    func parse() throws -> Pattern
    {
        let pattern : Pattern
    
        switch peek()
        {
            case .IDENTIFIER:
    
                pattern = ????
    
            case .KEYWORD:
    
                pattern = ????
    
            case .PUNCTUATION:
    
                pattern = ????
    
            default:
    
                pattern = try expression()
        }
        return pattern
    }

That won't work as is because in the keyword or punctuation case we need to know what the actual keyword or punctuation is.

We can fix this by binding the associated values in those cases.

    func parse() throws -> Pattern
    {
        let pattern : Pattern
    
        switch peek()
        {
            case let .IDENTIFIER:
    
                pattern = ????
    
            case let .KEYWORD(keyword):
    
                pattern = ????
    
            case let .PUNCTUATION(punctuation):
    
                pattern = ????
    
            default:
    
                pattern = try expression()
        }
        return pattern
    }

Now we can see whether we have the 'right kind' of keyword or punctuation.

In case case we have another enum so we can use a switch.

    func parse() throws -> Pattern
    {
        let pattern : Pattern
    
        switch peek()
        {
            case .IDENTIFIER:
    
                pattern = ????
    
            case let .KEYWORD(keyword):
    
                switch keyword
                {
                    case .IS:
    
                        pattern = try isPattern()
    
                    case .LET:
    
                        pattern = try valueBindingPattern()
                        
                    case .UNDERSCORE:
                    
                        pattern = try wilcardPattern()
    
                    case .VAR:
    
                        pattern = try valueBindingPattern()
    
                    default:
    
                        pattern = try expression()
                }
    
            case let .PUNCTUATION(punctuation):
    
                switch punctuation
                {
                    case .DOT:
    
                        pattern = try enumCasePattern()
    
                    case .LEFT_PAREN:
    
                        pattern = try tuplePattern()
    
                    default:
    
                        pattern = try expression()
                }
    
            default:
    
                pattern = try expression()
        }
        return pattern
    }

Now we've got a bit of a mess really. Lets just hope they don't add any more pattern types. Maybe nested enums are not such a great idea after all.

In fact nested enums are a perfectly fine idea. The reason its a bit of a mess is the nested switches, but nested enums do not require nested switches. We, in this case I, have simply been doing it wrong.

A Closer Look At Patterns

Of the eight patterns in the grammar shown above, the switch in the last version of the method uses two

The case

    case .IDENTIFIER:

uses the simplest form of an enum-case-pattern

    . enum-case-name

The other two cases

    case let .KEYWORD(keyword):

and

    case let .PUNCTUATION(punctuation):

use the let form of a value-binding-pattern.

A value-binding-pattern is simply a pattern prefixed by either of the keywords let or var.

The prefixed pattern in each case is another example of an enum-case-pattern.

This time the optional tuple-pattern suffix element is present.

A tuple-pattern is a tuple-pattern-element-list delimited by parentheses.

This is the grammar for a tuple-pattern-element-list

    tuple-pattern-element-listtuple-pattern-element | tuple-pattern-element , tuple-pattern-element-list
    tuple-pattern-elementpattern

A tuple-pattern is a parentheses delimited list of comma separated patterns.

In the two cases above there is only one pattern in the list and that is an identifier-pattern.

According to the grammar there can be any number of patterns and the patterns are not limited to identifier-patterns.

At this point it is worth mentioning that the grammar as given in the Swift language reference is not always entirely accurate.

In the case of a value-binding-pattern it would imply that you could write this

    case let let .KEYWORD(keyword):

which you cannot.

Nor can you write this

    case let .KEYWORD(let keyword):

although you can write

    case .KEYWORD(let keyword):

In short, value-binding-patterns are not recursive whatever the grammar might say.

On the other hand, enum-case-patterns are recursive which means that you can write this

    case .KEYWORD(.IS):

which means that the method above can be rewritten more succinctly like so

    func parse() throws -> Pattern
    {
        let pattern : Pattern
    
        switch peek()
        {
            case .IDENTIFIER:
    
                pattern = try identifierOrEnum()
    
            case .KEYWORD(.IS):
    
                pattern = try isPattern()
    
            case .KEYWORD(.LET):
    
                pattern = try valueBindingPattern(.LET)
    
            case .KEYWORD(.UNDERSCORE):
    
                pattern = try wildcardPattern()
    
            case .KEYWORD(.VAR):
    
                pattern = try valueBindingPattern(.VAR)
    
            case .PUNCTUATION(.DOT):
    
                pattern = try enumCasePattern()
    
            case .PUNCTUATION(.LEFT_PAREN):
    
                pattern = try tuplePattern()
    
            default: // expression
    
                pattern = try expression()
        }
        if peek(.AS)
        {
            return try asPattern(pattern)
        }
        else
        {
            return pattern
        }
    }

There is no need for nested switches because we can use nested patterns instead.


Copyright (c) 2015 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog's author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Advertisements

September 9, 2015

Swift Miscellany: Keywords, Reserved And Escaped

There are a fair few keywords in Swift 2.0, I think we’re up to around about seventy-five at the moment, but it wasn’t until I started rummaging around in the grammar that I discovered that some of them are context senstive.

The context sensitive keywords are those which are used in only one specific context and most of the declaration modifiers

Infix Operator Declaration

  • associativity

  • left

  • none

  • precedence

  • right

are reserved keywords in the context of an infix-operator-declaration.

    infix-operator-declaration infix operator operator { infix-operator-attributesopt }
    infix-operator-attributes precedence-clauseopt associativity-clauseopt
    precedence-clause precedence precedence-level
    associativity-clause associativity associativity
    associativity left | right | none

Computed Properties/Variables

  • get

and

  • set

are reserved keywords in the context of a computed property/variable declaration.

    variable-declarationvariable-declaration-head variable-name type-annotation getter-setter-block
    getter-setter-block{ getter-clause setter-clauseopt }
    getter-setter-block{ setter-clause getter-clause }
    getter-clauseattributesopt get code-block
    setter-clauseattributesopt set setter-nameopt code-block

Property/Variable Observers

  • didSet

and

  • willSet

are keywords in the context of a stored variable observer declaration or a property observer declaration

    variable-declarationvariable-declaration-head variable-name initializer willSet-didSet-block
    variable-declarationvariable-declaration-head variable-name type-annotation initializeropt willSet-didSet-block
    willSet-didSet-block{ willSet-clause didSet-clauseopt }
    willSet-didSet-block{ didSet-clause willSet-clauseopt }
    willSet-clauseattributesopt willSet setter-nameopt code-block
    didSet-clauseattributesopt didSet setter-nameopt code-block

Metatype Type

  • Protocol

and

  • Type

are keywords in the context of a metatype-type

    metatype-typetype  . Type | type  . Protocol 

Declaration Modifiers

  • convenience

  • dynamic

  • final

  • infix

  • indirect

  • lazy

  • mutating

  • nonmutating

  • optional

  • override

  • postfix

  • prefix

  • required

  • unowned

  • weak

are reserved keywords when they appear before a declaration.

When not being used in their specific contexts reserved keywords can moonlight as identifiers which means that you can, if you want to, do this

    ...
    
    private var final    = true
    private var left     = true
    private var optional = true
    private var override = false
    private var required = true
    private var right    = false
    private var set      = true
    private var unowned  = true
    private var weak     = false
    
    ...

or this

    enum associativity
    {
        case left
        case right
        case none
    }

or even this

    func infix()
    {
    
    }

Escaped Keywords

If your favourite identifier is a non-reserved Swift keyword there is still hope.

Any Swift keyword can be used as an identifier if it is escaped, so if you have always wanted a class called class

    class `class`
    {
    
    }

or a global variable called self

    var `self` = `class`()

you can have one.

The escape character is the grave accent/backtick/backquote character (`) ASCII 96/0x60.


Copyright (c) 2015 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 21, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Seven — Stream And Storage Objects

We can model the combination of a set of sectors and the associated file allocation table as a SectorSpace.

    protocol SectorSpace
    {
        func data(index:SectorIndex) -> ByteData?
    }

A SectorSpace is an object capable of returning the contents of a stream given the index of the first sector in that ‘space’.

There are two possible SectorSpaces in a compound file. The first represents the sectors in the file itself in combination with the FAT. The second represents the sectors in the mini stream in combination with the mini FAT.

We can implements the first straight away. We have a SectorSource for the sectors in the file and the FAT.

For the second we have the mini FAT but we do not have the sectors stored in the mini stream.

The mini stream is an internal stream stored in sectors in the file itself, so it can be constructed using the first SectorSpace which represents those sectors and the FAT.

To construct the mini stream we need to know the starting sector and the size. These are stored in the directory entry for the root storage object.

We can define them as properties in the class RootStorageEntry

    let miniStreamStart : SectorIndex
    let miniStreamSize  : StreamSize

We can make the mini stream sector space like this

    private func makeMiniStreamSpace(
                     rootStorageEntry:
                         RootStorageEntry,
                     miniFAT:
                         FileAllocationTable,
                     fileSpace:
                         SectorSpace) -> SectorSpace?
    {
        if let data = fileSpace.data(rootStorageEntry.miniStreamStart)
        {
            return
                MiniStreamSpace(
                    data:
                        data,
                    size:
                        rootStorageEntry.miniStreamSize,
                    fat:
                        miniFAT,
                    sectorSize:
                        CFBFFormat.MINI_SECTOR_SIZE)
        }
        else
        {
            return nil
        }
    }

Now we have our two sector spaces we can implement a stream factory that can create a stream object given the index of its first sector and its size.

The size below which a stream object is stored in the mini stream is defined by the miniStreamCutoffSize field in the header. This and
the two sector spaces is all the stream factory needs.

    final class StreamFactory
    {
        init(fileSpace:SectorSpace, miniStreamSpace:SectorSpace, miniStreamCutoffSize:StreamSize)
        {
            self.fileSpace            = fileSpace
            self.miniStreamSpace      = miniStreamSpace
            self.miniStreamCutoffSize = miniStreamCutoffSize
        }
    
        //
    
        func makeStream(entry:StreamEntry) -> Stream?
        {
            let size = entry.streamSize
    
            if size > miniStreamCutoffSize
            {
                return Stream(size:size, start: entry.startingSector, space: fileSpace)
            }
            else
            {
                return Stream(size:size, start: entry.startingSector, space: miniStreamSpace)
            }
        }
    
        //
    
        private let fileSpace               : SectorSpace
        private let miniStreamSpace         : SectorSpace
        private let miniStreamCutoffSize    : StreamSize
    }

Once we have the stream factory we can define a class which implements a storage object.

All it needs is the StorageEntry which represents the storage object in the directory so it can find the stream and storage objects it contains, and the stream factory so that it can create stream objects as necessary,

    final class Storage
    {
        init(entry:StorageEntry, streamFactory:StreamFactory)
        {
            self.entry          = entry
            self.streamFactory  = streamFactory
            self.storageTable   = [String: Storage]()
            self.streamTable    = [String: Stream]()
        }
    
        //
    
        func getStream(var path:[String], name:String) -> Stream?
        {
            if path.count != 0
            {
                return getStorage(path.removeAtIndex(0))?.getStream(path, name: name)
            }
            else
            {
                return getStream(name)
            }
        }
    
        func getStorage(storageName:String) -> Storage?
        {
            var storage = storageTable[storageName]
    
            if storage != nil
            {
                return storage
            }
    
            let storageEntry = entry.getStorageEntry(storageName)
    
            if storageEntry == nil
            {
                return nil
            }
            storage = Storage(entry: storageEntry!, streamFactory: streamFactory)
            storageTable[storageName] = storage
            return storage
        }
    
        func getStream(streamName:String) -> Stream?
        {
            var stream = streamTable[streamName]
    
            if stream == nil
            {
                let streamEntry = entry.getStreamEntry(streamName)
    
                if streamEntry == nil
                {
                    return nil
                }
                stream = streamFactory.makeStream(streamEntry!)
                streamTable[streamName] = stream
            }
            return stream
        }
    
        //
    
        private let entry           : StorageEntry
        private let streamFactory   : StreamFactory
        //
        private var storageTable    : [String: Storage]
        private var streamTable     : [String: Stream]
    }

We can define a CompoundFile as a very simple wrapper around the Storage instance which represents the root storage object.

    final class CompoundFile
    {
        init(rootStorage:Storage)
        {
            self.rootStorage = rootStorage
        }
    
        //
    
        func getStream(#storage:[String], name:String) -> Stream?
        {
            return rootStorage.getStream(storage, name:name)
        }
    
        //
    
        private let rootStorage: Storage
    }

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 18, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Five — Where Is Everything ? The Much Smaller Sectors Edition

The size of the sectors in a compound file is a function of the version. In a version three compound file the sector size is 512 bytes. In a version four compound file the sector size is 4096 bytes.

If a compound file contains a large number of stream objects that are smaller than a sector, and/or whose last part only partially fills a sector than there can be a considerable amount of wasted space.

To help avoid this stream objects below a certain size may be stored as a series of much smaller 64 byte sectors instead.

These sectors are in turn stored as the contents of an internal stream called the ‘mini stream’. This stream has an associated file allocation table, the ‘mini FAT’.

The mini FAT like the FAT is stored in sectors. Unlike the FAT the sector index chain for the Mini FAT is stored in the FAT.

Having read the FAT we can now read the mini FAT.

The starting sector and the number of sectors are specified by the

    firstMiniFATSector

and

    nMiniFATSectors

fields in the header

We can read the mini FAT using readFAT since the structure of the mini FAT is identical to that of the FAT.

    private func readMiniFAT(
                     header:
                         FileHeader,
                     nEntriesPerSector:
                         Int,
                     fat:
                         FileAllocationTable,
                     sectors:
                         SectorSource) -> FileAllocationTable?
    {
        if let sequence = fat.sequence(header.firstMiniFATSector)
        {
            return
                readFAT(
                        header.nMiniFATSectors,
                    nEntriesPerSector:
                        nEntriesPerSector,
                    sequence:
                        sequence,
                    sectors:
                        sectors)
        }
        else
        {
            return nil
        }
    }

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 17, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Four — Where Is Everything ? Sectors Edition

Conceptually a compound file is a collection of

  • storage objects, and

  • stream objects.

arranged in a ‘tree’, plus internal streams which hold metadata.

A storage object is a named collection of stream objects and storage objects.

A stream object is a named sequence of bytes.

A compound file is considered to be a single storage object, the ‘root storage’ object, containing other storage objects and stream objects.

A storage object is a purely logical collection.

Storage objects do not exist as a separate entities unlike stream objects and stream objects ‘contained’ within a single storage object are not grouped together for example.

At the lowest level a compound file comprises a header and some number of fixed-length sectors.

Sectors are used to store the contents of stream objects and internal streams.

If a stream object or internal stream spans multiple sectors then those sectors may appear anywhere in the file in any order.

To access the contents of a stream object or internal stream it is necessary to know which sector contains the first part and which sectors contain the other parts and in what order.

Given the name of a stream object the starting sector can be determined using ‘the directory’ which is an internal stream.

Given the index of the sector which contains part of a stream object the index of the sector which contains the next part, if any, can be determined using the ‘file allocation table’ (FAT) which is another internal stream.

To access the contents of a given named stream object therefore it is first necessary to read the directory.

The directory is stored in one or more sectors so to read it, it is first necessary to have read the file allocation table to determine their whereabouts.

The file allocation table is also stored in one or more sectors. Fortunately we don’t have to have already read it in order to read it, because the sectors which comprise the file allocation table are specified by the ‘double-indirect file allocation table’ (DIFAT) which is another internal stream.

The first sector of the DIFAT is specified in the header and unlike all other sectors in a compound file the DIFAT sectors are chained together using information in the sectors themselves not the FAT.

In addition the first 109 entries in the DIFAT also appear in the header as the DIFAT field, which means that in some cases it may not be necessary to read the DIFAT sectors at all.

To read the file allocation table we must iterate over the entries in the DIFAT reading each of the specified sectors and concatenating the contents.

From this point on everything has to be accessed in terms of sectors so we can start by defining the SectorSource protocol.

    protocol SectorSource
    {
        var sectorSize : Int { get }
    
        func sector(index:SectorIndex) -> UnsafePointer<UInt8>?
    }

A SectorSource is an object capable of returning a sector given its index.

The SectorIndex type is defined like this

    typealias SectorIndex   = UInt32

The sectorSize property specifies the size of all sectors returned by a successful call to the sector method.

Given a SectorSource object for the sectors in the file and the sequence of sector indexes from the DIFAT we can read the FAT like this

     private func readFAT(
                      nSectors:
                          Int,
                      nEntriesPerSector:
                          Int,
                      sequence:
                          SectorIndexSequence,
                      sectors:
                          SectorSource) -> FileAllocationTable?
    {
        if nSectors == 1
        {
            var g = sequence.generate()
    
            if let sectorIndex = g.next()
            {
                return readSingleSectorFAT(sectorIndex, nEntries:nEntriesPerSector, sectors: sectors)
            }
            else
            {
                return nil
            }
        }
        else
        {
            return nil
        }
    }

It is especially easy in the case of this particular file since the entire FAT is contained in a single sector.

    private func readSingleSectorFAT(index:SectorIndex, nEntries:Int, sectors:SectorSource) -> FileAllocationTable?
    {
        if let sector = sectors.sector(index)
        {
            return SingleSectorFAT(bytes:sector, nEntries:nEntries)
        }
        else
        {
            return nil
        }
    }

The result is an object which implements the FileAllocationTable protocol.

    protocol FileAllocationTable
    {
        func next(index: SectorIndex) -> SectorIndex?
        
        func sequence(index:SectorIndex) -> SectorIndexSequence?
    }

Given a sector index the next method returns the index of the next sector as held in the file allocation table.

Given the index of the first sector of a stream object the sequence method will return the indices of all the sectors containing the stream object in order.

The SectorIndexSequence type is defined like this

    typealias SectorIndexSequence = SequenceOf<SectorIndex>

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 16, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Three — Now Read Your Header

The 512 byte header of a compound file can be represented as a Swift struct like this

    struct FlatFileHeader
    {
        let signature               : EightBytes
        let clsid                   : CLSID
        //
        let minor                   : UInt16
        let major                   : UInt16
        let byteOrder               : UInt16
        let sectorShift             : UInt16
        let miniSectorShift         : UInt16
        let reserved                : SixBytes
        let nDirSectors             : UInt32
        let nFATSectors             : UInt32
        let firstDirSector          : UInt32
        let xactionSig              : UInt32
        let miniStreamCutoffSize    : UInt32
        let firstMiniFATSector      : UInt32
        let nMiniFATSectors         : UInt32
        let firstDIFATSector        : UInt32
        let nDIFATSectors           : UInt32
        let difat                   : DIFAT
    }

It is effectively a straight transcription from the specification with four exceptions

In the specification the first field is defined as

    Header Signature (8 bytes): ... 

the second field is defined as

    Header CLSID (16 bytes): ... 

the eighth field is defined as

    Reserved (6 bytes): ... 

and the last field is defined as

    DIFAT (436 bytes): ... 

In all these cases the field could be represented as

    [UInt8]

but that fails to capture the exact size of each field, so we do this instead.

We represent the ‘Header Signature’ using the struct EightBytes which looks something like this

    struct EightBytes
    {
        let b0 : UInt8
        let b1 : UInt8
        let b2 : UInt8
        let b3 : UInt8
        let b4 : UInt8
        let b5 : UInt8
        let b6 : UInt8
        let b7 : UInt8
    }

We represent the ‘Header CLSID’ using the struct CLSID which looks something like this

    struct CLSID
    {
        let first   : EightBytes
        let second  : EightBytes
    }

We represent the ‘Reserved’ field using the struct SixBytes which looks something like this

    struct SixBytes
    {
        let b0 : UInt8
        let b1 : UInt8
        let b2 : UInt8
        let b3 : UInt8
        let b4 : UInt8
        let b5 : UInt8
    }

The DIFAT field is not really 436 bytes but 109 32-bit integers which we can represent using the struct DIFAT which looks something like this

    struct DIFAT
    {
        let i0  : UInt32
        let i1  : UInt32
    }

At the moment it only represents the first two values but it can be ‘extended’ if necessary.

The result of using this seemingly random combination of rather odd structures is that the struct FlatFileHeader is indeed ‘flat’ which is to say that are all its fields are value types. They are in fact all structs.

Bearing in mind that the compound file format is little endian and so is this computer, and if we assume the Swift compiler

  1. represents the values of the UInt<N> types by the exact number of bytes necessary when the value is contained in a struct

  2. represents the fields in exactly the same order that they were defined and wihout padding,

  3. that it does the same recursively with the nested struct values, and

  4. that it ensures that the memory allocated for the struct at runtime is at least 4 byte aligned

then, not at all accidentally, the representation of the struct in memory would be identical to the representation of the header in the compound file, and vice-versa.

It is the vice-versa case which is of interest since it would imply that if we had an NSData object containing at least the
first 512 bytes of a compound file then we could ‘read’ the header like this

    let flatHeader = UnsafePointer<FlatFileHeader>(data.bytes).memory

This is not necessarily the piece of insane optimism that it might at first appear.

Given the seamless interworking between Swift and Objective-C it would make a great deal of sense if at runtime a Swift struct meeting the right criteria was identical to the equivalent Objective-C struct.

Running this

    ...
    
    let data = NSData(contentsOfFile:fileName)
    
    if data == nil
    {
        return
    }
    
    let nBytes = data!.length
    
    if nBytes < CFBFFormat.HEADER_SIZE
    {
        return
    }
        
    let flatHeader = UnsafePointer<FlatFileHeader>(data!.bytes).memory
            
    print("Signature:\t\t\t")
    for i in 0 ..< 8
    {
        print("\(flatHeader.signature[i]) ")
    }
    println()
    println("Major:\t\t\t\t\(flatHeader.major)")
    println("Minor:\t\t\t\t\(flatHeader.minor)")
    println("Byte order:\t\t\t\(flatHeader.byteOrder)")
    println("Sector shift:\t\t\(flatHeader.sectorShift)")
    println("MiniSector shift:\t\(flatHeader.miniSectorShift)")
    println("N dir sectors:\t\t\(flatHeader.nDirSectors)")
    println("N FAT sectors:\t\t\(flatHeader.nFATSectors)")
    println()
            
    ...

prints this

    Signature:          208 207 17 224 161 177 26 225
    Major:              3
    Minor:              62
    Byte order:         65534
    Sector shift:       9
    MiniSector shift:   6
    N dir sectors:      0
    N FAT sectors:      1

The specification gives the signature bytes as

    0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1

so we appear to have ‘read’ the header successfully.

Additional checks on the fields with predefined values.

Major version is 3 in which case the specification says the minor version should be 0x003E which it is.

Byte order should be 0xFFFE which it is.

The sector shift is correct, as is the minisector shift.

The number of directory sectors in a version 3 file is always 0 and it is

All done with nary a getUInt16 or a getUInt32 in sight.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 15, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part Two — Of Bytes And Pointers To Bytes

Whether we chose to access the contents of a file as a whole or in part, we will end up with an NSData object with some bytes in, so how do you get at the bytes ?

As before the answer is, in exactly the same way as you do in Objective-C.

So that would be like this then ?

    let bytes = data.bytes

It would.

The similarities between doing things in Swift and Objective-C extend to the type of the property bytes.

In Objective-C it is declared like this

    @property(readonly) const void *bytes

and in Swift like this

    var bytes: UnsafePointer<Void> { get }

So what we have got hold of is something with the type

    UnsafePointer<Void>

which is the equivalent of

    const void*

and it is about as useful, which is to say, not very.

If we do this

    let b = bytes[0]

then the compiler will helpfully volunteer the warning

    Constant 'b' inferred to have type 'Void' which may be unexpected

Possibly not unexpected, it is an ‘unsafe pointer’ to ‘Void’ after all, but definitely of limited utility.

An empty tuple, for that is what a ‘Void’ is of course, specialises in representing nothing, a job at which it excels. but it makes for a very unconvincing byte.

What we want is a

    UnsafePointer<UInt8>

in the same way that we would want a

    const uint8_t*

or something in Objective-C.

In Objective-C you do this to get one

    const uint8_t* bytes = (const uint8_t*)data.bytes;

and in Swift you do this

    let bytes = UnsafePointer<UInt8>(data.bytes)

Once you have one, you can access the byte to which it ‘points’ directly

    let b = bytes.memory

or by using a subscript

    let c = bytes[1]

Also, just as you can in Objective-C, you can ‘walk’ right off the end of the associated memory because it really is an ‘unsafe’ pointer.

    ...
    
    let data  = NSData()
    let bytes = UnsafePointer<UInt8>(data.bytes)
    
    for i in 0 ..< 16
    {
        println(bytes[i])
    }

    ...

at which point everything may come to a grinding halt, but then again it may not, it all depends.

The other way to get at the bytes in an NSData object is, needless to say, exactly the same as the other you would do it in
Objective-C, viz.

    let bytes = UnsafeMutablePointer<UInt8>.alloc(length)

    data.getBytes(bytes, length:data.length)

If you are not happy walking off the end of other people’s memory and prefer walking off the end of your own, this is the option for you.

The memory returned by the call to alloc is not managed and must be explicitly freed by a call to dealloc.

    bytes.dealloc(length)

The allocated memory is also not initialized to anything in particular and especially not to zero.

In Swift an

    UnsafeMutablePointer<T>

is to an

    UnsafePointer<T>

as, in Objective-C,

    T*

is to

    const T*

so you can modify the memory an UnsafeMutablePointer<T> ‘points at’ directly

    bytes.memory = UInt8(length)

or using a subscript

    bytes[8] = UInt8(length)

As well as the subscript functions UnsafePointer<T> and UnsafeMutablePointer<T> types support a variety of operators.

For example

    let bytes  = UnsafeMutablePointer<UInt8>.alloc(16)
    
    for i in 0 ..< 16
    {
        bytes[i] = UInt8(i)
    }
        
    var p = bytes
        
    for i in 0 ..< 8
    {
        println(p.memory)
        p += 2
    }
        
    let end = bytes + 8
        
    p = bytes
        
    while p < end
    {
        println(p++.memory)
    }
        
    p = bytes + 7
        
    while p >= bytes
    {
        println(p--.memory)
    }

The ‘mutating’ operators pre/post decrement/increment etc. can only be used if the ‘pointer’ is referenced from a mutable variable.

You can create an UnsafePointer<T> from an UnsafeMutablePointer<T>

    let bytes     = UnsafeMutablePointer<UInt8>.alloc(16)
    let immutable = UnsafePointer<UInt8>(bytes)

and even vice-versa

    let mutable   = UnsafeMutablePointer<UInt8>(data.bytes)

which is a bit worrying but then the clue is in the name. UnsafePointer<T>s and UnsafeMutablePointer<T>s, are ‘unsafe’.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 13, 2014

Swift vs. The Compound File Binary File Format (aka OLE/COM): Part One — First Open Your File

Supplies of booby trapped PDFs masquerading as invoices seem to have been exhausted. The same is true of .EXE’s concealed in ZIPs.

Fortunately an enterprising prospector somewhere has struck a rich seam of ‘word’ documents of dubious provenance which are now being made freely available to me and doubtless to many other lucky people as well.

Not being able to open my document in the approved manner I am unable to experience at first hand the planned ‘surprise’, but I don’t see why I should miss all the fun so I thought I would open my document in an unapproved manner by pulling it to bits programatically to see if I could find out what it was intended to do, although I could probably guess, and more interestingly, how.

In the case of PDFs the surprise involved Javascript and a buffer overflow. This time I though it was more likely to involve macros so I wondered if some Visual Basic code might put in an appearance at some point.

After a bit of rummaging around I came up with the specification for the likely format of my ‘word’ document.

Strangely the specification is a PDF.

According to the first paragraph of the specification the format has no less than three names

This document specifies a new structure called the Microsoft Compound File Binary (CFB) file format, also known as the Object Linking and Embedding (OLE) or Component Object Model (COM) structured storage compound file implementation binary file format. This structure name can be shortened to compound file.

or is that two names ?

Anyway in what follows I will follow the suggestion that the name can be shortened to compound file although I do reserve the right to put it in ‘quotes’ on occasion.

I was vaguely aware that whatever it was called it was some kind of container format but I was somewhat taken aback do discover exactly how ‘baroque’ it was. Still, in for a penny, in for a pound.

The choice of Swift for the task in hand was based on the fact that I wanted to do something real-worldish with it as a break from my ongoing forays into the low-level implementation details.

Using Xcode to create a ‘Command Line Tool’ with the language set to Swift results in a file called

    main.swift

which contains this

    import Foundation
    
    println("Hello, World!")

which is all very canonical and stuff, but where is main and where are the arguments ?

In a dangerous break with tradition a Swift command line program does not have to have a main function. This is clearly ‘not right’.

Not being irredeemably iconoclastic however it does enable access to the command line arguments via the variables

    C_ARGC

and

    C_ARGV

As is apparent from the names these correspond to the the

    argc

and

    argv

arguments passed to the standard C/C++ main function.

Using these we can set the world to rights by building ourselves an argument list and handing it to our very own main function.

    let argc          = Int(C_ARGC)
    var argp          = C_ARGV
    var argv:[String] = []
    
    for i in 0 ..< argc
    {
        if let s = String(UTF8String: argp.memory)
        {
            argv.append(s)
        }
        else
        {
            break
        }
        ++argp
    }
        
    func main(args: [String])
    {
        ...
    }
        
    if argv.count == argc
    {
        main(argv)
    }

Now how do you open a file in Swift ?

It is very easy to forget when starting to use Swift that if you know how to do something in Objective-C you will nearly always know how to do it in Swift.

If you are happy to work with the contents of the file as one lump you can do this

    let data = NSData(contentsOfFile:fileName)

This is simply the Swift equivalent of this

    NSData* data = [NSData dataWithContentsOfFile:fileName];.

in Objective-C

Alternatively you can do this and then read the contents as required

    let fh = NSFileHandle(forReadingAtPath:fileName)

This is the Swift equivalent of doing this

     NSFileHandle* fh = [NSFileHandle fileHandleForReadingAtPath:fileName];

in Objective-C

A general rule of thumb is that if it is an Objective-C class factory method it will be a Swift constructor with a keyword argument that results from chopping off the Objective-C class name from the front of the method name, except when it isn't as in the first example when the 'with' goes missing as well.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog's author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 3, 2014

So Swift Then: The Curried Function Type Encoding Mystery Concluded

While currying may be a theoretically elegant technique for thinking about and solving certain problems, when it comes to using it to write software it may not be very practical

Calling N one argument functions in a row, with each function allocating a closure, is going to be an expensive undertaking in terms of speed and memory, and it is going to be a lot slower than one function call of N arguments that does not allocate any closures.

If the sequence of function calls is complete, that is, they are being used to obtain the result rather than an intermediate closure then the allocated closures are one-shot. They are going to be deallocated straight away. This cost is incurred every time the result is obtained using all N function calls.

Of course the compiler could attempt some kind of special case optimization when the curried function is called. It could attempt to turn the N function calls back into one function call of N arguments but in the general case, that is, when the curried function is not guaranteed to be in the same module as the caller it is a bit stuck, unless it wants to decompile the object code.

Alternatively it could de-curry the curried function when it compiled it which is of course what it is doing in the case of xper.plusN

It is turning this

    func plusN(n: Int)(i: Int) -> Int
    {
        return i + n
    }

into this

    func 'plusN[decurried]'(n: Int, #i: Int) -> Int
    {
        return i + n
    }

For this to work the compiler must do two additional things when the curried function xper.plusN is called.

In the the function cod

    import xper
    
    func cod(x: Int, y:Int) -> Int
    {
        return xper.plusN(x)(i:y) // ignore the 'external name' its not really there
    }

it must convert the repeated single argument function call sequence to a standard function call.

In the function saithe

    import xper
    
    func saithe(x: Int, y:Int) -> Int
    {
        let plusX = xper.plusN(x)
    
        return plusX(i:y) // ignore the 'external name' its not really there
    }

it must generate the functions necessary to make it appear that the curried function still exists.

This explains what we see when cod and saithe are compiled.

In the first case ‘plusN[decurried]’ can be called directly.

In the second case the compiler needs to generate the equivalent of this

    func 'xper.plusN'(n: Int) -> (i: Int) -> Int
    {
        return { (i : Int) in return 'plusN[decurried]'(n, i: i) }
    }

To rewrite a call to a curried function and generate the local functions if necessary, the compiler must be able to identify when a public function in a module is a curried function, but that is not what the symbol associated with the object code is for.

We can demonstrate this by compiling the function cod using the module information in

  xper.swiftmodule

associated with the curried function version of plusN, and the library

  libxper.dylib

obtained by compiling its vanilla function equivalent, in which case this happens.

    $ swiftc -I. -lxper -L. -module-name caller -emit-library call_plusN.swift
    Undefined symbols for architecture x86_64:
      "__TF4xper5plusNfSiFT1iSi_Si", referenced from:
         __TF6caller3codFTSiSi_Si in call_plusN-7db2ff.o
    ld: symbol(s) not found for architecture x86_64
    <unknown>:0: error: linker command failed with exit code 1 (use -v to see invocation)

The compiler has used the module information that identifies plusN as a curried function and compiled accordingly.

It has specified that the object code of the caller needs to be linked against the object code identified by the symbol

    __TF4xper5plusNfSiFT1iSi_Si

and left it to the linker to do whatever it is that linkers do.

In this case there is an undefined symbol for some object code so the linker goes and looks for the piece of object code for which that symbol is defined.

What it does if it finds it depends on where the object code is found.

What it does if doesn’t find it is stop.

In this case the only other object code is in

  libxper.dylib

and because the version it is has been given to link against does not contain the compiled code for the curried function it cannot find a matching defined symbol so it stops.

To match undefined and defined symbols does not require the linker to understand their format or even to be aware that they have a format.

The linker likes to match symbols and the only thing it asks is that they be unique.

If the compiler is prepared to guarantee that no two functions with the same name and the same type can exist at the same time in the same module, then encoding a function’s name and type into the symbol for that function is a guaranteed way to ensure its uniqueness.

In the case of a curried function the compiler needs to generate a symbol which can be used to identify the de-curried function which is the result of the compilation.

Like all symbols used for linking it must be unique.

Given that the compiler is treating curried functions as a special case the most straight forward solution is for it to use a distinct ‘curried function type’ encoding which reflects this to generate the symbols for them.

Hence the lower-case ‘f’.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 1, 2014

So Swift Then: The Curried Function Type Encoding Mystery Continued

Compiling the curried function

    public func plusN(n: Int)(i: Int) -> Int
    {
        return i + n
    }

on its own in the module xper gives us the symbol

    __TF4xper5plusNfSiFT1iSi_Si

All the examples of mangled names of vanilla functions we have seen to date have been in the format.

    "__TF" 'function name' "F" 'parameter encoding' 'return type encoding' 'suffix'?

The format of the symbol for the curried function is identical to this except that there is a lower-case ‘f’ rather than an upper case ‘F’

We have been assuming that the upper case ‘F’ immediately following the function name in the symbol for a vanilla function is the prefix for a ‘function type’.

If this is true than the lower case ‘f’ is presumably the prefix for a different kind of function type.

The obvious candidate is the ‘curried function type’ but why is it necessary to distinguish between a ‘function type’ and a ‘curried function type’ in the mangled name of the function ?

The compiler does not distinguish between a curried function and the vanilla version of the curried function at compile time but it does ensure that it is possible to identify which one was compiled to produce the object code.

This implies that a caller would have to be linked against either one or the other despite calls to both functions being identical.

Here is a function which calls plusN and then immediately calls the function that was returned

    import xper
    
    func cod(x: Int, y:Int) -> Int
    {
        return xper.plusN(x)(i:y) // ignore the 'external name' its not really there
    }

This call works whether it is the curried function or the vanilla function equivalent that is being called.

Here is a function that does the same thing as the function cod with an intermediate step just to make it clear what is going
on.

    import xper
    
    func saithe(x: Int, y:Int) -> Int
    {
        let plusX = xper.plusN(x)

        return plusX(i:y) // ignore the 'external name' its not really there
    }

This call also works whether it is the curried function or the vanilla function equivalent that is being called.

This is not surprising since they are obviously both doing the same thing, that is

  • calling the function xper.plusN with the argument x

  • calling the function returned from the call to xper.plusN with the argument y

In saithe the function returned from the call to xper.plusN is explicitly referenced but the only thing done with it is to call it. It does not ‘escape’ from the function at any point so the two functions are effectively identical.

If we compile the function cod on its own in the module caller linking against the module xper containing the curried function plusN we get the symbol

    __TF6caller3codFTSiSi_Si

for the function itself, and the undefined symbol

  __TF4xper5plusNfSiFT1iSi_Si

which is the symbol for the curried function xper.plusN.

If we compile the function saithe on its own in the module caller linking against the module xper containing the curried function plusN then, as before, we get the symbol

    __TF6caller6saitheFTSiSi_Si

for the function itself, the undefined symbol

  __TF4xper5plusNfSiFT1iSi_Si

which is the symbol for the curried function xper.plusN, as before, and quite unexpectedly the symbols for two local
functions.

    __TF4xper5plusNFSiFT1iSi_Si

and

    __TPA__TF4xper5plusNfSiFT1iSi_Si

Who ordered those ?

The function type encoded in the first symbol is

    (Int) -> (i: Int) -> Int

which looks strangely familiar.

The only place in the function saithe that it could be called is here.

    let plusX = xper.plusN(x)

If we assume that the compiler knows what it is doing then that must be what it is for.

The symbol for the second function

    __TPA__TF4xper5plusNfSiFT1iSi_Si

looks like the symbol for the curried function xper.plusN with the prefix

    __TPA

We have never seen anything like this before so it is a bit difficult to say what this kind of function might be for.

Continuing with the hypothesis that the compiler knows what it is doing we have to assume that it needs this function for something, and given that is has already generated a function to return a function then presumably this is the function that gets returned.

If this is true then the reading of the function type in the mangled name has to be something other than the type apparently encoded.

The function type needs to be

    (i: Int) -> Int

and that is not what is encoded according to the vanilla function type reading.

If the compiler really has had to generate a function to do this

    let plusX = xper.plusN(x)

then the implication is that the function plusN in the module xper cannot be called at that point.

It must be possible to call it since the first example works and the compiler did not need to generate any additional functions to make it work.

If the function plusN in the module xper cannot be called in saithe but it can be called in cod then we can infer both

  • what its function type is not, and

  • what it must be

which explains both why the generated local functions are needed and what they do.

It also confirms that the reading of ‘curried function’ types is not the same as the reading of ‘vanilla function’ types when they appear in mangled function names.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Older Posts »

Blog at WordPress.com.

%d bloggers like this: