Just An Application

August 25, 2015

Swift Miscellany: Parsing With Generics And Protocols

The pseudo-code to parse a parenthesis delimited comma separated list of homogeneous elements, such as this one

    tuple-pattern : '(' tuple-pattern-element-list? ')'
    
    tuple-pattern-element-list : tuple-pattern-element
                               | tuple-pattern-element ',' tuple-pattern-element-list

or this

    parenthesized-expression : '(' expression-element-list? ')'
    
    expression-element-list : expression-element
                            | expression-element ',' expression-element-list

looks something like this

    ...
    
    if next token != '('
    {
        error
    }
    
    initialize element list
    
    while peek token != ')'
    {
        element = try to parse element
        
        add element to element list
    }
    if next token != ')'
    {
        really bad error
    }
    return element list
    
    ...

The differences in each case will be the types of ‘element’ and ‘element list’ and exactly what is involved in parsing the element.

The number of times you get to write the actual code in practice will depend to a large extent on the programming language you are using.

Since Swift supports generics we should be able to get away with writing the actual code just once.

We can define a function with a signature something like this

    func list<T>(tokens:TokenStream, 'something to parse the element of type T') -> [T]?

One choice for the ‘something to parse the element of type T’ is another function.

The function is passed the tokens and it returns an instance of type T or nil if there was an error which gives us the signature

    (tokens:TokenStream) -> T?

which in turn gives us the function

    func list(tokens:TokenStream, parser:(tokens:TokenStream) -> T?)  -> [T]?
    {
        if !next(tokens, .LEFT_PAREN)
        {
            return nil
        }
    
        var elements = [T]()
    
        while !peek(tokens, .RIGHT_PAREN)
        {
            let element = parser(tokens:tokens)
    
            if element == nil
            {
                return nil
            }
    
            elements.append(element!)
    
            if !peek(tokens, .COMMA)
            {
                break
            }
            assertNext(tokens, .COMMA)
        }
        assertNext(tokens, .RIGHT_PAREN)
        return elements
    }

In Swift 2.0 we can make things simpler by dispensing with the returning nil on error convention.

    func list(tokens:TokenStream, parser:(tokens:TokenStream) throws -> T) throws  -> [T]
    {
        try next(tokens, .LEFT_PAREN)
    
        var elements = [T]()
    
        while !peek(tokens, .RIGHT_PAREN)
        {
            elements.append(try parser(tokens:tokens))
    
            if !peek(tokens, .COMMA)
            {
                break
            }
            assertNext(tokens, .COMMA)
        }
        assertNext(tokens, .RIGHT_PAREN)
        return elements
    }

Taking A Slightly More Object-oriented Approach

For practical or ideological reasons we might want to use an ‘object’ or ‘objects’ to do the parsing rather than functions.

In this case we might want to pass an instance of a type capable of doing the parsing of the element rather than using one these old-fangled closure thingys.

For the method, as it now is, to remain generic one possibility would be to pass an instance of a generic class in which case the function signature would look something like this

    func list<T>(parser:GenericListElementParser<T>) throws -> [T]

This is pretty straightforward but not that flexible.

The original function could take any function, method, or closure with the right signature, but in this case any object passed would either have to an instance of the generic class itself or a sub-class of it.

It might be better if we could define a protocol which the object must implement but there seems to be a problem.

    protocol ListElementParser
    {
        func parse() throws -> ????
    }

How do we define the return type of the method the object must implement ?

Unlike classes, enums and structs, protocol definitions cannot have a generic argument clause so how do we define the return type ?

The answer is to define an ‘associated type’ that can then be referenced in subsequent declarations in the protocol

    protocol ListElementParser
    {
        typealias ElementType
        
        func parse() throws -> ElementType
    }

For each type that implements the protocol ListElementParser, for example

    final class ExpressionElementParser: ListElementParser
    {
        func parse() throws -> ExpressionElement
        {
            ...
        }
    }

the concrete type corresponding to ElementType is inferred from the declaration of the parse method.

The function signature now has to be generic in the argument type and the return type but what does the constraint look like ?

    func list<????>(parser:P) throws -> T

It has to include both P and T, and we have to constrain the type P to implement the ListElementParser.

The constraint on the the type P can be expressed like this

    P:ListElementParser

so how about

    func list<P:ListElementParser, T>(p:P) throws -> [T]
    {
        try next(.LEFT_PAREN)
    
        var elements = [T]()
    
        while !peek(.RIGHT_PAREN)
        {
            elements.append(try p.parse())
    
            if !peek(.COMMA)
            {
                break
            }
            assertNext(.COMMA)
        }
        assertNext(.RIGHT_PAREN)
        return elements
    }

Nope, the compiler is having none of it.

It states, not unreasonably

    Cannot invoke 'append' with an argument list of type '(P.ElementType)'

Just plonking the T in the generic argument list tells the compiler nothing.

We have to relate the type T to the type ElementType being returned from the call to the parse method.

Fortunately there is a place in the generic parameter list where we can put requirements like this.

The requirement we have is that

the type ElementType of the ListElementParser type is the same as the type T

which in generic parameter clause requirements speak can be expressed very simply

    P.ElementType == T

We can add this to the generic parameter clause in the ‘requirements section’ and the compiler is happy.

    func list<P:ListElementParser, T where P.ElementType == T>(p:P) throws -> [T]
    {
        try next(.LEFT_PAREN)
    
        var elements = [T]()
    
        while !peek(.RIGHT_PAREN)
        {
            elements.append(try p.parse())
    
            if !peek(.COMMA)
            {
                break
            }
            assertNext(.COMMA)
        }
        assertNext(.RIGHT_PAREN)
        return elements
    }

An Incremental Improvement

To use the function we need to first construct an instance of the parser which is a pain if the only reason for constructing it is to pass it to the function.

Why can’t the function construct the parser itself ?

In Swift it can.

We need to change the function signature to take a type which implements ListElementParser.

    func list<P:ListElementParser, T where P.ElementType == T>(pType:P.Type) throws -> [T]

and we need to specify that the type implements the necessary constructor.

    protocol ListElementParser
    {
        typealias ElementType
    
        init(context:ParserContext, tokens:TokenStream)
    
        func parse() throws -> ElementType
    }

To construct an instance we simply do

    let parser = pType.init(context:context, tokens:tokens)

The function now looks like this

    func list<P:ListElementParser, T where P.ElementType == T>(pType:P.Type) throws -> [T]
    {
        try next(.LEFT_PAREN)
    
        let parser   = pType.init(context:context, tokens:tokens)
        var elements = [T]()
    
        while !peek(.RIGHT_PAREN)
        {
            elements.append(try parser.parse())
    
            if !peek(.COMMA)
            {
                break
            }
            assertNext(.COMMA)
        }
        assertNext(.RIGHT_PAREN)
        return elements
    }

To invoke the method we now pass the type of the parser

    try list(ExpressionElementParser.self)

Another Incremental Improvement

Now it is no longer necessary to contruct an instance of the parser for type T but it is still necessary to know what the type of parser is.

It would be simpler if it was possible to simply pass the type of T and decouple the caller from the details of how T‘s get parsed.

We can also do this.

To do so we need another protocol.

    protocol ListElement
    {
        typealias ParserType: ListElementParser
    
        static var parserType : ParserType.Type { get }
    }

and the method now looks like this

    func list<T:ListElement where T.ParserType.ElementType == T>(tType:T.Type) throws -> [T]
    {
        try next(.LEFT_PAREN)
            
        let parser   = tType.parserType.init(context:context, tokens:tokens)
        var elements = [T]()
            
        while !peek(.RIGHT_PAREN)
        {
            elements.append(try parser.parse())
            
            if !peek(.COMMA)
            {
                break
            }
            assertNext(.COMMA)
        }
        assertNext(.RIGHT_PAREN)
        return elements
    }

For this to work in the ExpressionElementParser case the ExpressionElement type has to implement the ListElement protocol.

    enum ExpressionElement: ListElement
    {
        static let parserType = ExpressionElementParser.self
    
        //
    
        case EXPRESSION(Expression)
        case NAMED_EXPRESSION(String, Expression)
    }

Now we can simply pass the type of the elements in the list.

    try list(ExpressionElement.self)

Via the extension mechanism this approach can be made to work with any class, enum, or struct type.

Defining

    final class StringListElementParser: ListElementParser
    {
        init(context:ParserContext, tokens:TokenStream)
        {
            // ...
        }
    
        func parse() throws -> String
        {
            // ...
        }
    }

and

    extension String: ListElement
    {
        static let parserType = StringListElementParser.self
    }

means we can do

    try list(String.self)

Copyright (c) 2015 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Advertisements

Create a free website or blog at WordPress.com.

%d bloggers like this: