Just An Application

September 9, 2015

Swift Miscellany: More Parsing With Generics And Protocols

Having stumbled upon my new, at least to me, idiom I was curious whether it could be used as the basis for a complete parser.

The answer seems to be yes.

With a handful of protocols and generic methods it is possible to write a basic parser for Swift which is at least capable of parsing itself.

Protocols

Parsable

    protocol Parsable
    {
        typealias ParserType: ElementParser

        static var parserType: ParserType.Type { get }
    }
    protocol ElementParser
    {
        typealias ElementType
    
        init(context:ParserContext, tokens:TokenStream)
    
        func parse() throws -> ElementType
    }

A type is parsable if it has an associated ElementParser type

Element

    protocol Element: Parsable
    {
    }

An Element is a non-terminal which appears exactly once in a grammar production.

A condition-clause is an example of an Element.

It appears exactly once, for example, in a guard-statement

    guard-statement -> guard condition-clause else code-block

or a while-statement

    while-statement -> while condition-clause code-block

OptionalElement

    protocol OptionalElement
    {
        typealias ParserType: OptionalElementParser
    
        static var parserType: ParserType.Type { get }
    }
    protocol OptionalElementParser: ElementParser
    {
        func elementIsPresent() -> Bool
    }

An OptionalElement is a non-terminal which may appear exactly once or not at all in a grammar production.

A generic-parameter-clause

    generic-parameter-clause< generic-parameter-list requirement-clauseopt >

and a type-inheritance-clause

    type-inheritance-clause: class-requirement , type-inheritance-list
    
    type-inheritance-clause: class-requirement
    
    type-inheritance-clause: type-inheritance-list

are both examples of an OptionalElement.

They may both appear, or not appear, in a class-declaration

    class-declarationattributesopt access-level-modifieropt class class-name generic-parameter-clauseopt type-inheritance-clauseopt class-body

an enum-declaration

    enum-declarationattributesopt access-level-modifieropt union-style-enum
    enum-declarationattributesopt access-level-modifieropt raw-value-style-enum
    union-style-enumindirectopt enum enum-name generic-parameter-clauseopt type-inheritance-clauseopt { union-style-enum-membersopt }
    raw-value-style-enumenum enum-name generic-parameter-clauseopt type-inheritance-clauseopt { raw-value-style-enum-members }

or a struct-declaration

    struct-declarationattributesopt access-level-modifieropt struct class-name generic-parameter-clauseopt type-inheritance-clauseopt struct-body

A generic-parameter-clause may also appear, or not appear, in a function-declaration

    function-declarationfunction-head function-name generic-parameter-clauseopt function-signature function-bodyopt 

or an initializer declaration

    initializer-declarationinitializer-head generic-parameter-clauseopt parameter-clause throwsopt initializer-body 
    initializer-declarationinitializer-head generic-parameter-clauseopt parameter-clause rethrows initializer-body 

A type-inheritance-clause may also appear in an extension-declaration

    extension-declarationaccess-level-modifieropt extension type-identifier type-inheritance-clauseopt extension-body 

A type-annotation is a third example of an OptionalElement

    type-annotation: attributesopt type 

A type-annotation is optional in a constant declaration or a variable declaration, but it is required in a parameter, so the type TypeAnnotation is declared as both an Element and an OptionalElement.

    extension TypeAnnotation: Element, OptionalElement
    {
        static let parserType = TypeAnnotationParser.self
    }

RepeatableElement

    protocol RepeatableElement: Parsable
    {
    }

A RepeatableElement is a non-terminal which can appear one or more times in a grammar production. If it appears more than once then there is a separator between each occurence.

A condition is an example of a RepeatableElement.

    conditionavailability-condition | case-condition | optional-binding-condition 

It can appear one or more times in a condition-list.

    condition-listcondition | condition , condition-list

If there are multiple occurences then they are separated by commas

BlockElement

    protocol BlockElement: Parsable
    {
    }

A BlockElement is a non-terminal that can appear zero or more times in a grammer production, delimited by left and right braces and with no separators.

A statement is a BlockElement.

Zero or more statements can appear in a code-block

    statementsstatement statementsopt
    code-block{ statementsopt }

A declaration is also a BlockElement.

Zero or more declarations can appear in the body of a class or struct declaration

    declarationsdeclaration declarationsopt
    class-body{ declarationsopt }
    struct-body{ declarationsopt }

Both declarations and statements are also Elements since they can appear outside blocks.

Statements, for example, can appear in a switch-case

Declarations can appear anywhere in an enum-declaration.

The Declaration and Statement types are therefore declared to implement both the BlockElement and the Element protocols.

    extension Declaration: BlockElement, Element
    {
        static let parserType = DeclarationParser.self
    }
    extension Statement: BlockElement, Element
    {
        static let parserType = StatementParser.self
    }

Generic Methods

The following methods are defined on the class Parser which is the base class for the various element parsers.

element<E:Element … >

    final func element<E:Element where E.ParserType.ElementType == E>(eType:E.Type) throws -> E
    {
        let parser = eType.parserType.init(context:context, tokens: tokens)
    
        return try parser.parse()
    }

The element method takes a type which implements the Element protocol and returns an instance of that type.

optional<O:OptionalElement … >

    final func optional<O:OptionalElement where O.ParserType.ElementType == O>(oType:O.Type) throws -> O?
    {
        let parser = oType.parserType.init(context:context, tokens: tokens)
    
        if parser.elementIsPresent()
        {
            return try parser.parse()
        }
        else
        {
            return nil
        }
    }

The optional method takes a type which implements the OptionalElement protocol and returns an instance of that type or nil if the element is not present.

elements<R:RepeatableElement …>

     final func elements
        <
            R:RepeatableElement where R.ParserType.ElementType == R
        >
        (rType:R.Type, _ separator:Punctuation = .COMMA) throws -> [R]
        {
            let parser   = rType.parserType.init(context:context, tokens: tokens)
            var elements = [try parser.parse()]
        
            while peek(separator)
            {
                assertNext(separator)
                elements.append(try parser.parse())
            }
            return elements
        }

The elements method takes a type which implements the RepeatableElement protocol and returns an array of one or more instances of that type.

The method takes a second argument which specifies the separator.

By default the separator is defined to be a comma, but specifying a dot instead means it is also possible to use the method to parse, for example, an import-path

    import-pathimport-path-identifier | import-path-identifier . import-path 

or a type-identifier

    type-identifiertype-name generic-argument-clauseopt| type-name generic-argument-clauseopt. type-identifier 

elements<O:OptionalElement …>

    final func elements<O:OptionalElement where O.ParserType.ElementType == O>(oType:O.Type) throws -> [O]?
    {
        let parser = oType.parserType.init(context:context, tokens: tokens)
    
        if parser.elementIsPresent()
        {
            var elements = [O]()
    
            while parser.elementIsPresent()
            {
                elements.append(try parser.parse())
            }
            return elements
        }
        else
        {
            return nil
        }
    }

The elements method takes a type which implements the OptionalElement protocol and returns an array of one or more instances of that type or nil.

block<B:BlockElement …>

    final func block<B:BlockElement where B.ParserType.ElementType == B>(bType:B.Type) throws -> [B]
    {
        try next(.LEFT_BRACE)
    
        let parser   = bType.parserType.init(context:context, tokens: tokens)
        var elements = [B]()
    
        while !peek(.RIGHT_BRACE)
        {
            elements.append(try parser.parse())
        }
        assertNext(.RIGHT_BRACE)
        return elements
    }

The block method takes a type which implements the BlockElement protocol and returns an array of zero or more instances of that type or nil.

Examples

Class Declaration

    final class ClassDeclarationParser: Parser
    {
        func parse() throws -> Class
        {
            assertNext(.CLASS)
    
            let name  = try identifier()
            let gpc   = try optional(GenericParameterClause.self)
            let tic   = try optional(TypeInheritanceClause.self)
    
            pushContext(DeclarationContext())
    
            let decls = try block(Declaration.self)
    
            popContext()
    
            if gpc == nil
            {
                return .CLASS(name, tic, decls)
            }
            else
            {
                return .GENERIC_CLASS(name, gpc!, tic, decls)
            }
        }
    }

Extension Declaration

    final class ExtensionDeclarationParser: Parser
    {
        func parse() throws -> Extension
        {
            assertNext(.EXTENSION)
    
            let typeIdentifier  = try element(TypeIdentifier.self)
            let typeInheritance = try optional(TypeInheritanceClause.self)
            let body            = try block(Declaration.self)
    
            return
                Extension(
                    typeId:
                        typeIdentifier,
                    typeInheritance:
                        typeInheritance,
                    declarations:
                        body)
        }
    }

This method would be even simpler if Swift guarantees that arguments are evaluated left-to-right but I’ve not been able to find anything to that effect as yet.

Guard Statement

    final class GuardStatementParser: Parser
    {
        func parse() throws -> Guard
        {
            assertNext(.GUARD)
    
            let condition = try element(ConditionClause.self)
    
            try next(.ELSE)
    
            let codeBlock = try block(Statement.self)
    
            return Guard(condition: condition, codeBlock:codeBlock)
        }
    }

While Statement

    final class WhileStatementParser: Parser
    {
        func parse() throws -> While
        {
            assertNext(.WHILE)
    
            let condition = try element(ConditionClause.self)
            let codeBlock = try block(Statement.self)
    
            return While(condition:condition, codeBlock:codeBlock)
        }
    }

Type Annotation

    final class TypeAnnotationParser: Parser, OptionalElementParser
    {
        func elementIsPresent() -> Bool
        {
            return peek(.COLON)
        }
    
        func parse() throws -> TypeAnnotation
        {
            assertNext(.COLON)
    
            let attributes = try elements(Attribute.self)
            let type       = try element(Type.self)
    
            if attributes == nil
            {
                return .TYPE(type)
            }
            else
            {
                return .ATTRIBUTED_TYPE(attributes!, type)
            }
        }
    }

Copyright (c) 2015 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

August 25, 2015

Swift Miscellany: Parsing With Generics And Protocols

The pseudo-code to parse a parenthesis delimited comma separated list of homogeneous elements, such as this one

    tuple-pattern : '(' tuple-pattern-element-list? ')'
    
    tuple-pattern-element-list : tuple-pattern-element
                               | tuple-pattern-element ',' tuple-pattern-element-list

or this

    parenthesized-expression : '(' expression-element-list? ')'
    
    expression-element-list : expression-element
                            | expression-element ',' expression-element-list

looks something like this

    ...
    
    if next token != '('
    {
        error
    }
    
    initialize element list
    
    while peek token != ')'
    {
        element = try to parse element
        
        add element to element list
    }
    if next token != ')'
    {
        really bad error
    }
    return element list
    
    ...

The differences in each case will be the types of ‘element’ and ‘element list’ and exactly what is involved in parsing the element.

The number of times you get to write the actual code in practice will depend to a large extent on the programming language you are using.

Since Swift supports generics we should be able to get away with writing the actual code just once.

We can define a function with a signature something like this

    func list<T>(tokens:TokenStream, 'something to parse the element of type T') -> [T]?

One choice for the ‘something to parse the element of type T’ is another function.

The function is passed the tokens and it returns an instance of type T or nil if there was an error which gives us the signature

    (tokens:TokenStream) -> T?

which in turn gives us the function

    func list(tokens:TokenStream, parser:(tokens:TokenStream) -> T?)  -> [T]?
    {
        if !next(tokens, .LEFT_PAREN)
        {
            return nil
        }
    
        var elements = [T]()
    
        while !peek(tokens, .RIGHT_PAREN)
        {
            let element = parser(tokens:tokens)
    
            if element == nil
            {
                return nil
            }
    
            elements.append(element!)
    
            if !peek(tokens, .COMMA)
            {
                break
            }
            assertNext(tokens, .COMMA)
        }
        assertNext(tokens, .RIGHT_PAREN)
        return elements
    }

In Swift 2.0 we can make things simpler by dispensing with the returning nil on error convention.

    func list(tokens:TokenStream, parser:(tokens:TokenStream) throws -> T) throws  -> [T]
    {
        try next(tokens, .LEFT_PAREN)
    
        var elements = [T]()
    
        while !peek(tokens, .RIGHT_PAREN)
        {
            elements.append(try parser(tokens:tokens))
    
            if !peek(tokens, .COMMA)
            {
                break
            }
            assertNext(tokens, .COMMA)
        }
        assertNext(tokens, .RIGHT_PAREN)
        return elements
    }

Taking A Slightly More Object-oriented Approach

For practical or ideological reasons we might want to use an ‘object’ or ‘objects’ to do the parsing rather than functions.

In this case we might want to pass an instance of a type capable of doing the parsing of the element rather than using one these old-fangled closure thingys.

For the method, as it now is, to remain generic one possibility would be to pass an instance of a generic class in which case the function signature would look something like this

    func list<T>(parser:GenericListElementParser<T>) throws -> [T]

This is pretty straightforward but not that flexible.

The original function could take any function, method, or closure with the right signature, but in this case any object passed would either have to an instance of the generic class itself or a sub-class of it.

It might be better if we could define a protocol which the object must implement but there seems to be a problem.

    protocol ListElementParser
    {
        func parse() throws -> ????
    }

How do we define the return type of the method the object must implement ?

Unlike classes, enums and structs, protocol definitions cannot have a generic argument clause so how do we define the return type ?

The answer is to define an ‘associated type’ that can then be referenced in subsequent declarations in the protocol

    protocol ListElementParser
    {
        typealias ElementType
        
        func parse() throws -> ElementType
    }

For each type that implements the protocol ListElementParser, for example

    final class ExpressionElementParser: ListElementParser
    {
        func parse() throws -> ExpressionElement
        {
            ...
        }
    }

the concrete type corresponding to ElementType is inferred from the declaration of the parse method.

The function signature now has to be generic in the argument type and the return type but what does the constraint look like ?

    func list<????>(parser:P) throws -> T

It has to include both P and T, and we have to constrain the type P to implement the ListElementParser.

The constraint on the the type P can be expressed like this

    P:ListElementParser

so how about

    func list<P:ListElementParser, T>(p:P) throws -> [T]
    {
        try next(.LEFT_PAREN)
    
        var elements = [T]()
    
        while !peek(.RIGHT_PAREN)
        {
            elements.append(try p.parse())
    
            if !peek(.COMMA)
            {
                break
            }
            assertNext(.COMMA)
        }
        assertNext(.RIGHT_PAREN)
        return elements
    }

Nope, the compiler is having none of it.

It states, not unreasonably

    Cannot invoke 'append' with an argument list of type '(P.ElementType)'

Just plonking the T in the generic argument list tells the compiler nothing.

We have to relate the type T to the type ElementType being returned from the call to the parse method.

Fortunately there is a place in the generic parameter list where we can put requirements like this.

The requirement we have is that

the type ElementType of the ListElementParser type is the same as the type T

which in generic parameter clause requirements speak can be expressed very simply

    P.ElementType == T

We can add this to the generic parameter clause in the ‘requirements section’ and the compiler is happy.

    func list<P:ListElementParser, T where P.ElementType == T>(p:P) throws -> [T]
    {
        try next(.LEFT_PAREN)
    
        var elements = [T]()
    
        while !peek(.RIGHT_PAREN)
        {
            elements.append(try p.parse())
    
            if !peek(.COMMA)
            {
                break
            }
            assertNext(.COMMA)
        }
        assertNext(.RIGHT_PAREN)
        return elements
    }

An Incremental Improvement

To use the function we need to first construct an instance of the parser which is a pain if the only reason for constructing it is to pass it to the function.

Why can’t the function construct the parser itself ?

In Swift it can.

We need to change the function signature to take a type which implements ListElementParser.

    func list<P:ListElementParser, T where P.ElementType == T>(pType:P.Type) throws -> [T]

and we need to specify that the type implements the necessary constructor.

    protocol ListElementParser
    {
        typealias ElementType
    
        init(context:ParserContext, tokens:TokenStream)
    
        func parse() throws -> ElementType
    }

To construct an instance we simply do

    let parser = pType.init(context:context, tokens:tokens)

The function now looks like this

    func list<P:ListElementParser, T where P.ElementType == T>(pType:P.Type) throws -> [T]
    {
        try next(.LEFT_PAREN)
    
        let parser   = pType.init(context:context, tokens:tokens)
        var elements = [T]()
    
        while !peek(.RIGHT_PAREN)
        {
            elements.append(try parser.parse())
    
            if !peek(.COMMA)
            {
                break
            }
            assertNext(.COMMA)
        }
        assertNext(.RIGHT_PAREN)
        return elements
    }

To invoke the method we now pass the type of the parser

    try list(ExpressionElementParser.self)

Another Incremental Improvement

Now it is no longer necessary to contruct an instance of the parser for type T but it is still necessary to know what the type of parser is.

It would be simpler if it was possible to simply pass the type of T and decouple the caller from the details of how T‘s get parsed.

We can also do this.

To do so we need another protocol.

    protocol ListElement
    {
        typealias ParserType: ListElementParser
    
        static var parserType : ParserType.Type { get }
    }

and the method now looks like this

    func list<T:ListElement where T.ParserType.ElementType == T>(tType:T.Type) throws -> [T]
    {
        try next(.LEFT_PAREN)
            
        let parser   = tType.parserType.init(context:context, tokens:tokens)
        var elements = [T]()
            
        while !peek(.RIGHT_PAREN)
        {
            elements.append(try parser.parse())
            
            if !peek(.COMMA)
            {
                break
            }
            assertNext(.COMMA)
        }
        assertNext(.RIGHT_PAREN)
        return elements
    }

For this to work in the ExpressionElementParser case the ExpressionElement type has to implement the ListElement protocol.

    enum ExpressionElement: ListElement
    {
        static let parserType = ExpressionElementParser.self
    
        //
    
        case EXPRESSION(Expression)
        case NAMED_EXPRESSION(String, Expression)
    }

Now we can simply pass the type of the elements in the list.

    try list(ExpressionElement.self)

Via the extension mechanism this approach can be made to work with any class, enum, or struct type.

Defining

    final class StringListElementParser: ListElementParser
    {
        init(context:ParserContext, tokens:TokenStream)
        {
            // ...
        }
    
        func parse() throws -> String
        {
            // ...
        }
    }

and

    extension String: ListElement
    {
        static let parserType = StringListElementParser.self
    }

means we can do

    try list(String.self)

Copyright (c) 2015 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Create a free website or blog at WordPress.com.