Having stumbled upon my new, at least to me, idiom I was curious whether it could be used as the basis for a complete parser.
The answer seems to be yes.
With a handful of protocols and generic methods it is possible to write a basic parser for Swift which is at least capable of parsing itself.
Protocols
Parsable
protocol Parsable
{
typealias ParserType: ElementParser
static var parserType: ParserType.Type { get }
}
protocol ElementParser
{
typealias ElementType
init(context:ParserContext, tokens:TokenStream)
func parse() throws -> ElementType
}
A type is parsable if it has an associated ElementParser type
Element
protocol Element: Parsable
{
}
An Element
is a non-terminal which appears exactly once in a grammar production.
A condition-clause is an example of an Element.
It appears exactly once, for example, in a guard-statement
guard-statement -> guard condition-clause else code-block
or a while-statement
while-statement -> while condition-clause code-block
OptionalElement
protocol OptionalElement
{
typealias ParserType: OptionalElementParser
static var parserType: ParserType.Type { get }
}
protocol OptionalElementParser: ElementParser
{
func elementIsPresent() -> Bool
}
An OptionalElement
is a non-terminal which may appear exactly once or not at all in a grammar production.
A generic-parameter-clause
generic-parameter-clause → < generic-parameter-list requirement-clauseopt >
and a type-inheritance-clause
type-inheritance-clause → : class-requirement , type-inheritance-list
type-inheritance-clause → : class-requirement
type-inheritance-clause → : type-inheritance-list
are both examples of an OptionalElement.
They may both appear, or not appear, in a class-declaration
class-declaration → attributesopt access-level-modifieropt class class-name generic-parameter-clauseopt type-inheritance-clauseopt class-body
an enum-declaration
enum-declaration → attributesopt access-level-modifieropt union-style-enum
enum-declaration → attributesopt access-level-modifieropt raw-value-style-enum
union-style-enum → indirectopt enum enum-name generic-parameter-clauseopt type-inheritance-clauseopt { union-style-enum-membersopt }
raw-value-style-enum → enum enum-name generic-parameter-clauseopt type-inheritance-clauseopt { raw-value-style-enum-members }
or a struct-declaration
struct-declaration → attributesopt access-level-modifieropt struct class-name generic-parameter-clauseopt type-inheritance-clauseopt struct-body
A generic-parameter-clause may also appear, or not appear, in a function-declaration
function-declaration → function-head function-name generic-parameter-clauseopt function-signature function-bodyopt
or an initializer declaration
initializer-declaration → initializer-head generic-parameter-clauseopt parameter-clause throwsopt initializer-body
initializer-declaration → initializer-head generic-parameter-clauseopt parameter-clause rethrows initializer-body
A type-inheritance-clause may also appear in an extension-declaration
extension-declaration → access-level-modifieropt extension type-identifier type-inheritance-clauseopt extension-body
A type-annotation is a third example of an OptionalElement
type-annotation → : attributesopt type
A type-annotation is optional in a constant declaration or a variable declaration, but it is required in a parameter, so the type TypeAnnotation
is declared as both an Element and an OptionalElement.
extension TypeAnnotation: Element, OptionalElement
{
static let parserType = TypeAnnotationParser.self
}
RepeatableElement
protocol RepeatableElement: Parsable
{
}
A RepeatableElement
is a non-terminal which can appear one or more times in a grammar production. If it appears more than once then there is a separator between each occurence.
A condition is an example of a RepeatableElement.
condition → availability-condition | case-condition | optional-binding-condition
It can appear one or more times in a condition-list.
condition-list → condition | condition , condition-list
If there are multiple occurences then they are separated by commas
BlockElement
protocol BlockElement: Parsable
{
}
A BlockElement
is a non-terminal that can appear zero or more times in a grammer production, delimited by left and right braces and with no separators.
A statement is a BlockElement.
Zero or more statements can appear in a code-block
statements → statement statementsopt
code-block → { statementsopt }
A declaration is also a BlockElement.
Zero or more declarations can appear in the body of a class or struct declaration
declarations → declaration declarationsopt
class-body → { declarationsopt }
struct-body → { declarationsopt }
Both declarations and statements are also Elements since they can appear outside blocks.
Statements, for example, can appear in a switch-case
Declarations can appear anywhere in an enum-declaration.
The Declaration
and Statement
types are therefore declared to implement both the BlockElement and the Element protocols.
extension Declaration: BlockElement, Element
{
static let parserType = DeclarationParser.self
}
extension Statement: BlockElement, Element
{
static let parserType = StatementParser.self
}
Generic Methods
The following methods are defined on the class Parser
which is the base class for the various element parsers.
element<E:Element … >
final func element<E:Element where E.ParserType.ElementType == E>(eType:E.Type) throws -> E
{
let parser = eType.parserType.init(context:context, tokens: tokens)
return try parser.parse()
}
The element
method takes a type which implements the Element
protocol and returns an instance of that type.
optional<O:OptionalElement … >
final func optional<O:OptionalElement where O.ParserType.ElementType == O>(oType:O.Type) throws -> O?
{
let parser = oType.parserType.init(context:context, tokens: tokens)
if parser.elementIsPresent()
{
return try parser.parse()
}
else
{
return nil
}
}
The optional
method takes a type which implements the OptionalElement
protocol and returns an instance of that type or nil if the element is not present.
elements<R:RepeatableElement …>
final func elements
<
R:RepeatableElement where R.ParserType.ElementType == R
>
(rType:R.Type, _ separator:Punctuation = .COMMA) throws -> [R]
{
let parser = rType.parserType.init(context:context, tokens: tokens)
var elements = [try parser.parse()]
while peek(separator)
{
assertNext(separator)
elements.append(try parser.parse())
}
return elements
}
The elements
method takes a type which implements the RepeatableElement
protocol and returns an array of one or more instances of that type.
The method takes a second argument which specifies the separator.
By default the separator is defined to be a comma, but specifying a dot instead means it is also possible to use the method to parse, for example, an import-path
import-path → import-path-identifier | import-path-identifier . import-path
or a type-identifier
type-identifier → type-name generic-argument-clauseopt| type-name generic-argument-clauseopt. type-identifier
elements<O:OptionalElement …>
final func elements<O:OptionalElement where O.ParserType.ElementType == O>(oType:O.Type) throws -> [O]?
{
let parser = oType.parserType.init(context:context, tokens: tokens)
if parser.elementIsPresent()
{
var elements = [O]()
while parser.elementIsPresent()
{
elements.append(try parser.parse())
}
return elements
}
else
{
return nil
}
}
The elements
method takes a type which implements the OptionalElement
protocol and returns an array of one or more instances of that type or nil.
block<B:BlockElement …>
final func block<B:BlockElement where B.ParserType.ElementType == B>(bType:B.Type) throws -> [B]
{
try next(.LEFT_BRACE)
let parser = bType.parserType.init(context:context, tokens: tokens)
var elements = [B]()
while !peek(.RIGHT_BRACE)
{
elements.append(try parser.parse())
}
assertNext(.RIGHT_BRACE)
return elements
}
The block
method takes a type which implements the BlockElement
protocol and returns an array of zero or more instances of that type or nil.
Examples
Class Declaration
final class ClassDeclarationParser: Parser
{
func parse() throws -> Class
{
assertNext(.CLASS)
let name = try identifier()
let gpc = try optional(GenericParameterClause.self)
let tic = try optional(TypeInheritanceClause.self)
pushContext(DeclarationContext())
let decls = try block(Declaration.self)
popContext()
if gpc == nil
{
return .CLASS(name, tic, decls)
}
else
{
return .GENERIC_CLASS(name, gpc!, tic, decls)
}
}
}
Extension Declaration
final class ExtensionDeclarationParser: Parser
{
func parse() throws -> Extension
{
assertNext(.EXTENSION)
let typeIdentifier = try element(TypeIdentifier.self)
let typeInheritance = try optional(TypeInheritanceClause.self)
let body = try block(Declaration.self)
return
Extension(
typeId:
typeIdentifier,
typeInheritance:
typeInheritance,
declarations:
body)
}
}
This method would be even simpler if Swift guarantees that arguments are evaluated left-to-right but I’ve not been able to find anything to that effect as yet.
Guard Statement
final class GuardStatementParser: Parser
{
func parse() throws -> Guard
{
assertNext(.GUARD)
let condition = try element(ConditionClause.self)
try next(.ELSE)
let codeBlock = try block(Statement.self)
return Guard(condition: condition, codeBlock:codeBlock)
}
}
While Statement
final class WhileStatementParser: Parser
{
func parse() throws -> While
{
assertNext(.WHILE)
let condition = try element(ConditionClause.self)
let codeBlock = try block(Statement.self)
return While(condition:condition, codeBlock:codeBlock)
}
}
Type Annotation
final class TypeAnnotationParser: Parser, OptionalElementParser
{
func elementIsPresent() -> Bool
{
return peek(.COLON)
}
func parse() throws -> TypeAnnotation
{
assertNext(.COLON)
let attributes = try elements(Attribute.self)
let type = try element(Type.self)
if attributes == nil
{
return .TYPE(type)
}
else
{
return .ATTRIBUTED_TYPE(attributes!, type)
}
}
}
Copyright (c) 2015 By Simon Lewis. All Rights Reserved.
Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.
Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.