Just An Application

November 3, 2014

So Swift Then: The Curried Function Type Encoding Mystery Concluded

While currying may be a theoretically elegant technique for thinking about and solving certain problems, when it comes to using it to write software it may not be very practical

Calling N one argument functions in a row, with each function allocating a closure, is going to be an expensive undertaking in terms of speed and memory, and it is going to be a lot slower than one function call of N arguments that does not allocate any closures.

If the sequence of function calls is complete, that is, they are being used to obtain the result rather than an intermediate closure then the allocated closures are one-shot. They are going to be deallocated straight away. This cost is incurred every time the result is obtained using all N function calls.

Of course the compiler could attempt some kind of special case optimization when the curried function is called. It could attempt to turn the N function calls back into one function call of N arguments but in the general case, that is, when the curried function is not guaranteed to be in the same module as the caller it is a bit stuck, unless it wants to decompile the object code.

Alternatively it could de-curry the curried function when it compiled it which is of course what it is doing in the case of xper.plusN

It is turning this

    func plusN(n: Int)(i: Int) -> Int
    {
        return i + n
    }

into this

    func 'plusN[decurried]'(n: Int, #i: Int) -> Int
    {
        return i + n
    }

For this to work the compiler must do two additional things when the curried function xper.plusN is called.

In the the function cod

    import xper
    
    func cod(x: Int, y:Int) -> Int
    {
        return xper.plusN(x)(i:y) // ignore the 'external name' its not really there
    }

it must convert the repeated single argument function call sequence to a standard function call.

In the function saithe

    import xper
    
    func saithe(x: Int, y:Int) -> Int
    {
        let plusX = xper.plusN(x)
    
        return plusX(i:y) // ignore the 'external name' its not really there
    }

it must generate the functions necessary to make it appear that the curried function still exists.

This explains what we see when cod and saithe are compiled.

In the first case ‘plusN[decurried]’ can be called directly.

In the second case the compiler needs to generate the equivalent of this

    func 'xper.plusN'(n: Int) -> (i: Int) -> Int
    {
        return { (i : Int) in return 'plusN[decurried]'(n, i: i) }
    }

To rewrite a call to a curried function and generate the local functions if necessary, the compiler must be able to identify when a public function in a module is a curried function, but that is not what the symbol associated with the object code is for.

We can demonstrate this by compiling the function cod using the module information in

  xper.swiftmodule

associated with the curried function version of plusN, and the library

  libxper.dylib

obtained by compiling its vanilla function equivalent, in which case this happens.

    $ swiftc -I. -lxper -L. -module-name caller -emit-library call_plusN.swift
    Undefined symbols for architecture x86_64:
      "__TF4xper5plusNfSiFT1iSi_Si", referenced from:
         __TF6caller3codFTSiSi_Si in call_plusN-7db2ff.o
    ld: symbol(s) not found for architecture x86_64
    <unknown>:0: error: linker command failed with exit code 1 (use -v to see invocation)

The compiler has used the module information that identifies plusN as a curried function and compiled accordingly.

It has specified that the object code of the caller needs to be linked against the object code identified by the symbol

    __TF4xper5plusNfSiFT1iSi_Si

and left it to the linker to do whatever it is that linkers do.

In this case there is an undefined symbol for some object code so the linker goes and looks for the piece of object code for which that symbol is defined.

What it does if it finds it depends on where the object code is found.

What it does if doesn’t find it is stop.

In this case the only other object code is in

  libxper.dylib

and because the version it is has been given to link against does not contain the compiled code for the curried function it cannot find a matching defined symbol so it stops.

To match undefined and defined symbols does not require the linker to understand their format or even to be aware that they have a format.

The linker likes to match symbols and the only thing it asks is that they be unique.

If the compiler is prepared to guarantee that no two functions with the same name and the same type can exist at the same time in the same module, then encoding a function’s name and type into the symbol for that function is a guaranteed way to ensure its uniqueness.

In the case of a curried function the compiler needs to generate a symbol which can be used to identify the de-curried function which is the result of the compilation.

Like all symbols used for linking it must be unique.

Given that the compiler is treating curried functions as a special case the most straight forward solution is for it to use a distinct ‘curried function type’ encoding which reflects this to generate the symbols for them.

Hence the lower-case ‘f’.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

November 1, 2014

So Swift Then: The Curried Function Type Encoding Mystery Continued

Compiling the curried function

    public func plusN(n: Int)(i: Int) -> Int
    {
        return i + n
    }

on its own in the module xper gives us the symbol

    __TF4xper5plusNfSiFT1iSi_Si

All the examples of mangled names of vanilla functions we have seen to date have been in the format.

    "__TF" 'function name' "F" 'parameter encoding' 'return type encoding' 'suffix'?

The format of the symbol for the curried function is identical to this except that there is a lower-case ‘f’ rather than an upper case ‘F’

We have been assuming that the upper case ‘F’ immediately following the function name in the symbol for a vanilla function is the prefix for a ‘function type’.

If this is true than the lower case ‘f’ is presumably the prefix for a different kind of function type.

The obvious candidate is the ‘curried function type’ but why is it necessary to distinguish between a ‘function type’ and a ‘curried function type’ in the mangled name of the function ?

The compiler does not distinguish between a curried function and the vanilla version of the curried function at compile time but it does ensure that it is possible to identify which one was compiled to produce the object code.

This implies that a caller would have to be linked against either one or the other despite calls to both functions being identical.

Here is a function which calls plusN and then immediately calls the function that was returned

    import xper
    
    func cod(x: Int, y:Int) -> Int
    {
        return xper.plusN(x)(i:y) // ignore the 'external name' its not really there
    }

This call works whether it is the curried function or the vanilla function equivalent that is being called.

Here is a function that does the same thing as the function cod with an intermediate step just to make it clear what is going
on.

    import xper
    
    func saithe(x: Int, y:Int) -> Int
    {
        let plusX = xper.plusN(x)

        return plusX(i:y) // ignore the 'external name' its not really there
    }

This call also works whether it is the curried function or the vanilla function equivalent that is being called.

This is not surprising since they are obviously both doing the same thing, that is

  • calling the function xper.plusN with the argument x

  • calling the function returned from the call to xper.plusN with the argument y

In saithe the function returned from the call to xper.plusN is explicitly referenced but the only thing done with it is to call it. It does not ‘escape’ from the function at any point so the two functions are effectively identical.

If we compile the function cod on its own in the module caller linking against the module xper containing the curried function plusN we get the symbol

    __TF6caller3codFTSiSi_Si

for the function itself, and the undefined symbol

  __TF4xper5plusNfSiFT1iSi_Si

which is the symbol for the curried function xper.plusN.

If we compile the function saithe on its own in the module caller linking against the module xper containing the curried function plusN then, as before, we get the symbol

    __TF6caller6saitheFTSiSi_Si

for the function itself, the undefined symbol

  __TF4xper5plusNfSiFT1iSi_Si

which is the symbol for the curried function xper.plusN, as before, and quite unexpectedly the symbols for two local
functions.

    __TF4xper5plusNFSiFT1iSi_Si

and

    __TPA__TF4xper5plusNfSiFT1iSi_Si

Who ordered those ?

The function type encoded in the first symbol is

    (Int) -> (i: Int) -> Int

which looks strangely familiar.

The only place in the function saithe that it could be called is here.

    let plusX = xper.plusN(x)

If we assume that the compiler knows what it is doing then that must be what it is for.

The symbol for the second function

    __TPA__TF4xper5plusNfSiFT1iSi_Si

looks like the symbol for the curried function xper.plusN with the prefix

    __TPA

We have never seen anything like this before so it is a bit difficult to say what this kind of function might be for.

Continuing with the hypothesis that the compiler knows what it is doing we have to assume that it needs this function for something, and given that is has already generated a function to return a function then presumably this is the function that gets returned.

If this is true then the reading of the function type in the mangled name has to be something other than the type apparently encoded.

The function type needs to be

    (i: Int) -> Int

and that is not what is encoded according to the vanilla function type reading.

If the compiler really has had to generate a function to do this

    let plusX = xper.plusN(x)

then the implication is that the function plusN in the module xper cannot be called at that point.

It must be possible to call it since the first example works and the compiler did not need to generate any additional functions to make it work.

If the function plusN in the module xper cannot be called in saithe but it can be called in cod then we can infer both

  • what its function type is not, and

  • what it must be

which explains both why the generated local functions are needed and what they do.

It also confirms that the reading of ‘curried function’ types is not the same as the reading of ‘vanilla function’ types when they appear in mangled function names.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

October 31, 2014

So Swift Then: The Curried Function Type Encoding Mystery

We have a model of how function types are encoded when function names are mangled.

It works for all the vanilla functions we have looked at to date, albeit we have not looked at that many, but there are at least three other kinds of function we have not looked at.

The first kind are ‘curried’ functions.

Currying, the term originated in the lambda calculus, is the transformation of a function that takes N arguments into N functions each of one argument so that

    F(a[0], ..., a[N-1]) == f[0](a[0])(a[1]) ... (a[N-1])

For I from 0 to N – 2, function I takes argument I and returns function I + 1.

For I == N – 1, function I takes argument I and returns the result.

Its a neat trick if you can pull it off.

The usual example for N == 2 is something like this.

    func plusN(n: Int) -> (Int) -> Int
    {
        func plusI(i: Int) -> Int
        {
            return i + n
        }
    
        return plusI
    }

In Swift this can also be written like this

    func plusN(n: Int)(i: Int) -> Int
    {
        return i + n
    }

What is interesting in this context is how the function name gets mangled when the curried function syntax is used.

Compiling the first version gives us the symbol

    __TF4xper5plusNFSiFSiSi

Compiling the second version gives us the symbol

    __TF4xper5plusNfSiFT1iSi_Si

which is interesting both because the function type starts with an ‘f’ rather than an ‘F’ and because the second ‘parameter’ as written, which is actually the first parameter of the returned function has apparently acquired an external name.

    __TF4xper5plusNfSiFT1iSi_Si

It should be possible to invoke either version like this

    func plus(a: Int, b:Int) -> Int
    {
        return plusN(a)(b)
    }

This function duly compiles with the first definition but compiling it with the second definition gets you this

    $ swiftc -module-name xper  -emit-library functions.swift
    functions.swift:83:21: error: missing argument label 'i:' in call
        return plusN(a)(b)
                        ^
                        i:

The compiler does indeed believe that the second parameter has an ‘external name’.

The only clue to what is going on is the symbol.

    __TF4xper5plusNfSiFT1iSi_Si

We know that compiling the function

    func bass(e i : Int) -> Int
    {
        return i
    }

gives us the symbol

    __TF4xper4bassFT1eSi_Si

so it looks as though the compiler believes the return type of the function plusN is

a function with a single parameter with the external name ‘i’ of type Int which returns an Int

The definition of a function type given in the ‘red book’ is

    function-typetype -> type

The definition of a type is

    type  array-type
           | dictionary-type
           | function-type
           | type-identifier
           | tuple-type
           | optional-type
           | implicitly-unwrapped-optional-type
           | protocol-composition-type
           | metatype-type

The definition of a tuple type is

    tuple-type( tuple-type-bodyopt )
    
    tuple-type-bodytuple-type-element-list ...opt
    
    tuple-type-element-listtuple-type-element
                               | tuple-type-element , tuple-type-element-list
                            
    
    tuple-type-elementattributesopt inoutopt type
                               | inoutopt element-name type-annotation
                            
    element-nameidentifier

This is all a bit misleading because you simply cannot use an arbitrary tuple type wherever you can use a type.

For example the return type of a function can be a tuple type, but it certainly cannot be the tuple type

    (inout Int)

Which is a bit disappointing really.

It would be quite an interesting feature although it is not entirely clear whether it would enable the caller to alter the value inside the called function after the called function had returned the value to them, or conversely enable the called function to alter the returned value insider the caller after it had returned the value to the caller.

Either way an opportunity missed I think.

We have already seen another return type example.

A return type cannot be a single named element tuple type.

This is covered in the Tuple Type section where it says

you can name a tuple element only when the tuple type has two or more elements

so you can’t do this either

    func cod(#i: Int) -> Int
    {
        return 0
    }
    
    func dab() -> (i: Int) -> Int
    {
        return cod
    }

except that you can.

The effect of this is to specify that the parameter of the returned function has the external name ‘i’ and the type Int although you would be hard pushed to discover that other than by trial and error.

Note that the type of the function dab

    (i: Int) -> Int

is the same text appears in the curried function version of the function plusN.

    func plusN(n: Int)(i: Int) -> Int
    {
        return i + n
    }

If you were to mistakenly transform that version into this one

    func plusN(n: Int) -> (i : Int) -> Int
    {
        return { i in  i + n }
    }

you would end up with the right behaviour but with an external name you do not want.

This is not what is happening the curried function we have been looking at it because the resulting symbol would not be the same but it is presumably something similar that occurs at some stage during the compilation.

Let’s assume the appearance of the ‘external name’ when using the curried syntax in this way is a bug and that it is going to get fixed.

That leaves the lower case ‘f’.

You cannot compile the curried and vanilla versions of plusN above together in the same file. The compiler considers one to be a redeclaration of the other.

Yet if you compile them on their own you get different symbols even if the difference is the case of a single letter.

Is the difference meaningful or is it some kind of artefact ?


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

October 29, 2014

So Swift Then: Mangled Function Names And Function Types — Function Types Go Recursive

Now we have a model of how function types are encoded lets see if we can break it.

What happens if we use it on a function which returns a function.

Starting with the function

    func bass() -> ()
    {
    }

Now we have a function to return like so

    func cod() -> () -> ()
    {
        return bass
    }

Compiling cod gives us the symbol

    __TF4xper3codFT_FT_T_

giving us

    FT_FT_T_

as the function type.

Now for the model.

Starting by creating some useful objects.

    let encoder     = Encoder()
    let builder     = FunctionTypeBuilder()
    let ftVoidVoid  = builder.build()

then

    let ft6000 = builder.returnType(ftVoidVoid).build()

    println(ft6000.encode(encoder))

prints

    FT_FT_T_

We’re off to flying start.

This is a function which returns a function which returns a function

    func dab() -> () -> () -> ()
    {
        return cod
    }

and compiling it gives us the symbol

    __TF4xper3dabFT_FT_FT_T_

giving us

    FT_FT_FT_T_

as the function type.

This

    let ft6001 = builder.returnType(ft6000).build()
    
    println(ft6001.encode(encoder))

prints

    FT_FT_FT_T_

so that’s alright.

A function that takes an Int and a function as arguments

    func eel(x: Int, f: (Int) -> (Int)) -> Int
    {
        return f(x)
    }

Compiling it gives us the symbol

    __TF4xper3eelFTSiFSiSi_Si

giving us

    FTSiFSiSi_Si

as the function type.

    let ftIntInt = builder.intParam().intReturnType().build()
    let ft6002   = builder.intParam().param(ftIntInt).intReturnType().build()
    
    println(ft6002.encode(encoder))

prints

    FTSiFSiSi_Si

A function that takes a function that takes two Ints and returns a pair or Ints

    func flounder(f: (Int, Int) -> (Int, Int))
    {
    }

Compiling it gives us the symbol

    __TF4xper8flounderFFTSiSi_TSiSi_T_

giving us

    FFTSiSi_TSiSi_T_

as the function type.

    let intType  = BuiltinType.IntType
    let fnType   = builder.tupleTypeParam(intType, intType).tupleReturnType(intType, intType).build()
    let ft6003   = builder.param(fnType).build()
    
    println(ft6003.encode(encoder))

prints

    FFTSiSi_TSiSi_T_

which of course it would !

A function that takes a function which returns a function as its argument and returns the result of calling that function

    func goby(f: () -> () -> ()) -> () -> ()
    {
        return f()
    }

Pick the parentheses out of that.

The compiler having picked the parentheses out comes up with the symbol

    __TF4xper4gobyFFT_FT_T_FT_T_

which gives us the function type

    FFT_FT_T_FT_T_

and we’ll just have to trust that it knows what its doing.

    let ft6004   = builder.param(ft6000).returnType(ftVoidVoid).build()
    
    println(ft6004.encode(encoder))

prints

    FFT_FT_T_FT_T_

so if the compiler knows what its doing so does the model, and possibly vice versa.

And that’s enough of that without a dedicated test harness that can count ‘F’s and ‘T’s and stuff.

It certainly looks as though the model’s behaviour matches that of the compiler when it comes to function types as arguments and return types.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

October 28, 2014

So Swift Then: More Fun With Mangled Names Continued

Function Type Encoding: A Model In Swift

Type Encoding

Encodeable

We start by defining the Encodeable protocol

    protocol Encodeable
    {
        func encode(encoder:Encoder) -> String
    }

Type

A Type must be encodeable. That’s it at the moment.

    protocol Type: Encodeable
    {
    }

Builtin Types

Built in types are represented by the enum BuiltinType

We will assume that the Encoder is responsible for knowing how built-in types are actually encoded.

    enum BuiltinType: Type
    {
        case ArrayType
        case BoolType
        case DoublgType
        case IntType
        case OptionalType
        case StringType
        case UintType
    
        //
    
        func encode(encoder:Encoder) -> String
        {
            return encoder.encode(self)
        }
    }

Generic Types

Generic types are represented by sub-classes of GenericType.

A generic type encodes itself.

    class GenericType: Type
    {
        init(baseType:Type, parameterTypes:Type...)
        {
            self.baseType       = baseType
            self.parameterTypes = parameterTypes
        }
    
        //
    
        func encode(encoder:Encoder) -> String
        {
            var encoding = "G"
    
            encoding += baseType.encode(encoder)
            for t in parameterTypes
            {
                encoding += t.encode(encoder)
            }
            encoding += "_"
            return encoding
        }
    
        //
    
        private let baseType        : Type
        private let parameterTypes  : [Type]
    }

Array Types

Array types are represented by instances of the class ArrayType

    final class ArrayType: GenericType
    {
        init(elementType:Type)
        {
            super.init(baseType:BuiltinType.ArrayType, parameterTypes:elementType)
        }
    }

Optional Types

Optional types are represented by instances of the class OptionalType

    final class OptionalType: GenericType
    {
        init(type:Type)
        {
            super.init(baseType:BuiltinType.OptionalType, parameterTypes:type)
        }
    }

Tuple Types

The Empty Tuple Type

The type of the empty tuple

    ()

is represented by an instance of the class EmptyTupleType

    final class EmptyTupleType: TupleType
    {
        func encode(encoder:Encoder) -> String
        {
            return "T_"
        }
    }

    typealias VoidType = EmptyTupleType

Single Element Tuple Types

The type of a tuple with a single unnamed element is represented by an instance of the class SingleElementTupleType

    final class SingleElementTupleType: TupleType
    {
        init(elementType:Type)
        {
            self.elementType = elementType
        }
        
        //

        func encode(encoder:Encoder) -> String
        {
            return elementType.encode(encoder)
        }

        //

        private let elementType: Type
    }

It encodes itself by returning the encoding of its element type.

Multi Element Tuple Types

The type of any tuple which has more than one element is represented by an instance of the class MultiElementTupleType

    final class MultiElementTupleType: TupleType
    {
        init(first:TupleElementType, second:TupleElementType, rest:[TupleElementType])
        {
            var elementTypes = [first, second]
    
            elementTypes += rest
            self.elementTypes = elementTypes
        }
    
        //
    
        func encode(encoder:Encoder) -> String
        {
            var encoding = "T"
    
            for et in elementTypes
            {
                encoding += e.encode(encoder)
            }
            encoding += "_"
            return encoding
        }
    
        //
    
        private let elementTypes: [TupleElementType]
    }

The type of a tuple element is represented by an instance of the enum TupleElementType

    enum TupleElementType: Encodeable
    {
        case NameAndType(String, Type)
        case TypeOnly(Type)
    
        func encode(encoder:Encoder) -> String
        {
            switch self
            {
                case let .NameAndType(name, type):
    
                    var encoding = ""
    
                    encoding += encoder.encode(name)
                    encoding += type.encode(encoder)
                    return encoding
    
                case let .TypeOnly(type):
    
                    return type.encode(encoder)
            }
        }
    }
    

Function Type Encoding

The encoding of a function type is the encoding of its parameters immediately followed by the encoding of its return type.

A function’s parameters are encoded as though they comprise a tuple type.

To do this for certain function parameters we need two additional ‘synthetic’ tuple types.

Synthetic Tuple Types

Single Named Element Tuple Types

The compiler refuses to acknowledge the existence of single named element tuple types.

We need to encode a parameter with an external name as though it were one so we define the class SingleNamedElementTupleType

    final class SingleNamedElementTupleType: TupleType
    {
        init(name:String, type:Type)
        {
            self.name = name
            self.type = type
        }
    
        //
    
        func encode(encoder:Encoder) -> String
        {
            var encoding = "T"
    
            encoding += encoder.encode(name)
            encoding += type.encode(encoder)
            encoding += "_"
            return encoding
        }
    
        //
    
        private let name:   String
        private let type:   Type
    }

Varadic Tuple Types

To represent a function’s parameters as a tuple type when that function has a varadic parameter we need a ‘varadic tuple’ type.

A ‘varadic tuple’ type is represented by an instance of the class

    final class VaradicTupleType: TupleType
    {
        init(var elementTypes:[TupleElementType])
        {
            assert(elementTypes.count != 0)
            
            var last : TupleElementType

            switch elementTypes.removeLast()
            {
                case let .NameAndType(name, type):

                    last = TupleElementType.NameAndType(
                                                name,
                                                ArrayType(
                                                    elementType:type)))

                case let .TypeOnly(type):

                    last = TupleElementType.TypeOnly(
                                                ArrayType(
                                                    elementType:type)))
            }
            elementTypes.append(last)
            self.elementTypes = elementTypes
        }

        //

        func encode(encoder:Encoder) -> String
        {
            var encoding = "t"

            for et in elementTypes
            {
                encoding += et.encode(encoder)
            }
            encoding += "_"
            return encoding
        }

        //

        private let elementTypes : [TupleElementType]
    }

Parameter Tuple Element Types

Vanilla

A vanilla parameter of type T is represented by an instance of

    TupleElementType.TypeOnly

For example

    i : Int

is represented by

    TupleElementType.TypeOnly(BuiltinType.IntType)

External Names

A parameter with an external name is represented by an instance of

    TupleElementType.NameAndType

For example

    e i : Int

is represented by

    TupleElementType.NameAndType("e", "BuiltinType.IntType)

inout

The type of an inout parameter is represented by an instance of the class ReferenceType.

    final class ReferenceType: Type
    {
        init(type:Type)
        {
            self.type = type
        }
    
        //
    
        func encode(encoder:Encoder) -> String
        {
            return "R" + type.encode(encoder)
        }
    
        //
    
        private let type: Type
    }

If it does not have an ‘external name’ the parameter is represented by an instance of

    TupleElementType.TypeOnly

or by an instance of

    TupleElementType.NameAndType

otherwise.

Varadic

If a function has a varadic parameter then its parameters are represented by an instance of the class VaradicTupleType.

The parameter itself is represented by an instance of ArrayType.

Function Types

A function type is represented by an instance of the class FunctionType.

   final class FunctionType: Type
    {
        init(parameters:TupleType, returnType:TupleType)
        {
            self.parameters = parameters
            self.returnType = returnType
        }
    
        //
    
        func encode(encoder:Encoder) -> String
        {
            var encoding = "F"
    
            encoding += parameters.encode(encoder)
            encoding += returnType.encode(encoder)
            return encoding
        }
    
        //
    
        private let parameters: TupleType
        private let returnType: TupleType
    }

Other Types

Of the other types we have seen class, enum and struct types can all be represented by sub-classes of NamedType.

    class NamedType: Type
    {
        init(prefix:String, name:Name)
        {
            self.prefix = prefix
            self.name   = name
        }
    
        func encode(encoder:Encoder) -> String
        {
            return encoder.encode(prefix:prefix, name:name)
        }
    
        private let prefix: String
        private let name:   Name
    }

The actual encoding of the names is done by the Encoder. This makes it possible for the Encoder to replace the name of the type with a substitution pattern
when appropriate.

The type names are represented by instances of the enum Name.

    enum Name
    {
        case Local([String])
        case External([String])
        case Swift([String])
    }

The representation of the protocol type is left as an exercise for the reader.

Some Examples

Starting with the obvious one

    let encoder = Encoder()
    let builder = FunctionTypeBuilder()
    
    let ft0001 = builder.build()
    
    println(ft0001.encode(encoder))

prints

    FT_T_

So far so good.

Some return types.

An integer

    let ft0002 = builder.returnType(BuiltinType.IntType).build()
    
    println(ft0002.encode(encoder))

prints

    FT_Si

An array of integers

    let ft0003 = builder.returnType(ArrayType(elementType:BuiltinType.IntType)).build()
    
    println(ft0003.encode(encoder))

prints

    FT_GSaSi_

An optional integer

    let ft0004 = builder.optionalReturnType(BuiltinType.IntType).build()

    println(ft0004.encode(encoder))

prints

    FT_GSqSi_

A single unnamed element tuple

    let ft0005 = builder.tupleReturnType(BuiltinType.IntType).build()
    
    println(ft0005.encode(encoder))

prints

    FT_Si

A multiple element tuple

    let ft0006 = builder.tupleReturnType(BuiltinType.IntType, BuiltinType.IntType).build()
    
    println(ft0006.encode(encoder))

prints

    FT_TSiSi_

A named element tuple

    let ft0007 = builder.namedTupleReturnType((name:"x", type:BuiltinType.IntType), rest:(name:"y", type:BuiltinType.IntType)).build()
    
    println(ft0007.encode(encoder))

prints

    FT_T1xSi1ySi_

Some parameters

An integer parameter

    let ft1000 = builder.param(BuiltinType.IntType).build()

    println(ft1000.encode(encoder))

prints

    FSiT_

An integer parameter with an external name

    let ft1001 = builder.param(externalName:"e", type:BuiltinType.IntType).build()

    println(ft1001.encode(encoder))

prints

    FT1eSi_T_

Two integer parameters

    let ft1002 = builder.param(BuiltinType.IntType).param(BuiltinType.IntType).build()

    println(ft1002.encode(encoder))

prints

    FTSiSi_T_

An inout integer parameter

    let ft1003 = builder.inoutParam(BuiltinType.IntType).build()

    println(ft1003.encode(encoder))

prints

    FRSiT_

An integer parameter and an inout integer parameter

    let ft1004 = builder.param(BuiltinType.IntType).inoutParam(BuiltinType.IntType).build()

    println(ft1004.encode(encoder))

prints

    FTSiRSi_T_

A single unnamed element tuple

    let ft1005 = builder.tupleTypeParam(BuiltinType.IntType).build()

    println(ft1005.encode(encoder))

prints

    FSiT_

A multiple unnamed element tuple

    let ft1006 = builder.tupleTypeParam(BuiltinType.IntType, BuiltinType.IntType).build()

    println(ft1006.encode(encoder))

prints

    FTSiSi_T_

Conclusions

The model is based on two simple assumptions

  1. a single unnamed element tuple type is encoded as the type of its element, and

  2. a function’s parameters are encoded as though they comprised a tuple type

The resulting behaviour seems to be accurate up to and including generating the same encodings for ostensibly ‘different’ sets of function parameters.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

October 27, 2014

So Swift Then: More Fun With Mangled Names

What follows is an attempt to determine

  1. whether the function name mangling machine (FNMM) has something against single unnamed element tuple types
    only when they appear in vanilla parameters or whether it just objects to them on principle wherever they appear

  2. whether it has any idiosyncracies with respect to any of the other possible tuple types.

There are effectively four kinds of tuple.

  • the empty tuple

  • unnamed element tuples, i.e., tuples where all the elements are unnamed

  • named element tuples, i.e., tuples where all the elements are named

  • mixed element tuples, i.e., tuples which are a mixture of named and unnamed elements

Unnamed and named tuples can further be divided into single and multiple element cases.

Tuple types can appear in the declarations of return types and of parameter types and as components of other type declarations.

1.0 Return Types

1.1 Void

We already know that the FNMM encodes the empty tuple type as

    T_

whenever it is used as a return type.

1.2 Unnamed Element Tuples

1.2.1 Single Element

Compiling

    func cod() -> (Int)
    {
        return (0)
    }

gives us the symbol

     __TF4xper3codFT_Si

so the FNMM also flattens a single element tuple type when it is used as a return type.

This means that there is a collision between the mangled names of functions which return

    T

and those which return

    (T)

and have identical parameters.

In practice it turns out that the compiler considers them to be the same function when it can see both of them at the same time

    $ swiftc -module-name xper -emit-library functions03.swift
    functions03.swift:8:6: error: invalid redeclaration of 'cod()'
    func cod() -> (Int)
         ^
    functions03.swift:3:6: note: 'cod()' previously declared here
    func cod() -> Int
         ^

Then there are optionals.

Compiling

    func cod() -> (Int)?
    {
        return (0)
    }

gives us the symbol

     __TF4xper3codFT_GSqSi_

The FNMM has flattened the tuple type again.

Compiling

    func cod() -> (Int)!
    {
        return (0)
    }

gives us the symbol

     __TF4xper3codFT_GSQSi_

At least the FNMM is consistent, but is it recursive ?

Compiling

    func cod() -> (((Int)))
    {
        return (((0)))
    }

gives us the symbol

     __TF4xper3codFT_Si

so the FNMM is indeed recursive.

Compiling

    func cod() -> [(Int)]
    {
        return []
    }

gives us the symbol

     __TF4xper3codFT_GSaSi_

so the FNMM is not easily fooled either.

1.2.2 Multi Element

Compiling

    func cod() -> (Int, String, Int)
    {
        return (1, "", 5)
    }

gives us the symbol

     __TF4xper3codFT_TSiSSSi_

The FNMM does not treat a multiple unnamed element tuple type specially when it is used as a return type.

1.3 Named Tuples

1.4 Single Element

Compiling

    func cod() -> (i: Int)
    {
        return (i: 0)
    }

doesnt.

Instead this happens

    $ swiftc -module-name xper -emit-library  functions03.swift
    functions03.swift:40:16: error: cannot create a single-element tuple with an element label
    func cod() -> (i: Int)
                   ^~~

so the FNMM never gets the chance to encode the return type.

The compiler error message is a tad misleading.

You CAN create a single named element tuple, for example, this will compile

    func cod() -> (Int)
    {
        let t = (i: 0)

        return t
    }

you just cannot explictly specify its type, so this version will not compile

    func cod() -> (Int)
    {
        let t : (i: Int) = (i: 0)
    
        return t
    }

and the reason for that is probably because that isn’t its type.

Note the return type of the function and the fact that this compiles

    func cod() -> (Int)
    {
        let t : (Int) = (i: 0)
    
        return t
    }

1.4 Multi Element

Compiling

    func cod() -> (i: Int, j: Int)
    {
        return (i: 0, j: 0)
    }

gives us the symbol

     __TF4xper3codFT_T1iSi1jSi_

The FNMM does not treat a multiple named element tuple type specially when it is used as a return type.

Of course, compiling

    func cod() -> (i: (Int), j: (Int))
    {
        return (i: 0, j: 0)
    }

also gives us the symbol

     __TF4xper3codFT_T1iSi1jSi_

Luckily this is not a problem as the compiler, as in the case above, considers them to be one and the same function.

    $ swiftc  -module-name xper -emit-library somefuncs.swift
    somefuncs.swift:7:6: error: invalid redeclaration of 'cod()'
    func cod() -> (i: (Int), j: (Int))
         ^
    somefuncs.swift:1:6: note: 'cod()' previously declared here
    func cod() -> (i: Int, j: Int)
         ^

1.5 Mixed Element Tuples

Compiling

    func cod() -> (s String, String)
    {
        return (s: "", "")
    }

gives us the symbol

     __TF4xper3codFT_T1sSSSS_

The FNMM does not treat a mixed element tuple type specially when it is used as a return type.

2.0 Parameters

2.1 Vanilla

2.1.1 Void

Compiling

    func cod(t:())
    {
    }

gives us the symbol

     __TF4xper3codFT_T_

A function that takes a Void argument is the same as function that takes no arguments at all.

Digression

Of course, a function that takes more than one Void argument is not the same as function that takes no arguments at all.

Compiling

    func cod(v0:(), v1:(), v2:())
    {
    }

gives us the symbol

    __TF4xper3codFTT_T_T__T_

Discuss.

End of digression

2.1.2 Unnamed Element Tuples

2.1.2.1 Single Element

We already know the answer to this one.

The FNMM flattens a single unnamed element tuple type when it appears as the type of a vanilla parameter.

2.1.2.2 Multi Element

Compiling

    func cod(e t:(Int, Int))
    {
    }

gives us the symbol

    __TF4xper3codFTSiSi_T_

The FNMM does not treat a multiple unnamed element tuple type specially when it is used as the type of a vanilla parameter.

2.1.3 Named Tuples

2.1.3.1 Single Element

As in the return type case above, not supported by the compiler.

2.1.3.2 Multi Element

Compiling

    func cod(e t:(x:Int, y:Int))
    {
    }

gives us the symbol

    __TF4xper3codFT1xSi1ySi_T_

The FNMM does not treat a multiple named element tuple type specially when it is used as the type of a vanilla parameter.

2.1.4 Mixed Element Tuples

Compiling

    func cod(t:(x:Int, String, y:Int))
    {
    }

gives us the symbol

    __TF4xper3codFT1xSiSS1ySi_T_

The FNMM does not treat a mixed element tuple type specially when it is used as the type of a vanilla parameter.

2.2 External Names

2.2.1 Void

Compiling

    func cod(e t:())
    {
    }

gives us the symbol

     __TF4xper3codFT1eT__T_

How do you call it ?

Like this, obviously

    cod(t:())

2.2.2 Unnamed Element Tuples

2.2.2.1 Single Element

Compiling

    func cod(e t:(Int))
    {
    }

gives us the symbol

     __TF4xper3codFT1eSi_T_

The FNMM has flattened the single unnamed element tuple type as usual.

2.2.2.2 Multi Element

Compiling

    func cod(e t:(Int, Int))
    {
    }

gives us the symbol

     __TF4xper3codFT1eTSiSi__T_

The FNMM does not treat a multiple unnamed element tuple types specially when it is used as the type of a vanilla parameter with an external name.

2.2.3 Named Element Tuples

2.2.3.1 Single Element

Not supported by the compiler.

2.2.3.2 Multi Element

Compiling

    func cod(e t:(i: Int, j:Int))
    {
    }

gives us the symbol

     __TF4xper3codFT1eT1iSi1jSi__T_

The FNMM does not treat a multiple named element tuple type specially when it is used as the type of a vanilla parameter with an external name.

2.2.4 Mixed Element Tuples

Compiling

    func cod(e t:(x:Int, String, y:Int))
    {
    }

gives us the symbol

   __TF4xper3codFT1eT1xSiSS1ySi__T_

The FNMM does not treat a mixed element tuple type specially when it is used as the type of a vanilla parameter with an external name.

2.3 inout

2.3.1 Void

Compiling

    func cod(inout t:())
    {
    }

gives us the symbol

     __TF4xper3codFRT_T_

Yes you can call it.

You need a mutable empty tuple of course.

    $ swift
    Welcome to Swift!  Type :help for assistance.
      1> func cod(inout t:()) { println(t) }
      2> func dab()
      3. {
      4.     var empty : () = ()
      5.
      6.     cod(&empty)
      7. }
      8> dab()
    ()
      9>
      

You can have an external name as well if you want.

Compiling

    func cod(inout e t:())
    {
    }

gives us the symbol

     __TF4xper3codFT1eRT__T_

2.3.2 Unnamed Element Tuples

2.3.2.1 Single Element

Compiling

    func cod(inout t:(Int))
    {
    }

gives us the symbol

     __TF4xper3codFRSiT_

and compiling

    func cod(inout e t:(Int))
    {
    }

gives us the symbol

     __TF4xper3codFT1eRSi_T_

In both cases the FNMM has flattened the single unnamed element tuple type as usual.

2.3.2.2 Multi Element

Compiling

    func cod(inout t:(Int, Int))
    {
    }

gives us the symbol

     __TF4xper3codFRTSiSi_T_

and compiling

    func cod(inout e t:(Int, Int))
    {
    }

gives us the symbol

     __TF4xper3codFT1eRTSiSi__T_

The FNMM does not treat a multiple named element tuple type specially when it is used as the type of an inout parameter with or without an external parameter.

2.3.3 Named Element Tuples

2.3.3.1 Single Element

Not supported by the compiler.

2.3.3.2 Multi Element

Compiling

    func cod(inout t:(x:Int, y:Int))
    {
    }

gives us the symbol

     __TF4xper3codFRT1xSi1ySi_T_

and compiling

    func cod(inoute t:(x:Int, y:Int))
    {
    }

gives us the symbol

     __TF4xper3codFT1eRT1xSi1ySi__T_

The FNMM does not treat a multiple named element tuple type specially when it is used as the type of an inout parameter with or without an external name.

2.3.4 Mixed Element Tuples

Compiling

    func cod(inout t:(x:Int, String, y:Int))
    {
    }

gives us the symbol

    __TF4xper3codFRT1xSiSS1ySi_T_

and compiling

    func cod(inout e t:(x:Int, String, y:Int))
    {
    }

gives us the symbol

    __TF4xper3codFT1eRT1xSiSS1ySi__T_

The FNMM does not treat a mixed element tuple type specially when it is used as the type of an inout parameter with or without an external name.

2.4 varadic

2.4.1 Void

Compiling

    func cod(t:()...)
    {
    }

gives us the symbol

     __TF4xper3codFtGSaT___T_

and you can call it too.

You end up with an array of empty tuples.

Compiling

    func cod(voids t:()...)
    {
    }

gives us the symbol

     __TF4xper3codFt5voidsGSaT___T_

Needless to say you can call that one as well.

2.4.2 Unnamed Element Tuples

2.4.2.1 Single Element

Compiling

    func cod(t:(Int)...)
    {
    }

gives us the symbol

     __TF4xper3codFtGSaSi__T_

and compiling

    func cod(ints t:(Int)...)
    {
    }

gives us the symbol

     __TF4xper3codFt4intsGSaSi__T_

In both cases the FNMM has flattened the single element tuple type as usual.

2.4.2.2 Multi Element

Compiling

    func cod(t:(Int, Int)...)
    {
    }

gives us the symbol

     __TF4xper3codFtGSaTSiSi___T_

and compiling

    func cod(e t:(Int, Int)...)
    {
    }

gives us the symbol

     __TF4xper3codFt1eGSaTSiSi___T_

The FNMM does not treat a multiple unnamed element tuple type specially when it is used as the type of a varadic parameter with or without an external name.

2.4.3 Named Element Tuples

2.4.3.1 Single Element

Not supported by the compiler.

2.4.3.2 Multi Element

Compiling

    func cod(t:(i:Int, j:Int)...)
    {
    }

gives us the symbol

     __TF4xper3codFtGSaT1iSi1jSi___T_

and compiling

    func cod(e t:(i:Int, j:Int)...)
    {
    }

gives us the symbol

     __TF4xper3codFt1eGSaT1iSi1jSi___T_

The FNMM does not treat a multiple named element tuple type specially when it is used as the type of a varadic parameter with or without an external name.

2.4.4 Mixed Element Tuples

Compiling

    func cod(t:(x:Int, String, y:Int)...)
    {
    }

gives us the symbol

     __TF4xper3codFtGSaT1xSiSS1ySi___T_

and compiling

    func cod(e t:(x:Int, String, y:Int)...)
    {
    }

gives us the symbol

     __TF4xper3codFt1eGSaT1xSiSS1xSi___T_

The FNMM does not treat a mixed element tuple type specially when it is used as the type of a varadic parameter with or without an external name.

3.0 Conclusion

The FNMM simply cannot abide single unnamed element tuple types. Either that or they don’t exist.

Postscript

While ploughing through that lot it occurred to me that the FNMM’s dislike of single unnamed element tuple types is at the bottom of the way in which it encodes a function’s parameters.

Unfortunately the explanation is too long to fit into the margin of this post.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

October 24, 2014

So Swift Then: Fun With Mangled Names Continued

Continuing where we left off in last week’s thrilling installment, that is, with the encoding of function parameters in mangled function names.

While the return type of a function is necessarily a type and nothing but a type, Swift function parameters come in a variety of different flavours.

Some of these flavours affact how a function is called and what it can be called with.

Mangled function names are potentially used by the linker so the names must include an encoding of the relevant information about the function parameters which affect how the function can be called.

A basic vanilla function parameter is of the form

    'name' ":" 'type'

for example

    func cod(b: Bool)

In the case of a vanilla parameter only the type is significant so we would expect that the parameter would appear in a mangled function name as the encoding of its type.

A vanilla parameter can be augmented with an ‘external name’ which is declared before the ‘name’, for example

    func cod(e b: Bool)

The ‘external name’ must appear in any call to the function, for example

    cod(e: true)

In this case, since the external name is significant, we would expect both it and the type to appear in a mangled function name is some way.

By default function parameters are immutable. A function parameter can be made mutable using the

    var

keywood. for example

    func cod(var b: Bool)

The fact that a parameter of a function is mutable is not apparent to a caller of that function, consequently we would not expect the encoding of a mutable parameter to be any different from that of an immutable one.

A function parameter can be made mutable such that changes to it are visible to the caller using the

    inout

keywood. for example

    func cod(inout b: Bool)

A call to this function would look like this

    cod(&flag)

and flag must be mutable.

In this case we would expect the encoding to comprise the encoding of the parameter type plus something to indicate that it is an inout parameter.

A vanilla parameter can have a default value added, like so

    func cod(b: Bool = true)

A parameter with a default value must have an ‘external name’. If it does not have an explicit one, the ‘name’ is also used as the ‘external name’.

A call to the function above with a value would look like this

    cod(b:true)

and without one, so the default value is used, like this

    cod()

While the presence of a default value does affect how the function can be called, the effect is to define a function with two different entry points.

It is easier for the compiler to handle this by generating what looks like two different functions than for the linker to edit compiled code at the call site.

Hence, in this case we would only expect the ‘external name’ and the parameter type to appear in the parameter encoding.

A Swift function can be defined to take a variable number of parameters lke so

    func cod(flags: Bool...)

In the body of the function

   flags

has the type

   [Bool]

One candidate for the encoding of this kind of parameter would be the encoding of the appropriate array type.

Vanilla Parameters

Compiling

    func cod(b: Bool)
    {
    }

gives us the symbol

    __TF4xper3codFSbT_

Compiling

    func cod(b: Bool, c: Bool)
    {
    }

gives us the symbol

    __TF4xper3codFTSbSb_T_

Compiling

    func cod(b: Bool, c: Bool, d: Bool)
    {
    }

gives us the symbol

    __TF4xper3codFTSbSbSb_T_

On the basis of these three examples so far we can infer the following rules for the encoding of the parameters of a function with N vanilla parameters.

If N == 0 the parameters are represented by the encoding of the empty tuple type

   ()

If N == 1 the parameters are represented by the encoding of

    T

where T is the type of the single parameter.

If N > 1 then the parameters are represented by the encoding of the tuple type

   (T[0], ..., T[N-1])

where T[i] is the type of the i'th parameter.

As it stands the N == 1 case is a bit odd. Why is it a special case ? What if the single parameter has a tuple type ?

Compiling this

    func cod(b: (Bool))
    {
    }

gives us the symbol

    __TF4xper3codFSbT_

NOT

    __TF4xper3codFTSb_T_

which is surprising but it does mean that the rule for N == 1 holds.

Compiling this

    func cod(b: (Bool,Bool))
    {
    }

gives us the symbol

    __TF4xper3codFTSbSb_T_

which is even more surprising.

Two functions with different numbers of parameters end up with the same mangled name.

That cannot be right.

How can you have both of the functions in the same library ?

What happens if you compile both of them in the same file ?

    $ swiftc -module-name xper -emit-library funcs.swift
    Basic Block in function '_TF4xper3codFTSbSb_T_' does not have terminator!
    label %entry1
    LLVM ERROR: Broken function found, compilation aborted!

That's not good.

What about putting them in different files ?

    $ swiftc -module-name xper -emit-library func01.swift func02.swift
    duplicate symbol __TF4xper3codFTSbSb_T_ in:
        [..]/func01-b2b04f.o
        [..]/func02-5b9bf6.o
    ld: 1 duplicate symbol for architecture x86_64
    <unknown>:0: error: linker command failed with exit code 1 (use -v to see invocation)

That's not good either.

OK, them's the rules and they are broken.

Sidles away nonchalantly, hands in pockets, hoping no one is going to ask him to pay for the damage.

Parameter Type Substitution Syntax

There is one slight twist with the encoding of function parameter types even in the all vanilla parameters case.

Compiling

    func cod(c1: Character, c2: Character)
    {
    }

gives us the symbol

    __TF4xper3codFTOSs9CharacterS0__T_

rather than

    __TF4xper3codFTOSs9CharacterOSs9CharacterST_

If we read

    S0_

as substitute the 0th parameter type then it makes sense.

External Names

Adding an external name to our first example

    func cod(e b: Bool)
    {
    }

and compiling gives us the symbol

    __TF4xper3codFT1eSb_T_

We now have the function's parameters represented by

    T1eSb_

which looks like the encoding of the named tuple type

    (e:Bool)

Adding external names to our second example

    func cod(e b: Bool, f c: Bool)
    {
    }

and compiling gives us the symbol

    __TF4xper3codFT1eSb1fSb_T_

and we now have the function's parameters represented by an encoding of the named tuple type

    (e:Bool, f:Bool)

Adding external names to the first two parameters of our third example

    func cod(e b: Bool, f c: Bool, d: Bool)
    {
    }

and compiling gives us the symbol

    __TF4xper3codFT1eSb1fSbSb_T_

and we now have the function's parameters represented by an encoding of the hybrid named/unnamed tuple type

    (e:Bool, f:Bool, Bool)

as you might expect.

Mutable Parameters

Modifying our first example once more

    func cod(var b: Bool)
    {
    }

and compiling gives us the symbol

    __TF4xper3codFSbT_

as expected the presence of the var keyword has no effect on the encoding of the paramater.

inout Parameters

Compiling

    func cod(inout b: Bool)
    {
    }

gives us the symbol

    __TF4xper3codFRSbT_

and

    RSb

for the encoding of the parameter with an

    R

for 'Reference' or 'Recondite' or something.

An inout parameter can have an 'external name'.

Compiling

    func cod(inoute b: Bool)
    {
    }

gives us the symbol

    __TF4xper3codFT1eRSb_T_

and

    T1eRSb_

for the encoding of the parameter.

Default Values

Compiling

    func cod(b: Bool = true)
    {
    }

gives us the symbol

    __TF4xper3codFT1bSb_T_

and a second symbol

    __TIF4xper3codFT1bSb_T_A_

The first symbol is the same as the symbol for the function

    func cod(b b: Bool)
    {
    }

and would enable the linker to identify the entry point for calls to the function made with a value.

The second symbol presumably enables the linker to identify the entry point for the calls to the function made without a value.

Adding another parameter

    func cod(s:String, b: Bool = true)
    {
    }

and compiling gives us the two symbols

    __TF4xper3codFTSS1bSb_T_

and

    __TIF4xper3codFTSS1bSb_T_A0_

because there are still only two ways to call the function, for example

    cod("")

and

    cod("", b:false)

Compiling a function with two parameters both with default values

    func cod(i:Int = 2, b: Bool = true)
    {
    }

and compiling gives us three symbols

    __TF4xper3codFT1iSi1bSb_T_
    __TIF4xper3codFT1iSi1bSb_T_A0_
    __TIF4xper3codFT1iSi1bSb_T_A_

as there are three ways to call the function.

In all these examples it looks as though it is the 'A' suffix on the 'secondary' symbols which identifies the parameter value or values which are being defaulted.

Varadic Parameters

Compiling

    func cod(flags: Bool...)
    {
    }

gives us the symbol

    __TF4xper3codFtGSaSb__T_

and

   tGSaSb__

for the encoding of the parameter.

We have

   GSaSb_

for

  [Bool]

but 't' rather than 'T' for the tuple.

Adding another parameter

    func  cod(s: String, flags:  Bool...)
    {
    }

and compiling gives us the symbol

    __TF4xper3codFtSSGSaSb__T_

A varadic parameter can have an 'external name'

    func  cod(flags f:  Bool...)
    {
    }

Compiling this gives us the symbol

    __TF4xper3codFt5flagsGSaSb__T_

The 'external name' is now present as we would expect.

In all these examples the 't' remains resolutely lower-case so it looks as though it is connected with the presence of the varadic parameter.

Not The Summary

This is another post that is now way too long so its time to call a halt.

Coming up next time, what has the function name mangling machine got against single element type tuples ?


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog's author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

October 23, 2014

So Swift Then: Fun With Mangled Names

For a while writing about Swift was something of a Sisyphean task as features kept changing either just after I had posted about them, or just before I was about to, so in the end I decided to wait for the dust to settle before writing anything else.

With the advent of iOS 8.1, OS X 10.10 and Xcode 6.1 there is now a version of Swift which cannot keep changing.

I am not assuming that there will not be any more changes it just that for the moment I am basing everything on the version available in Xcode 6.1.

As is my wont I have started trying to do something ‘real’ in Swift and while rummaging around trying to work out how to do something I stumbled upon Swift’s version of mangled names.

These turn out to be quite interesting in what they reveal about how Swift works, or at least how it currently works, and they are also turn out to be a good way of gaining an understanding of certain aspects of Swift as a language.

Swift mangled names can be found by looking at the symbols in Swift dynamic libraries.

If you have a Swift file you can turn it into a Swift dynamic library by doing something like this

    swiftc -emit-library -module-name xper functions.swift

You can then use nm to look at the resulting symbols.

The simplest possible Swift function definition is this

    func cod() -> Void
    {
    }

which defines a function which takes no arguments and returns nothing.

When a Swift function returns Void the return type can be omitted so the simplest possible Swift function
definition is actually

    func cod()
    {
    }

Compiling this in the module xper gives us the symbol

    __TF4xper3codFT_T_

which unsurprisngly doesn't tell us a great deal, although we might conjecture that names are encoded as their length in ASCII followed by the characters, hence

    4xper

and

    3cod

What if we try returning something ?

Compiling the definition

    func cod() -> Bool
    {
        return true
    }

gives us the symbol

    __TF4xper3codFT_Sb

The trailing

    T_

has been replaced by

    Sb

so it looks like the return type is at the end

Lets try returning a Character instead.

Compiling the definition

    func cod() -> Character
    {
        return "A"
    }

gives us the symbol

    __TF4xper3codFT_OSs9Character

Not sure what that is about but the name 'Character' is encoded as we would expect and it is definitely at the end.

Trying some more return types

    func cod() -> Double
    {
        return 0.0
    }

gives us the symbol

    __TF4xper3codFT_Sd

while

    func cod() -> Int
    {
        return 0
    }

gives us the symbol

    __TF4xper3codFT_Si

and

    func cod() -> UInt
    {
        return 0
    }

gives us the symbol

    __TF4xper3codFT_Su

So far so good. Everything is at the end.

Some size specific integers.

    func cod() -> Int16
    {
        return 0
    }

gives us the symbol

    __TF4xper3codFT_VSs5Int16

and

    func cod() -> Int32
    {
        return 0
    }

gives us the symbol

    __TF4xper3codFT_VSs5Int32

OK, so not a lot like the Int case.

What about the unsigned versions ?

    func cod() -> UInt16
    {
        return 0
    }

gives us the symbol

   __TF4xper3codFT_VSs6UInt16

and

    func cod() -> UInt32
    {
        return 0
    }

gives us the symbol

   __TF4xper3codFT_VSs6UInt32

Not making a lot of progress right now. Although all the signed and unsigned integer types seem to be encoded as the same kind of 'something' it is not currently obvious what the 'something' is.

Lets try something different.

What about returning an array ?

    func cod() -> [Int]
    {
        return []
    }

gives us the symbol

    __TF4xper3codFT_GSaSi_

We've got an

    Si

at least.

Presumably the

    GSa

prefix is the code for 'array'

We have also got a

    _

suffix.

A dictionary ?

    func cod() -> [Int: Int]
    {
        return [Int: Int]()
    }

gives us the symbol

    __TF4xper3codFT_GVSs10DictionarySiSi_

Now we've got a

    VSs

again, albeit with a 'G' prefix.

We've also got

    Si

twice which does at least make sense, and another

    _

suffix.

How about a tuple ?

    func cod() -> (Int, Int)
    {
        return (0, 1)
    }

gives us the symbol

    __TF4xper3codFT_TSiSi_

We've got a

    T

prefix, followed by

    Si

twice, corresponding to the tuple element types, followed by a

    _

suffix, which is interesting, because if

    TSiSi_

encodes

    (Int, Int)

then presumably

    T_

encodes

    ()

Given that Void is simply an alias for the empty tuple

    ()

then we would would expect the return type of a function with a Void return type to be encoded as

    T_

and if we look at the first example we see that it is.

Note also that in every example to date the encoding of the return type has been preceded by

    T_

and in every example to date the function has no arguments.

Carrying on with return types for the moment.

What about returning a String ?

    func cod() -> String
    {
        return ""
    }

gives us the symbol

     __TF4xper3codFT_SS

String is analagous to Bool, Double , Int and UInt it would appear.

Time to try returning some non-builtin defined types

Given the class Thing then

    func cod() -> Thing
    {
        return Thing()
    }

gives us the symbol

    __TF4xper3codFT_CS_5Thing

which gives us

   C

for class presumably.

In addition to classes there are protocols, so lets return one.

Given the protocol ByteSource implemented by the class ByteBuffer then compiling

    func cod() -> ByteSource
    {
        return ByteBuffer()
    }

gives us the symbol

    __TF4xper3codFT_PS_10ByteSource_

so that's

   P

for 'protocol' then, except that we have a '_' suffix which implies that there can be more than one protocol name so its really 'protocols'

Compiling this, where ByteSink is an additional protocol and the class ByteBuffer now implements both ByteSink and ByteSource

    func cod() -> protocol<ByteSource,ByteSink>
    {
        return ByteBuffer()
    }

duly gives us the symbol

    __TF4xper3codFT_PS_8ByteSinkS_10ByteSource_

Then there is the 'no protocol' case

    func cod() -> protocol<>
    {
        return 0
    }

duly gives us the symbol

    __TF4xper3codFT_P_

If you are wondering what you can actually do with the result of that function the answer is anything that you can do with something of type Any.

The type

    Any

is simply an alias for

    protocol<>

But I digress.

Onwards with enums.

Given an enum Element then compiling

    func cod() -> Element
    {
        return Element.He
    }

gives us the symbol

    __TF4xper3codFT_OS_7Element

Interestingly we've seen something like this before.

The encoding for Character is

    OSs9Character

so we've got

    O S_ 7Element

and

    O Ss 9Character

If

    O

is the type prefix for an enum, then we have

    "O" 'something' 'enum name'

We've also seen

    "C" 'something' 'class name'

in the class example above, and

    "P" 'something' 'protocol name' "_"

in the protocol example above.

No idea about the 'something' as yet, so moving right along.

What about a struct ?

Given an empty struct AnotherThing

    func cod() -> AnotherThing
    {
        return AnotherThing()
    }

gives us the symbol

    __TF4xper3codFT_VS_12AnotherThing

We've seen some types with a 'V' prefix before, namely

  • VSs5Int16

  • VSs5Int32

  • VSs6UInt16

  • VSs6UInt32

as well as something that might have either a 'GV' or a 'V' prefix

    GVSs10DictionarySiSi_

We know that dictionaries and structs are passed by value so 'V' might mean value, but so are arrays and the array type encoding we have seen does not have a 'V' prefix

We also know that explicitly sized signed and unsigned integer types are actually structs so for the moment we will assume that

    V

is the type prefix for struct.

We now have four type encodings of the form

    'type prefix' 'something' 'type name'

In the class and protocol case the 'something' is

    S_

In the enum cases the 'something' is either

    S_

or

    Ss

The same thing is true in the struct cases

In all the examples to date the 'something' is always

    S_

when the type is local to the module and

    Ss

when it is a built-in type.

It looks as though 'something' might be the module name where

    S_

is 'this module' and

    Ss

is 'Swift'

We can try and confirm this by moving one of the local types into another module.

If we move the Element type into the module other and compile this

    import other
    
    func cod() -> other.Element
    {
        return other.Element.He
    }

we would expect the resulting symbol to be

    __TF4xper3codFT_O5other7Element

and it is.

Since types can be nested in Swift the type encodings are likely to actually be of the form

    'type prefix' 'fully quaified type name'

Compiling

    struct Node
    {
        enum Colour: UInt8
        {
            case Red   = 0
            case Black = 1
        }

        var colour = Colour.Red
    }

    func cod() -> Node.Colour
    {
        return Node.Colour.Black
    }

gives us the symbol

    __TF4xper3codFT_OVS_4Node6Colour

which gives us

    "O" 'fully qualified type name'

where the 'fully qualified name' is three elements long.

What else can a function return ?

There are optionals.

An optional Int

    func cod() -> Int?
    {
        return nil
    }

gives us the symbol

    __TF4xper3codFT_GSqSi_

This looks as though it follows the same pattern as the encoding for array, dictionary, and tuple types, namely

    'type prefix' 'element-type(s)' "_"

What about an optional array ?

    func cod() -> [Int]?
    {
        return nil
    }

gives us the symbol

    __TF4xper3codFT_GSqGSaSi__

which gives us a return type encoding of

    "GSq" 'array type encoding' "_"

as we would expect.

We now have three encodings, array, dictionary and optional, which have a 'G' prefix and a '_' suffix.

All three are generic types so it looks as though their encodings are instances of a more general generic type encoding

Defining the canonical generic type Stack<T> and compiling this

    func cod() -> Stack<Int>
    {
        return Stack<Int>()
    }

gives us the symbol

    __TF4xper3codFT_GVS_5StackSi_

which matches the form of the dictionary type encoding.

Where there is a '?' there is always a '!'

Compiling

    func cod() -> Int!
    {
        return nil
    }

gives us the symbol

    __TF4xper3codFT_GSQSi_

so SQ is to '!' as Sq is to '?'.

What about types ? Can you return a type ? You can access them so, so you should be able to return them.

Compiling

    func cod() -> UInt16.Type
    {
        return UInt16.self
    }

gives us the symbol

    __TF4xper3codFT_MVSs6UInt16

which gives us another type prefix

    M

for

    Meta

or something like that.

And of course, functions can return functions

Compiling the not terribly useful functions

    func zero() -> Int
    {
         return 0
    }
    
    func cod() -> () -> Int
    {
        return zero
    }

gives us the symbol

    __TF4xper3codFT_FT_Si

which would appear to give us

    F

as the type prefix for a function.

This post is already way too long so the encoding of function parameters will have to be the next post.

In the meantime here is a summary of what we know so far about how Swift types are encoded in mangled names in the form of an ad-hoc syntax diagram

    type-encoding            := builtin-type
                                |
                                class-type-encoding
                                |
                                enum-type-encoding
                                |
                                function-type-encoding
                                |
                                generic-type-encoding
                                |
                                meta-type-encoding
                                |
                                protocols-type-encoding
                                |
                                struct-type-encoding
                                |
                                tuple-type-encoding
    
                     
    builtin-type             := "SS"    // String
                                |
                                "Sb"    // Bool
                                |
                                "Sd"    // Double
                                |
                                "Si"    // Int
                                |
                                "Su"    // Uint
    
                    
    class-type-encoding      := "C" fully-qualified-name
    
    
    enum-type-encoding       := "O" fully-qualified-name
    
    
    function-type-encoding   := "F" ???? type-encoding
    
    
    generic-type-encoding    := "G" "Sa" type-encoding "_"                  // array
                                |
                                "G" class-type-encoding type-encoding+ "_"  // generic class
                                |
                                "G" enum-type-encoding type-encoding+ "_"   // generic enum
                                |
                                "G" struct-type-encoding type-encoding+ "_" // generic struct
                                |
                                "G" "SQ" type-encoding "_"                  // implicit optional
                                |
                                "G" "Sq" type-encoding "_"                  // optional
                                
                                
    meta-type-encoding       := "M" type-encoding                          // ???? conjecture based on one example ! ????
                                
                                
    protocols-type-encoding  := "P" fully-qualified-name* "_"
    
    
    struct-type-encoding     := "V" fully-qualified-name
    
    
    tuple-type-encoding      := "T" type-encoding* "_"

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog's author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Create a free website or blog at WordPress.com.