Just An Application

October 23, 2014

So Swift Then: Fun With Mangled Names

For a while writing about Swift was something of a Sisyphean task as features kept changing either just after I had posted about them, or just before I was about to, so in the end I decided to wait for the dust to settle before writing anything else.

With the advent of iOS 8.1, OS X 10.10 and Xcode 6.1 there is now a version of Swift which cannot keep changing.

I am not assuming that there will not be any more changes it just that for the moment I am basing everything on the version available in Xcode 6.1.

As is my wont I have started trying to do something ‘real’ in Swift and while rummaging around trying to work out how to do something I stumbled upon Swift’s version of mangled names.

These turn out to be quite interesting in what they reveal about how Swift works, or at least how it currently works, and they are also turn out to be a good way of gaining an understanding of certain aspects of Swift as a language.

Swift mangled names can be found by looking at the symbols in Swift dynamic libraries.

If you have a Swift file you can turn it into a Swift dynamic library by doing something like this

    swiftc -emit-library -module-name xper functions.swift

You can then use nm to look at the resulting symbols.

The simplest possible Swift function definition is this

    func cod() -> Void
    {
    }

which defines a function which takes no arguments and returns nothing.

When a Swift function returns Void the return type can be omitted so the simplest possible Swift function
definition is actually

    func cod()
    {
    }

Compiling this in the module xper gives us the symbol

    __TF4xper3codFT_T_

which unsurprisngly doesn't tell us a great deal, although we might conjecture that names are encoded as their length in ASCII followed by the characters, hence

    4xper

and

    3cod

What if we try returning something ?

Compiling the definition

    func cod() -> Bool
    {
        return true
    }

gives us the symbol

    __TF4xper3codFT_Sb

The trailing

    T_

has been replaced by

    Sb

so it looks like the return type is at the end

Lets try returning a Character instead.

Compiling the definition

    func cod() -> Character
    {
        return "A"
    }

gives us the symbol

    __TF4xper3codFT_OSs9Character

Not sure what that is about but the name 'Character' is encoded as we would expect and it is definitely at the end.

Trying some more return types

    func cod() -> Double
    {
        return 0.0
    }

gives us the symbol

    __TF4xper3codFT_Sd

while

    func cod() -> Int
    {
        return 0
    }

gives us the symbol

    __TF4xper3codFT_Si

and

    func cod() -> UInt
    {
        return 0
    }

gives us the symbol

    __TF4xper3codFT_Su

So far so good. Everything is at the end.

Some size specific integers.

    func cod() -> Int16
    {
        return 0
    }

gives us the symbol

    __TF4xper3codFT_VSs5Int16

and

    func cod() -> Int32
    {
        return 0
    }

gives us the symbol

    __TF4xper3codFT_VSs5Int32

OK, so not a lot like the Int case.

What about the unsigned versions ?

    func cod() -> UInt16
    {
        return 0
    }

gives us the symbol

   __TF4xper3codFT_VSs6UInt16

and

    func cod() -> UInt32
    {
        return 0
    }

gives us the symbol

   __TF4xper3codFT_VSs6UInt32

Not making a lot of progress right now. Although all the signed and unsigned integer types seem to be encoded as the same kind of 'something' it is not currently obvious what the 'something' is.

Lets try something different.

What about returning an array ?

    func cod() -> [Int]
    {
        return []
    }

gives us the symbol

    __TF4xper3codFT_GSaSi_

We've got an

    Si

at least.

Presumably the

    GSa

prefix is the code for 'array'

We have also got a

    _

suffix.

A dictionary ?

    func cod() -> [Int: Int]
    {
        return [Int: Int]()
    }

gives us the symbol

    __TF4xper3codFT_GVSs10DictionarySiSi_

Now we've got a

    VSs

again, albeit with a 'G' prefix.

We've also got

    Si

twice which does at least make sense, and another

    _

suffix.

How about a tuple ?

    func cod() -> (Int, Int)
    {
        return (0, 1)
    }

gives us the symbol

    __TF4xper3codFT_TSiSi_

We've got a

    T

prefix, followed by

    Si

twice, corresponding to the tuple element types, followed by a

    _

suffix, which is interesting, because if

    TSiSi_

encodes

    (Int, Int)

then presumably

    T_

encodes

    ()

Given that Void is simply an alias for the empty tuple

    ()

then we would would expect the return type of a function with a Void return type to be encoded as

    T_

and if we look at the first example we see that it is.

Note also that in every example to date the encoding of the return type has been preceded by

    T_

and in every example to date the function has no arguments.

Carrying on with return types for the moment.

What about returning a String ?

    func cod() -> String
    {
        return ""
    }

gives us the symbol

     __TF4xper3codFT_SS

String is analagous to Bool, Double , Int and UInt it would appear.

Time to try returning some non-builtin defined types

Given the class Thing then

    func cod() -> Thing
    {
        return Thing()
    }

gives us the symbol

    __TF4xper3codFT_CS_5Thing

which gives us

   C

for class presumably.

In addition to classes there are protocols, so lets return one.

Given the protocol ByteSource implemented by the class ByteBuffer then compiling

    func cod() -> ByteSource
    {
        return ByteBuffer()
    }

gives us the symbol

    __TF4xper3codFT_PS_10ByteSource_

so that's

   P

for 'protocol' then, except that we have a '_' suffix which implies that there can be more than one protocol name so its really 'protocols'

Compiling this, where ByteSink is an additional protocol and the class ByteBuffer now implements both ByteSink and ByteSource

    func cod() -> protocol<ByteSource,ByteSink>
    {
        return ByteBuffer()
    }

duly gives us the symbol

    __TF4xper3codFT_PS_8ByteSinkS_10ByteSource_

Then there is the 'no protocol' case

    func cod() -> protocol<>
    {
        return 0
    }

duly gives us the symbol

    __TF4xper3codFT_P_

If you are wondering what you can actually do with the result of that function the answer is anything that you can do with something of type Any.

The type

    Any

is simply an alias for

    protocol<>

But I digress.

Onwards with enums.

Given an enum Element then compiling

    func cod() -> Element
    {
        return Element.He
    }

gives us the symbol

    __TF4xper3codFT_OS_7Element

Interestingly we've seen something like this before.

The encoding for Character is

    OSs9Character

so we've got

    O S_ 7Element

and

    O Ss 9Character

If

    O

is the type prefix for an enum, then we have

    "O" 'something' 'enum name'

We've also seen

    "C" 'something' 'class name'

in the class example above, and

    "P" 'something' 'protocol name' "_"

in the protocol example above.

No idea about the 'something' as yet, so moving right along.

What about a struct ?

Given an empty struct AnotherThing

    func cod() -> AnotherThing
    {
        return AnotherThing()
    }

gives us the symbol

    __TF4xper3codFT_VS_12AnotherThing

We've seen some types with a 'V' prefix before, namely

  • VSs5Int16

  • VSs5Int32

  • VSs6UInt16

  • VSs6UInt32

as well as something that might have either a 'GV' or a 'V' prefix

    GVSs10DictionarySiSi_

We know that dictionaries and structs are passed by value so 'V' might mean value, but so are arrays and the array type encoding we have seen does not have a 'V' prefix

We also know that explicitly sized signed and unsigned integer types are actually structs so for the moment we will assume that

    V

is the type prefix for struct.

We now have four type encodings of the form

    'type prefix' 'something' 'type name'

In the class and protocol case the 'something' is

    S_

In the enum cases the 'something' is either

    S_

or

    Ss

The same thing is true in the struct cases

In all the examples to date the 'something' is always

    S_

when the type is local to the module and

    Ss

when it is a built-in type.

It looks as though 'something' might be the module name where

    S_

is 'this module' and

    Ss

is 'Swift'

We can try and confirm this by moving one of the local types into another module.

If we move the Element type into the module other and compile this

    import other
    
    func cod() -> other.Element
    {
        return other.Element.He
    }

we would expect the resulting symbol to be

    __TF4xper3codFT_O5other7Element

and it is.

Since types can be nested in Swift the type encodings are likely to actually be of the form

    'type prefix' 'fully quaified type name'

Compiling

    struct Node
    {
        enum Colour: UInt8
        {
            case Red   = 0
            case Black = 1
        }

        var colour = Colour.Red
    }

    func cod() -> Node.Colour
    {
        return Node.Colour.Black
    }

gives us the symbol

    __TF4xper3codFT_OVS_4Node6Colour

which gives us

    "O" 'fully qualified type name'

where the 'fully qualified name' is three elements long.

What else can a function return ?

There are optionals.

An optional Int

    func cod() -> Int?
    {
        return nil
    }

gives us the symbol

    __TF4xper3codFT_GSqSi_

This looks as though it follows the same pattern as the encoding for array, dictionary, and tuple types, namely

    'type prefix' 'element-type(s)' "_"

What about an optional array ?

    func cod() -> [Int]?
    {
        return nil
    }

gives us the symbol

    __TF4xper3codFT_GSqGSaSi__

which gives us a return type encoding of

    "GSq" 'array type encoding' "_"

as we would expect.

We now have three encodings, array, dictionary and optional, which have a 'G' prefix and a '_' suffix.

All three are generic types so it looks as though their encodings are instances of a more general generic type encoding

Defining the canonical generic type Stack<T> and compiling this

    func cod() -> Stack<Int>
    {
        return Stack<Int>()
    }

gives us the symbol

    __TF4xper3codFT_GVS_5StackSi_

which matches the form of the dictionary type encoding.

Where there is a '?' there is always a '!'

Compiling

    func cod() -> Int!
    {
        return nil
    }

gives us the symbol

    __TF4xper3codFT_GSQSi_

so SQ is to '!' as Sq is to '?'.

What about types ? Can you return a type ? You can access them so, so you should be able to return them.

Compiling

    func cod() -> UInt16.Type
    {
        return UInt16.self
    }

gives us the symbol

    __TF4xper3codFT_MVSs6UInt16

which gives us another type prefix

    M

for

    Meta

or something like that.

And of course, functions can return functions

Compiling the not terribly useful functions

    func zero() -> Int
    {
         return 0
    }
    
    func cod() -> () -> Int
    {
        return zero
    }

gives us the symbol

    __TF4xper3codFT_FT_Si

which would appear to give us

    F

as the type prefix for a function.

This post is already way too long so the encoding of function parameters will have to be the next post.

In the meantime here is a summary of what we know so far about how Swift types are encoded in mangled names in the form of an ad-hoc syntax diagram

    type-encoding            := builtin-type
                                |
                                class-type-encoding
                                |
                                enum-type-encoding
                                |
                                function-type-encoding
                                |
                                generic-type-encoding
                                |
                                meta-type-encoding
                                |
                                protocols-type-encoding
                                |
                                struct-type-encoding
                                |
                                tuple-type-encoding
    
                     
    builtin-type             := "SS"    // String
                                |
                                "Sb"    // Bool
                                |
                                "Sd"    // Double
                                |
                                "Si"    // Int
                                |
                                "Su"    // Uint
    
                    
    class-type-encoding      := "C" fully-qualified-name
    
    
    enum-type-encoding       := "O" fully-qualified-name
    
    
    function-type-encoding   := "F" ???? type-encoding
    
    
    generic-type-encoding    := "G" "Sa" type-encoding "_"                  // array
                                |
                                "G" class-type-encoding type-encoding+ "_"  // generic class
                                |
                                "G" enum-type-encoding type-encoding+ "_"   // generic enum
                                |
                                "G" struct-type-encoding type-encoding+ "_" // generic struct
                                |
                                "G" "SQ" type-encoding "_"                  // implicit optional
                                |
                                "G" "Sq" type-encoding "_"                  // optional
                                
                                
    meta-type-encoding       := "M" type-encoding                          // ???? conjecture based on one example ! ????
                                
                                
    protocols-type-encoding  := "P" fully-qualified-name* "_"
    
    
    struct-type-encoding     := "V" fully-qualified-name
    
    
    tuple-type-encoding      := "T" type-encoding* "_"

Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog's author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Advertisements

Blog at WordPress.com.

%d bloggers like this: