Just An Application

September 15, 2011

Android Internals: Resources – Part Four: The StringPool Chunk

1.0 The Example

Immediately following the Table header is a StringPool chunk.

    0000000 02 00 0c 00 64 04 00 00 01 00 00 00 01 00 1c 00
    0000010 d0 00 00 00 06 00 00 00 00 00 00 00 00 01 00 00
    0000020 34 00 00 00 00 00 00 00 00 00 00 00 1d 00 00 00
    0000030 3a 00 00 00 57 00 00 00 6d 00 00 00 8f 00 00 00
    0000040 1a 1a 72 65 73 2f 64 72 61 77 61 62 6c 65 2d 6c
    0000050 64 70 69 2f 69 63 6f 6e 2e 70 6e 67 00 1a 1a 72
    0000060 65 73 2f 64 72 61 77 61 62 6c 65 2d 6d 64 70 69
    0000070 2f 69 63 6f 6e 2e 70 6e 67 00 1a 1a 72 65 73 2f
    0000080 64 72 61 77 61 62 6c 65 2d 68 64 70 69 2f 69 63
    0000090 6f 6e 2e 70 6e 67 00 13 13 72 65 73 2f 6c 61 79
    00000a0 6f 75 74 2f 6d 61 69 6e 2e 78 6d 6c 00 1f 1f 48
    00000b0 65 6c 6c 6f 20 57 6f 72 6c 64 2c 20 50 65 6e 64
    00000c0 72 61 67 6f 6e 41 63 74 69 76 69 74 79 21 00 09
    00000d0 09 50 65 6e 64 72 61 67 6f 6e 00 00 00 02 1c 01
    00000e0 88 03 00 00 7f 00 00 00 78 00 70 00 65 00 72 00
    00000f0 2e 00 72 00 65 00 73 00 6f 00 75 00 72 00 63 00
    0000100 65 00 73 00 2e 00 70 00 65 00 6e 00 64 00 72 00
    0000110 61 00 67 00 6f 00 6e 00 00 00 00 00 00 00 00 00
    0000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
	
    ...
	

The bytes in blue are the StringPool chunk header and those in green the StringPool chunk body.

2.0 The StringPool Chunk Header

The format of a StringPool chunk header is defined by the following C++ struct (see frameworks/base/include/ResourceTypes.h lines 382-410)

    struct ResStringPool_header
    {
        struct ResChunk_header header;

        // Number of strings in this pool (number of uint32_t indices that follow
        // in the data).
        uint32_t stringCount;

        // Number of style span arrays in the pool (number of uint32_t indices
        // follow the string indices).
        uint32_t styleCount;

        // Flags.
        enum {
            // If set, the string index is sorted by the string values (based
            // on strcmp16()).
            SORTED_FLAG = 1<<0,

            // String pool is encoded in UTF-8
            UTF8_FLAG = 1<<8
        };
        uint32_t flags;

        // Index from header of the string data.
        uint32_t stringsStart;

        // Index from header of the style data.
        uint32_t stylesStart;
    };

2.1 header

The header field is a struct ResChunk_header instance.

The header.type field is always 0x0001 (RES_STRING_POOL_TYPE).

The header.headerSize field is always 0x001c.

2.2 stringCount

The stringCount field specifies the number of strings in the StringPool.

2.3 styleCount

The styleCount field specifies the number of strings which have associated style data in the body of this chunk. This field maybe, and in fact usually is, zero.

2.4 flags

The flags field holds none, either, or both of the bit-flags SORTED_FLAG and UTF8_FLAG.

2.5 stringsStart

The stringsStart field specifies the offset from the start of the StringPool chunk to the start of the string data in the body of this chunk.

2.6 stylesStart

The stylesStart field specifies the offset from the start of the StringPool chunk to the start of the string style data in the body of this chunk. This field will be zero if the styleCount field is zero.

3.0 The StringPool Chunk Body

The StringPool chunk body comprises either four sections

  • a table of string indices

  • a table of style indices

  • the string data

  • the style data

if string style data is present, or two sections

  • a table of string indices

  • the string data

if it is not.

3.1 The String Indices

Immediately following the StringPool chunk header are stringCount 32-bit integers. Each integer specifies the start of the data defining a string as an offset from the start of the string data section.

3.2 The String Data

Despite the following comment in frameworks/base/include/ResourceTypes.h lines 370-375

At stringsStart are all of the UTF-16 strings concatenated together; each starts with a uint16_t of the string's length and each ends with a 0x0000 terminator. If a string is > 32767 characters, the high bit of the length is set meaning to take those 15 bits as a high word and it will be followed by another uint16_t containing the low word.

the string data can be in two different formats.

If the UTF8_FLAG is not set in the flags field then the string data format is as described in the comment, otherwise it is in a UTF-8 format.

3.2.1 The 16-bit Format

The 16-bit format is as described in the comment.

The data for each string comprises

  • the length of the string in characters

  • the 16-bit characters

  • a trailing 16-bit zero

The length is encoded as either one or two 16-bit integers as per the comment.

The length does not include the trailing zero.

3.2.2 UTF-8 Format

The UTF-8 data format can be determined by examining the code used to write it (see frameworks/base/tools/aapt/StringPool.cpp lines 233-277) or the code used to read it (see frameworks/base/include/ResourceTypes.cpp lines 545-564).

The data for each string comprises

  • the length of the string in characters

  • the length of the UTF-8 encoding of the string in bytes

  • the UTF-8 encoded string

  • a trailing 8-bit zero

The lengths are encoded in the same way as for the 16-bit format but using 8-bit rather than 16-bit integers.

The lengths do not include the trailing zero.

3.2.3 Padding

Irrespective of the format the string data section is always padded with zero bytes so that it ends on a 32-bit boundary. This ensures that 32-bit integer fields that follow in this chunk or in following chunks are correctly aligned.

An Aside

If you create an Android project using the Eclipse ADT plugin the string data in the StringPool chunks in the Resource Table will be in the UTF-8 format.

If you create an Android project using the android command line tool and then build it from the command line using ant the string data in the StringPool chunks in the Resource Table will be in the 16-bit format.

Strange but true.

3.3 The Style Indices

If present then the style indices section immediately follows the string indices section. It comprises styleCount 32-bit integers. Each integer specifies the start of the style data for a string as an offset from the start of the style data section.

The string and style indices are paired. The style data specified by the entry at index i in the style indices is for the string specified by the entry at index i in the string indices.

3.4 The Style Data

When present the style data section comprises styleCount pieces of individual string style data.

The style data for an individual string comprises a sequence of instances of the C++ struct ResStringPool_span which is defined as follows (see frameworks/base/include/ResourceTypes.h lines 416-429)

    struct ResStringPool_span
    {
        enum {
            END = 0xFFFFFFFF
        };

        // This is the name of the span -- that is, the name of the XML
        // tag that defined it.  The special value END (0xFFFFFFFF) indicates
        // the end of an array of spans.
        ResStringPool_ref name;

        // The range of characters in the string that this span applies to.
        uint32_t firstChar, lastChar;
    };

The sequence of ResStringPool_spans for an individual string is terminated by a 32-bit integer with the value END (0xFFFFFFFF).

The style data section itself is terminated by two further 32-bit integer each with the value END (0xFFFFFFFF).

4.0 The Example Annotated

This is the annotated version of the StringPool chunk immediately following the Table chunk header from the example.

    ...

    0000000c 01 00       // type [STRING_POOL]
    0000000e 1c 00       // header size
    00000010 d0 00 00 00 // chunk size
    --------------------

    00000014 06 00 00 00 // stringCount
    00000018 00 00 00 00 // styleCount
    0000001c 00 01 00 00 // flags
    00000020 34 00 00 00 // stringsStart (address 00000040)
    00000024 00 00 00 00 // stylesStart  (address 0000000c)
    ++++++++++++++++++++

    00000028 00 00 00 00 // string[0]
    0000002c 1d 00 00 00 // string[1]
    00000030 3a 00 00 00 // string[2]
    00000034 57 00 00 00 // string[3]
    00000038 6d 00 00 00 // string[4]
    0000003c 8f 00 00 00 // string[5]

    00000040 1a 1a 72 65 // [0] "res/drawable-ldpi/icon.png"
    00000044 73 2f 64 72
    00000048 61 77 61 62
    0000004c 6c 65 2d 6c
    00000050 64 70 69 2f
    00000054 69 63 6f 6e
    00000058 2e 70 6e 67
    0000005c 00 1a 1a 72 // [1] "res/drawable-mdpi/icon.png"
    00000060 65 73 2f 64
    00000064 72 61 77 61
    00000068 62 6c 65 2d
    0000006c 6d 64 70 69
    00000070 2f 69 63 6f
    00000074 6e 2e 70 6e
    00000078 67 00 1a 1a // [2] "res/drawable-hdpi/icon.png"
    0000007c 72 65 73 2f
    00000080 64 72 61 77
    00000084 61 62 6c 65
    00000088 2d 68 64 70
    0000008c 69 2f 69 63
    00000090 6f 6e 2e 70
    00000094 6e 67 00 13 // [3] "res/layout/main.xml"
    00000098 13 72 65 73
    0000009c 2f 6c 61 79
    000000a0 6f 75 74 2f
    000000a4 6d 61 69 6e
    000000a8 2e 78 6d 6c
    000000ac 00 1f 1f 48 // [4] "Hello World, PendragonActivity!"
    000000b0 65 6c 6c 6f
    000000b4 20 57 6f 72
    000000b8 6c 64 2c 20
    000000bc 50 65 6e 64
    000000c0 72 61 67 6f
    000000c4 6e 41 63 74
    000000c8 69 76 69 74
    000000cc 79 21 00 09 // [5] "Pendragon"
    000000d0 09 50 65 6e
    000000d4 64 72 61 67
    000000d8 6f 6e 00 00
    ==================== [End of STRING_POOL]

    ...

5.0 Styled Strings: An Example

For the Table’s StringPool chunk to contain any style data there must be at least one Resource which is a styled string.

If we create a vanilla Android project using ADT in Eclipse and then modify the generated strings.xml file to look like this


    <?xml version="1.0" encoding="utf-8"?>
    <resources>
        <string name="hello"><b>Hello</b> <u>World</u>, <i>TintagelActivity!</i></string>
        <string name="app_name">Tintagel</string>
    </resources>

then the resulting Resource Table’s StringPool chunk looks like this

    ...

    0000000c 01 00       // type [STRING_POOL]
    0000000e 1c 00       // header size
    00000010 3c 01 00 00 // chunk size
    --------------------

    00000014 09 00 00 00 // stringCount
    00000018 05 00 00 00 // styleCount
    0000001c 00 01 00 00 // flags
    00000020 54 00 00 00 // stringsStart (address 00000060)
    00000024 fc 00 00 00 // stylesStart  (address 00000108)
    ++++++++++++++++++++

    00000028 00 00 00 00 // string[0]
    0000002c 1d 00 00 00 // string[1]
    00000030 3a 00 00 00 // string[2]
    00000034 57 00 00 00 // string[3]
    00000038 6d 00 00 00 // string[4]
    0000003c 8e 00 00 00 // string[5]
    00000040 99 00 00 00 // string[6]
    00000044 9d 00 00 00 // string[7]
    00000048 a1 00 00 00 // string[8]

    0000004c 00 00 00 00 // style[0]
    00000050 04 00 00 00 // style[1]
    00000054 08 00 00 00 // style[2]
    00000058 0c 00 00 00 // style[3]
    0000005c 10 00 00 00 // style[4]

    00000060 1a 1a 72 65 // [0] "res/drawable-ldpi/icon.png"
    00000064 73 2f 64 72
    00000068 61 77 61 62
    0000006c 6c 65 2d 6c
    00000070 64 70 69 2f
    00000074 69 63 6f 6e
    00000078 2e 70 6e 67
    0000007c 00 1a 1a 72 // [1] "res/drawable-mdpi/icon.png"
    00000080 65 73 2f 64
    00000084 72 61 77 61
    00000088 62 6c 65 2d
    0000008c 6d 64 70 69
    00000090 2f 69 63 6f
    00000094 6e 2e 70 6e
    00000098 67 00 1a 1a // [2] "res/drawable-hdpi/icon.png"
    0000009c 72 65 73 2f
    000000a0 64 72 61 77
    000000a4 61 62 6c 65
    000000a8 2d 68 64 70
    000000ac 69 2f 69 63
    000000b0 6f 6e 2e 70
    000000b4 6e 67 00 13 // [3] "res/layout/main.xml"
    000000b8 13 72 65 73
    000000bc 2f 6c 61 79
    000000c0 6f 75 74 2f
    000000c4 6d 61 69 6e
    000000c8 2e 78 6d 6c
    000000cc 00 1e 1e 48 // [4] "Hello World, TintagelActivity!"
    000000d0 65 6c 6c 6f
    000000d4 20 57 6f 72
    000000d8 6c 64 2c 20
    000000dc 54 69 6e 74
    000000e0 61 67 65 6c
    000000e4 41 63 74 69
    000000e8 76 69 74 79
    000000ec 21 00 08 08 // [5] "Tintagel"
    000000f0 54 69 6e 74
    000000f4 61 67 65 6c
    000000f8 00 01 01 62 // [6] "b"
    000000fc 00 01 01 75 // [7] "u"
    00000100 00 01 01 69 // [8] "i"
    00000104 00 00 00 00
    00000108 ff ff ff ff // [0] END
    0000010c ff ff ff ff // [1] END
    00000110 ff ff ff ff // [2] END
    00000114 ff ff ff ff // [3] END
    00000118 06 00 00 00 // [4][0] name
    0000011c 00 00 00 00 // [4][0] firstChar
    00000120 04 00 00 00 // [4][0] lastChar
    00000124 07 00 00 00 // [4][1] name
    00000128 06 00 00 00 // [4][1] firstChar
    0000012c 0a 00 00 00 // [4][1] lastChar
    00000130 08 00 00 00 // [4][2] name
    00000134 0d 00 00 00 // [4][2] firstChar
    00000138 1d 00 00 00 // [4][2] lastChar
    0000013c ff ff ff ff // [4] END
    00000140 ff ff ff ff // 
    00000144 ff ff ff ff // 
    ==================== [End of STRING_POOL]

    ...

The first four strings in the StringPool are not styled and hence have no style data but empty entries need to be present so that the style data for the fifth string which is styled can be represented.

Another Aside

A curious thing about styled strings is that you can style them any way you like.

The example above actually works. This is the result

However changing the strings.xml file above to look like this


    <?xml version="1.0" encoding="utf-8"?>
    <resources>
        <string name="hello"><bold>Hello</bold> <underline>World</underline>, <italic>TintagelActivity!</italic></string>
        <string name="app_name">Tintagel</string>
    </resources>

still results in a StringPool chunk with style data it is just has no effect whatsoever.


Copyright (c) 2011 By Simon Lewis. All Rights Reserved.

Advertisements

6 Comments »

  1. […] a StringPool chunk […]

    Pingback by Android Internals: Binary XML – Part Two: The XML Chunk « Just An Application — September 22, 2011 @ 7:15 am

  2. // Index from header of the string data.
    // Index from header of the style data.

    Shouldn’t that be “Index from [top of the] header TO the … data”?

    Comment by Pawel Veselov — November 18, 2011 @ 4:44 am

    • Again the code is taken directly from the Android source code. Something like

      // the offset from the start of this header to the start of the string data

      might be clearer if slightly more verbose 🙂

      Comment by Simon Lewis — November 18, 2011 @ 8:47 am

  3. 00000060 1a 1a 72 65 // [0] “res/drawable-ldpi/icon.png”
    00000064 73 2f 64 72
    00000068 61 77 61 62
    0000006c 6c 65 2d 6c
    00000070 64 70 69 2f
    00000074 69 63 6f 6e
    00000078 2e 70 6e 67
    0000007c 00

    Why does ‘1a’ repeat twice? The description suggests it would start with an 8bit length, and if 0x80 is set, it would followed by another 8bit for lower bits of the length…

    Comment by Pawel Veselov — November 18, 2011 @ 7:30 am

    • The first one is the number of characters in the string, the second one the length of the UTF8 encoding in bytes.

      As it is an ASCII string they are the same.

      Comment by Simon Lewis — November 18, 2011 @ 8:54 am

  4. […] The dataType field is 3 which identifies it as as string. The data field is 4, which, in the case of a value of type string, is the index of the string within the String pool chunk […]

    Pingback by Android Internals: Resource Ids And Resource Lookup « Just An Application — January 22, 2013 @ 9:46 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: