Just An Application

August 25, 2014

Anatomy Of A PDF: Part Eight — The Denouement

Filed under: BMP, PDF, Security, XFA — Tags: , , , , , — Simon Lewis @ 9:42 am

Given that this thing is in the wild and putting aside the possibility that it is a piece of very elaborate performance art then it must be targeting an actual vulnerability.

Its pretty obvious what the program is and what the platform is, so typing those along with terms like PDF, XFA, and BMP in some combination into your search engine of choice turns up all sorts of stuff but it looks like CVE-2013-2729 is the vulnerability in question.

See here and here for the gory details of how it actually exploits the heap corruption and what happens once it has done so.

The heap implementation targeted is the Low Fragmentation Heap (LFH). See the paper “Understanding the Low Fragmentation Heap” by Chris Valasek for a detailed description of how it works and how to do unpleasant things to it. Be aware that this is a PDF which in the circumstances … ! You can find it here.

For details of how synthesized x86 machine code can be made to run when it shouldn’t be see here. Warning, may contain assembler


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Anatomy Of A PDF: Part Seven — How To Overflow An Unsigned Integer Using Nothing But Bytes !

Filed under: BMP, BMP Run Length Encoding, Image Formats, Security — Tags: , , , — Simon Lewis @ 8:33 am

The theory of overflowing unsigned integers is, fortunately, well understood.

You fill them up with ‘F’s, you can use ‘f’s instead if you prefer, and then you add 1, like so.

    ...
    
    uint32_t i = 0xFFFFFFFF;

    i += 1;
    
    ...

and i is now zero.

Nothing to it.

Alternative combinations of ‘F’s not to mention other hex digits and increments are possible but its best to start off with the base case and make sure you’ve got the hang of it before moving on to the more advanced stuff.

One obvious application of this is when you have an excess of large unsigned integers and a shortage of zeroes. You can use this technique to turn the former into the latter.

Another application is this

    0000000 42 4d 00 00 00 00 00 00 00 00 00 00 00 00 40 00
    0000010 00 00 2e 01 00 00 01 00 00 00 01 00 08 00 01 00
    0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00
    0000030 00 00 00 00 00 00 52 47 42 41 52 47 42 41 00 02
    0000040 ff 00 00 02 ff 00 00 02 ff 00 00 02 ff 00 00 02
    *
    4040440 f8 00 00 08 01 00 00 00 00 00 27 05 00 02 00 ff
    4040450 00 02 00 ff 00 02 00 ff 00 02 00 ff 00 02 00 ff
    *
    4040470 00 02 00 ff 00 0a 58 58 58 58 58 58 58 58 58 58
    4040480

which is the really really big BMP that came free with the PDF.

The result of Base64 decoding the image data in the XML clocks in at 67,372,160 bytes, which is surprisingly large for an image which is only 302×1.

The reason that you can fit 67MB of data on nine lines of hexdump output is that the data is intensively repetitive.

The vast majority of the file comprises the four bytes

    00 02 ff 00

The first occurence is at 0x00003E and the last occurence ends at 0x404043E which gives a grand total of 0x4040400 bytes or 0x1010100 repetitions.

If you multiply 0xFF by 0x1010100 you get 0xFFFFFF00 which is a mere 0xFF+1 away from being 0 in certain circumstances.

This is interesting because that multiplication is equivalent to the effect of the four bytes

    00 02 ff 00

being repeated 16843008 times in the BMP image data.

As we know the header specifies that the image is run length encoded which means that the image data is a series of commands which are evaluated at runtime to produce the bytes that comprise the image.

The image is built up left to right, bottom to top as the commands are evaluated.

If the result of evaluating a command is a sequence of bytes, those bytes are appended at the offset within the current line specified by the current x position.

Both the current x position and the current y position, which specifies the current line, can be changed using a delta command.

The four bytes

    00 02 ff 00

are an example of a delta command the effect of which is to add 0xFF to the current x position.

If that command is successfully evaluated 16843008 times then you know

  1. that whatever is doing the evaluating is not doing any kind of bounds checking in this particular case, and

  2. that the current x position is now 0xFFFFFF00

The command at 0x404043E which immediately follows the repetitions is another delta command

    00 02 f8 00

except that this time it adds 0xF8 to the current x position which brings it up 0xFFFFFFf8 as well as creating the expectation that the next command is going to feature the number 8.

The next command is at 0x4040442 and it is this

    00 08 01 00 00 00 00 00 27 05

and lo and behold there is the number 8.

The effect of the command is to add the last eight bytes of the command as image data at the current x position

Clearly at this point it would be a very irresponsible of whatever it is that is evaluating the commands not to perform a bounds check, which means that we are going to discover exactly how the current x position is being represented.

The width and height fields of this particular BMP are signed 32-bit integers so it would not be unreasonable for the current x position to be represented using an unsigned 32 bit integer.

If this command succeeds than either

  • there is no bounds checking

OR

  • the current x position is represented by the command evaluator as an unsigned 32 bit integer

Whichever it is, those eight bytes are going to end up somewhere where they are going to make no useful contribution to the image at all.

After all the excitement of adding some actual bytes, albeit in the wrong place, it all seems to go horribly wrong.

Starting at 0x404044C another delta command

    00 02 00 ff

which adds 0xFF to the current y position, is repeated ten times.

The final command at 0x4040474

    00 0a 58 58 58 58 58 58 58 58 58 58

then adds ten bytes at the current x position, except of course that it doesn’t.

The current y position is out of bounds, the image is supposed to be 302×1 after all, and nothing like large enough to benefit from any kind of unsigned integer overflow and that is intentional.

Causing the image load to fail results in the memory allocated for the image being freed.

If the eight bytes that actually got written have ended up in the right wrong place then presumably their effect is to cause the heap management code to free the wrong object at this point.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

August 24, 2014

Anatomy Of A PDF: Part Five — Q: When Is A Form Not A Form ? A: When It Is A Can Of Worms

Filed under: BMP, Document Format, Image Formats, Javascript, PDF, Programming Languages, XFA — Tags: , , , , — Simon Lewis @ 12:56 pm

Top-Level Structure

This is the top-level structure of the XFA form lurking in Object 1.

I have omitted all the non-interesting elements, elided the contents of the image and script elements, and renamed a few things but apart from that this is what 91MB of XML looks like.

Impressive isn’t it ?


    <xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/" timeStamp="2014-01-21T18:18:41Z">
        <template xmlns="http://www.xfa.org/schema/xfa-template/3.1/">
            <?formServer defaultPDFRenderFormat acrobat9.1static?>
            ...
            <subform ...>
                
                ...
                <field name="Image_1">
                    <ui>
                        <imageEdit/>
                    </ui>
                    <value>
                        <image>...</image>
                    </value>
                </field>
                <field name="Image_2">
                    <ui>
                        <imageEdit/>
                    </ui>
                    <value>
                        <image>...</image>
                    </value>
                </field>
                <variables>
                    <script name="..." contentType="application/x-javascript">...</script>
                    <script name="..." contentType="application/x-javascript">...</script>
                    <?templateDesigner expand 1?>
                </variables>
                <subform ...>
                    <field name="Image_3">
                        <ui>
                            <imageEdit/>
                        </ui>
                        <value>
                            <image>...</image>
                        </value>
                    </field>
                </subform>
                <event activity="initialize" name="...">
                    <script contentType="application/x-javascript">...</script>
                </event>
                <event activity="docReady" ref="$host" name="...">
                    <script contentType="application/x-javascript">...</script>
                </event>
            </subform>
            ...
        </template>
        
        
        ...
        
    </xdp:xdp>    

Scripts

As predicted the form contains scripts.

As you can see there are four script elements containing chunks of Javascript.

Taking all four together there is approximately 20KB of Javascript.

Two of the script elements are associated with events so one lot of Javascript will get to run when the “initialize” event occurs and the other lot when the “docReady” event occurs.

Images

As you can see there are three images.

The default encoding for image data in XFA is Base64. This is not over-ridden anywhere so the data for each image is Base64 encoded.

Obfuscation

What you cannot see because I’ve omitted it, is that the chunks of Javascript are partially and mildly obfuscated.

More Scripts

To see what I mean by mildly obfuscated consider the following.

One of the Javascript chunks contains these strings.

    "VW~`~`~XYZa!~`bcde!fghij~`~klm~``~~nopqrs~````tuvwx~~~yz01!234~~56789+``/~~~"
    
    "AB!~CDEF!`GHI!``JKL!~~MNOP!```QRSTU"

If you bolt the first on to the end of the second and then strip out all the occurences of the characters ‘!’, ”~’, and ‘`’ you end up with

    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

which looks remarkably like the ‘Base 64 Alphabet’ as described in RFC 4648 because that’s what it is.

There are also unobfuscated references to the images I’ve named “Image_1” and “Image_2” in the Javascript.

Given that

  1. the image data is Base64 encoded, and

  2. that there are the makings of a ‘Base 64 Alphabet’ sat in the Javascript,

it doesn’t take an enormous leap of imagination to wonder whether the referenced images are really images at all or whether the Javascript is going to decode them and turn them into something else entirely.

Extracting the image data into files and Base64 decoding them produces two more chunks of Javascript.

One of them contains a table indexed by version number indicating presumably that the Javascript can tailor its behaviour depending on what version of the target executable it finds itself running in.

Dark Matter

OK so there’s around 20KB of Javascript.

The two pseudo images taken together are about 10KB.

There is maybe another 10KB of XFA related random angle bracket action.

Where is the other ~90.95MB ?

Its the data for what I’ve named “Image_3”.

A Really Really Big Image

Apparently Image_3 is a really really big image, but is it ?

Given that Image_1 and Image_2 turned out not to be images at all is Image_3 also something else in disguise ?

Base64 decoding the data does not produce Javascript.

What it produces is this

    0000000 42 4d 00 00 00 00 00 00 00 00 00 00 00 00 40 00
    0000010 00 00 2e 01 00 00 01 00 00 00 01 00 08 00 01 00
    0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00
    0000030 00 00 00 00 00 00 52 47 42 41 52 47 42 41 00 02
    0000040 ff 00 00 02 ff 00 00 02 ff 00 00 02 ff 00 00 02
    *
    4040440 f8 00 00 08 01 00 00 00 00 00 27 05 00 02 00 ff
    4040450 00 02 00 ff 00 02 00 ff 00 02 00 ff 00 02 00 ff
    *
    4040470 00 02 00 ff 00 0a 58 58 58 58 58 58 58 58 58 58
    4040480

Yes that is the entire thing.

As you can see although it starts off promisingly it becomes tediously predictable almost straightaway with the four bytes

    00 02 ff 00

repeating over and over and over again.

That’s what’s IN it, but what IS it ?

The first two bytes are the ASCII characters ‘B’ and ‘M’ which would seem to indicate that it is a BMP image which is a pain because the BMP image format is not officially documented.

According to the various bits of unofficial documentation BMP images can have a bewilderingly variety of headers and this one seems to have an OS/2 2.x header.

Treating it as an OS/2 2.x header would mean that the compression type of the image is RLE/8 where RLE means run-length encoded. This is supported by the image data which follows the header which makes sense as RLE/8 data.

Javascript + BMP Image == What ?

So there you have it.

An XFA form with four chunks of Javascript partially and not very successfully obfuscated, two hidden chunks of Javascript likewise, and one great big run-length encoded BMP image.

Why ?

What is this particular can of worms going to do when unleashed on a poor, unsuspecting, but probably not very little executable ?


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Blog at WordPress.com.