Just An Application

August 25, 2014

Anatomy Of A PDF: Part Eight — The Denouement

Filed under: BMP, PDF, Security, XFA — Tags: , , , , , — Simon Lewis @ 9:42 am

Given that this thing is in the wild and putting aside the possibility that it is a piece of very elaborate performance art then it must be targeting an actual vulnerability.

Its pretty obvious what the program is and what the platform is, so typing those along with terms like PDF, XFA, and BMP in some combination into your search engine of choice turns up all sorts of stuff but it looks like CVE-2013-2729 is the vulnerability in question.

See here and here for the gory details of how it actually exploits the heap corruption and what happens once it has done so.

The heap implementation targeted is the Low Fragmentation Heap (LFH). See the paper “Understanding the Low Fragmentation Heap” by Chris Valasek for a detailed description of how it works and how to do unpleasant things to it. Be aware that this is a PDF which in the circumstances … ! You can find it here.

For details of how synthesized x86 machine code can be made to run when it shouldn’t be see here. Warning, may contain assembler


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Advertisements

August 24, 2014

Anatomy Of A PDF: Part Five — Q: When Is A Form Not A Form ? A: When It Is A Can Of Worms

Filed under: BMP, Document Format, Image Formats, Javascript, PDF, Programming Languages, XFA — Tags: , , , , — Simon Lewis @ 12:56 pm

Top-Level Structure

This is the top-level structure of the XFA form lurking in Object 1.

I have omitted all the non-interesting elements, elided the contents of the image and script elements, and renamed a few things but apart from that this is what 91MB of XML looks like.

Impressive isn’t it ?


    <xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/" timeStamp="2014-01-21T18:18:41Z">
        <template xmlns="http://www.xfa.org/schema/xfa-template/3.1/">
            <?formServer defaultPDFRenderFormat acrobat9.1static?>
            ...
            <subform ...>
                
                ...
                <field name="Image_1">
                    <ui>
                        <imageEdit/>
                    </ui>
                    <value>
                        <image>...</image>
                    </value>
                </field>
                <field name="Image_2">
                    <ui>
                        <imageEdit/>
                    </ui>
                    <value>
                        <image>...</image>
                    </value>
                </field>
                <variables>
                    <script name="..." contentType="application/x-javascript">...</script>
                    <script name="..." contentType="application/x-javascript">...</script>
                    <?templateDesigner expand 1?>
                </variables>
                <subform ...>
                    <field name="Image_3">
                        <ui>
                            <imageEdit/>
                        </ui>
                        <value>
                            <image>...</image>
                        </value>
                    </field>
                </subform>
                <event activity="initialize" name="...">
                    <script contentType="application/x-javascript">...</script>
                </event>
                <event activity="docReady" ref="$host" name="...">
                    <script contentType="application/x-javascript">...</script>
                </event>
            </subform>
            ...
        </template>
        
        
        ...
        
    </xdp:xdp>    

Scripts

As predicted the form contains scripts.

As you can see there are four script elements containing chunks of Javascript.

Taking all four together there is approximately 20KB of Javascript.

Two of the script elements are associated with events so one lot of Javascript will get to run when the “initialize” event occurs and the other lot when the “docReady” event occurs.

Images

As you can see there are three images.

The default encoding for image data in XFA is Base64. This is not over-ridden anywhere so the data for each image is Base64 encoded.

Obfuscation

What you cannot see because I’ve omitted it, is that the chunks of Javascript are partially and mildly obfuscated.

More Scripts

To see what I mean by mildly obfuscated consider the following.

One of the Javascript chunks contains these strings.

    "VW~`~`~XYZa!~`bcde!fghij~`~klm~``~~nopqrs~````tuvwx~~~yz01!234~~56789+``/~~~"
    
    "AB!~CDEF!`GHI!``JKL!~~MNOP!```QRSTU"

If you bolt the first on to the end of the second and then strip out all the occurences of the characters ‘!’, ”~’, and ‘`’ you end up with

    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

which looks remarkably like the ‘Base 64 Alphabet’ as described in RFC 4648 because that’s what it is.

There are also unobfuscated references to the images I’ve named “Image_1” and “Image_2” in the Javascript.

Given that

  1. the image data is Base64 encoded, and

  2. that there are the makings of a ‘Base 64 Alphabet’ sat in the Javascript,

it doesn’t take an enormous leap of imagination to wonder whether the referenced images are really images at all or whether the Javascript is going to decode them and turn them into something else entirely.

Extracting the image data into files and Base64 decoding them produces two more chunks of Javascript.

One of them contains a table indexed by version number indicating presumably that the Javascript can tailor its behaviour depending on what version of the target executable it finds itself running in.

Dark Matter

OK so there’s around 20KB of Javascript.

The two pseudo images taken together are about 10KB.

There is maybe another 10KB of XFA related random angle bracket action.

Where is the other ~90.95MB ?

Its the data for what I’ve named “Image_3”.

A Really Really Big Image

Apparently Image_3 is a really really big image, but is it ?

Given that Image_1 and Image_2 turned out not to be images at all is Image_3 also something else in disguise ?

Base64 decoding the data does not produce Javascript.

What it produces is this

    0000000 42 4d 00 00 00 00 00 00 00 00 00 00 00 00 40 00
    0000010 00 00 2e 01 00 00 01 00 00 00 01 00 08 00 01 00
    0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00
    0000030 00 00 00 00 00 00 52 47 42 41 52 47 42 41 00 02
    0000040 ff 00 00 02 ff 00 00 02 ff 00 00 02 ff 00 00 02
    *
    4040440 f8 00 00 08 01 00 00 00 00 00 27 05 00 02 00 ff
    4040450 00 02 00 ff 00 02 00 ff 00 02 00 ff 00 02 00 ff
    *
    4040470 00 02 00 ff 00 0a 58 58 58 58 58 58 58 58 58 58
    4040480

Yes that is the entire thing.

As you can see although it starts off promisingly it becomes tediously predictable almost straightaway with the four bytes

    00 02 ff 00

repeating over and over and over again.

That’s what’s IN it, but what IS it ?

The first two bytes are the ASCII characters ‘B’ and ‘M’ which would seem to indicate that it is a BMP image which is a pain because the BMP image format is not officially documented.

According to the various bits of unofficial documentation BMP images can have a bewilderingly variety of headers and this one seems to have an OS/2 2.x header.

Treating it as an OS/2 2.x header would mean that the compression type of the image is RLE/8 where RLE means run-length encoded. This is supported by the image data which follows the header which makes sense as RLE/8 data.

Javascript + BMP Image == What ?

So there you have it.

An XFA form with four chunks of Javascript partially and not very successfully obfuscated, two hidden chunks of Javascript likewise, and one great big run-length encoded BMP image.

Why ?

What is this particular can of worms going to do when unleashed on a poor, unsuspecting, but probably not very little executable ?


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

August 23, 2014

Anatomy Of A PDF: Part Three — Objects, Objects, Objects

According to the cross-reference table there are six objects in use starting with object number 1.

Starting from the root object as specified by the trailer.

Object 3

Object 3 starts at 13294 which 0x33EE.

    3 0 obj
    <<
    /Extensions <</ADBE <</ExtensionLevel 3 /BaseVersion /1.7 >> >>
    /AcroForm 2 0 R
    /Type /Catalog
    /Pages 4 0 R
    /NeedsRendering true
    >>
    endobj

The root object should be a Catalog and the Type entry shows that it is.

AcroForm

The presence of the AcroForm entry indicates that the PDF contains an interactive form.

The entry references Object 2 which should therefore be an interactive form dictionary specifying the form.

Pages

The Pages entry references Object 4 which should therefore be a page tree node.

NeedsRendering

The NeedsRendering entry is another clue that the document contains a form.

Object 2

Object 2 start at offset 13263 which is 0x33CF and it looks like this.

    2 0 obj
    <</XFA 1 0 R >>
    endobj

According to Object 3, the Document Catalog, this should be an interactive form dictionary and the presence of the XFA entry confirms that it is.

XFA is the XML Forms Architecture. Amongst other things it supports scripting, and the Adobe implementation the scripting support includes Javascript, which in this context seems highly significant.

The XFA entry identifies the object which contains the XML which describes the XFA form.

The referenced object is Object 1, so ten to one on there are scripts in Object 1 and they do something unpleasant.

Object 1

Object 1 starts at offset 15.

According to Object 2 it should contain XML specifying an XFA form.

It is the biggest object by far so it definitely contains something, and presumably something nasty at that, so we’ll save it for later.

Object 4

Object 4 starts at 13443 which is 0x3483 and it looks like this

    4 0 obj
    <<
        /Count 1
        /Kids [5 0 R]
        /Type /Pages
    >>
    endobj

According to Object 3, the Document Catalog, this should be a page tree node with a Type of Pages and it is.

The value of the Kids (sic) entry is an Array of Object References of length one.

This entry identifies the children of this object, which are either collections of pages or individual pages which make up the document.

Object 5

Object 5 starts at 13499 which is 0x34BB and it looks like this

    5 0 obj
    <<
        /Parent 4 0 R
        /Type /Page
        /Contents 6 0 R
        /Resources << /Font <&lt/F1 <</BaseFont /Helvetica /Subtype /Type1 /Name /F1 >> >> >>
    >>
    endobj

According to Object 4, the Pages Object, this should be either another page tree node or page object. The Type entry Page tells us that it is a page object.

The Parent entry is a reference to Object 4 which is indeed the parent of this object.

The Contents entry is a reference to Object 6.

The Resources entry is an example of an entry whose value is a dictionary, which is itself an example of a dictionary with entries whose values are dictionaries.

Object 6

Object 6 starts at 13644 which is 0x354c.

    6 0 obj
    <</Length 23 >>
    stream
    BT /F1 24 Tf 100 100 Td
    endstream
    endobj

We know that it is the contents of the page represented by Object 5.

It is in fact a Content Stream which comprises a series of graphic operators and their operands, with the operands preceding their operators (cough, Postscript, cough).

For what its worth

    BT

is the ‘begin text’ operator.

    Tf

is the ‘set text font and size’ operator.

It is preceded by its operands

    /F1 24

with F1 being the font as defined in Object 5 and 24 being the size.

    Td

is the ‘move text’ operator.

It is preceded by its operands

    100 100

And that’s it. Not a very exciting page it has to be said.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Create a free website or blog at WordPress.com.

%d bloggers like this: