Javascript | Just An Application

August 25, 2014

Anatomy Of A PDF: Part Six — The Allocation And Deallocation Of Strings By Javascript Considered Harmful ?

Filed under: Javascript, Security — Tags: Javascript, Security — Simon Lewis @ 6:53 am

When it is not Base64 decoding other bits of Javascript disguised as images the Javascript in the XFA form is mostly messing about with strings which on the face of it doesn’t look as though it could be that dangerous.

The messing about occurs in three distinct phases.

The Allocation And Selective Deallocation Phase

The Javascript triggered by the initialize event starts by creating four arrays.

It then proceeds to set each element of the first array to a newly allocated string.

Each string is the result of concatenating two distinctive patterns plus two elements computed using the current iteration index.

Interestingly the size of the created strings looks very much as though it may be the same as the width of the really really big BMP image.

Having gone to all that trouble to create all those different strings it then proceeds to repeatedly null out every tenth element of the first array.

Search Phase

The search phase is performed by the Javascript triggered by the docReady event.

The code iterates over all the elements of the first array. If the element is not null the first character of the string is compared with the value that was used to initialize it when the string was created.

If they are not equal the index of the element is noted, the contents of part of the changed string are saved and the iteration terminates.

Targetted Deallocation and Second Allocation Phase

This phase is also performed by the Javascript triggered by the docReady event but only occurs if a changed string was found during the search phase.

This phase begins by assembling a new string from various bits of hex.

The element in the first array which references the changed string and the element before it in the array are then repeatedly set to null.

Each element of the second array is then set to a string created from the new string constructed at the start of this phase.

It is at this point there is a reference to “Image_2” which we know to be Base64 encoded Javascript.

The image data is decoded and then handed off to eval which is masequerading under an alias !

The “Image_2” code builds another string out of information from the version indexed table in “Image_1”, large amounts of repeated hex, some of which is in all probability x86 code, the strings

    "VirtualProtect"

and

    "KERNEL32"

plus the result of unescaping a big chunk of data which, it turns out, contains, amongst other things, the hard-wired URL of a .exe.

I cannot begin to imagine what they are going to try and do with all of that !

The string is then repeatedly concatenated with itself until the result is a string which occupies at least 1MB of memory.

One hundred copies of this string are then created, presumably just to be on the safe side ?

By this point the heap must surely consist almost entirely of x86 code !

Conjecture

The order of events is

the initialize event code runs.
System attempts to load really, really, big BMP
the docReady event code runs.

and what happens is

the initialize event code sets up a known heap state based on the allocation of strings with distinctive contents, interspersed with holes created by freeing every tenth string.
the System image loading code corrupts the heap attempting to load the really, really, big BMP
he docReady event code finds the location of the heap corruption on the basis of the known state set up by the initialize event code and exploits it.

Given that the Javascript it looking for a string that has changed underneath it so to speak the assumption must be that the string has been freed despite still being in use. Once freed it is re-used by non Javascript code which changes it.

This in turn implies that the heap corruption involves the data used by the heap management code to keep track of which pieces of memory in the heap are in use and which are free. Such data is often contiguous with the memory that is allocated, in which case the easiest way to corrupt it is to write beyond the limits of a piece of memory as allocated which is what the really really big BMP is for.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate
and specific direction to the original content.

August 24, 2014

Anatomy Of A PDF: Part Five — Q: When Is A Form Not A Form ? A: When It Is A Can Of Worms

Filed under: BMP, Document Format, Image Formats, Javascript, PDF, Programming Languages, XFA — Tags: BMP, Javascript, PDF, ProgrammingLanguages, XFA — Simon Lewis @ 12:56 pm

Top-Level Structure

This is the top-level structure of the XFA form lurking in Object 1.

I have omitted all the non-interesting elements, elided the contents of the image and script elements, and renamed a few things but apart from that this is what 91MB of XML looks like.

Impressive isn’t it ?


    <xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/" timeStamp="2014-01-21T18:18:41Z">
        <template xmlns="http://www.xfa.org/schema/xfa-template/3.1/">
            <?formServer defaultPDFRenderFormat acrobat9.1static?>
            ...
            <subform ...>
                
                ...
                <field name="Image_1">
                    <ui>
                        <imageEdit/>
                    </ui>
                    <value>
                        <image>...</image>
                    </value>
                </field>
                <field name="Image_2">
                    <ui>
                        <imageEdit/>
                    </ui>
                    <value>
                        <image>...</image>
                    </value>
                </field>
                <variables>
                    <script name="..." contentType="application/x-javascript">...</script>
                    <script name="..." contentType="application/x-javascript">...</script>
                    <?templateDesigner expand 1?>
                </variables>
                <subform ...>
                    <field name="Image_3">
                        <ui>
                            <imageEdit/>
                        </ui>
                        <value>
                            <image>...</image>
                        </value>
                    </field>
                </subform>
                <event activity="initialize" name="...">
                    <script contentType="application/x-javascript">...</script>
                </event>
                <event activity="docReady" ref="$host" name="...">
                    <script contentType="application/x-javascript">...</script>
                </event>
            </subform>
            ...
        </template>
        
        
        ...
        
    </xdp:xdp>

Scripts

As predicted the form contains scripts.

As you can see there are four script elements containing chunks of Javascript.

Taking all four together there is approximately 20KB of Javascript.

Two of the script elements are associated with events so one lot of Javascript will get to run when the “initialize” event occurs and the other lot when the “docReady” event occurs.

Images

As you can see there are three images.

The default encoding for image data in XFA is Base64. This is not over-ridden anywhere so the data for each image is Base64 encoded.

Obfuscation

What you cannot see because I’ve omitted it, is that the chunks of Javascript are partially and mildly obfuscated.

More Scripts

To see what I mean by mildly obfuscated consider the following.

One of the Javascript chunks contains these strings.

    "VW~`~`~XYZa!~`bcde!fghij~`~klm~``~~nopqrs~````tuvwx~~~yz01!234~~56789+``/~~~"
    
    "AB!~CDEF!`GHI!``JKL!~~MNOP!```QRSTU"

If you bolt the first on to the end of the second and then strip out all the occurences of the characters ‘!’, ”~’, and ‘`’ you end up with

    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

which looks remarkably like the ‘Base 64 Alphabet’ as described in RFC 4648 because that’s what it is.

There are also unobfuscated references to the images I’ve named “Image_1” and “Image_2” in the Javascript.

Given that

the image data is Base64 encoded, and
that there are the makings of a ‘Base 64 Alphabet’ sat in the Javascript,

it doesn’t take an enormous leap of imagination to wonder whether the referenced images are really images at all or whether the Javascript is going to decode them and turn them into something else entirely.

Extracting the image data into files and Base64 decoding them produces two more chunks of Javascript.

One of them contains a table indexed by version number indicating presumably that the Javascript can tailor its behaviour depending on what version of the target executable it finds itself running in.

Dark Matter

OK so there’s around 20KB of Javascript.

The two pseudo images taken together are about 10KB.

There is maybe another 10KB of XFA related random angle bracket action.

Where is the other ~90.95MB ?

Its the data for what I’ve named “Image_3”.

A Really Really Big Image

Apparently Image_3 is a really really big image, but is it ?

Given that Image_1 and Image_2 turned out not to be images at all is Image_3 also something else in disguise ?

Base64 decoding the data does not produce Javascript.

What it produces is this

    0000000 42 4d 00 00 00 00 00 00 00 00 00 00 00 00 40 00
    0000010 00 00 2e 01 00 00 01 00 00 00 01 00 08 00 01 00
    0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00
    0000030 00 00 00 00 00 00 52 47 42 41 52 47 42 41 00 02
    0000040 ff 00 00 02 ff 00 00 02 ff 00 00 02 ff 00 00 02
    *
    4040440 f8 00 00 08 01 00 00 00 00 00 27 05 00 02 00 ff
    4040450 00 02 00 ff 00 02 00 ff 00 02 00 ff 00 02 00 ff
    *
    4040470 00 02 00 ff 00 0a 58 58 58 58 58 58 58 58 58 58
    4040480

Yes that is the entire thing.

As you can see although it starts off promisingly it becomes tediously predictable almost straightaway with the four bytes

    00 02 ff 00

repeating over and over and over again.

That’s what’s IN it, but what IS it ?

The first two bytes are the ASCII characters ‘B’ and ‘M’ which would seem to indicate that it is a BMP image which is a pain because the BMP image format is not officially documented.

According to the various bits of unofficial documentation BMP images can have a bewilderingly variety of headers and this one seems to have an OS/2 2.x header.

Treating it as an OS/2 2.x header would mean that the compression type of the image is RLE/8 where RLE means run-length encoded. This is supported by the image data which follows the header which makes sense as RLE/8 data.

Javascript + BMP Image == What ?

So there you have it.

An XFA form with four chunks of Javascript partially and not very successfully obfuscated, two hidden chunks of Javascript likewise, and one great big run-length encoded BMP image.

Why ?

What is this particular can of worms going to do when unleashed on a poor, unsuspecting, but probably not very little executable ?