Just An Application

September 8, 2014

The Mystery Of The Unsigned JAR: Part Four — JARs Within JARs

Filed under: Security, Things That Go Bump In The Night — Tags: , — Simon Lewis @ 7:15 am

Given the calls to getClass and getResourceAsStream in the load method, the hidden JAR is obviously in the JAR.

There are only six files in the JAR and five of them are a priori not a JAR so that leaves the putative ‘dll’[1].

In fact the ‘dll’ is an encrypted JAR.

It contains a variety of things none of which you would want running on your computer.

Perhaps the most interesting thing is that it is multi-platform. There is code for installing itself on Linux, MacOS X and Windows.

The one thing I am not clear about is how the thing is intended to be run in the first place. The top-level JAR contains a manifest with a

    Main-Class

entry, so the JAR is ‘runnable’ in that respect, but it arrived as an attachment in an e-mail.

Are there really still/were there ever e-mail clients that would automatically run a JAR found as an attachment and without any kind of attempt at sand-boxing ?

Notes

  1. The ‘dll’ might have been more credible as a dll if there had been an actual call to System.loadLibrary somewhere


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

September 7, 2014

The Mystery Of The Unsigned JAR: Part Three — The Little JAR That Hid

This is the bytecode for a method called load with everything removed except the calls to methods in the Java APIs.

As you can see, it is possible to deduce what the method is doing solely on the basis of the Java API method calls.


    public void load();
      Code:
         0: aload_0
         1: dup
         2: invokevirtual #6                  // Method java/lang/Object.getClass:()Ljava/lang/Class;
         
            ...
            
        11: invokevirtual #7                  // Method java/lang/Class.getResourceAsStream:(Ljava/lang/String;)Ljava/io/InputStream;
        
            ...
        
        47: new           #12                 // class java/util/jar/JarInputStream
        50: dup
        51: new           #13                 // class java/io/ByteArrayInputStream
        54: dup
        55: aload_1
        56: invokespecial #14                 // Method java/io/ByteArrayInputStream."<init>":([B)V
        59: invokespecial #15                 // Method java/util/jar/JarInputStream."<init>":(Ljava/io/InputStream;)V
        62: astore_2
        
            ...
            
        75: aload_2
        76: invokevirtual #16                 // Method java/util/jar/JarInputStream.getNextJarEntry:()Ljava/util/jar/JarEntry;
        79: dup
        80: astore        4
        82: ifnull        235
        85: aload         4
        87: invokevirtual #17                 // Method java/util/jar/JarEntry.getName:()Ljava/lang/String;
        90: dup
        91: astore        5
        
            ...
            
        99: invokevirtual #18                 // Method java/lang/String.endsWith:(Ljava/lang/String;)Z
       102: ifeq          172
       
            ...
            
       117: iconst_0
       118: iconst_1
       119: dup
       120: pop2
       121: aload         5
       123: dup_x1
       124: invokevirtual #19                 // Method java/lang/String.length:()I
       127: bipush        6
       129: iconst_1
       130: dup
       131: pop2
       132: isub
       133: invokevirtual #20                 // Method java/lang/String.substring:(II)Ljava/lang/String;
       136: bipush        47
       138: iconst_1
       139: dup
       140: pop2
       141: bipush        46
       143: iconst_1
       144: dup
       145: pop2
       146: invokevirtual #21                 // Method java/lang/String.replace:(CC)Ljava/lang/String;
       149: astore        5
       
            ...
            
       163: aload_3
       164: invokevirtual #22                 // Method java/util/HashMap.put:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
       167: pop
       168: goto          75
       171: pop
       172: aload         4
       174: invokevirtual #23                 // Method java/util/jar/JarEntry.isDirectory:()Z
       177: ifeq          185
       180: aload_2
       181: goto          76
      
            ...

The calls to the JarInputStream and JarEntry methods make it clear that the method is iterating over the contents of a JAR

The 6 in

    127: bipush        6

just happens to be the length of the suffix

    .class

The 47 in

    136: bipush        47

is the solidus character (‘/‘) and the 46 in

    141: bipush        46

is the dot character (‘.‘).

The sequences following the iconst_0 instruction and each bipush instruction

    117: iconst_0
    118: iconst_1
    119: dup
    120: pop2
    
    ...

    127: bipush        6
    129: iconst_1
    130: dup
    131: pop2
    
    ...
    
    136: bipush        47
    138: iconst_1
    139: dup
    140: pop2
    
    ...

    141: bipush        46
    143: iconst_1
    144: dup
    145: pop2

are all either pretty feeble attempts at obfuscation or evidence of a severely confused compiler as they have no effect at all.

From time immemorial Java class loaders have been converting the JarEntry names of class files into class names and that is what this method is doing as well.

If you have a class, for example,

    xper.mcm.CrashPow

then the name of the JarEntry for the class file in a JAR will be

    xper/mcm/CrashPow.class

Chop off the .class suffix

    117: iconst_0

      0
      
    118: iconst_1
    
      0, 1
      
    119: dup
    
      0, 1, 1

    120: pop2
    
      0
      
    121: aload         5
    
      0, entry_name
      
    123: dup_x1
    
      entry_name, 0, entry_name
      
    124: invokevirtual #19                 // Method java/lang/String.length:()I
    
      entry_name, 0, length(entry_name)
      
    127: bipush        6
    
      entry_name, 0, length(entry_name), 6
      
    129: iconst_1
    
      entry_name, 0, length(entry_name), 6, 1
      
    130: dup
    
      entry_name, 0, length(entry_name), 6, 1, 1
    
    131: pop2
    
      entry_name, 0, length(entry_name), 6
    
    132: isub
    
      entry_name, 0, length(entry_name)_minus_6
      
    133: invokevirtual #20                 // Method java/lang/String.substring:(II)Ljava/lang/String;

      substring(entry_name, 0, length(entry_name)_minus_6))

and replace the occurrences of ‘/‘ with ‘.

    136: bipush        47
    
      substring(...), 47
      
    138: iconst_1
    
      substring(...), 47, 1
      
    139: dup
    
      substring(...), 47, 1, 1
      
    140: pop2
    
      substring(...), 47
      
    141: bipush        46
    
       substring(...), 47, 46
       
    143: iconst_1
    
      substring(...), 47, 46, 1
      
    144: dup
    
      substring(...), 47, 46, 1, 1

    145: pop2
    
      substring(...), 47, 46
       
    146: invokevirtual #21                 // Method java/lang/String.replace:(CC)Ljava/lang/String;
    
      substring(...).replace(47, 46)
        
    149: astore        5

and you get the canonical Java class name back again

The method is storing the contents of the class files in the JAR it is iterating over in a HashMap with the class names as the keys.

Now where’s the JAR ?


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

September 6, 2014

The Mystery Of The Unsigned JAR: Part Two — The Sekrit Method

The ‘sekrit method’ takes a single String argument and returns a String and it could not be more obvious if it tried.

Unlike the other methods in the class it has a non-meaningful name. Instead its name features a large subset of the lower case letters of the alphabet in order. I don’t think there is any coding standard that requires that, ‘tho I suppose there could be.

It is always passed a String literal and the String literal is always gibberish, yet the return values are often passed to factory methods in the Java API which take specific well-formed names.

The inference is obvious. The method takes obfuscated Strings and turns them into unobfuscated Strings.

The nice thing about Java VM bytecode is that you can decompile it using nothing more than a pencil and a piece of paper, oh, and a knowledge of the Java VM opcodes !

Being stack based it is pretty easy to work out what is going on by tracing the state of the stack and the local variables after each instruction.

Of course if you don’t have a piece of paper and a pencil handy you can always write some code to do the same thing.

The following shows the stack/local variable state after each instruction for a large part of the sekrit method.

The top of the stack is on the right.

    ...

    
    34: invokevirtual #232                // Method java/lang/StringBuffer.toString:()Ljava/lang/String;
    
        S
    
    37: dup
    
        S, S
    
    38: invokevirtual #19                 // Method java/lang/String.length:()I
    
        S, length(S)
    
    41: iconst_1
    
        S, length(S), 0x01
    
    42: isub
    
        S, length(S)_minus_one
    
    43: iconst_2
    
        S, length(S)_minus_one, 0x02
    
    44: iconst_3
    
        S, length(S)_minus_one, 0x02, 0x03
    
    45: ishl
    
        S, length(S)_minus_one, 0x10
    
    46: iconst_3
    
        S, length(S)_minus_one, 0x10, 0x03
    
    47: iconst_5
    
        S, length(S)_minus_one, 0x10, 0x03, 0x05
    
    48: ixor
    
        S, length(S)_minus_one, 0x10, 0x06
    
    49: ixor
    
        S, length(S)_minus_one, 0x16
    
    50: iconst_2
    
        S, length(S)_minus_one, 0x16, 0x02
    
    51: iconst_5
    
        S, length(S)_minus_one, 0x16, 0x02, 0x05
    
    52: ixor
    
        S, length(S)_minus_one, 0x16, 0x07
    
    53: iconst_3
    
        S, length(S)_minus_one, 0x16, 0x07, 0x03
    
    54: ishl
    
        S, length(S)_minus_one, 0x16, 0x38
    
    55: iconst_2
    
        S, length(S)_minus_one, 0x16, 0x38, 0x02
    
    56: ixor
    
        S, length(S)_minus_one, 0x16, 0x3A
    
    57: iconst_2
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x02
    
    58: iconst_5
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x02, 0x05
    
    59: ixor
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x07
    
    60: iconst_4
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x07, 0x04
    
    61: ishl
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x70
    
    62: iconst_1
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x70, 0x01
    
    63: dup
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x70, 0x01, 0x01
    
    64: ishl
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x70, 0x02
    
    65: ixor
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x72
    
    66: aload_0
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x72, avar0
    
    67: invokevirtual #19                 // Method java/lang/String.length:()I
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x72, length(avar0)
    
    70: dup
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x72, length(avar0), length(avar0)
    
    71: newarray       char
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x72, length(avar0), char[length(avar0)]
    
    73: iconst_1
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x72, length(avar0), char[length(avar0)], 0x01
    
    74: dup
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x72, length(avar0), char[length(avar0)], 0x01, 0x01
    
    75: pop2
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x72, length(avar0), char[length(avar0)]
    
    76: swap
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x72, char[length(avar0)], length(avar0)
    
    77: iconst_1
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x72, char[length(avar0)], length(avar0), 0x01
    
    78: isub
    
        S, length(S)_minus_one, 0x16, 0x3A, 0x72, char[length(avar0)], length(avar0)_minus_1
    
    79: dup_x2
    
        S, length(S)_minus_one, 0x16, 0x3A, length(avar0)_minus_1, 0x72, char[length(avar0)], length(avar0)_minus_1
    
    80: istore_1
    
        S, length(S)_minus_one, 0x16, 0x3A, length(avar0)_minus_1, 0x72, char[length(avar0)]
    
        {
            ivar1 == length(avar0)_minus_1
        }
    
    81: astore_3
    
        S, length(S)_minus_one, 0x16, 0x3A, length(avar0)_minus_1, 0x72
    
        {
            ivar1 == length(avar0)_minus_1
            avar3 == char[length(avar0)]
        }
    
    
    82: istore        7
    
        S, length(S)_minus_one, 0x16, 0x3A, length(avar0)_minus_1
    
        {
            ivar1 == length(avar0)_minus_1
            avar3 == char[length(avar0)]
            ivar7 == 0x72
        }
    
    84: dup_x2
    
        S, length(S)_minus_one, length(avar0)_minus_1, 0x16, 0x3A, length(avar0)_minus_1
    
    85: pop
    
        S, length(S)_minus_one, length(avar0)_minus_1, 0x16, 0x3A
    
    86: istore        4
    
        S, length(S)_minus_one, length(avar0)_minus_1, 0x16
    
        {
            ivar1 == length(avar0)_minus_1
            avar3 == char[length(avar0)]
            ivar4 == 0X3A
            ivar7 == 0x72
        }
    
    88: pop
    
        S, length(S)_minus_one, length(avar0)_minus_1
    
    89: swap
    
        S, length(avar0)_minus_1, length(S)_minus_one
    
    90: dup
    
        S, length(avar0)_minus_1, length(S)_minus_one, length(S)_minus_one
    
    91: istore_2
    
        S, length(avar0)_minus_1, length(S)_minus_one
    
        {
            ivar1 == length(avar0)_minus_1
            ivar2 == length(S)_minus_one
            avar3 == char[length(avar0)]
            ivar4 == 0X3A
            ivar7 == 0x72
        }
    
    92: istore        5
    
        S, length(avar0)_minus_1
    
        {
            ivar1 == length(avar0)_minus_1
            ivar2 == length(S)_minus_one
            avar3 == char[length(avar0)]
            ivar4 == 0X3A
            ivar5 == length(S)_minus_one
            ivar7 == 0x72
    }
    
    94: swap
    
        length(avar0)_minus_1, S
    
    95: astore        6
    
        length(avar0)_minus_1
    
        {
            ivar1 == length(avar0)_minus_1
            ivar2 == length(S)_minus_one
            avar3 == char[length(avar0)]
            ivar4 == 0x3A
            ivar5 == length(S)_minus_one
            avar6 == S
            ivar7 == 0x72
        }
    
    97: goto          163
    
    
    100: aload_3
    
        char[length(avar0)]
    
    101: iload         4
    
        char[length(avar0)], 0x3A
    
    103: aload_0
    
        char[length(avar0)], 0x3A, avar0
    
    104: iload_1
    
        char[length(avar0)], 0x3A, avar0, ivar1
    
    105: iinc          1, -1
    
        char[length(avar0)], 0x3A, avar0, ivar1
    
    108: dup_x2
    
        char[length(avar0)], ivar1, 0x3A, avar0, ivar1
    
    109: invokevirtual #236                // Method java/lang/String.charAt:(I)C
    
        char[length(avar0)], ivar1, 0x3A, c
    
    112: aload         6
    
        char[length(avar0)], ivar1, 0x3A, c, avar6
    
    114: iload_2
    
        char[length(avar0)], ivar1, 0x3A, c, avar6, ivar2
    
    115: invokevirtual #236                // Method java/lang/String.charAt:(I)C
    
        char[length(avar0)], ivar1, 0x3A, c, d
    
    118: ixor
    
        char[length(avar0)], ivar1, 0x3A, c_xor_d
    
    119: ixor
    
        char[length(avar0)], ivar1, c_xor_d_xor_0x3A
    
    120: i2c
    
        char[length(avar0)], ivar1, c_xor_d_xor_0x3A
    
    121: castore
    
    
    122: iload_1
    
        ivar1
    
    123: ifge          130
    126: aload_3
    127: goto          167
    
    
    130: aload_3
    
        avar3
    
    131: iload         7
    
        avar3, 0x72
    
    133: aload_0
    
        avar3, 0x72, avar0
    
    134: iload_1
    
        avar3, 0x72, avar0, ivar1
    
    135: dup_x2
    
        avar3, ivar1, 0x72, avar0, ivar1
    
    136: invokevirtual #236                // Method java/lang/String.charAt:(I)C
    
        avar3, ivar1, 0x72, c
    
    139: aload         6
    
        avar3, ivar1, 0x72, c, avar6
    
    141: iload_2
    
        avar3, ivar1, 0x72, c, avar6, ivar2
    
    142: invokevirtual #236                // Method java/lang/String.charAt:(I)C
    
        avar3, ivar1, 0x72, c, d
        
    145: ixor
    
        avar3, ivar1, 0x72, c_xor_d
    
    146: ixor
    
        avar3, ivar1, c_xor_d_xor_0x72
    
    147: i2c
    
        avar3, ivar1, c_xor_d_xor_0x72
    
    148: iinc          1, -1
    151: iinc          2, -1
    154: castore
    
    
    155: iload_2
    
        ivar2
    
    156: ifge          162
    
    159: iload         5
    
        ivar5
        
    161: istore_2
    
    
    162: iload_1
    
        ivar1
        
    163: ifge          100
    
    
    166: aload_3
    
        avar3
        
    167: invokespecial #239                // Method java/lang/String."<init>":([C)V
        
        
    170: areturn

The method is basically is a loop which starts at

    100: aload_3

and ends at

    163: ifge          100

Everything before the loop is initialization code, some of which seems to have been partially obfuscated by the use of various bitwise operators acting on small integers.

Following the initialization we can deduce the following.

The local variable avar0 holds the obfuscated String passed as the argument.

The local variable avar3 holds an array of chars which will contain the unobfuscated characters.

The local variable ivar1 holds the index used to access the characters in avar0 and avar3. Note that it being decremented

    105: iinc          1, -1

and, compared against 0

    123: ifge          130

so both avar0 and avar3 are being iterated over backwards.

The local variable ivar2 is used in the same way to iterate backwards over the String in avar6.

Each time round the loop a character is added to avar3. The character is the result of XORing a character from avar0 and a character from avar6 and the constant 0x3A. [Instructions 100 to 121]

If there are still characters to be added a second character is added which is the result of XORing a character from avar0 and a character from avar6 and the constant 0x72. [Instructions 130 to 154]

Note that ivar1 is being decremented twice each time round the loop but ivar2 only once.

The String in avar6 is constructed using the name of the calling method so the obfuscation is context sensitive.

If you convert all of the above back into Java you will probably end up with something like this

        ...
    
        int     lengthA = a.length();
        int     lengthS = s.length();
        char[]  chars   = new char[lengthA];
        int     i       = lengthA - 1;
        int     j       = lengthS - 1;
    
        while (true)
        {
            char c = a.charAt(i);
            char d = s.charAt(j);
    
            chars[i] = (char)(c ^ d ^ 0x3a);
    
            --i;
            if (i < 0)
            {
                break;
            }
    
            c = a.charAt(i);
            d = s.charAt(j);
    
            chars[i] = (char)(c ^ d ^ 0x72);
    
            --i;
            --j;
            if (j < 0)
            {
                j = lengthS -1;
            }
            if (i < 0)
            {
                break;
            }
        }
        return new String(chars);

where a is the obfuscated String being passed as the argument and s is the context sensitive ‘mask’.

I doubt whether it compiles into exactly the same bytecode but it definitely deobfuscates the String literals in the class files correctly.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

September 5, 2014

The Mystery Of The Unsigned JAR: Part One — Enter A Small JAR, Furtively

Filed under: Security, Things That Go Bump In The Night — Tags: , , — Simon Lewis @ 6:41 am

The unsolicited contributions to my PDF collection seem to have dried up which is a bit of a disappointment. I hope it wasn’t something I said.

I have had a couple more ZIPed .exes but they are not really my kind of thing, then I got sent a JAR file.

Initially I suspected it might be a ZIPed .exe in disguise, which would have been something of an anti-climax, but it turns out to be an actual JAR with a manifest and everything.

Its not exactly the world’s biggest JAR file. Its about 50KB and contains a manifest, four class files and a dll.

More accurately it contains

  • a manifest which is definitely a manifest

  • four class files which are definitely class files since javap can parse them, and

  • a file which has the suffix .dll but which file cannot identify

After taking a quick look at the class files courtesy of ‘javap -c -private‘ there are three things that stand out

  1. there is a ‘sekrit method’

  2. there is a class loader without any classes to load unless it is a very narcissistic class loader which only loads itself

  3. there is something which purports to be a dll and in all probability isn’t


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

August 28, 2014

Anatomy Of A PDF Continued: #4 — Part Two: What Is It With This Obfuscation Lark ?

Javascript Obfuscation Considered …

What IS the point of doing this

    ...
    
    var sekritVar0001 = [0x30,0x74,0x67,0x72,0x6E,0x63,0x65,0x67, 1, 10, 40];
    
    ...

and then this somewhere else ?

    ...
    
    for(var q = 0; q &lt; sekritVar0001.length-3; q++)
    sekritVar0002 += String.fromCharCode(sekritVar0001[q]-2);

    ...

Anybody who has managed to lever open a mal-formed PDF and get to the point where they can actually see it is hardly likely to give up at this juncture just because they are going to have to iterate over an array subtract 2 from every element and turn into a character now are they ?

It says

   .replace

for pity’s sake.

And then this.

   ...
    
    var p1 = "(\/[^\\/";
    
    ...
    
    var p2 = "(\/[\\/";
    var sekritVar0003 = "x" + sekritVar0002 + p1 + "\\d]\/g,'')";
    var sekritVar0004 = "z" + sekritVar0002 + p2 + "]\/g,',')";
    
    ...

Oh no the regular expressions are in two halves ! Oh woe is me !

The first one says strip out everything that isn’t a digit or the character ‘/’.

The second one says turn all the ‘/’s into ‘,’s.

If you apply the second to the result of the first you end up with a lot of comma separated integers, and here’s the start of some non-base64 encoded image data.

    fpo10/t10/hmA32/Ac32/nCK32/XX32/yCO32/R32 ...

What a coincidence.

As for this

    ...
    
    function sekritFun0001(x)
    {
        var s = [];
    
        var z = sekritFun0002(sekritVar0003);
        z = sekritFun0002(sekritVar0004);
    
        var ar = sekritFun0002("[" + z + "]");
    
        for (var i = 0; i &lt; ar.length; i ++)
        {
            var j = ar[i];
            if ((j &gt;= 33) &amp;&amp; (j &lt;= 126))
            {
                s[i] = String.fromCharCode(33 + ((j + 14) % 94));
            }
            else
            {
                s[i] = String.fromCharCode(j);
            }
        }
        return s.join('');
    }
    
    ...

A variable z which we have worked out is a string that looks like a comma separated list of integers is topped and tailed with what looks suspiciously like the delimiters of a Javascript array literal and is then passed to sekritFun0002 and the result is assigned to the variable ar.

Then there is a for loop which is under the mistaken impression that ar references an array.

Now let me guess. The function sekritFun0002 is really the well known turn strings into arrays as if by magic function for which Javascript fortunately defines a four letter abbreviation.

Then there is the body of the loop.

To cut a long story short here is some Java code which does the same thing as the Javascript function written in the increasingly popular no-regexp idiom

    private static String decode(byte[] theBytes)
    {
        StringBuilder builder = new StringBuilder();
        int           nBytes  = theBytes.length;
        int           v       = 0;

        for (int i = 0; i < nBytes; i++)
        {
            byte b = theBytes[i];
    
            if (b == '/')
            {
                // end of number
    
                int c = 0;
    
                if ((v >= 33) && (v <= 126))
                {
                    c = 33 + ((v + 14) % 94);
                }
                else
                {
                    c = v;
                }
                builder.append((char)c);
                v = 0;
            }
            else
            if (b >= '0' && (b <= '9'))
            {
                v *= 10;
                v += b - '0';
            }
        }
        return builder.toString();
    }

If you run it on the two non-images needless to say you get Javascript.

… Pointless ?

If the point of the obfuscation is that re-obfuscation will result in the hash of the PDF changing then, as I have already observed, there are much much easier ways to achieve the same effect.

If the point of the obfuscation is to conceal the existence of certain strings that could be used to identify the PDF as malicious it is only necessary if the assumption is that something has

  1. parsed the PDF file

  2. extracted Object 1 into a usable state which so far has always required inflating it twice

  3. parsed the XML of the XFA form and identified the different elements

After step 1 the structure of the PDF is apparent and can be considered characteristic of this particular PDF.

After step 2 the ratio of the original to the final size of Object 1 is apparent and can also considered to be characteristic of this particular PDF.

After step 3 another characteristic of this particular PDF is apparent without even looking at the Javascript elements.

To misquote Mae West is that an image in your form or are you just pleased to see me ?

The content of the form is dominated by one thing, an image. To all intents and purposes that’s all it is.

If whatever it is has got this far it can be about 99% certain that it is looking at an instance of this particular malicious PDF.

It can increase the certainty still further by inspecting the image without even decoding it.

In short, unless you can conceal the elephant in the form, there is no need to obfuscate the Javascript other than for amusement.

Postscript (Not The Language)

Even if it was a valid PDF I don’t think this one would do what it was intended to do either.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Anatomy Of A PDF Continued: #4 — Part One: Now What ?

My collection of a single PDF continues to grow apace whether I want it to or not.

I got three ‘invoices’ in one go the other day but they were ZIPs and ZIPs usually means .exes and so it proved.

Then today a PDF arrived which not only supposedly originated in a completely different continent to the other three, but is four times as big as the others. That has got to be good hasn’t it ?

It turned out to be a bit of disappointment taken as PDF because technically it isn’t one. I know I got it for nothing and everything but honestly it comes to something when people can’t even be bothered to get the format right.

Still whats the point of writing a whacking great chunk of Objective-C to read well-formed PDFs if you can’t hack lumps out of it until it can read things that are not well-formed PDFs ? After some judicious hard-wiring of this and that I managed to extract Object 1 once again and inflate it twice as per usual.

And ?

And the XML is exactly the same as all the others ?

Yes and no. The overall structure is pretty much the same but the data for the first two images isn’t.

It does not look like Base64 encoded data and Base64 decoding it definitely does not produce Javascript.

A quick look for the Base64 alphabet construction kit reveals that it has gone walk about, but both images are referenced in the Javascript so the supposition has got to be that it is still really Javascript but its not Base64 encoded. Its a bit of a disappointment but you have got to move with the times I suppose. Base64 encoding was so last week.

Time to find out what this weeks fashionable Javascript encoding technology is i suppose.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

August 26, 2014

Anatomy Of A PDF: Afterword

Filed under: CVE, CVE-2013-2729, PDF, PDF Vulnerability, Security — Tags: , , , — Simon Lewis @ 7:14 am

Since I started writing these posts anonymous benefactors have very kindly presented me with two further versions of the original PDF to add to my collection.

I say versions because although they both possess exactly the same structure as the original, the size of Object 1 is slightly different in each one and the binary sludge is different.

This of course means that the hash of the file will be different in each case, which in turn means it is very likely that any hash based AV scanner will miss these slightly different versions unless they are kept updated with the hashes of these new versions as they appear.

Looking at the actual XML it is apparent that the obfuscation of the Javascript has resulted in different variable names in each version but that there is no difference between what is obfuscated and what is not in any of them.

One additional thing all three versions have in common is that I suspect they won’t actually work.

There is no question what they are trying to do and how they are trying to do it, and at least one version has been seen in the wild by someone else that works, but it is a distinct possibility that the versions I have do not.

I have no way of proving this one way or another as I do not have access to the appropriate environment.

If I am wrong and they will in fact do what they are intended to do I would obviously be interested in knowing why I am wrong. It is all grist to the mill.

If I am right then it is all to the good, at least these particular versions cannot cause any damage, so for obvious reasons I am not going to say why they will not work as intended other than that it is a very simple mistake.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

August 25, 2014

Anatomy Of A PDF: Part Eight — The Denouement

Filed under: BMP, PDF, Security, XFA — Tags: , , , , , — Simon Lewis @ 9:42 am

Given that this thing is in the wild and putting aside the possibility that it is a piece of very elaborate performance art then it must be targeting an actual vulnerability.

Its pretty obvious what the program is and what the platform is, so typing those along with terms like PDF, XFA, and BMP in some combination into your search engine of choice turns up all sorts of stuff but it looks like CVE-2013-2729 is the vulnerability in question.

See here and here for the gory details of how it actually exploits the heap corruption and what happens once it has done so.

The heap implementation targeted is the Low Fragmentation Heap (LFH). See the paper “Understanding the Low Fragmentation Heap” by Chris Valasek for a detailed description of how it works and how to do unpleasant things to it. Be aware that this is a PDF which in the circumstances … ! You can find it here.

For details of how synthesized x86 machine code can be made to run when it shouldn’t be see here. Warning, may contain assembler


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Anatomy Of A PDF: Part Seven — How To Overflow An Unsigned Integer Using Nothing But Bytes !

Filed under: BMP, BMP Run Length Encoding, Image Formats, Security — Tags: , , , — Simon Lewis @ 8:33 am

The theory of overflowing unsigned integers is, fortunately, well understood.

You fill them up with ‘F’s, you can use ‘f’s instead if you prefer, and then you add 1, like so.

    ...
    
    uint32_t i = 0xFFFFFFFF;

    i += 1;
    
    ...

and i is now zero.

Nothing to it.

Alternative combinations of ‘F’s not to mention other hex digits and increments are possible but its best to start off with the base case and make sure you’ve got the hang of it before moving on to the more advanced stuff.

One obvious application of this is when you have an excess of large unsigned integers and a shortage of zeroes. You can use this technique to turn the former into the latter.

Another application is this

    0000000 42 4d 00 00 00 00 00 00 00 00 00 00 00 00 40 00
    0000010 00 00 2e 01 00 00 01 00 00 00 01 00 08 00 01 00
    0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00
    0000030 00 00 00 00 00 00 52 47 42 41 52 47 42 41 00 02
    0000040 ff 00 00 02 ff 00 00 02 ff 00 00 02 ff 00 00 02
    *
    4040440 f8 00 00 08 01 00 00 00 00 00 27 05 00 02 00 ff
    4040450 00 02 00 ff 00 02 00 ff 00 02 00 ff 00 02 00 ff
    *
    4040470 00 02 00 ff 00 0a 58 58 58 58 58 58 58 58 58 58
    4040480

which is the really really big BMP that came free with the PDF.

The result of Base64 decoding the image data in the XML clocks in at 67,372,160 bytes, which is surprisingly large for an image which is only 302×1.

The reason that you can fit 67MB of data on nine lines of hexdump output is that the data is intensively repetitive.

The vast majority of the file comprises the four bytes

    00 02 ff 00

The first occurence is at 0x00003E and the last occurence ends at 0x404043E which gives a grand total of 0x4040400 bytes or 0x1010100 repetitions.

If you multiply 0xFF by 0x1010100 you get 0xFFFFFF00 which is a mere 0xFF+1 away from being 0 in certain circumstances.

This is interesting because that multiplication is equivalent to the effect of the four bytes

    00 02 ff 00

being repeated 16843008 times in the BMP image data.

As we know the header specifies that the image is run length encoded which means that the image data is a series of commands which are evaluated at runtime to produce the bytes that comprise the image.

The image is built up left to right, bottom to top as the commands are evaluated.

If the result of evaluating a command is a sequence of bytes, those bytes are appended at the offset within the current line specified by the current x position.

Both the current x position and the current y position, which specifies the current line, can be changed using a delta command.

The four bytes

    00 02 ff 00

are an example of a delta command the effect of which is to add 0xFF to the current x position.

If that command is successfully evaluated 16843008 times then you know

  1. that whatever is doing the evaluating is not doing any kind of bounds checking in this particular case, and

  2. that the current x position is now 0xFFFFFF00

The command at 0x404043E which immediately follows the repetitions is another delta command

    00 02 f8 00

except that this time it adds 0xF8 to the current x position which brings it up 0xFFFFFFf8 as well as creating the expectation that the next command is going to feature the number 8.

The next command is at 0x4040442 and it is this

    00 08 01 00 00 00 00 00 27 05

and lo and behold there is the number 8.

The effect of the command is to add the last eight bytes of the command as image data at the current x position

Clearly at this point it would be a very irresponsible of whatever it is that is evaluating the commands not to perform a bounds check, which means that we are going to discover exactly how the current x position is being represented.

The width and height fields of this particular BMP are signed 32-bit integers so it would not be unreasonable for the current x position to be represented using an unsigned 32 bit integer.

If this command succeeds than either

  • there is no bounds checking

OR

  • the current x position is represented by the command evaluator as an unsigned 32 bit integer

Whichever it is, those eight bytes are going to end up somewhere where they are going to make no useful contribution to the image at all.

After all the excitement of adding some actual bytes, albeit in the wrong place, it all seems to go horribly wrong.

Starting at 0x404044C another delta command

    00 02 00 ff

which adds 0xFF to the current y position, is repeated ten times.

The final command at 0x4040474

    00 0a 58 58 58 58 58 58 58 58 58 58

then adds ten bytes at the current x position, except of course that it doesn’t.

The current y position is out of bounds, the image is supposed to be 302×1 after all, and nothing like large enough to benefit from any kind of unsigned integer overflow and that is intentional.

Causing the image load to fail results in the memory allocated for the image being freed.

If the eight bytes that actually got written have ended up in the right wrong place then presumably their effect is to cause the heap management code to free the wrong object at this point.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Anatomy Of A PDF: Part Six — The Allocation And Deallocation Of Strings By Javascript Considered Harmful ?

Filed under: Javascript, Security — Tags: , — Simon Lewis @ 6:53 am

When it is not Base64 decoding other bits of Javascript disguised as images the Javascript in the XFA form is mostly messing about with strings which on the face of it doesn’t look as though it could be that dangerous.

The messing about occurs in three distinct phases.

The Allocation And Selective Deallocation Phase

The Javascript triggered by the initialize event starts by creating four arrays.

It then proceeds to set each element of the first array to a newly allocated string.

Each string is the result of concatenating two distinctive patterns plus two elements computed using the current iteration index.

Interestingly the size of the created strings looks very much as though it may be the same as the width of the really really big BMP image.

Having gone to all that trouble to create all those different strings it then proceeds to repeatedly null out every tenth element of the first
array.

Search Phase

The search phase is performed by the Javascript triggered by the docReady event.

The code iterates over all the elements of the first array. If the element is not null the first character of the string is compared with the
value that was used to initialize it when the string was created.

If they are not equal the index of the element is noted, the contents of part of the changed string are saved and the iteration terminates.

Targetted Deallocation and Second Allocation Phase

This phase is also performed by the Javascript triggered by the docReady event but only occurs if a changed string was found during the search phase.

This phase begins by assembling a new string from various bits of hex.

The element in the first array which references the changed string and the element before it in the array are then repeatedly set to null.

Each element of the second array is then set to a string created from the new string constructed at the start of this phase.

It is at this point there is a reference to “Image_2″ which we know to be Base64 encoded Javascript.

The image data is decoded and then handed off to eval which is masequerading under an alias !

The “Image_2″ code builds another string out of information from the version indexed table in “Image_1″, large amounts of repeated hex, some of which is in all probability x86 code, the strings

    "VirtualProtect"

and

    "KERNEL32"

plus the result of unescaping a big chunk of data which, it turns out, contains, amongst other things, the hard-wired URL of a .exe.

I cannot begin to imagine what they are going to try and do with all of that !

The string is then repeatedly concatenated with itself until the result is a string which occupies at least 1MB of memory.

One hundred copies of this string are then created, presumably just to be on the safe side ?

By this point the heap must surely consist almost entirely of x86 code !

Conjecture

The order of events is

  1. the initialize event code runs.

  2. System attempts to load really, really, big BMP

  3. the docReady event code runs.

and what happens is

  1. the initialize event code sets up a known heap state based on the allocation of strings with distinctive contents, interspersed with holes created by freeing every tenth string.

  2. the System image loading code corrupts the heap attempting to load the really, really, big BMP

  3. he docReady event code finds the location of the heap corruption on the basis of the known state set up by the initialize event code and exploits it.

Given that the Javascript it looking for a string that has changed underneath it so to speak the assumption must be that the string has been freed despite still being in use. Once freed it is re-used by non Javascript code which changes it.

This in turn implies that the heap corruption involves the data used by the heap management code to keep track of which pieces of memory in the heap are in use and which are free. Such data is often contiguous with the memory that is allocated, in which case the easiest way to corrupt it is to write beyond the limits of a piece of memory as allocated which is what the really really big BMP is for.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate
and specific direction to the original content.

Older Posts »

The WordPress Classic Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: