Using a CompoundFile
instance we can create a Stream
instance for any stream object in the file as long as we know what it is called.
If we continue to assume that whatever this particular compound file does is being done by macros, then we need to know where they are stored in a ‘word’ document.
To find this out we need to consult a second specification pithily entitled
[MS-WORD]: Word (.doc) Binary File Format
According to section 2.1.9 Macros Storage
The Macros storage is an optional storage that contains the macros for the file. If present, it MUST be a Project Root Storage as defined in [MS-OVBA] section 2.2.1.
Curiously every other section which describes storage explicitly specifies the name, for example.
The Custom XML Data storage is an optional storage whose name MUST be “MsoDataStore”.
Section 2.1.9 is the only one that does not so we will have to assume for the moment that its name is going to be
"Macros"
or something of that ilk.
Moving on to specification number three
[MS-OVBA]: Office VBA File Format Structure
as referenced from section 2.1.9 quoted above, section 2.2.1 Project Root Storage starts
A single root storage. MUST contain VBA Storage (section 2.2.2) and PROJECT Stream (section 2.2.7).
Going further down the rabbit hole we find section 2.2.2 VBA Storage
A storage that specifies VBA project and module information. MUST have the name “VBA” (case- insensitive). MUST contain _VBA_PROJECT Stream (section 2.3.4.1) and dir Stream (section 2.3.4.2). MUST contain a Module Stream (section 2.2.5) for each module in the VBA project.
Its not obvious from that where the actual code is but a quick look at section 2.2.5 tells us.
A stream (1) that specifies the source code of modules in the VBA project. The name of this stream is specified by MODULESTREAMNAME (section 2.3.4.2.3.2.3). MUST contain data as specified by Module Stream (section 2.3.4.3).
So thats where the source code is but the name of the stream is elsewhere it appears.
In fact the name is in a MODULESTREAMNAME record which is in a MODULE record which is in PROJECTMODULES record which is in the “dir” stream.
In the face of all that its tempting to just guess which stream it must be. There can’t be that many of them can there ?
Assuming we can find it, what’s in it ?
A module stream, it turns out, contains a variable length record followed by the compressed source code, so even if we guess which stream it is not going to do us much good.
The length of the first variable length record is defined in a MODULEOFFSET record which is also contained within a MODULE record and so on and so forth.
There is nothing for it we are going to have to get hold of the “dir” stream.
We are looking for a stream named “dir” within a storage object which is definitely named “VBA” or “vba” or “VbA” or something like that, which is within a storage object which might be called “Macros”, maybe.
Trying
let cff = CompoundFileFactory()
let cf = cff.open(argv[1])
let dirStream = cf?.getStream(storage: ["Macros", "VBA"], name: "dir")
let data = dirStream?.data()
results in a non nil
value for data
so we are nearly there.
Copyright (c) 2014 By Simon Lewis. All Rights Reserved.
Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.
Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.