According to the cross-reference table there are six objects in use starting with object number 1.
Starting from the root object as specified by the trailer.
Object 3
Object 3 starts at 13294 which 0x33EE.
3 0 obj
<<
/Extensions <</ADBE <</ExtensionLevel 3 /BaseVersion /1.7 >> >>
/AcroForm 2 0 R
/Type /Catalog
/Pages 4 0 R
/NeedsRendering true
>>
endobj
The root object should be a Catalog and the Type
entry shows that it is.
AcroForm
The presence of the AcroForm
entry indicates that the PDF contains an interactive form.
The entry references Object 2 which should therefore be an interactive form dictionary specifying the form.
Pages
The Pages
entry references Object 4 which should therefore be a page tree node.
NeedsRendering
The NeedsRendering
entry is another clue that the document contains a form.
Object 2
Object 2 start at offset 13263 which is 0x33CF and it looks like this.
2 0 obj
<</XFA 1 0 R >>
endobj
According to Object 3, the Document Catalog, this should be an interactive form dictionary and the presence of the XFA
entry confirms that it is.
XFA is the XML Forms Architecture. Amongst other things it supports scripting, and the Adobe implementation the scripting support includes Javascript, which in this context seems highly significant.
The XFA
entry identifies the object which contains the XML which describes the XFA form.
The referenced object is Object 1, so ten to one on there are scripts in Object 1 and they do something unpleasant.
Object 1
Object 1 starts at offset 15.
According to Object 2 it should contain XML specifying an XFA form.
It is the biggest object by far so it definitely contains something, and presumably something nasty at that, so we’ll save it for later.
Object 4
Object 4 starts at 13443 which is 0x3483 and it looks like this
4 0 obj
<<
/Count 1
/Kids [5 0 R]
/Type /Pages
>>
endobj
According to Object 3, the Document Catalog, this should be a page tree node with a Type of Pages and it is.
The value of the Kids
(sic) entry is an Array of Object References of length one.
This entry identifies the children of this object, which are either collections of pages or individual pages which make up the document.
Object 5
Object 5 starts at 13499 which is 0x34BB and it looks like this
5 0 obj
<<
/Parent 4 0 R
/Type /Page
/Contents 6 0 R
/Resources << /Font <</F1 <</BaseFont /Helvetica /Subtype /Type1 /Name /F1 >> >> >>
>>
endobj
According to Object 4, the Pages Object, this should be either another page tree node or page object. The Type
entry Page tells us that it is a page object.
The Parent
entry is a reference to Object 4 which is indeed the parent of this object.
The Contents
entry is a reference to Object 6.
The Resources
entry is an example of an entry whose value is a dictionary, which is itself an example of a dictionary with entries whose values are dictionaries.
Object 6
Object 6 starts at 13644 which is 0x354c.
6 0 obj
<</Length 23 >>
stream
BT /F1 24 Tf 100 100 Td
endstream
endobj
We know that it is the contents of the page represented by Object 5.
It is in fact a Content Stream which comprises a series of graphic operators and their operands, with the operands preceding their operators (cough, Postscript, cough).
For what its worth
BT
is the ‘begin text’ operator.
Tf
is the ‘set text font and size’ operator.
It is preceded by its operands
/F1 24
with F1 being the font as defined in Object 5 and 24 being the size.
Td
is the ‘move text’ operator.
It is preceded by its operands
100 100
And that’s it. Not a very exciting page it has to be said.
Copyright (c) 2014 By Simon Lewis. All Rights Reserved.
Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.
Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.