Just An Application

August 23, 2014

Anatomy Of A PDF: Part Three — Objects, Objects, Objects

According to the cross-reference table there are six objects in use starting with object number 1.

Starting from the root object as specified by the trailer.

Object 3

Object 3 starts at 13294 which 0x33EE.

    3 0 obj
    <<
    /Extensions <</ADBE <</ExtensionLevel 3 /BaseVersion /1.7 >> >>
    /AcroForm 2 0 R
    /Type /Catalog
    /Pages 4 0 R
    /NeedsRendering true
    >>
    endobj

The root object should be a Catalog and the Type entry shows that it is.

AcroForm

The presence of the AcroForm entry indicates that the PDF contains an interactive form.

The entry references Object 2 which should therefore be an interactive form dictionary specifying the form.

Pages

The Pages entry references Object 4 which should therefore be a page tree node.

NeedsRendering

The NeedsRendering entry is another clue that the document contains a form.

Object 2

Object 2 start at offset 13263 which is 0x33CF and it looks like this.

    2 0 obj
    <</XFA 1 0 R >>
    endobj

According to Object 3, the Document Catalog, this should be an interactive form dictionary and the presence of the XFA entry confirms that it is.

XFA is the XML Forms Architecture. Amongst other things it supports scripting, and the Adobe implementation the scripting support includes Javascript, which in this context seems highly significant.

The XFA entry identifies the object which contains the XML which describes the XFA form.

The referenced object is Object 1, so ten to one on there are scripts in Object 1 and they do something unpleasant.

Object 1

Object 1 starts at offset 15.

According to Object 2 it should contain XML specifying an XFA form.

It is the biggest object by far so it definitely contains something, and presumably something nasty at that, so we’ll save it for later.

Object 4

Object 4 starts at 13443 which is 0x3483 and it looks like this

    4 0 obj
    <<
        /Count 1
        /Kids [5 0 R]
        /Type /Pages
    >>
    endobj

According to Object 3, the Document Catalog, this should be a page tree node with a Type of Pages and it is.

The value of the Kids (sic) entry is an Array of Object References of length one.

This entry identifies the children of this object, which are either collections of pages or individual pages which make up the document.

Object 5

Object 5 starts at 13499 which is 0x34BB and it looks like this

    5 0 obj
    <<
        /Parent 4 0 R
        /Type /Page
        /Contents 6 0 R
        /Resources << /Font <&lt/F1 <</BaseFont /Helvetica /Subtype /Type1 /Name /F1 >> >> >>
    >>
    endobj

According to Object 4, the Pages Object, this should be either another page tree node or page object. The Type entry Page tells us that it is a page object.

The Parent entry is a reference to Object 4 which is indeed the parent of this object.

The Contents entry is a reference to Object 6.

The Resources entry is an example of an entry whose value is a dictionary, which is itself an example of a dictionary with entries whose values are dictionaries.

Object 6

Object 6 starts at 13644 which is 0x354c.

    6 0 obj
    <</Length 23 >>
    stream
    BT /F1 24 Tf 100 100 Td
    endstream
    endobj

We know that it is the contents of the page represented by Object 5.

It is in fact a Content Stream which comprises a series of graphic operators and their operands, with the operands preceding their operators (cough, Postscript, cough).

For what its worth

    BT

is the ‘begin text’ operator.

    Tf

is the ‘set text font and size’ operator.

It is preceded by its operands

    /F1 24

with F1 being the font as defined in Object 5 and 24 being the size.

    Td

is the ‘move text’ operator.

It is preceded by its operands

    100 100

And that’s it. Not a very exciting page it has to be said.


Copyright (c) 2014 By Simon Lewis. All Rights Reserved.

Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and owner Simon Lewis is strictly prohibited.

Excerpts and links may be used, provided that full and clear credit is given to Simon Lewis and justanapplication.wordpress.com with appropriate and specific direction to the original content.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: