English [Trend Micro 2015] [Analysis-other 100] Write-Up


Category: Analysis-others
Points: 100

Please fix the PDF file for me.



If you already tried to look at the PDF specification, you should know it’s a real mess. There are different versions of the format (PDF1.3, PDF1.4, PDF, 1.7 and certainly others). The format is very loosy, it means that we can do weird things without breaking the PDF file, but as a consequence it’s difficult to read the internals of the file.
Moreover, none of us were used to play with PDF specifications, so it was quite difficult, but we did it, and it was really interesting 🙂

Anyway, the goal of this challenge is to fix the PDF. Most PDF browsers are unable to open the file, there must be something missing. In order to check what can be missing, we could try to read it with Origami framework and other tools.

$ pdfsh
Welcome to the PDF shell (Origami release 1.2.6) [OpenSSL: yes, JavaScript: no]

>> PDF.read 'fix_my_pdf.pdf'
[info ] ...Reading header...
[info ] ...Parsing revision 1...
[error] Breaking on: "stream\nx\x01\xED..." at offset 0x1581
[error] Last exception: [Origami::InvalidObjectError] Cannot determine object (no:13,gen:0) type
[info ] ...Parsing xref table...
[info ] ...Parsing trailer...
[info ] ...Propagating types...

---------- Header ----------
 [+] Major version: 1
 [+] Minor version: 3
---------- Body ----------
 5 0 R ContentStream
 6 0 R Integer
 3 0 R Page
 7 0 R Resources
 8 0 R FormXObject
 9 0 R Integer
 10 0 R Dictionary
 11 0 R ImageXObject
 12 0 R Integer
 14 0 R Integer
 17 0 R ImageXObject
 18 0 R Integer
 19 0 R ImageXObject
 20 0 R Integer
 15 0 R ExtGState
 16 0 R Array
 4 0 R PageTreeNode
 21 0 R Catalog
 2 0 R MetadataStream
 22 0 R Integer
 23 0 R ByteString
 24 0 R ByteString
 25 0 R ByteString
 26 0 R ByteString
 1 0 R Dictionary
---------- Trailer ---------
 [*] /Size: 27
 [*] /Root: 21 0 R
 [*] /Info: 1 0 R
 [*] /ID: [ <<<4A35CD9A9BEE822FD415DF26633E5E4A&>>>; <<<4A35CD9A9BEE822FD415DF26633E5E4A>>> ]
 [+] startxref: 32548

$ pdfimages fix_my_pdf.pdf .
Syntax Error (756): XObject 'Im2' is wrong type

OK, we have some errors, which could be our start point in order to begin the forensics : “Cannot determine object (no:13,gen:0) and ‘type XObject ‘Im2’ is wrong type” .
With these hints, we could open pdfwalker and try to search any object with ID 13.

By the way, quick explanation of the PDF format, before moving on. When you load a PDF, you actually load some objects which have properties. For example (as you’ll see a bit later), you can load a FormXObject which is a container for other objects. This FormXObject has properties, for example the length (/Length), the compression level (/Filter), the object type (/Type), etc. But it’s not as simple as it seems : properties have values which could be literals… or object IDs, which adds an amazing mess 🙂

So, get back to our challenge. PDFWalker is launched, and we are looking for some object with an ID 13. We just have to look for “XX 0 obj” strings (in this case, maybe this string is not mandatory).”

Let's walk through IDs

In the screen above, the selected object is a FormXObject with an ID 8. We looked for every IDs, but there weren’t any 13. Nevermind, we still have another hint : “Im2”. Let’s look for it (Ctrl+F).

Amazing boxes, isn't it ?
Amazing boxes, isn’t it ?

So, as you can see, there are two references. The first one (in the right corner), is in the FormXObject data, and the second one (on the left) is in a dictionary property. Let study them. The FormXObject is just a form containing other objects (as a div in html for example). These objects are in the Resources properties, and the value of this property is “10 0 obj”. By looking at this object (just double click on the “Resources” word), we can see that it is… our dictionary containing Im2. By looking at the dictionary, we can see the following properties :

10 0 obj
 /ProcSet [ /PDF /ImageB /ImageC /ImageI ]
 /ExtGState << /Gs1 15 0 R >>
 /XObject << /Im1 11 0 R /Im2 13 0 R >>

Im2 is referencing on object with the ID 13! Hmmm, seems correct, everything’s related 🙂

From now, we can sum up, without even opening the PDF with a viewer, that it contains a form at position 153:318-642-455 (see the BBox section of the FormXObject), and that this form contains a dictionary of two XObject with IDs 11 & 13. By clicking on the “Im1” on the left panel, PDFWalker shows us the object which is an ImageXObject. But, when we click on the “Im2”, an error message explains us that the object has not been found.

Oh noooo! Not found
Oh noooo! Not found

Now, it seems pretty clear that what we are looking for is the object 13. It MUST be somewhere. To find it, why not check with an hexadecimal editor ?PDF bless us

It’s just here, in front of us! But as pdfwalker didn’t find it, there must be something wrong on this part. The best way to find it is to check with a valid ImageXObject and by looking for object 11.

We got it :)

OK, now that we have a valid object, we have to check differences. For the 13 0 obj section, we have this :

13 0 obj

For the 11 0 obj section here is what we have :

11 0 obj
<span style="color: #ff0000;"><< /Length 12 0 R /Type /XObject /Subtype /Image /Width 305 /Height 50 /ColorSpace 16 0 R /Intent /Perceptual /SMask 17 0 R /BitsPerComponent 8 /Filter /FlateDecode >></span>

It seems that we are missing some data for our broken object (differences are in red). As said before, PDF is a very loosy format, it means that, unlike some other files specifications (BMP for example), the data offset isn’t mandatory, as PDF viewer will parse the entire document to find object delimiters. The best way to fix the object 13 is simply to copy/past the missing part, just as shown below, and to modify the length reference (12 0 R is the length for ImageXObect 11, and during our research we saw that object 14 seemed to be the length for the ImageXObject 13)

13 0 obj
<span style="color: #ff0000;"><< /Length 14 0 R /Type /XObject /Subtype /Image /Width 305 /Height 50 /ColorSpace 16 0 R /Intent /Perceptual /SMask 17 0 R /BitsPerComponent 8 /Filter /FlateDecode >></span>

Copy past!
Copy past!

OK, file saved, we just have to test it 🙂

Well... At least we can open it :)
Well… At least we can open it 🙂

Well well well. Not so bad, but there is no flag here.
After some tests (actually a lot, during hours), we saw that changing the ImageXObect width and height was giving us some results. We doubled the values and opened the PDF again.

Is it... a flag?
Is it… a flag?

We’re almost done! There seems to be some text in the image, but changing again the width/height, didn’t worked. Maybe there is something else to change ? After all, there aren’t lot of properties. Why not try to change the BitPerComponent to 16 ?


FINALLY! The flag was in front of us! We did it!

Flag is TMCTF{There is always light behind the clouds.}

Here is the fixed pdf 🙂

As said at the beginning of the write up, none of us were good at PDF forensics, so it was really cool to solve this challenge and to learn a bit how PDF is working

External links :
These links were helpful to solve this challenge, but also to learn how PDF is working.


The lsd

Leave a Reply

Your email address will not be published. Required fields are marked *