English [Pwn2Win 2016] [Forensics 80 – Samuel Riff Breese] Write Up

Description

We are presented with a PNG image file which is hidding top secret informations from mister Riff.

Download

Question: the flag will be XXX (Upercase)

Resolution


Introduction:

At first sight the title was about Samuel “Riff” Breese.
When you talk about Riff, you’re usually talking about audio, and more generaly about WAV files : Wikipedia – WAV/RIFF.
We were pretty sure we were going to have to create a wav file at some point and we started looking for the WAV header datas.


Looking for the datas:

The file looked like a full black picture.

image

By looking at the actual pixels values (not their color representation, which is black) we saw that they aren’t all the same.
Creating a new image where the pixel value become the pixel color allowed us to distinguish interesting information:

value to pixelHave you seen closely enough? Yes, that’s text, tiny tiny text.
Looking the remaining time,  there was only 6 hours left …  Time to grab a pencil and papers!
..
..
..
Nooo, just kidding  :p

Getting the datas, the right way: 

We had a picture, but not a very clean one and OCR won’t like it very much.

notr

First step was to try to make it look nicer.
From the initial file, we generated a bunch of this file with a slit variation of the threshold value that will decide if the pixel will be black or white.

def enhance(filepath,value):
   im=Image.open(filepath)
   pixdata=im.load()
   for threshold in range(64):
      img = Image.new("L",(im.size[0],im.size[1]),255)
      pixdata2=img.load()
      for y in xrange(im.size[1]):
         for x in xrange(im.size[0]):
            if pixdata[x,y] >threshold:
               pixdata2[x,y]=255
            else : pixdata2[x,y]=0

      img.save('enh_%i_'%threshold+filepath)

The result was 64 images to chose from.
We selected the file #38 (anything around 35-40 has a good quality too) because letters/numbers were not overlapping and well outlined.

withtr

Second step was to separate this files into lines of text in order to feed the OCR.
The OCR works better with big pictures, so we also had to increase each line.
PIL offers several ways to do it, we tried some of those and checked which one gave us the best result:

def enhance_image(filename):
   methods=[Image.ANTIALIAS, Image.BILINEAR, Image.BICUBIC, Image.NEAREST]
   for method in methods :
      image_name=increase(filename,4,method)
      txt=image_to_txt(image_name,1)
      print "method %i - len : %i - error:%i"%(method,len(txt),txt.count(' '))

little

Image.BICUBIC offered the best results, we used this one.

big0

We cut the file into lines, upscaled it a little bit and stored it into a bmp file which we used with ‘tesseract’ from the shell.

def cut_image(filepath,size=14):
   im=Image.open(filepath)
   pixdata=im.load()
   for i in range(3374/size):
      img = Image.new("L",(im.size[0],size),255)
      pixdata2=img.load()
      for y in xrange(i*size,(i+1)*size):
         for x in range(im.size[0]):
            pixdata2[x,y-(i*size)]=pixdata[x,y]
      img.save('%i_'%i+filepath[:-3]+'bmp')
      increase('%i_'%i+filepath[:-3]+'bmp',4,Image.BICUBIC)
   return i

 

In order to help our OCR, we just created a rule telling it to look only for hexadecimal values.

def image_to_txt(base_filename, number_of_files):
   for i in range(number_of_files):
      filename='%i%s'%(i,base_filename)
      output='myout%i'%i
      args = ['tesseract', filename, output,'/opt/local/share/tessdata/configs/justhexa'] #limiting the possible chars to 0-9A-F
      proc = subprocess.Popen(args)
   return 1

 

Improving data:

At this moment we had 241 files, each one corresponding to a line of text.
We have concatenated all the files and … nothing! There was lots of mistakes, we had to clean it again //ocr <3 🙁

-Comments are in the code-

def make_final_file(filepath='',datas=''):
   if datas=='':
      f=open(filepath,'rb')
      datas=f.read()
      f.close()

   #First problem with Tesseract, we sometimes have 'blank' instead of a char
   #looking at it closely, we can see that most of the times it read badly a '7'
   #some times it also miss a '0' in '808' series
   for j in range(len(datas)):
      if datas[j] not in '0123456789ABCDEF':
         if reponse[j-1]=='8' :reponse=reponse[:j]+'0'+reponse[j+1:]
         else : reponse=reponse[:j]+'7'+reponse[j+1:]

   #Second problem. Tesseract sometimes miss a char or discover 2 chars instead of one
   #As we will have to convert it to hexa values, adding 1 char create a huge problem for us as it changes all the datas
   #We have a lots of 80, which is 0 in a wave file (no sound).
   #we will use these to identify and correct bad alignement of text and minimise problems (thew not solving them totaly)
   out=""
   offset=0
   for i in range(len(datas)/2-6):
      if datas[i*2-offset:i*2+6-offset]=="080808":
         out+="00"
         offset+=1
      else : out+=datas[i*2-offset:(i+1)*2-offset]
   #writing it to a file, just in case
   f=open('rawoutput_texte_corrected.txt','wb')
   f.write(out)
   f.close()
   #creating our final WAV RIFF File
   hex_data=binascii.unhexlify(out)
   g=open('rawfile4.wav','wb')
   g.write(hex_data)
   g.close()&lt;/pre&gt;

 

We finally have our wav file.

Not very nice isn’t it?  OCR may have mixed few values!
That was not a big deal in the sound’s data, but it could be if the header is messed up.
We took the wav file in a hex editor, and checked at the header (compared to the first line of text)

wav-sound-format

(Source: http://soundfile.sapp.org/doc/WaveFormat/)

Once the header was corrected, it was only necessary to adjust the file length and the data length chunk.
That’s all!  We tried to listen the flag again!

We still don’t heard the flag! 🙁
They warned us that the flag had to be in uppercase, would they have screamed?

But wait… these bips were pretty familiar:

morse

We just needed to traduct this in morse code : Wikipedia Morse code

morse code

WRITEBAZINGATOGOON

Flag was : CTF-BR{BAZINGA}

Leave a Reply

Your email address will not be published. Required fields are marked *