[NDH 2017] [Stegano 150 – Codetalkers] Write Up

Description

It may be oldschool but was proven effective during a war…

Will you be able to obtain the message in time ?

Resolution

We were given a ~15MB GIF file.
It was composed of 1247 frames, each one having a logo like this:

Some logos appeared few times on the screen in the first seconds.
It probably means it’s a text written with a dingbat font, shown letter by letter.
Our goal was to extract all the “letters” from the animation.

For this, we had to:

extract the logo of each frame
compare it to the other frames
if the logo is not known, give it an arbitrary letter
else if it’s already known, use the same arbitrary letter than the 1st one
append the letter to a variable
output the variable

As it’s easier to only compare the logo and not the full frame, we had to crop each frame to retrieve only the colored part:

Then we used the Python library Imagehash to analyze the logo and generate a hash of it.
There are 4 methods of calculation, but only 2 are interesting for our case.
Both are working with frequency domain analysis:

As the DWT is giving best results for images, we chose to use it. 🙂
After some attempts, we found we had an issue because some logos were quite identical but not enough.
The generated hashes were not identical, due to only one letter:

20428f0b0bdf7e3c #f0b0 versus:
20428f0f0bdf7e3c #f0f0

In order to fix it, we had to check how identical are the 2 strings.
One of the easiest way is to use the Hamming distance.
It’s just an Exclusive or logical operation of the 2 hashes: Hash #1 ⊕ Hash #2.

The final Python code:

#!/usr/bin/python3

# - libs -
import string
import os
from PIL import Image, ImageChops
import imagehash
import itertools
import operator

# - variables -
mapping = {}
characters = list(''.join(string.ascii_letters + string.digits))
result = ''
BandW = False # Convert image to B&W
debug = False

# - script -
def main():
  gif2png('codetalkers.gif', 'output')

def gif2png(gifimg, outFolder):
  global result
  frame = Image.open(gifimg)
  nframes = 0
  while frame:
    newframe = autocrop(frame, (0, 0, 0, 0))
    if not BandW:
      newframe = newframe.convert('L')
      newframe = newframe.point(lambda x: 0 if x<8 else 255, '1')
    outFile = '%s/%s-%s.png' % (outFolder, os.path.basename(gifimg), '{:06d}'.format(nframes))
    hash = str(imagehash.whash(newframe))
    if debug:
      print('testing',hash)
    if hash not in mapping:
      for i in mapping:
        score = hamming_distance(hash, i)
        if debug:
          print(hash,i,score)
        if score < 5:
          if debug:
            print(hash,'~',i)
          mapping[hash] = mapping[i]
          break
    if hash not in mapping:
      mapping[hash] = characters.pop(0)
    if debug:
      print('mapping['+hash+']=',mapping[hash])
      print(outFile,mapping[hash])
    result += mapping[hash]
    newframe.save(outFile, 'PNG')
    nframes += 1
    if debug:
      print(result)
    try:
      frame.seek(nframes)
    except EOFError:
      print(result)
      break;
  return True


def autocrop(im, bgcolor):
  if im.mode != "RGB":
    im = im.convert("RGB")
  bg = Image.new("RGB", im.size, bgcolor)
  diff = ImageChops.difference(im, bg)
  bbox = diff.getbbox()
  if bbox:
    return im.crop(bbox)
  return None

def hamming_distance(str1, str2):
  assert len(str1) == len(str2)
  return sum(map(operator.ne, str1, str2))

if __name__ == "__main__":
	main()

The output of the script was:

abcdefgbhbfifjfkllmnobpqmekrkibobcdqflbdrbhkjdhoijjpdsdcdknkipodtjqkiucdqflbtdhcbhlmvdpktdibjwbobsdcrkifbobkswpbtfibpjwbfhdgivkhfkijodtbcdefnofixrhdrhfbjkhmojkipkhpojwbtfhojbcdefgkoshbkjbpfiekrkiqmowfxbjkykunhfjkgwdgkorkhjdtjwbjbkcgdhyfixdiijjpdsdcdofcdpbcdqflbfijbhibjrlkjtdhcynhfjkjddyfiorfhkjfdithdcgbkjwbhtdhbskojojwkjnobpomcqdlojdowdggbkjwbhswfibobswkhksjbhokipojhbbjofxiokipthdcckixkjwkjnobpojdsuomcqdlojdbzrhboobcdjfdioonswkolfxwjqnlqoofxiftmfixfiorfhkjfdijwbtfhojobjdtbcdefgkoshbkjbpAorAhjdtfcdpbocbookxfixtbkjnhbojdwblrtksflfjkjbblbsjhdifssdccnifskjfdiAipjdobhvbkokpfojfixnfowfixtbkjnhbthdcdjwbhobhvfsbounhfjkshbkjbpjwbtfhojbcdefqkobpdijwbbzrhboofdiojwkjwbdqobhvbprbdrlbckufixkipdjwbhjwfixofijwbsfjmwniphbpodtbcdefswkhksjbhogbhbbisdpbpfijwbnifsdpbojkipkhpjwbkppfjfdiodhfxfikllmhbBnbojbpqmxddxlbykjcdcdfckhypkvfokipckhynooswbhbhghdjbjwbtfhojphktjtdhsdiofpbhkjfdiqmjwbnifsdpbjbswifsklsdccfjjbbAipkrrlbfisgwdobmkondyfpAkiprbjbhbpqbhxedfibpjwbtfhojdttfsfklnjsrhdrdoklgbijjwhdnxwkldixobhfbodtsdccbijfixqmcbcqbhodtjwbnifsdpbsdiodhjfnckipikjfdiklojkipkhpfCkjfdiqdpfbodtvkhfdnosdnijhfborkhjfsfrkjfixgwbijhkiocfjjbpbcdefomcqdlokhborbsftfbpkoAjgdqmjbobBnbisbsdixhkjnlkjfdioftmdnskihbkpjwfocbookxbjwbtlkxfowxzbbijvzmojhbgsCencmtgnzxChiprsyofntksmBcD

The 1st letter and the last one must be removed, as it was “Receiving transmission” and “Roger that” and not logos.
There was 27 unique letters, we deduced it was probably the letters A-Z plus the dot.
In fact no, there was a last “near hash”, but it doesn’t matter, this letter will be processed manually later.

A scrambled text with only 26 letters looks like a mono alphabetic substitution.
It means a letter has been modified by another one in the alphabet, for example: A -> T, B -> H, C -> I, etc.
To decode this ciphertext we can use one of these methods:

We chose the N-gram model as it gave the most efficient results and got the following clear message (spaces has been added):

emoji were initially used by japanese mobile operators ntt do com oau and softbanz mobile formerly vodafone these companies each defined their own variants of emoji using proprietary standards the first emoji was created in japan by shige takazu rita who was part of the team working on ntt do comosimo de mobile internet platform kurita took inspiration from weather forecasts that used symbols to show weather chinese characters and street signs and from manga that used stocz symbols to express emotions such as light bulbs signifying inspiration the first set of emoji was created Qs pQrt of imodes messaging features to help facilitate electronic communication Qnd to serve as a distinguishing feature from other services zurita created the first emoji based on the expressions that he observed people mazing and other things in the city hundreds of emoji characters were encoded in the unicode standard the additions originally reEuested by google kat momoi mark davis and markus scherer wrote the first draft for consideration by the unicode technical committee Qnd apple inc whose yasuo kid Q and pete redberg joined the first official utc proposal went through a long series of commenting by members of the unicode consortium and national standard iMation bodies of various countries participating when transmitted emoji symbols are specified as Q two byte seEuence congratulations if you can read this message the flag is h g xeen tvxy strew cMju my fwux g M r ndp c k siu fac yEm

It’s the article about Emoji on Wikipedia.

We modified the 2 incorrect letters:

the “Q” as “a” (i.e.: created Qs pQrt of -> created as part of)
the “E” as “q” (i.e.: two byte seEuence -> two byte sequence)

And.. removed the spaces to get the correct last sentence:

congratulations if you can read this message the flag is hgxeentvxystrewczjumyfwuxgzrndpcksiufacyqm

Flag was hgxeentvxystrewczjumyfwuxgzrndpcksiufacyqm

0x90r00t

[NDH 2017] [Stegano 150 – Codetalkers] Write Up

Description

Resolution

Leave a Reply Cancel reply

0x90r00t, 0x90r00f