Python Unescape 16-bit Unicode String to File

Archived here for me, maybe someone else will need it. Frequently when our analysts are doing malcode analysis, particularly on malicious PDF documents, they see shellcode in the form of 16-bit Unicode values that are then unescaped into the heap calling the Javascript unescape() function. Problem is, we do most of our malicious Javascript analysis from the command line with Spidermonkey, and it has some truncation issues with unescaping 16-bit Unicode correctly (it handles ASCII just fine). The devs are well aware of the issue, btw, so don’t bother them ;-).

So I wrote a quickie to take a string, massage it to the right byte order, and slap it to STDOUT, which the analyst can then redirect to a file or whatever. If there is a much easier way to do this, I’m all ears.


import binascii
import sys
import re

# print usage if args wrong
if len(sys.argv) > 2 or len(sys.argv) < 2:
  print "Usage: " + sys.argv[0] + " <string to decode>"
  print "where string is something like '%u30CC%u4560'"
  print "Keep in mind this only works for unicode 16-bit"
  print "which means 2 bytes (four hexadecimal chars with %u"
  print "in front of them)."

# convert string to upper since we don't care
string = sys.argv[1].upper()

# clean up the string for processing, do some rudimentary input validation
if re.findall(r'[^UA-F0-9\\%]', string):
  print "invalid string submitted\nonly the following chars are allowed:"
  print ''' % \ u U A-F a-f 0-9 ' " '''
string = string.strip('"').strip("'")
string = re.sub(r'(%|\\)[U]', '', string)

# check one last time that we have only hex
if re.findall(r'[^A-F0-9]', string):
  print "invalid string submitted\nonly the following chars are allowed:"
  print ''' % \ u U A-F a-f 0-9 ' " '''

# split up the string, do our stuff with hex
a = []
for i in string: a.append(i)
if len(a) % 4 != 0:
  print "you are missing some characters, must be in groups of 4"
  print "did your copy mess up?"
b = ""
while len(a) > 0:
  b1 = a.pop(0) + a.pop(0)
  b2 = a.pop(0) + a.pop(0)
  b = b + b2 + b1

result = binascii.a2b_hex(b)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s