1 files changed, 601 insertions, 0 deletions
diff --git a/_posts/2022-12-31-linux-detours.md b/_posts/2022-12-31-linux-detours.md
new file mode 100644
index 0000000..062c6c9
--- /dev/null
+++ b/_posts/2022-12-31-linux-detours.md
@@ -0,0 +1,601 @@
+---
+layout: post
+title: A Tiny C (x86_64) Function Hooking Library
+author: Dylan Müller
+---
+
+> Function detouring is a powerful hooking technique that allows for the
+> interception of `C/C++` functions. `cdl86` aims to be a tiny `C` detours
+> library for `x86_64` binaries.
+
+1. [Overview](#overview)
+2. [JMP Patching](#jmp-patching)
+3. [INT3 Patching](#int3-patching)
+4. [Code Injection](#code-injection)
+5. [API](#api)
+6. [Source Code](#source-code)
+
+# Overview
+
+Note: This article details the linux specific details of the library. Windows
+support has since been added.
+
+See:
+[https://github.com/lunar-rf/cdl86](https://github.com/lunar-rf/cdl86)
+
+[Microsoft Research](https://en.wikipedia.org/wiki/Microsoft_Research) currently
+maintains a library known as [MS Detours](https://github.com/microsoft/Detours).
+It allows for the interception of Windows `API` calls within the memory address
+space of a process.
+
+This might be useful in certain situations such as if you are writing a `D3D9`
+(`DirectX`) hook and you need to intercept cetain graphics routines. This is
+commonly done for `ESP` and wallhacks where the `Z-buffer` needs to be
+disabled for certain character models, for `D3D9` this might involve hooking
+`DrawIndexedPrimitive`.
+
+
+```
+HRESULT WINAPI hkDrawIndexedPrimitive(LPDIRECT3DDEVICE9 pDevice, ...args)
+{
+    // Check play model strides, primitive count, etc
+    ...
+    pDevice->SetRenderState(D3DRS_ZENABLE, false);
+    ...
+    // Call original function and return
+    oDrawIndexedPrimitive(...)
+    return ...
+}
+```
+
+In order to disable the `Z-buffer` in this example we need access to a valid
+`LPDIRECT3DDEVICE9` context within the running process. This is where detours
+comes in handy. Generally, the procedure to hook a specific function is as
+follows:
+
+- Declare a function pointer with target function signature:
+
+```
+typedef HRESULT (WINAPI* tDrawIndexedPrimitive)(LPDIRECT3DDEVICE9 pDevice, ...args);
+```
+
+- Define detour function with same function signature:
+
+```
+HRESULT WINAPI hkDrawIndexedPrimitive(LPDIRECT3DDEVICE9 pDevice, ...args)
+```
+
+- Assign the function pointer the target functions address in memory. In this
+  case a `VTable` entry.
+
+```
+#define DIP 0x55
+tDrawIndexedPrimitive oDrawIndexedPrimitive = (oDrawIndexedPrimitive)SomeVTable[DIP];
+```
+
+- Call DetourFunction:
+
+```
+DetourFunction((void**)&oDrawIndexedPrimitive, &hkhkDrawIndexedPrimitive)
+```
+
+`DetourFunction` then uses the `oDrawIndexedPrimitive` function pointer and
+modifies the instructions at the target function in order to transfer control
+flow to the detour function.
+
+At this point any calls to `DrawIndexedPrimitive` within the `LPDIRECT3DDEVICE9`
+class will be rerouted to `hkDrawIndexedPrimitive`. You can see that this is a
+very powerful concept and gives us access to the callee's function arguments. As
+demonstrated, it is possible to hook both `C` and `C++` functions.
+
+The difference generally is that the first argument to a `C++` function is a
+hidden `this` pointer. Therefore you can define a `C++` detour in `C` with this
+extra argument.
+
+Detours is great, but it is only available for Windows. The aim of the `cdl86`
+project is to create a simple, compact detours library for `x86_64` Linux. What
+follows is a brief explanation on how the library was designed.
+
+# Detour methods
+
+Two different approaches to method detouring were investigated and implemented
+in the `cdl86` `C` library. First let's have a look at a typical function call for a
+simple `C` program. We will be using `GDB` to inspect the resulting disassembly.
+
+```
+#include <stdio.h>
+
+int add(int x, int y)
+{
+    return x + y;
+}
+int main()
+{
+    printf("%i", add(1,1));
+    return 0;
+}
+```
+
+Compile with:
+```
+gcc main.c -o main
+```
+
+and then debug with `GDB`:
+
+```
+gdb main
+```
+
+To list all the functions in the binary, supply `info functions` to the `gdb`
+command prompt.
+
+```
+0x0000000000001100  __do_global_dtors_aux
+0x0000000000001140  frame_dummy
+0x0000000000001149  add
+0x0000000000001161  main
+0x00000000000011a0  __libc_csu_init
+0x0000000000001210  __libc_csu_fini
+0x0000000000001218  _fini
+```
+
+Let's disassemble the main function with `disas /r main`:
+
+```
+Dump of assembler code for function main:
+   0x0000000000001161 <+0>:     f3 0f 1e fa     endbr64
+   0x0000000000001165 <+4>:     55      push   %rbp
+   0x0000000000001166 <+5>:     48 89 e5        mov    %rsp,%rbp
+   0x0000000000001169 <+8>:     be 01 00 00 00  mov    $0x1,%esi
+   0x000000000000116e <+13>:    bf 01 00 00 00  mov    $0x1,%edi
+   0x0000000000001173 <+18>:    e8 d1 ff ff ff  callq  0x1149 <add>
+   0x0000000000001178 <+23>:    89 c6   mov    %eax,%esi
+```
+
+`callq` has one operand which is the address of the function being called. It
+pushes the current value of `%rip` (next instruction after call) onto the stack
+and then transfers control flow to the target function.
+
+You may have also noticed the presence of the `endbr64` instruction. This
+instruction is specific to Intel processors and is part of [Intel's Control-Flow
+Enforcement Technology
+(CET)](https://software.intel.com/content/www/us/en/develop/articles/technical-look-control-flow-enforcement-technology.html).
+`CET` is designed to provide hardware protection against `ROP` (Return-orientated
+Programming) and similar methods which manipulate control flow using *existing*
+byte code.
+
+It's two main features are:
+
+* A shadow stack for tracking return addresses.
+* Indirect branch tracking, which `endbr64` is a part of.
+
+`Intel CET` however does not prevent us from modifying control flow **directly**
+by inserting instructions into memory.
+
+# JMP Patching
+
+The first method of function detouring we will explore is by inserting a `JMP`
+instruction at the beginning of the target function to transfer control over to
+the detour function. It should be noted that in order to preserve the stack we
+need to use a `JMP` (specifically `jmpq`) instruction rather than a `CALL`.
+
+Since there is no way to pass a `64-bit` address to the `jmpq` instruction we will
+have to first store the address we want to jump to into a register. We need to
+choose a register that is not part of the `__cdecl` (defualt) calling
+convention. `%rax` happens to be a register that is not part of the `__cdecl`
+userspace calling convention and so for simplicity we use this register in our
+design.
+
+The following is a disassembly of the instructions required for a `JMP` to a
+`64-bit` immediate address:
+
+```
+0x0000555555561389 <+0>: 48 b8 b1 13 56 55 55 55 00 00 movabs $0x5555555613b1,%rax
+0x0000555555561393 <+10>: ff e0	jmpq   *%rax
+```
+
+You can see that `12` bytes are required to encode the `movabs` instruction (which
+moves the detour address into `%rax`) as well as the `jmpq` instruction.
+Immediate values are stored in little endian (LE) encoding.
+
+So we can therefore conclude that we need to patch **at least** `12` bytes in
+memory at the location of our target function. These `12` bytes however are
+important and we cannot simply discard them. It turns out that we actually place
+these bytes at the start of what I will call a 'trampoline function', it's
+layout is as follows:
+
+```
+trampoline <0x23215412>:
+    (original instruction bytes which were patched)
+    JMP (target + JMP patch length)
+```
+
+Simply put, the trampoline function behaves as the original, unpatched function.
+As shown above it consists of the target function's original instruction bytes
+as well as a call to the target function, offset by the `JMP` patch length.
+
+The trampoline generation code for `cdl86` is shown below:
+
+```
+uint8_t *cdl_gen_trampoline(uint8_t *target, uint8_t *bytes_orig, int size)
+{
+    uint8_t *trampoline;
+    int prot = 0x0;
+    int flags = 0x0;
+
+    /* New function should have read, write and
+     * execute permissions.
+     */
+    prot = PROT_READ | PROT_WRITE | PROT_EXEC;
+    flags = MAP_PRIVATE | MAP_ANONYMOUS;
+
+    /* We use mmap to allocate trampoline memory pool. */
+    trampoline = mmap(NULL, size + BYTES_JMP_PATCH, prot, flags, -1, 0);
+    memcpy(trampoline, bytes_orig, size);
+    /* Generate jump to address just after call
+     * to detour in trampoline. */
+    cdl_gen_jmpq_rax(trampoline + size, target + size);
+
+    return trampoline;
+}
+```
+
+You can see that the allocation of the trampoline function occurs through a call
+to `mmap` with the `PROT_READ | PROT_WRITE | PROT_EXEC` memory protection flags.
+
+Therefore it should also be noted that the correct memory permissions should be
+set for both the target function before modification as well as the trampoline
+function, after allocation. Here is a snippet from the `cdl86` library for
+setting memory attributes:
+
+```
+/* Set R/W memory protections for code page. */
+int cdl_set_page_protect(uint8_t *code)
+{
+    int perms = 0x0;
+    int ret = 0x0;
+
+    /* Read, write and execute perms. */
+    perms = PROT_EXEC | PROT_READ | PROT_WRITE;
+    /* Calculate page size */
+    uintptr_t page_size = sysconf(_SC_PAGE_SIZE);
+    ret = mprotect(code - ((uintptr_t)(code) % page_size), page_size, perms);
+
+    return ret;
+}
+```
+
+The general procedure to place the `JMP` hook is as follows:
+
+1. Determine the minimum number of bytes required for a `JMP` patch.
+2. Create trampoline function.
+3. Set memory permissions (read, write, execute).
+4. Generate `JMP` to detour at target function.
+5. Fill unused bytes with `NOP`.
+6. Assign trampoline address to target function pointer.
+
+Let's have a look at all of this in action using `GDB`. I will be using the
+[basic_jmp.c](https://github.com/lunar-rf/cdl86/blob/master/tests/basic_jmp.c)
+test case in the `cdl86` library. The source code for this test case is shown
+below:
+
+```
+#include "cdl.h"
+
+typedef int add_t(int x, int y);
+add_t *addo = NULL;
+
+int add(int x, int y)
+{
+    printf("Inside original function\n");
+    return x + y;
+}
+
+int add_detour(int x, int y)
+{
+    printf("Inside detour function\n");
+    return addo(5,5);
+}
+
+int main()
+{
+    struct cdl_jmp_patch jmp_patch = {};
+    addo = (add_t*)add;
+
+    printf("Before attach: \n");
+    printf("add(1,1) = %i\n\n", add(1,1));
+
+    jmp_patch = cdl_jmp_attach((void**)&addo, add_detour);
+    if(jmp_patch.active)
+    {
+        printf("After attach: \n");
+        printf("add(1,1) = %i\n\n", add(1,1));
+        printf("== DEBUG INFO ==\n");
+        cdl_jmp_dbg(&jmp_patch);
+    }
+
+    cdl_jmp_detach(&jmp_patch);
+    printf("\nAfter detach: \n");
+    printf("add(1,1) = %i\n\n", add(1,1));
+
+    return 0;
+}
+```
+
+We compile the following source file with (modified from makefile):
+
+```
+gcc -I../ -g basic_jmp.c ../cdl.c ../lib/libudis86/*.c -g -o basic_jmp
+```
+
+Then load into `GDB` using:
+
+```
+gdb basic_jmp
+```
+
+Once `GDB` has loaded, we insert a breakpoints at lines `24` and `27` using the
+command:
+
+```
+break 24
+break 27
+```
+
+We start execution of the program with:
+
+```
+run
+```
+
+`GDB` will then inform you that the first breakpoint has been triggered. For this
+first breakpoint we are interested in the `add()` function's assembly before the
+hook has taken place. To inspect this assembly, provide:
+
+```
+disas /r add
+```
+```
+Dump of assembler code for function add:
+   0x0000555555561389 <+0>:	f3 0f 1e fa	endbr64
+   0x000055555556138d <+4>:	55	push   %rbp
+   0x000055555556138e <+5>:	48 89 e5	mov    %rsp,%rbp
+   0x0000555555561391 <+8>:	48 83 ec 10	sub    $0x10,%rsp
+   0x0000555555561395 <+12>:	89 7d fc	mov    %edi,-0x4(%rbp)
+```
+
+This is the disassembly of the unaltered target function. `12` bytes for the `JMP`
+patch will have to be written at this address. Therefore the first `4`
+instructions will need to be written to the trampoline function followed by a
+`JMP` to address `0x0000555555561395` and that's all we need for the trampoline!
+
+Now the fun part! Let's continue execution to the next breakpoint, where our
+`JMP` hook will be placed.
+
+```
+continue
+```
+
+Let's examine the disassembly of our `add()` function once again:
+
+```
+Dump of assembler code for function add:
+   0x0000555555561389 <+0>: 48 b8 b1 13 56 55 55 55 00 00 movabs $0x5555555613b1,%rax
+   0x0000555555561393 <+10>: ff e0	jmpq   *%rax
+   0x0000555555561395 <+12>: 89 7d fc	mov    %edi,-0x4(%rbp)
+   0x0000555555561398 <+15>: 89 75 f8	mov    %esi,-0x8(%rbp)
+```
+
+`0x5555555613b1` is the address of our detour/intercept function. Let's examine
+the disassembly of our detour function:
+
+```
+disas /r 0x5555555613b1
+```
+
+```
+Dump of assembler code for function add_detour:
+   0x00005555555613b1 <+0>:	f3 0f 1e fa	endbr64
+   0x00005555555613b5 <+4>:	55	push   %rbp
+   0x00005555555613b6 <+5>:	48 89 e5	mov    %rsp,%rbp
+   0x00005555555613b9 <+8>:	48 83 ec 10	sub    $0x10,%rsp
+   0x00005555555613bd <+12>:	89 7d fc	mov    %edi,-0x4(%rbp)
+   0x00005555555613c0 <+15>:	89 75 f8	mov    %esi,-0x8(%rbp)
+   0x00005555555613c3 <+18>:	48 8d 3d 53 5c 00 00	lea    0x5c53(%rip),%rdi
+   0x00005555555613ca <+25>:	e8 b1 fd ff ff	callq  0x555555561180 <puts@plt>
+   0x00005555555613cf <+30>:	48 8b 05 ba bc 01 00	mov    0x1bcba(%rip),%rax
+   0x00005555555613d6 <+37>:	be 05 00 00 00	mov    $0x5,%esi
+   0x00005555555613db <+42>:	bf 05 00 00 00	mov    $0x5,%edi
+   0x00005555555613e0 <+47>:	ff d0	callq  *%rax
+   0x00005555555613e2 <+49>:	c9	leaveq
+   0x00005555555613e3 <+50>:	c3	retq
+```
+
+We can see that a call to our trampoline function is made to the address given
+by referencing the `QWORD` (out function pointer) at address `0x55555557d090`,
+let's deference it:
+
+```
+print /x *(long unsigned int*)(0x55555557d090)
+```
+```
+$20 = 0x7ffff7ffb000
+```
+
+So the function pointer is pointing to address `0x7ffff7ffb000` which is our
+trampoline function, let's dissasemble it:
+
+```
+x/10i 0x7ffff7ffb000
+```
+
+```
+   0x7ffff7ffb000:	endbr64
+   0x7ffff7ffb004:	push   %rbp
+   0x7ffff7ffb005:	mov    %rsp,%rbp
+   0x7ffff7ffb008:	sub    $0x10,%rsp
+   0x7ffff7ffb00c:	movabs $0x555555561395,%rax
+   0x7ffff7ffb016:	jmpq   *%rax
+   0x7ffff7ffb018:	add    %al,(%rax)
+   0x7ffff7ffb01a:	add    %al,(%rax)
+   0x7ffff7ffb01c:	add    %al,(%rax)
+   0x7ffff7ffb01e:	add    %al,(%rax)
+```
+
+You can see that our trampoline contains the first `4` instructions that were
+replaced when the `JMP` patch was placed in our target function. You can see a
+jmp back to address `0x555555561395` which was disassembled earlier. This should
+give you an idea of how the control flow modification is achieved.
+
+# INT3 Patching
+
+There is another method of function detouring which involves placing `INT3`
+breakpoints at the start of the target function in memory. `INT3` breakpoints
+are encoded with the `0xCC` opcode:
+
+```
+/* Generate int3 instruction. */
+uint8_t *cdl_gen_swbp(uint8_t *code)
+{
+    *(code + 0x0) = 0xCC;
+    return code;
+}
+```
+
+So rather than placing a `JMP` patch to the detour we simply write the byte
+`0xCC` to the target function being careful to `NOP` the unused bytes. Once the
+`RIP` register reaches an address of an `INT3` breakpoint the Linux kernel sends
+a `SIGTRAP` signal to the process.
+
+We can register our own signal handler but we need some additional info on the
+signal such as context information. A context is the state of a program's
+registers and stack. We need this info to compare the breakpoints `RIP` value to
+any active global software breakpoints.
+
+This is how the signal handler is registered in `cdl86`:
+
+```
+ struct sigaction sa = {};
+
+    /* Initialise cdl signal handler. */
+    if (!cdl_swbp_init)
+    {
+        /* Request signal context info which
+         * is required for RIP register comparison.
+         */
+        sa.sa_flags = SA_SIGINFO | SA_ONESHOT;
+        sa.sa_sigaction = (void *)cdl_swbp_handler;
+        sigaction(SIGTRAP, &sa, NULL);
+        cdl_swbp_init = true;
+    }
+    ...
+```
+
+Note the use of `SA_SIGINFO` to get context information. The software breakpoint
+handler is then defined as follows:
+
+```
+void cdl_swbp_handler(int sig, siginfo_t *info, struct ucontext_t *context)
+{
+    int i = 0x0;
+    bool active = false;
+    uint8_t *bp_addr = NULL;
+
+    /* RIP register point to instruction after the
+     * int3 breakpoint so we subtract 0x1.
+     */
+    bp_addr = (uint8_t *)(context->uc_mcontext.gregs[REG_RIP] - 0x1);
+
+    /* Iterate over all breakpoint structs. */
+    for (i = 0; i < cdl_swbp_size; i++)
+    {
+        active = cdl_swbp_hk[i].active;
+        /* Compare breakpoint addresses. */
+        if (bp_addr == cdl_swbp_hk[i].bp_addr)
+        {
+            /* Update RIP and reset context. */
+            context->uc_mcontext.gregs[REG_RIP] = (greg_t)cdl_swbp_hk[i].detour;
+            setcontext(context);
+        }
+    }
+}
+```
+
+Note that if a match of the `RIP` value to any known breakpoints occurs the `RIP`
+value for the current context is updated and the new context applied using
+`setcontext()`. A trampoline function similar to our `JMP` patch is allocated
+and serves the same purpose.
+
+# Code Injection
+
+`cdl86` assumes that you are operating in the address space of the target
+process. Therefore code injection is often required in practice and requires the
+use of an
+[injector](https://github.com/lunar-rf/robocraft/tree/main/injector).
+
+Once a shared library (`.so`) has been injected you can use the following code
+to get the base address of the main executable module:
+
+```
+#include <link.h>
+#include <inttypes.h>
+
+int __attribute__((constructor)) init()
+{
+...
+    struct link_map *lm = dlopen(0, RTLD_NOW);
+    printf("base = %" PRIx64 , lm->l_addr);
+...
+
+}
+```
+
+Or find the address of a function by symbol name:
+
+```
+void* dl_handle = dlopen(NULL, RTLD_LAZY);
+void* add_ptr = dlsym(dl_handle, "add");
+```
+
+# API
+The API for the `cdl86` library is shown below:
+
+```
+struct cdl_jmp_patch cdl_jmp_attach(void **target, void *detour);
+struct cdl_swbp_patch cdl_swbp_attach(void **target, void *detour);
+void cdl_jmp_detach(struct cdl_jmp_patch *jmp_patch);
+void cdl_swbp_detach(struct cdl_swbp_patch *swbp_patch);
+void cdl_jmp_dbg(struct cdl_jmp_patch *jmp_patch);
+void cdl_swbp_dbg(struct cdl_swbp_patch *swbp_patch);
+```
+
+# Source code
+You can find the `cdl86` source code
+[here](https://github.com/lunar-rf/cdl86).<br>
+
+# Signature
+
+```
++---------------------------------------+
+|   .-.         .-.         .-.         |
+|  /   \       /   \       /   \        |
+| /     \     /     \     /     \     / |
+|        \   /       \   /       \   /  |
+|         "_"         "_"         "_"   |
+|                                       |
+|  _   _   _ _  _   _   ___   ___ _  _  |
+| | | | | | | \| | /_\ | _ \ / __| || | |
+| | |_| |_| | .` |/ _ \|   /_\__ \ __ | |
+| |____\___/|_|\_/_/ \_\_|_(_)___/_||_| |
+|                                       |
+|                                       |
+| Lunar RF Labs                         |
+| https://lunar.sh                      |
+|                                       |
+| Research Laboratories                 |
+| Copyright (C) 2022-2025               |
+|                                       |
++---------------------------------------+
+```
+