summaryrefslogtreecommitdiff
path: root/_posts/2020-10-24-tiny-linux-c-binaries.md
diff options
context:
space:
mode:
Diffstat (limited to '_posts/2020-10-24-tiny-linux-c-binaries.md')
-rw-r--r--_posts/2020-10-24-tiny-linux-c-binaries.md389
1 files changed, 389 insertions, 0 deletions
diff --git a/_posts/2020-10-24-tiny-linux-c-binaries.md b/_posts/2020-10-24-tiny-linux-c-binaries.md
new file mode 100644
index 0000000..ce90f48
--- /dev/null
+++ b/_posts/2020-10-24-tiny-linux-c-binaries.md
@@ -0,0 +1,389 @@
+---
+layout: post
+title: Tiny C Binaries
+author: Dylan Müller
+---
+
+> By default, following the linking stage, `GCC` generates `ELF` binaries that contain
+> redundant section data that increase executable size.
+
+1. [ELF Binaries](#elf-binaries)
+2. [Size Optimisation](#size-optimisation)
+3. [Linux Syscalls](#linux-syscalls)
+4. [Custom Linker Script](#custom-linker-script)
+5. [GCC flags](#gcc-flags)
+6. [SSTRIP](#sstrip)
+7. [Source Code](#source-code)
+
+# ELF Binaries
+
+The standard file format for executable object code on Linux is `ELF` (Executable
+and Linkable Format), it is the successor to the older `COFF` `UNIX` file format.
+
+`ELF` Binaries consist of two sections, the `ELF` header and file data (object
+code). The `ELF` header format for `64-bit` binaries is shown in the table below:
+
+| Offset | Field | Description | Value |
+|--------|------------------------|----------------------------------------|---------------------------------------------------------------------------------------|
+| 0x00 | e_ident[EI_MAG0] | magic number | 0x7F |
+| 0x04 | e_ident[EI_CLASS] | 32/64-bit | 0x2 = 64-bit |
+| 0x05 | e_ident[EI_DATA] | endianness | 0x1 = little<br>0x2 = big |
+| 0x06 | e_ident[EI_VERSION] | elf version | 0x1 = original |
+| 0x07 | e_ident[EI_OSABI] | system ABI | 0x00 = System V<br>0x02 = NetBSD<br>0x03 = Linux<br>0x09 = FreeBSD<br> |
+| 0x08 | e_ident[EI_ABIVERSION] | ABI Version | * ignored for static-linked binaries<br>* vendor specific for dynamic-linked binaries |
+| 0x09 | e_ident[EI_PAD] | undefined | * padded with zeros |
+| 0x10 | e_type | object type | 0x00 = ET_NONE<br>0x01 = ET_REL<br>0x02 = ET_EXEC<br>0x03 = ET_DYN<br>0x04 = ET_CORE |
+| 0x12 | e_machine | system ISA | 0x3E = amd64<br>0xB7 = ARM (v8/64) |
+| 0x14 | e_version | elf version | 0x1 = original |
+| 0x18 | e_entry | entry point | 64-bit entry point address |
+| 0x20 | e_phoff | header table offset | 64-bit program header table offset |
+| 0x28 | e_shoff | section table offset | 64-bit section header table offset |
+| 0x30 | e_flags | undefined | vendor specific or pad with zeros |
+| 0x34 | e_ehsize | elf header size | 0x40 = 64bits, 0x20 = 32bits |
+| 0x36 | e_phentsize | header table size | - |
+| 0x38 | e_phnum | #(num) entries in header table | - |
+| 0x3A | e_shentsize | section table size | - |
+| 0x3C | e_shnum | #(num) entries in section table | - |
+| 0x3E | e_shstrndx | section names index into section table | - |
+| 0x40 | | | End of 64-bit ELF |
+
+These data fields are used by the Linux `PL` (program loader) to resolve the entry
+point for code execution along with various fields such as the `ABI` version, `ISA`
+type, as well as section listings.
+
+A sample hello world program is shown below and was compiled with `GCC` using `gcc
+main.c -o example`.
+
+```
+#include <stdio.h>
+
+int main(int agrc, char *argv[]){
+ printf("Hello, World!");
+ return 0;
+}
+```
+
+This produced an output executable of almost **~17 KB** ! If you've ever
+programmed in assembly you might be surprised at the rather large file size for
+such a simple program.
+
+`GNU-binutils` `objdump` allows us to inspect the full list of `ELF` sections with
+the `-h` flag.
+
+After running `objdump -h example` on our sample binary we see that there are a
+large number of `GCC` derived sections: `.gnu.version` and `.note.gnu.property`
+attached to the binary image. The question becomes how much data these
+additional sections are consuming and to what degree can we 'strip' out
+redundant data.
+
+![enter image description here](https://journal.lunar.sh/images/2/01.png)
+
+`GNU-binutils` comes with a handy utility called `strip`, which attempts to remove
+unused `ELF` sections from a binary. Running `strip -s example` results only in a
+slightly reduced file of around **~14.5 KB**. Clearly, we need to strip much
+more! :open_mouth:
+
+# Size Optimisation
+
+`GCC` contains a large number of optimisation flags, these include the common :
+`-O2 -O3 -Os` flags as well as many more less widely used compile time options,
+which we will explore further. However, since we have not yet compiled with any
+optimisation thus far, and as a first step we recompile the above example with
+`-Os`, to optimise for size.
+
+![meme](https://journal.lunar.sh/images/memes/meme_00.png)
+
+And we see no decrease in size! This is expected behaviour however, since the
+`-Os` flag does not consider all redundant section data for removal, on the
+contrary the additional section information placed by `GCC` in the output binary
+is considered useful at this level of optimisation.
+
+In addition, the use of `printf` binds object code from the standard library
+into the final output executable and so we will instead call through to the
+Linux kernel directly to print to the standard output stream.
+
+# Linux syscalls
+
+System calls on Linux are invoked with the `x86_64` `syscall` opcode and syscall
+parameters follow a very specific order on `64-bit` architectures. For `x86_64`
+([System V ABI - Section
+A.2.1](https://refspecs.linuxfoundation.org/elf/x86_64-abi-0.99.pdf)), the order
+of parameters for linux system calls is as follows:
+
+| description | register (64-bit) |
+|----------------|----------|
+| syscall number | rax |
+| arg 1 | rdi |
+| arg 2 | rsi |
+| arg 3 | rdx |
+| arg 4 | r10 |
+| arg 5 | r8 |
+| arg 6 | r9 |
+
+
+Arguments at user mode level (`__cdecl` calling convention), however, are parsed in
+the following order:
+
+| description | register (64-bit) |
+|-------------|-----|
+| arg 1 | rdi |
+| arg 2 | rsi |
+| arg 3 | rdx |
+| arg 4 | rcx |
+| arg 5 | r8 |
+| arg 6 | r9 |
+
+To call through to the linux kernel from `C`, an assembly wrapper was required to
+translate user mode arguments (`C` formal parameters) into kernel `syscall`
+arguments:
+
+```
+syscall:
+ mov rax,rdi
+ mov rdi,rsi
+ mov rsi,rdx
+ mov rdx,rcx
+ mov r10,r8
+ mov r8,r9
+ syscall
+ ret
+```
+
+We may then make a call to this assembly routine from `C` using the following
+function signature:
+
+```
+void* syscall(
+ void* syscall_number,
+ void* param1,
+ void* param2,
+ void* param3,
+ void* param4,
+ void* param5
+);
+```
+
+To write to the standard output stream we invoke syscall `0x1`, which handles
+file output. A useful `x86_64` Linux syscall table can be found
+[here](https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/).
+Syscall `0x1` takes three arguments and has the following signature:
+
+`sys_write( unsigned int fd, const char *buf, size_t count)`
+
+A file called `base.c` was created, implementing both `syscall` and print wrappers:
+
+```
+// base.c
+typedef unsigned long int uintptr;
+typedef long int intptr;
+
+void* syscall(
+ void* syscall_number,
+ void* param1,
+ void* param2,
+ void* param3,
+ void* param4,
+ void* param5
+);
+
+static intptr print(void const* data, uintptr nbytes)
+{
+ return (intptr)
+ syscall(
+ (void*)1, /* sys_write */
+ (void*)(intptr)1, /* STD_OUT */
+ (void*)data,
+ (void*)nbytes,
+ 0,
+ 0
+ );
+}
+
+int main(int agrc, char *argv[]){
+ print("Hello, World", 12)
+ return 0;
+}
+```
+
+In order to instruct `GCC` to prevent linking in standard library object code, the
+`-nostdlib` flag should be passed at compile time. There is one caveat however,
+in that certain symbols, such as `_start` , which handle program startup and the
+parsing of the command line arguments to `main` , will be left up to us to
+implement, otherwise we will segfault :-/
+
+However, this is quite trivial and luckily program initialisation is well
+defined by -- [System V ABI - Section
+3.4](https://refspecs.linuxfoundation.org/elf/x86_64-abi-0.99.pdf).
+
+Initially it is specified that register `rsp` hold the argument count, while the
+address given by `rsp+0x8` hold an array of `64-bit` pointers to the argument
+strings.
+
+From here the argument count and string pointer array index can be passed to
+`rdi` and `rsi` respectively, the first two parameters of `main()` . Upon exit,
+a call to syscall `0x3c` is then made to handle program termination gracefully.
+
+Both the syscall and program startup assembly wrappers (written in GAS) were
+placed in a file called `boot.s`:
+
+```
+/* boot.s */
+.intel_syntax noprefix
+.text
+.globl _start, syscall
+
+_start:
+ xor rbp,rbp /* rbp = 0 */
+ pop rdi /* rdi = argc, rsp= rsp + 8 */
+ mov rsi,rsp /* rsi = char *ptr[] */
+ and rsp,-16 /* align rsp to 16 bytes */
+ call main
+ mov rdi,rax /* rax = main return value */
+ mov rax,60 /* syscall= 0x3c (exit) */
+ syscall
+ ret
+
+syscall:
+ mov rax,rdi
+ mov rdi,rsi
+ mov rsi,rdx
+ mov rdx,rcx
+ mov r10,r8
+ mov r8,r9
+ syscall
+ ret
+```
+
+Finally gcc was invoked with `gcc base.c boot.s -nostdlib -o base`.
+
+![enter image description here](https://journal.lunar.sh/images/2/05.png)
+
+Wait what!? We still get a **~14 KB** executable after all that work? Yep, and
+although we have optimised the main object code for our example, we have not yet
+stripped out redundant `ELF` code sections which contribute a majority of the file
+size.
+
+# Custom Linker Script
+
+Although it is possible to strip some redundant sections from an `ELF` binary
+using `strip`, it is much more efficient to use a custom linker script.
+
+A linker script specifies precisely which `ELF` sections to include in the output
+binary, which means we can eliminate *almost* all redundancy. Care, however,
+must be taken to ensure that essential segments such as `.text`, `.data`,
+`.rodata*` are not discarded during linking to avoid a segmentation fault.
+
+The linker script that I came up with is shown below (`x86_64.ld`):
+
+```
+OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
+ "elf64-x86-64")
+OUTPUT_ARCH(i386:x86-64)
+ENTRY(_start)
+
+SECTIONS
+{
+ . = 0x400000 + SIZEOF_HEADERS;
+ .text : { *(.text) *(.data*) *(.rodata*) *(.bss*) }
+}
+```
+
+The linker script sets the virtual base address of the output binary to `0x400000`
+and retains only the essential code segments.
+
+Custom linker scripts are parsed to `GCC` with the `-T` switch and the resulting
+binary was compiled with: `gcc -T x86_64.ld base.c boot.s -nostdlib -o base`.
+
+This produced an output executable of around **~2.7 KB**.
+
+This is much better, but there is still some room for improvement using
+additional `GCC` compile time switches.
+
+# GCC Flags
+
+We have thus far managed to shrink our executable size down to **~2.7 KB** from our
+initial file size of **~17 KB** by stripping redundant section data using a custom
+linker script and removing standard library object code.
+
+However, `GCC` has several compile time flags that can further help in removing
+unwanted code sections, these include:
+
+| flag | description |
+|----------------------|---------------------------------------|
+| -ffunction-sections | place each function into own section |
+| -fdata-sections | place each data item into own section |
+| -Wl,\--gc-sections | strip unused sections (linker) |
+| -fno-unwind-tables | remove unwind tables |
+| -Wl,\--build-id=none | remove build-id section |
+| -Qn | remove .ident directives |
+| -Os | optimize code for size |
+| -s | strip all sections |
+
+Compiling our example again with: `gcc -T x86_64.ld base.c boot.s -nostdlib -o
+base -ffunction-sections -fdata-sections -Wl,--gc-sections -fno-unwind-tables
+-Wl,--build-id=none -Qn -Os -s`.
+
+This produces an output executable with a size of **~1.5 KB** but we can still go
+further!
+
+Additionally, you can include the `-static` switch to ensure a static binary.
+This results in an output executable of **~640 bytes**.
+
+# SSTRIP
+
+Despite all our optimisation thus far, there are still a few redundant code and
+data sections in our dynamically linked output executable. Enter `sstrip`...
+
+[sstrip](https://github.com/aunali1/super-strip) is a useful utility that
+attempts to identify which sections of an `ELF` binary are to be loaded into
+memory during program execution. Based off this, all unused code and data
+sections are then subsequently removed. It is comparable to `strip` but performs
+section removal more aggressively.
+
+Running `./sstrip base` we get our final executable binary with a size of **~830
+bytes** !
+
+At this point it would probably be best to switch to assembly to get smaller
+file sizes, however the goal of this `journal` was to create small executables
+written in `C` and I think we've done quite well to reduce in size from **~17 KB**
+down to **~830 bytes**!
+
+![enter image description here](https://journal.lunar.sh/images/2/08.png)
+
+As a final comment you might be wondering if we could have simply run `sstrip`
+from our **17 KB** executable in the first place and the answer would be, no.
+
+I tried doing this and ended up with a binary image of around **~12 KB** so it seems
+the sstrip needs a bit of additional assistance in the form our our manual
+optimisations to get really `tiny` binaries!
+
+# Source Code
+
+Source code used in this `journal` is available at:
+[https://github.com/lunar-rf/tinybase](https://github.com/lunar-rf/tinybase)
+
+# Signature
+
+```
++---------------------------------------+
+| .-. .-. .-. |
+| / \ / \ / \ |
+| / \ / \ / \ / |
+| \ / \ / \ / |
+| "_" "_" "_" |
+| |
+| _ _ _ _ _ _ ___ ___ _ _ |
+| | | | | | | \| | /_\ | _ \ / __| || | |
+| | |_| |_| | .` |/ _ \| /_\__ \ __ | |
+| |____\___/|_|\_/_/ \_\_|_(_)___/_||_| |
+| |
+| |
+| Lunar RF Labs |
+| https://lunar.sh |
+| |
+| Research Laboratories |
+| Copyright (C) 2022-2025 |
+| |
++---------------------------------------+
+```
+