diff options
author | dmlunar <root@lunar.sh> | 2025-03-14 13:56:27 +0200 |
---|---|---|
committer | dmlunar <root@lunar.sh> | 2025-08-08 16:16:49 +0200 |
commit | 994b347557ccf03af0cd910d8ba50d127b7a61dd (patch) | |
tree | 9ece6c3e1b6bf5477737df137df0536b0c8e9559 /_posts | |
download | journal.lunar.sh-main.tar.gz journal.lunar.sh-main.zip |
initial commit.
Diffstat (limited to '_posts')
-rw-r--r-- | _posts/2020-10-24-tiny-linux-c-binaries.md | 389 | ||||
-rw-r--r-- | _posts/2020-11-03-transistor-design-for-newbies.md | 472 | ||||
-rw-r--r-- | _posts/2020-12-11-mono-dot-net-injection.md | 330 | ||||
-rw-r--r-- | _posts/2022-12-31-linux-detours.md | 601 | ||||
-rw-r--r-- | _posts/2023-09-17-gsctool.md | 537 | ||||
-rw-r--r-- | _posts/2024-03-20-rf-primer.md | 753 |
6 files changed, 3082 insertions, 0 deletions
diff --git a/_posts/2020-10-24-tiny-linux-c-binaries.md b/_posts/2020-10-24-tiny-linux-c-binaries.md new file mode 100644 index 0000000..ce90f48 --- /dev/null +++ b/_posts/2020-10-24-tiny-linux-c-binaries.md @@ -0,0 +1,389 @@ +--- +layout: post +title: Tiny C Binaries +author: Dylan Müller +--- + +> By default, following the linking stage, `GCC` generates `ELF` binaries that contain +> redundant section data that increase executable size. + +1. [ELF Binaries](#elf-binaries) +2. [Size Optimisation](#size-optimisation) +3. [Linux Syscalls](#linux-syscalls) +4. [Custom Linker Script](#custom-linker-script) +5. [GCC flags](#gcc-flags) +6. [SSTRIP](#sstrip) +7. [Source Code](#source-code) + +# ELF Binaries + +The standard file format for executable object code on Linux is `ELF` (Executable +and Linkable Format), it is the successor to the older `COFF` `UNIX` file format. + +`ELF` Binaries consist of two sections, the `ELF` header and file data (object +code). The `ELF` header format for `64-bit` binaries is shown in the table below: + +| Offset | Field | Description | Value | +|--------|------------------------|----------------------------------------|---------------------------------------------------------------------------------------| +| 0x00 | e_ident[EI_MAG0] | magic number | 0x7F | +| 0x04 | e_ident[EI_CLASS] | 32/64-bit | 0x2 = 64-bit | +| 0x05 | e_ident[EI_DATA] | endianness | 0x1 = little<br>0x2 = big | +| 0x06 | e_ident[EI_VERSION] | elf version | 0x1 = original | +| 0x07 | e_ident[EI_OSABI] | system ABI | 0x00 = System V<br>0x02 = NetBSD<br>0x03 = Linux<br>0x09 = FreeBSD<br> | +| 0x08 | e_ident[EI_ABIVERSION] | ABI Version | * ignored for static-linked binaries<br>* vendor specific for dynamic-linked binaries | +| 0x09 | e_ident[EI_PAD] | undefined | * padded with zeros | +| 0x10 | e_type | object type | 0x00 = ET_NONE<br>0x01 = ET_REL<br>0x02 = ET_EXEC<br>0x03 = ET_DYN<br>0x04 = ET_CORE | +| 0x12 | e_machine | system ISA | 0x3E = amd64<br>0xB7 = ARM (v8/64) | +| 0x14 | e_version | elf version | 0x1 = original | +| 0x18 | e_entry | entry point | 64-bit entry point address | +| 0x20 | e_phoff | header table offset | 64-bit program header table offset | +| 0x28 | e_shoff | section table offset | 64-bit section header table offset | +| 0x30 | e_flags | undefined | vendor specific or pad with zeros | +| 0x34 | e_ehsize | elf header size | 0x40 = 64bits, 0x20 = 32bits | +| 0x36 | e_phentsize | header table size | - | +| 0x38 | e_phnum | #(num) entries in header table | - | +| 0x3A | e_shentsize | section table size | - | +| 0x3C | e_shnum | #(num) entries in section table | - | +| 0x3E | e_shstrndx | section names index into section table | - | +| 0x40 | | | End of 64-bit ELF | + +These data fields are used by the Linux `PL` (program loader) to resolve the entry +point for code execution along with various fields such as the `ABI` version, `ISA` +type, as well as section listings. + +A sample hello world program is shown below and was compiled with `GCC` using `gcc +main.c -o example`. + +``` +#include <stdio.h> + +int main(int agrc, char *argv[]){ + printf("Hello, World!"); + return 0; +} +``` + +This produced an output executable of almost **~17 KB** ! If you've ever +programmed in assembly you might be surprised at the rather large file size for +such a simple program. + +`GNU-binutils` `objdump` allows us to inspect the full list of `ELF` sections with +the `-h` flag. + +After running `objdump -h example` on our sample binary we see that there are a +large number of `GCC` derived sections: `.gnu.version` and `.note.gnu.property` +attached to the binary image. The question becomes how much data these +additional sections are consuming and to what degree can we 'strip' out +redundant data. + + + +`GNU-binutils` comes with a handy utility called `strip`, which attempts to remove +unused `ELF` sections from a binary. Running `strip -s example` results only in a +slightly reduced file of around **~14.5 KB**. Clearly, we need to strip much +more! :open_mouth: + +# Size Optimisation + +`GCC` contains a large number of optimisation flags, these include the common : +`-O2 -O3 -Os` flags as well as many more less widely used compile time options, +which we will explore further. However, since we have not yet compiled with any +optimisation thus far, and as a first step we recompile the above example with +`-Os`, to optimise for size. + + + +And we see no decrease in size! This is expected behaviour however, since the +`-Os` flag does not consider all redundant section data for removal, on the +contrary the additional section information placed by `GCC` in the output binary +is considered useful at this level of optimisation. + +In addition, the use of `printf` binds object code from the standard library +into the final output executable and so we will instead call through to the +Linux kernel directly to print to the standard output stream. + +# Linux syscalls + +System calls on Linux are invoked with the `x86_64` `syscall` opcode and syscall +parameters follow a very specific order on `64-bit` architectures. For `x86_64` +([System V ABI - Section +A.2.1](https://refspecs.linuxfoundation.org/elf/x86_64-abi-0.99.pdf)), the order +of parameters for linux system calls is as follows: + +| description | register (64-bit) | +|----------------|----------| +| syscall number | rax | +| arg 1 | rdi | +| arg 2 | rsi | +| arg 3 | rdx | +| arg 4 | r10 | +| arg 5 | r8 | +| arg 6 | r9 | + + +Arguments at user mode level (`__cdecl` calling convention), however, are parsed in +the following order: + +| description | register (64-bit) | +|-------------|-----| +| arg 1 | rdi | +| arg 2 | rsi | +| arg 3 | rdx | +| arg 4 | rcx | +| arg 5 | r8 | +| arg 6 | r9 | + +To call through to the linux kernel from `C`, an assembly wrapper was required to +translate user mode arguments (`C` formal parameters) into kernel `syscall` +arguments: + +``` +syscall: + mov rax,rdi + mov rdi,rsi + mov rsi,rdx + mov rdx,rcx + mov r10,r8 + mov r8,r9 + syscall + ret +``` + +We may then make a call to this assembly routine from `C` using the following +function signature: + +``` +void* syscall( + void* syscall_number, + void* param1, + void* param2, + void* param3, + void* param4, + void* param5 +); +``` + +To write to the standard output stream we invoke syscall `0x1`, which handles +file output. A useful `x86_64` Linux syscall table can be found +[here](https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/). +Syscall `0x1` takes three arguments and has the following signature: + +`sys_write( unsigned int fd, const char *buf, size_t count)` + +A file called `base.c` was created, implementing both `syscall` and print wrappers: + +``` +// base.c +typedef unsigned long int uintptr; +typedef long int intptr; + +void* syscall( + void* syscall_number, + void* param1, + void* param2, + void* param3, + void* param4, + void* param5 +); + +static intptr print(void const* data, uintptr nbytes) +{ + return (intptr) + syscall( + (void*)1, /* sys_write */ + (void*)(intptr)1, /* STD_OUT */ + (void*)data, + (void*)nbytes, + 0, + 0 + ); +} + +int main(int agrc, char *argv[]){ + print("Hello, World", 12) + return 0; +} +``` + +In order to instruct `GCC` to prevent linking in standard library object code, the +`-nostdlib` flag should be passed at compile time. There is one caveat however, +in that certain symbols, such as `_start` , which handle program startup and the +parsing of the command line arguments to `main` , will be left up to us to +implement, otherwise we will segfault :-/ + +However, this is quite trivial and luckily program initialisation is well +defined by -- [System V ABI - Section +3.4](https://refspecs.linuxfoundation.org/elf/x86_64-abi-0.99.pdf). + +Initially it is specified that register `rsp` hold the argument count, while the +address given by `rsp+0x8` hold an array of `64-bit` pointers to the argument +strings. + +From here the argument count and string pointer array index can be passed to +`rdi` and `rsi` respectively, the first two parameters of `main()` . Upon exit, +a call to syscall `0x3c` is then made to handle program termination gracefully. + +Both the syscall and program startup assembly wrappers (written in GAS) were +placed in a file called `boot.s`: + +``` +/* boot.s */ +.intel_syntax noprefix +.text +.globl _start, syscall + +_start: + xor rbp,rbp /* rbp = 0 */ + pop rdi /* rdi = argc, rsp= rsp + 8 */ + mov rsi,rsp /* rsi = char *ptr[] */ + and rsp,-16 /* align rsp to 16 bytes */ + call main + mov rdi,rax /* rax = main return value */ + mov rax,60 /* syscall= 0x3c (exit) */ + syscall + ret + +syscall: + mov rax,rdi + mov rdi,rsi + mov rsi,rdx + mov rdx,rcx + mov r10,r8 + mov r8,r9 + syscall + ret +``` + +Finally gcc was invoked with `gcc base.c boot.s -nostdlib -o base`. + + + +Wait what!? We still get a **~14 KB** executable after all that work? Yep, and +although we have optimised the main object code for our example, we have not yet +stripped out redundant `ELF` code sections which contribute a majority of the file +size. + +# Custom Linker Script + +Although it is possible to strip some redundant sections from an `ELF` binary +using `strip`, it is much more efficient to use a custom linker script. + +A linker script specifies precisely which `ELF` sections to include in the output +binary, which means we can eliminate *almost* all redundancy. Care, however, +must be taken to ensure that essential segments such as `.text`, `.data`, +`.rodata*` are not discarded during linking to avoid a segmentation fault. + +The linker script that I came up with is shown below (`x86_64.ld`): + +``` +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) + +SECTIONS +{ + . = 0x400000 + SIZEOF_HEADERS; + .text : { *(.text) *(.data*) *(.rodata*) *(.bss*) } +} +``` + +The linker script sets the virtual base address of the output binary to `0x400000` +and retains only the essential code segments. + +Custom linker scripts are parsed to `GCC` with the `-T` switch and the resulting +binary was compiled with: `gcc -T x86_64.ld base.c boot.s -nostdlib -o base`. + +This produced an output executable of around **~2.7 KB**. + +This is much better, but there is still some room for improvement using +additional `GCC` compile time switches. + +# GCC Flags + +We have thus far managed to shrink our executable size down to **~2.7 KB** from our +initial file size of **~17 KB** by stripping redundant section data using a custom +linker script and removing standard library object code. + +However, `GCC` has several compile time flags that can further help in removing +unwanted code sections, these include: + +| flag | description | +|----------------------|---------------------------------------| +| -ffunction-sections | place each function into own section | +| -fdata-sections | place each data item into own section | +| -Wl,\--gc-sections | strip unused sections (linker) | +| -fno-unwind-tables | remove unwind tables | +| -Wl,\--build-id=none | remove build-id section | +| -Qn | remove .ident directives | +| -Os | optimize code for size | +| -s | strip all sections | + +Compiling our example again with: `gcc -T x86_64.ld base.c boot.s -nostdlib -o +base -ffunction-sections -fdata-sections -Wl,--gc-sections -fno-unwind-tables +-Wl,--build-id=none -Qn -Os -s`. + +This produces an output executable with a size of **~1.5 KB** but we can still go +further! + +Additionally, you can include the `-static` switch to ensure a static binary. +This results in an output executable of **~640 bytes**. + +# SSTRIP + +Despite all our optimisation thus far, there are still a few redundant code and +data sections in our dynamically linked output executable. Enter `sstrip`... + +[sstrip](https://github.com/aunali1/super-strip) is a useful utility that +attempts to identify which sections of an `ELF` binary are to be loaded into +memory during program execution. Based off this, all unused code and data +sections are then subsequently removed. It is comparable to `strip` but performs +section removal more aggressively. + +Running `./sstrip base` we get our final executable binary with a size of **~830 +bytes** ! + +At this point it would probably be best to switch to assembly to get smaller +file sizes, however the goal of this `journal` was to create small executables +written in `C` and I think we've done quite well to reduce in size from **~17 KB** +down to **~830 bytes**! + + + +As a final comment you might be wondering if we could have simply run `sstrip` +from our **17 KB** executable in the first place and the answer would be, no. + +I tried doing this and ended up with a binary image of around **~12 KB** so it seems +the sstrip needs a bit of additional assistance in the form our our manual +optimisations to get really `tiny` binaries! + +# Source Code + +Source code used in this `journal` is available at: +[https://github.com/lunar-rf/tinybase](https://github.com/lunar-rf/tinybase) + +# Signature + +``` ++---------------------------------------+ +| .-. .-. .-. | +| / \ / \ / \ | +| / \ / \ / \ / | +| \ / \ / \ / | +| "_" "_" "_" | +| | +| _ _ _ _ _ _ ___ ___ _ _ | +| | | | | | | \| | /_\ | _ \ / __| || | | +| | |_| |_| | .` |/ _ \| /_\__ \ __ | | +| |____\___/|_|\_/_/ \_\_|_(_)___/_||_| | +| | +| | +| Lunar RF Labs | +| https://lunar.sh | +| | +| Research Laboratories | +| Copyright (C) 2022-2025 | +| | ++---------------------------------------+ +``` + diff --git a/_posts/2020-11-03-transistor-design-for-newbies.md b/_posts/2020-11-03-transistor-design-for-newbies.md new file mode 100644 index 0000000..d473fed --- /dev/null +++ b/_posts/2020-11-03-transistor-design-for-newbies.md @@ -0,0 +1,472 @@ +--- +layout: post +title: Transistor Circuit Design For Newbies +author: Dylan Müller +--- + +> `BJTs` are important electronic devices that find use in a wide range of +> applications. Learn how to design circuits with them. + +1. [Principle of operation](#principle-of-operation) +3. [Transistor as a switch](#transistor-as-a-switch) +4. [Transistor as an amplifier](#transistor-as-an-amplifier) +5. [LTSpice](#ltspice) + +# Principle of Operation + +There are various analogies that you will most likely come across when first +learning about transistors, a useful analogy is that of a mechanically +controlled water valve. + +{:width="500px"} + +Here it is important to reference the water analogy of current and voltage. In +the water analogy we picture a column of water moving through a pipe. + +We define current as the movement of water (`charge`) through the pipe (wire), or +in mathematical terms the rate of flow of water (`charge`) past a given point with +respect to time: + +$$ i=\frac{dC}{dt} $$ + +Voltage is analogous to the pressure differential between two points. For +example, suppose we suspend water in a pipe and then apply a high pressure at +the top and a lower pressure at the bottom. We have just set up a 'water +potential difference' between two points and this tends to move water (`charge`) +from the higher pressure region (voltage) to the lower pressure region. + +The higher the water potential, the faster the column of water (`charge`) moves +through the pipe when it has the chance. + +In reality, voltage arises due to the presence of electric fields. For a given +electric field between two points, a positive test charge may be placed at any +distance along the electric field lines, that is, its 'field potential' varies +and a positive charge placed closer to the positive end of the electric field +feels more repulsion (and therefore has a higher potential to do work) than at +the negative end of the field. + +Potential difference (voltage\) is just a differential measure of this electric +'field potential' or put differently, the capacity of charge to do work in the +presence of an `electric` field: + +$$ V_{f} - V_{i} = -\int \overrightarrow{E} \cdot \overrightarrow{d}s $$ + +With this in mind the idea of a water valve then makes sense. The valve consists +of three ports, one attached to one end of the pipe, the other port to the end +section of the pipe and then the valve itself, sitting in the middle and +regulating the flow of water between both ends. + +By rotating the valve we adjust the water flow rate (current) through the pipe. +This is the basic principle of operation of a transistor. However rather than +applying a mechanical torque, we apply a potential difference at the base to +regulate current flow. + +You may think of the degree to which the mechanical valve is open or closed as +proportional to the voltage applied at the base of the transistor. This means +that we can control a potentially larger current through the transistor using a +smaller current through the base (through the application of a base voltage), +this is one of the useful properties of transistors. + + + +Bipolar Junction Transistors (`BJTs`) usually consists of three semiconductor +layers which can be of two types: `n` or `p`. The individual `silicon` layers are +crystalline structures that have what are known as dopants added to them. These +are individual elements (`phosphorus`, `boron`) added to neutral `silicon` (and +replace the corresponding `silicon` atoms) in order to change the electrical +properties of the layer. + + + +For example, `boron` `[B]` dopant has a valency (number of outer electrons) of `3`, +while `silicon` has a valency of `4`. This means that when `boron` and `silicon` bond +covalently (sharing of each others electrons) there is a mismatch (`3` < `4`) +between their valence electrons, leaving a 'hole', which needs to be filled with +an electron in order to match `silicon's` valency. This results in a crystal +structure with a net positive charge, the `p` type layer. + +In contrast `phosphorus` `[P]` dopant has a valency of `5`, again there is a mismatch +(`5` > `4`) with `silicon's` valency (`4`), allowing for the extra electron of +`phosphorus` to move freely through the crystal structure and giving the overall +crystal layer a negative polarity, the `n` type layer. + +{:height="200px"} + +If we were to place an `n` region and `p` region together we would form an +electronic device known as a diode. A diode is a `2` terminal device (with the `n` +side connected to the negative terminal (`cathode`) and `p` side connected to the +positive terminal (`anode`) that only allows current flow in one direction. + +It is also worth nothing that by placing an `n` and `p` region next to one another +there is a localised effect at their layer boundary that results in a small +number of electrons (from the `n` type region) migrating to the `p` type region in +what is known as the depletion region. + +{:height="300px"} + +The migration of electrons from the n type region to the `p` type region at the `np` +boundary sets up what is known as a barrier potential, a secondary electric +field at the np layer boundary in opposition to the primary `E-field` (between `p` +and `n`). + +This is the amount of voltage (`pressure`) required to force `n` layer electrons +through the `np` barrier (the secondary `E-field`) where they can flow into the +positive terminal (`anode`) of the diode. + +It is equivalent to having a water valve initially shut tight and requiring a +torque in order to get water flowing. A typical value for the barrier potential +of garden variety diodes is between `0.3v-0.7v`. + + + +A bipolar junction transistor (`BJT`) may be viewed as a combination of two diodes +(shown below for an `NPN` transistor): + + + +An `NPN` `BJT` transistor has two current paths, one from the collector to emitter +and the other from the base to emitter. The current flow from collector to +emitter represents the water flow in the pipe containing the valve, while the +current flow from base to emitter represents the degree to which the valve is +open or closed. + +You might be wondering why conventional (positive) current flows backwards +through the `base-collector` diode (from collector to emitter) for an `NPN` +transistor. As it turns out, current can actually flow in multiple directions +through a diode. However it takes much more voltage to 'push' charge through a +diode in the direction it's meant to block than in the direction it is meant to +flow. + +The ratio of `base-emitter` current to `collector-emitter` current is known as ($$\beta$$) +and is an important consideration in the design of circuits using transistors: + +$$ I_{c} = \beta I_{B} $$ + +Both transistor current paths have an associated voltage drop/potential +difference across them. + +For the current flow from base to emitter, there is the `base-emitter` voltage +drop $$V_{BE}$$ and from collector to emitter there is the `collector-emitter` +voltage drop $$V_{CE}$$ : + +{:height="200px"} + +The values of $$V_{CE}$$, $$V_{BE}$$ and $$V_{CB}$$ have predictable +values for the three modes of operation of a transistor, these are: + +* **Cut-off** (The transistor acts as an open circuit; valve closed). + $$V_{BE}$$ < `0.7V` +* **Saturation** (The transistor acts as a short circuit; valve completely open). + $$V_{BE}$$ >= `0.7V` +* **Active** (The transistor acts as an amplifier; valve varies between closed + and completely open). + +# Transistor as a switch + +When using a transistor as a switch we place the transistor into one of two +states: cut-off or saturation. + +The following switching circuit is usually employed (with an `NPN` `BJT`) (shown +together with an `LED`): + +{:height="300px"} + + +The circuit is seen consisting of a base current limiting resistor $$R_{B}$$ +as well as a `collector-emitter` current limiting resistor $$R_{LIM}$$. + +$$R_{B}$$ serves to set up the correct base current, while $$R_{LIM}$$ +serves to limit the maximum current through the `LED` (shown in red) when the +transistor is switched fully on (driven into saturation). + +To calculate the values for resistors $$R_{B}$$ and $$R_{LIM}$$ we use +the equation relating base current to collector current defined earlier: + +$$ I_{c} = \beta I_{B} $$ + +The first question becomes what collector current $$I_{C}$$ we desire. This +value depends on the device/load you are trying to switch on/off. It is worth +noting that when a transistor is switched fully on (is in saturation mode) the +equivalent circuit (simplified) is as follows (shown without the `LED`, you can +assume the `LED` follows resistor $$R_{C}$$): + +{:width="450px"} + +Thus at the collector a direct connection to ground is made. However this +connection is not perfect and there is an associated voltage drop from collector +to emitter of typically around `0.2v` ($$V_{CE}$$) rather than `0v`. Determining +the relevant value for $$I_{C}$$ is then just a matter how much current your +load (`LED`in our case) requires. + +For example, a typical green led requires around `15mA` of current to light up +brightly so we set $$I_{C}$$ = `15mA`. A green `LED` also typically has a `2v` +drop across it. To calculate $$R_{LIM}$$ we use ohms law: + +$$ R_{LIM} = \frac{V_{CC} - V_{LED} - V_{CE}}{I_{DESIRED}} $$ + +Given the `LED` and collector to emitter voltage drops of `2v` and `0.2v` +respectively, we can further reduce the above expression above to: + +$$ R_{LIM} = \frac{V_{CC} - 2 - 0.2}{15 \cdot 10^{-3}} $$ + +Choosing $$V_{CC}$$ is just a matter of what you have at hand. For example, +a `5v` or `9v` supply would be adequate to drive the transistor into saturation as +long as $$V_{CC} > $$ `0.7v` (due to the base emitter voltage drop) and $$V_{CC} >$$ +`2v` (for the led). + +Assume $$V_{CC}$$ = `5v`, then $$R_{LIM}$$ = `186.7` $$\Omega$$ + +In calculating the required base current, we use the transistor's $$\beta$$ value. This +can be found on the transistors datasheet and typically varies from anywhere +between `20` to `200`. The rule of thumb is to use the minimum value of $$\beta$$ for a +specific transistor type. For the standard garden variety `2N2222` transistor, the +minimum value of $$\beta$$ is around `75`. Therefore to calculate $$I_{B}$$, we have: + +$$ I_{B} = \frac{I_{C} \cdot SF}{\beta_{min}} = \frac{15mA \cdot 5}{75} = 1mA $$ + +You might have noticed an additional factor called `SF` for (safety factor). This +is a factor typically around `5-10` that we multiply our calculated $$I_{B}$$ +with in order to ensure we drive the transistor into saturation. This gives a +value of around `1mA` for $$I_{B}$$. + +Given $$I_{B}$$, calculating $$R_{B}$$ becomes trivial as we know the +voltage across $$R_{B}$$ as: $$V_{CC} - V_{BE}$$ (think of +$$V_{BE}$$ as a `0.7v` diode) and so we apply ohms law once again: + +$$ R_{B} = \frac{V_{CC} - V_{BE}}{I_{B}} = \frac{5-0.7}{1 \cdot 10^{-3}} = 4.3k\Omega $$ + +Now you can connect a switch between the base resistor and Vcc or connect the +base resistor directly to the output of a `5V-TTL` micro-controller in order to +turn the `LED` on and off! The benefit of using a transistor to do that is that we +require a relatively small current (`< 1mA`) in order to switch a much larger +current through the `LED` (`15mA`)! + +In conclusion: +1. Determine required collector current $$I_{C}$$. +2. Calculate $$R_{LIM}$$ (ohms law). +3. Calculate $$I_{B}$$ using lowest value for $$\beta$$. +4. Multiply $$I_{B}$$ by safety factor `5-10`. +5. Calculate $$R_{B}$$ (ohms law). + +The simple `LED` transistor circuit was modelled in `LTSpice`, with the `LED` +represented as a series voltage source (representing the `2v` voltage drop).: + +{:width="400px"} + + A simulation of the `DC` operating point of the circuit yielded: + +{:height="200px"} + +Here we can see the `~1mA` base current ($$I_{b}$$) driving `~15mA` collector +($$I_{C}$$) current. All current values are shown in `S.I` units of amperes +(`A`). + +# Transistor as an amplifier + +Here we operate the transistor in its active mode to achieve linear +amplification. Linear amplification means that our output should be a +proportional scaling of our input. For example if we feed in a sine wave we +should ideally get a scaled sine wave out, i.e with no distortion/clipping. + +There are various circuit configurations used to achieve amplification using +transistors, a useful 'template' is known as common emitter configuration (shown +below with an `NPN` transistor): + + {:width="600px"} + +Here we model a `20 mVp` (20mV amplitude) sinusoidal signal source with a +resistance of `50` $$\Omega$$, but your input can be practically anything. + +It should be noted that there are two electrical 'components' of the above +circuit, these are `AC` (the fluctuating component) and `DC` (the static component). + +When analysing a circuit from a `DC` perspective there are a few rules to follow: +* Capacitors become open circuits. +* Inductors become closed circuits. + +This means that at the base of `Q1`, `C3` becomes an open connection, i.e the base +of the transistor cannot see signal source `V2` or the `50` $$\Omega$$. resistor. +Additionally, capacitor `C1` becomes an open circuit and therefore has no effect +(it's as if all the capacitors weren't there in the first place). + +Capacitor `C3` is known as a `DC` blocking capacitor and is used to remove the `DC` +component of the input signal at the feed point (base of `Q1`). All signals have a +`DC` component: + +{:height="300px"} + +Effectively `C3` serves to isolate the fluctuating (`AC`) component from the net +signal, that is, we need a signal that moves along the line `y = 0`. + +Capacitor `C2` is also a `DC` blocking capacitor and also serves to remove any `DC` +offset at the output of the amplifier. + +The role of capacitor `C1` is a bit more involved and requires and understanding +of `AC` circuit analysis, specifically the `AC` signal gain/amplification +$$A_{v}$$ which, for common emitter configuration, is given by: + +$$ A_{v} = \frac{z_{out}}{r'e + R_{e}} $$ + +Here $$z_{out}$$ represents the output impedance of the common-emitter +amplifier which is given by the parallel combination of $$R_{c}$$ and your +load resistance, $$R_{L}$$ (connected to `C2`). + +$$ z_{out} = \frac{R_{c} \cdot R_{L}}{R_{c} + R_{L}} $$ + +From an `AC` perspective: +* Capacitors become short circuits. +* Inductors become open circuits. +* Voltage sources become grounds. + +The term $$r'e$$ is known as the transistor's `AC` base-emitter junction resistance +and is given by: + +$$ r'e = \frac{25mV}{I_{E}} $$ + +The introduction of capacitor `C1` nulls out the term $$R_{e}$$ from the +expression for $$A_{v}$$. This is typically done to achieve higher values +of $$A_{v}$$ than would otherwise be possible if resistor $$R_{e}$$ was +still present. For lower, more controlled values of $$A_{v}$$, resistor +$$R_{e}$$ should not be bypassed by capacitor `C1`. + +The first step in the design of the amplifier is choosing $$R_{c}$$ such that +$$z_{out}$$ isn't affected by changes in $$R_{L}$$. For example, for a +large value of $$R_{L}$$ choose $$R_{c} \ll R_{L}$$. + +For the purposes of our example we assume $$R_{L}$$ = `100` $$k\Omega$$. We then choose +$$R_{c}$$ = `5` $$k\Omega$$ + +Next we determine the maximum `AC` gain possible given a fixed $$z_{out}$$ : + +$$ A_{v} = \frac{0.7(\frac{V_{CC}}{2})}{0.025} $$ + +It is usually good practice to give `30%` of $$\frac{V_{CC}}{2}$$ to $$R_{e}$$ and `70%` to $$R_{c}$$. Higher +ratios of $$V_{CC}(R_{e})$$ to $$V_{CC}(R_{c})$$ might lead to higher `AC` gain ($$A_{v}$$) but +could sacrifice operational stability as a result. + +Given $$V_{CC}$$ = `5V`, we get $$A_{v}$$ = `70`. This is the highest +expected voltage gain for this amplifier. + +We know that: + +$$ I_{E} \approx I_{C} \approx \frac{0.025 A_{v}}{z_{out}} $$ + +Thus, given $$A_{v}$$ = `70`, $$z_{out}$$ = `5` $$k\Omega$$ we have $$I_{E}$$ = +`0.35mA`. We are now able to calculate $$R_{e}$$ : + +$$ R_{e} = \frac{0.3(\frac{V_{CC}}{2})}{I_{E}} $$ + +For $$V_{CC}$$ = `5V`, $$I_{E}$$ = `0.35mA` we get $$R_{e} \approx$$ `2.1` $$k\Omega$$. + +A useful parameter for common emitter configuration is the `AC` input impedance +(looking in from `C3`) and is given by: + +$$ z_{in} = (\frac{1}{R_{1}} + \frac{1}{R_{2}} + \frac{1}{R_{base}})^{-1} $$ + +Here $$R_{base}$$ represents the AC input impedance of transistor `Q1` +(looking into the base): + +$$ R_{base} = \beta \cdot r'e $$ + +We know how to calculate `r'e` from earlier and we use the minimum value of $$\beta$$ (`75` +for `2N2222`) to calculate $$R_{base}$$ : + +$$ R_{base} = 75 \cdot \frac{25}{0.35} $$ + +Thus $$R_{base}$$ = `5.4` $$k\Omega$$ + +Returning to our `DC` analysis, we calculate the expected voltage at the +transistor base: + +$$ V_{B} = V_{Re} + 0.7 $$ + +We know that $$V_{Re}$$ is `30%` of $$\frac{V_{CC}}{2}$$, which gives $$V_{B}$$ = `1.45V`. +Now given $$I_{E}$$ = `0.35mA` we can again use our minimum value for $$\beta$$ to +calculate our required base current: + +$$ I_{B} = \frac{0.35 mA}{75} $$ + +Thus $$I_{B}$$ = `4.57uA` + +At this point we need to ensure that small changes in the value of base current +(which occur due to variations in $$\beta$$) do not significantly effect the `DC` +operating point of the amplifier circuit. + +In order to ensure a stable operating point we 'stiffen' the voltage divider by +ensuring the only a small fraction of the total resistor divider current flows +into the base of transistor `Q1`. + +A good rule of thumb is to allow for `1%` of the total divider current to pass +into the base of the transistor. + +$$ \frac{1}{100} \cdot I_{R_{1}} = 4.57uA $$ + +We can therefore assume that $$I_{R1} \approx I_{R2}$$ and solving the +above expression yields $$I_{R2}$$ = `0.456mA`. Since we know the voltage +across $$R_{2}$$ (given by $$V_{B}$$) we can calculate the resistance +value: + +$$ R_{2} = \frac{1.45}{0.99(0.456 \cdot 10^{-3})} $$ + +This gives $$ R_{2} \approx$$ `3.2` $$k\Omega $$. Finally we calculate the value of +$$R_{1}$$ : + +$$ R_{1} = \frac{5-1.45}{0.456 \cdot 10^{-3}} $$ + +$$ R_{1} \approx $$ `7.8` $$ k\Omega $$ + +The values of capacitors `C3`, `C2` and `C1` are chosen such that the capacitive +reactance (resistance at `AC`) at the desired signal frequency is minimal. + +Capacitive reactance is given by: + +$$ X_{C} = \frac{1}{2\pi fC} $$ + +Now that we have all the required component values, we build the circuit in +`LTSpice`: + + + +A simulation of the `DC` operating point was performed: + +{:width="500px"} + +Here we can see our expected $$V_{base}$$ of around `1.45V` and an emitter +current of around `0.38mA` (instead of `0.35mA`), not too bad! Let's measure the +voltage gain (with the signal source set to a peak amplitude of `1mV` and a `100K` +$$\Omega$$ load attached): + +{:width="500px"} + +Our output across our load is seen reaching an amplitude of `70mV` and so we have +a voltage gain of `~70`. + +# LTSpice + +You can download `LTSpice` from +[https://www.analog.com/en/design-center/design-tools-and-calculators/ltspice-simulator.html](https://www.analog.com/en/design-center/design-tools-and-calculators/ltspice-simulator.html) + +# Signature + +``` ++---------------------------------------+ +| .-. .-. .-. | +| / \ / \ / \ | +| / \ / \ / \ / | +| \ / \ / \ / | +| "_" "_" "_" | +| | +| _ _ _ _ _ _ ___ ___ _ _ | +| | | | | | | \| | /_\ | _ \ / __| || | | +| | |_| |_| | .` |/ _ \| /_\__ \ __ | | +| |____\___/|_|\_/_/ \_\_|_(_)___/_||_| | +| | +| | +| Lunar RF Labs | +| https://lunar.sh | +| | +| Research Laboratories | +| Copyright (C) 2022-2025 | +| | ++---------------------------------------+ +``` diff --git a/_posts/2020-12-11-mono-dot-net-injection.md b/_posts/2020-12-11-mono-dot-net-injection.md new file mode 100644 index 0000000..bcde97b --- /dev/null +++ b/_posts/2020-12-11-mono-dot-net-injection.md @@ -0,0 +1,330 @@ +--- +layout: post +title: Mono/.NET Injection Under Linux +author: Dylan Müller +--- + +> Learning `mono` or `C#` library injection through a [Robocraft](https://robocraftgame.com/) exploit. The method used in +> this publication can be used to modify a wide range of Unity games. + +1. [Mono Overview](#mono-overview) +2. [Exploiting Robocraft](#exploiting-robocraft) +3. [Source Code](#source-code) + +# Mono Overview + +[Mono](https://en.wikipedia.org/wiki/Mono_%28software%29) is an open source port +of the `.NET` framework which runs on a variety of operating systems (including +Linux). + +The `mono` build chain compiles `C#` source code (`.cs` files) down to `IL` (immediate +language) spec'd byte code which is then executed by the `CLR` (Common Language +Runtime) layer provided by `mono`. + +Due to the translation down to `IL`, module decompilation as well as +modification/reverse engineering is relatively straightforward and a variety of +`C#` `IL` decompilers/recompilers already exist +([dnSpy](https://github.com/dnSpy/dnSpy/), +[ILSpy](https://github.com/icsharpcode/ILSpy)). + +The focus of this journal is on managed library injection, more specifically the +ability to inject `C#` code of our own and interact with/modify a target host. + +# Exploiting Robocraft + +[Robocraft](https://en.wikipedia.org/wiki/Robocraft) is an online `MMO` game +developed by `freejam` games. It features futuristic robotic battles and is an +example of an application we wish to tamper with. + + + +`Robocraft` uses the [Unity3D](https://unity3d.com/get-unity/download) engine, +which is a high level `C#` component based game engine. + +World entities in `Unity3D` derive from class `UnityEngine::GameObject` and may +have a number of components attached to them such as: rigidbodies, mesh +renderers, scripts, etc. + +`UnityEngine::GameObject` has many useful +[properties](https://docs.unity3d.com/ScriptReference/GameObject.html) such as a +name (string), tag, transform (position), etc. as well as static methods for +finding objects by name, tag, etc. These methods become useful when injecting +our own code as they provide a facility for interfacing with the game engine +from an external context (our `C#` script). + +Browsing the `Robocraft` root directory (installed via steam) revealed a few +directories that seemed interesting: + + - `Robocraft_Data` + - `lib64` + - `lib32` + - `EasyAntiCheat` + + + +Upon further inspection of the `Robocraft_Data` directory, we find the folders +containing the managed (`C#/mono`) portion of the application. In particular, the +Managed folder contains the `C#` libraries in `DLL` form of the `Unity3D` Engine as well +as other proprietary modules from the game developer. + + + +However at his point it's worth noting the presence of the `EasyAntiCheat` folder +in the root game directory which confirms the presence of an `anti-cheat` client. + +After some research I found out a few interesting details about the game's +`anti-cheat` client `EasyAntiCheat`: + + - The client computes hashes of all binary images during startup (including + managed libraries) and is cross-referenced to prevent modification to game + binaries. + - Uses a heartbeat mechanism to ensure presence of the `anti-cheat` client (To + mitigate `anti-cheat` removal). + - Works with an online service known as `RoboShield` to monitor server side + parameters such as position, velocity, damage, etc and assigns each user with + a trust score. The lower the score the higher the chance of getting kicked + from subsequent matches. This score seems to be persistent. + +Nonetheless, nothing seemed to prevent us from injecting our own `C#` library at +runtime and this was the vector employed with `Robocraft`. The advantage of this +method was that no modification to the game binaries would be required and +therefore any client side anti-tamper protection could be bypassed. + +In order to inject our own `C#` code we need to somehow force the client to load +our own `.NET/mono` library at runtime. This may be accomplished by a stager +payload which is essentially a shared library that makes internal calls to +`libmono.so`. + +Some interesting symbols found in `libmono.so` include: + + - `mono_get_root_domain` - get handle to primary domain. + - `mono_thread_attach` - attach to domain. + - `mono_assembly_open` - load assembly. + - `mono_assembly_get_image` - get assembly image. + - `mono_class_from_name` - get handle to class. + - `mono_class_get_method_from_name` - get handle to class method. + - `mono_runtime_invoke` - invoke class method. + +The function signatures for these symbols are shown below: + +``` +typedef void* (*mono_thread_attach)(void* domain); +typedef void* (*mono_get_root_domain)(); +typedef void* (*mono_assembly_open)(char* file, void* stat); +typedef void* (*mono_assembly_get_image)(void* assembly); +typedef void* (*mono_class_from_name)(void* image, char* namespacee, char* name); +typedef void* (*mono_class_get_method_from_name)(void* classs, char* name, DWORD param_count); +typedef void* (*mono_runtime_invoke)(void* method, void* instance, void* *params, void* exc); +``` + +In order to perform code injection, firstly a handle to the root application +domain must be retrieved using `mono_get_root_domain`. The primary application +thread must then be binded to the root domain using `mono_thread_attach` and the +assembly image loaded with `mono_assembly_open` and `mono_assembly_get_image`. + +Next the assembly class and class method to execute may be found by name using +`mono_class_from_name` and `mono_class_get_method_from_name`. + +Finally the class method may be executed using `mono_runtime_invoke`. It should +be noted that the class method to execute should be declared as static. + +The resulting stager payload is shown below: + +``` +#include <iostream> +#include <link.h> +#include <fstream> + +using namespace std; + +typedef unsigned long DWORD; + +typedef void* (*mono_thread_attach)(void* domain); +typedef void* (*mono_get_root_domain)(); +typedef void* (*mono_assembly_open)(char* file, void* stat); +typedef void* (*mono_assembly_get_image)(void* assembly); +typedef void* (*mono_class_from_name)(void* image, char* namespacee, char* name); +typedef void* (*mono_class_get_method_from_name)(void* classs, char* name, DWORD param_count); +typedef void* (*mono_runtime_invoke)(void* method, void* instance, void* *params, void* exc); + + +mono_get_root_domain do_mono_get_root_domain; +mono_assembly_open do_mono_assembly_open; +mono_assembly_get_image do_mono_assembly_get_image; +mono_class_from_name do_mono_class_from_name; +mono_class_get_method_from_name do_mono_class_get_method_from_name; +mono_runtime_invoke do_mono_runtime_invoke; +mono_thread_attach do_mono_thread_attach; + +int __attribute__((constructor)) init() +{ + void* library = dlopen("./Robocraft_Data/Mono/x86_64/libmono.so", RTLD_NOLOAD | RTLD_NOW); + + do_mono_thread_attach = (mono_thread_attach)(dlsym(library, "mono_thread_attach")); + do_mono_get_root_domain = (mono_get_root_domain)(dlsym(library, "mono_get_root_domain")); + do_mono_assembly_open = (mono_assembly_open)(dlsym(library, "mono_assembly_open")); + do_mono_assembly_get_image = (mono_assembly_get_image)(dlsym(library, "mono_assembly_get_image")); + do_mono_class_from_name = (mono_class_from_name)(dlsym(library, "mono_class_from_name")); + do_mono_class_get_method_from_name = (mono_class_get_method_from_name)(dlsym(library, "mono_class_get_method_from_name")); + do_mono_runtime_invoke = (mono_runtime_invoke)(dlsym(library, "mono_runtime_invoke")); + + + do_mono_thread_attach(do_mono_get_root_domain()); + void* assembly = do_mono_assembly_open("./Robocraft_Data/Managed/Client.dll", NULL); + + void* Image = do_mono_assembly_get_image(assembly); + void* MonoClass = do_mono_class_from_name(Image, "Test", "Test"); + void* MonoClassMethod = do_mono_class_get_method_from_name(MonoClass, "Load", 0); + + do_mono_runtime_invoke(MonoClassMethod, NULL, NULL, NULL); + + return 0; +} + +void __attribute__((destructor)) shutdown() +{ + +}; +``` + +The stager payload shown above loads the mono assembly located in +`<root>/Robocraft_Data/Managed/Client.dll` into memory and executes the class +method Load within the `namespace` `Test` and `class` `Test` (`Test::Test::Load`). + +Load has the following signature: `public static void Load()` The stager may +be compiled with: `gcc -fpic -shared stager.cpp -o stager.so`. + +In order to inject the stager into the target process you may use any standard +Linux shared library injector. + +With the capability of loading our own `mono` code into the target process, we +need to ensure that our injected `C#` code stays persistent, i.e to prevent +de-allocation due to garbage collection. + +For `Unity3D` this is typically achieved using the following pattern: + +``` + public class Exploit : MonoBehaviour + {...} + + public static class Test + { + private static GameObject loader; + public static void Load() + { + loader = new GameObject(); + loader.AddComponent<Exploit>(); + UnityEngine.Object.DontDestroyOnLoad(loader); + } + } +``` + +It is also worth keeping track of the `mono/.NET` assembly versions used in the +original application. Ideally you would want to use an identical `.NET` version as +compiling your `C#` exploit with the wrong `.NET` version can cause your exploit to +fail. + +For `Robocraft` `.NET` `v2.0` was required. Finding support for an older version of +`.NET` can be difficult as most modern `C#` `IDE's` do not support such an old target. +A simple solution to this problem is to download an older version of `mono`. + +At this point the second stage payload (our `C#` exploit) can be developed. I +chose to implement three simple functionalities: + + - Increase/decrease game speed. + +``` +if(Input.GetKeyDown(KeyCode.F2)){ + speedhack = !speedhack; + if(speedhack == true){ + Time.timeScale = 3; + }else{ + Time.timeScale = 1; + } + } +``` + + - Clip through walls/obstacles. + +``` +if(Input.GetKeyDown(KeyCode.F3)){ + collision = !collision; + GameObject obj = GameObject.Find("Player Machine Root"); + Rigidbody rb = obj.GetComponent<Rigidbody>(); + if(collision == true){ + rb.detectCollisions = false; + }else{ + rb.detectCollisions = true; + } + } +``` + + + + +- Place all network entites near player. + +``` +if(Input.GetKeyDown(KeyCode.F1)) +{ + salt = !salt; + GameObject obj = GameObject.Find("Player Machine Root"); + position = obj.transform.position; + + foreach(GameObject gameObj in GameObject.FindObjectsOfType<GameObject>()) + { + if(gameObj.name == "centerGameObject") + { + GameObject parent = gameObj.transform.parent.gameObject; + if(parent.name != "Player Machine Root"){ + MonoBehaviour[] comp = parent.GetComponents<MonoBehaviour>(); + foreach (MonoBehaviour c in comp){ + c.enabled = !salt; + + } + Vector3 myposition = position; + parent.transform.position = myposition; + + } + + } + } +} +``` + + + +In order to find the names of the game objects for the main player as well as +network players you can simply iterate through all the global game objects and +dump the corresponding names to a text file. + +# Source Code + +All source code for this `journal` is hosted at +[https://github.com/lunar-rf/robocraft](https://github.com/lunar-rf/robocraft) + +# Signature + +``` ++---------------------------------------+ +| .-. .-. .-. | +| / \ / \ / \ | +| / \ / \ / \ / | +| \ / \ / \ / | +| "_" "_" "_" | +| | +| _ _ _ _ _ _ ___ ___ _ _ | +| | | | | | | \| | /_\ | _ \ / __| || | | +| | |_| |_| | .` |/ _ \| /_\__ \ __ | | +| |____\___/|_|\_/_/ \_\_|_(_)___/_||_| | +| | +| | +| Lunar RF Labs | +| https://lunar.sh | +| | +| Research Laboratories | +| Copyright (C) 2022-2025 | +| | ++---------------------------------------+ +``` + diff --git a/_posts/2022-12-31-linux-detours.md b/_posts/2022-12-31-linux-detours.md new file mode 100644 index 0000000..062c6c9 --- /dev/null +++ b/_posts/2022-12-31-linux-detours.md @@ -0,0 +1,601 @@ +--- +layout: post +title: A Tiny C (x86_64) Function Hooking Library +author: Dylan Müller +--- + +> Function detouring is a powerful hooking technique that allows for the +> interception of `C/C++` functions. `cdl86` aims to be a tiny `C` detours +> library for `x86_64` binaries. + +1. [Overview](#overview) +2. [JMP Patching](#jmp-patching) +3. [INT3 Patching](#int3-patching) +4. [Code Injection](#code-injection) +5. [API](#api) +6. [Source Code](#source-code) + +# Overview + +Note: This article details the linux specific details of the library. Windows +support has since been added. + +See: +[https://github.com/lunar-rf/cdl86](https://github.com/lunar-rf/cdl86) + +[Microsoft Research](https://en.wikipedia.org/wiki/Microsoft_Research) currently +maintains a library known as [MS Detours](https://github.com/microsoft/Detours). +It allows for the interception of Windows `API` calls within the memory address +space of a process. + +This might be useful in certain situations such as if you are writing a `D3D9` +(`DirectX`) hook and you need to intercept cetain graphics routines. This is +commonly done for `ESP` and wallhacks where the `Z-buffer` needs to be +disabled for certain character models, for `D3D9` this might involve hooking +`DrawIndexedPrimitive`. + + +``` +HRESULT WINAPI hkDrawIndexedPrimitive(LPDIRECT3DDEVICE9 pDevice, ...args) +{ + // Check play model strides, primitive count, etc + ... + pDevice->SetRenderState(D3DRS_ZENABLE, false); + ... + // Call original function and return + oDrawIndexedPrimitive(...) + return ... +} +``` + +In order to disable the `Z-buffer` in this example we need access to a valid +`LPDIRECT3DDEVICE9` context within the running process. This is where detours +comes in handy. Generally, the procedure to hook a specific function is as +follows: + +- Declare a function pointer with target function signature: + +``` +typedef HRESULT (WINAPI* tDrawIndexedPrimitive)(LPDIRECT3DDEVICE9 pDevice, ...args); +``` + +- Define detour function with same function signature: + +``` +HRESULT WINAPI hkDrawIndexedPrimitive(LPDIRECT3DDEVICE9 pDevice, ...args) +``` + +- Assign the function pointer the target functions address in memory. In this + case a `VTable` entry. + +``` +#define DIP 0x55 +tDrawIndexedPrimitive oDrawIndexedPrimitive = (oDrawIndexedPrimitive)SomeVTable[DIP]; +``` + +- Call DetourFunction: + +``` +DetourFunction((void**)&oDrawIndexedPrimitive, &hkhkDrawIndexedPrimitive) +``` + +`DetourFunction` then uses the `oDrawIndexedPrimitive` function pointer and +modifies the instructions at the target function in order to transfer control +flow to the detour function. + +At this point any calls to `DrawIndexedPrimitive` within the `LPDIRECT3DDEVICE9` +class will be rerouted to `hkDrawIndexedPrimitive`. You can see that this is a +very powerful concept and gives us access to the callee's function arguments. As +demonstrated, it is possible to hook both `C` and `C++` functions. + +The difference generally is that the first argument to a `C++` function is a +hidden `this` pointer. Therefore you can define a `C++` detour in `C` with this +extra argument. + +Detours is great, but it is only available for Windows. The aim of the `cdl86` +project is to create a simple, compact detours library for `x86_64` Linux. What +follows is a brief explanation on how the library was designed. + +# Detour methods + +Two different approaches to method detouring were investigated and implemented +in the `cdl86` `C` library. First let's have a look at a typical function call for a +simple `C` program. We will be using `GDB` to inspect the resulting disassembly. + +``` +#include <stdio.h> + +int add(int x, int y) +{ + return x + y; +} +int main() +{ + printf("%i", add(1,1)); + return 0; +} +``` + +Compile with: +``` +gcc main.c -o main +``` + +and then debug with `GDB`: + +``` +gdb main +``` + +To list all the functions in the binary, supply `info functions` to the `gdb` +command prompt. + +``` +0x0000000000001100 __do_global_dtors_aux +0x0000000000001140 frame_dummy +0x0000000000001149 add +0x0000000000001161 main +0x00000000000011a0 __libc_csu_init +0x0000000000001210 __libc_csu_fini +0x0000000000001218 _fini +``` + +Let's disassemble the main function with `disas /r main`: + +``` +Dump of assembler code for function main: + 0x0000000000001161 <+0>: f3 0f 1e fa endbr64 + 0x0000000000001165 <+4>: 55 push %rbp + 0x0000000000001166 <+5>: 48 89 e5 mov %rsp,%rbp + 0x0000000000001169 <+8>: be 01 00 00 00 mov $0x1,%esi + 0x000000000000116e <+13>: bf 01 00 00 00 mov $0x1,%edi + 0x0000000000001173 <+18>: e8 d1 ff ff ff callq 0x1149 <add> + 0x0000000000001178 <+23>: 89 c6 mov %eax,%esi +``` + +`callq` has one operand which is the address of the function being called. It +pushes the current value of `%rip` (next instruction after call) onto the stack +and then transfers control flow to the target function. + +You may have also noticed the presence of the `endbr64` instruction. This +instruction is specific to Intel processors and is part of [Intel's Control-Flow +Enforcement Technology +(CET)](https://software.intel.com/content/www/us/en/develop/articles/technical-look-control-flow-enforcement-technology.html). +`CET` is designed to provide hardware protection against `ROP` (Return-orientated +Programming) and similar methods which manipulate control flow using *existing* +byte code. + +It's two main features are: + +* A shadow stack for tracking return addresses. +* Indirect branch tracking, which `endbr64` is a part of. + +`Intel CET` however does not prevent us from modifying control flow **directly** +by inserting instructions into memory. + +# JMP Patching + +The first method of function detouring we will explore is by inserting a `JMP` +instruction at the beginning of the target function to transfer control over to +the detour function. It should be noted that in order to preserve the stack we +need to use a `JMP` (specifically `jmpq`) instruction rather than a `CALL`. + +Since there is no way to pass a `64-bit` address to the `jmpq` instruction we will +have to first store the address we want to jump to into a register. We need to +choose a register that is not part of the `__cdecl` (defualt) calling +convention. `%rax` happens to be a register that is not part of the `__cdecl` +userspace calling convention and so for simplicity we use this register in our +design. + +The following is a disassembly of the instructions required for a `JMP` to a +`64-bit` immediate address: + +``` +0x0000555555561389 <+0>: 48 b8 b1 13 56 55 55 55 00 00 movabs $0x5555555613b1,%rax +0x0000555555561393 <+10>: ff e0 jmpq *%rax +``` + +You can see that `12` bytes are required to encode the `movabs` instruction (which +moves the detour address into `%rax`) as well as the `jmpq` instruction. +Immediate values are stored in little endian (LE) encoding. + +So we can therefore conclude that we need to patch **at least** `12` bytes in +memory at the location of our target function. These `12` bytes however are +important and we cannot simply discard them. It turns out that we actually place +these bytes at the start of what I will call a 'trampoline function', it's +layout is as follows: + +``` +trampoline <0x23215412>: + (original instruction bytes which were patched) + JMP (target + JMP patch length) +``` + +Simply put, the trampoline function behaves as the original, unpatched function. +As shown above it consists of the target function's original instruction bytes +as well as a call to the target function, offset by the `JMP` patch length. + +The trampoline generation code for `cdl86` is shown below: + +``` +uint8_t *cdl_gen_trampoline(uint8_t *target, uint8_t *bytes_orig, int size) +{ + uint8_t *trampoline; + int prot = 0x0; + int flags = 0x0; + + /* New function should have read, write and + * execute permissions. + */ + prot = PROT_READ | PROT_WRITE | PROT_EXEC; + flags = MAP_PRIVATE | MAP_ANONYMOUS; + + /* We use mmap to allocate trampoline memory pool. */ + trampoline = mmap(NULL, size + BYTES_JMP_PATCH, prot, flags, -1, 0); + memcpy(trampoline, bytes_orig, size); + /* Generate jump to address just after call + * to detour in trampoline. */ + cdl_gen_jmpq_rax(trampoline + size, target + size); + + return trampoline; +} +``` + +You can see that the allocation of the trampoline function occurs through a call +to `mmap` with the `PROT_READ | PROT_WRITE | PROT_EXEC` memory protection flags. + +Therefore it should also be noted that the correct memory permissions should be +set for both the target function before modification as well as the trampoline +function, after allocation. Here is a snippet from the `cdl86` library for +setting memory attributes: + +``` +/* Set R/W memory protections for code page. */ +int cdl_set_page_protect(uint8_t *code) +{ + int perms = 0x0; + int ret = 0x0; + + /* Read, write and execute perms. */ + perms = PROT_EXEC | PROT_READ | PROT_WRITE; + /* Calculate page size */ + uintptr_t page_size = sysconf(_SC_PAGE_SIZE); + ret = mprotect(code - ((uintptr_t)(code) % page_size), page_size, perms); + + return ret; +} +``` + +The general procedure to place the `JMP` hook is as follows: + +1. Determine the minimum number of bytes required for a `JMP` patch. +2. Create trampoline function. +3. Set memory permissions (read, write, execute). +4. Generate `JMP` to detour at target function. +5. Fill unused bytes with `NOP`. +6. Assign trampoline address to target function pointer. + +Let's have a look at all of this in action using `GDB`. I will be using the +[basic_jmp.c](https://github.com/lunar-rf/cdl86/blob/master/tests/basic_jmp.c) +test case in the `cdl86` library. The source code for this test case is shown +below: + +``` +#include "cdl.h" + +typedef int add_t(int x, int y); +add_t *addo = NULL; + +int add(int x, int y) +{ + printf("Inside original function\n"); + return x + y; +} + +int add_detour(int x, int y) +{ + printf("Inside detour function\n"); + return addo(5,5); +} + +int main() +{ + struct cdl_jmp_patch jmp_patch = {}; + addo = (add_t*)add; + + printf("Before attach: \n"); + printf("add(1,1) = %i\n\n", add(1,1)); + + jmp_patch = cdl_jmp_attach((void**)&addo, add_detour); + if(jmp_patch.active) + { + printf("After attach: \n"); + printf("add(1,1) = %i\n\n", add(1,1)); + printf("== DEBUG INFO ==\n"); + cdl_jmp_dbg(&jmp_patch); + } + + cdl_jmp_detach(&jmp_patch); + printf("\nAfter detach: \n"); + printf("add(1,1) = %i\n\n", add(1,1)); + + return 0; +} +``` + +We compile the following source file with (modified from makefile): + +``` +gcc -I../ -g basic_jmp.c ../cdl.c ../lib/libudis86/*.c -g -o basic_jmp +``` + +Then load into `GDB` using: + +``` +gdb basic_jmp +``` + +Once `GDB` has loaded, we insert a breakpoints at lines `24` and `27` using the +command: + +``` +break 24 +break 27 +``` + +We start execution of the program with: + +``` +run +``` + +`GDB` will then inform you that the first breakpoint has been triggered. For this +first breakpoint we are interested in the `add()` function's assembly before the +hook has taken place. To inspect this assembly, provide: + +``` +disas /r add +``` +``` +Dump of assembler code for function add: + 0x0000555555561389 <+0>: f3 0f 1e fa endbr64 + 0x000055555556138d <+4>: 55 push %rbp + 0x000055555556138e <+5>: 48 89 e5 mov %rsp,%rbp + 0x0000555555561391 <+8>: 48 83 ec 10 sub $0x10,%rsp + 0x0000555555561395 <+12>: 89 7d fc mov %edi,-0x4(%rbp) +``` + +This is the disassembly of the unaltered target function. `12` bytes for the `JMP` +patch will have to be written at this address. Therefore the first `4` +instructions will need to be written to the trampoline function followed by a +`JMP` to address `0x0000555555561395` and that's all we need for the trampoline! + +Now the fun part! Let's continue execution to the next breakpoint, where our +`JMP` hook will be placed. + +``` +continue +``` + +Let's examine the disassembly of our `add()` function once again: + +``` +Dump of assembler code for function add: + 0x0000555555561389 <+0>: 48 b8 b1 13 56 55 55 55 00 00 movabs $0x5555555613b1,%rax + 0x0000555555561393 <+10>: ff e0 jmpq *%rax + 0x0000555555561395 <+12>: 89 7d fc mov %edi,-0x4(%rbp) + 0x0000555555561398 <+15>: 89 75 f8 mov %esi,-0x8(%rbp) +``` + +`0x5555555613b1` is the address of our detour/intercept function. Let's examine +the disassembly of our detour function: + +``` +disas /r 0x5555555613b1 +``` + +``` +Dump of assembler code for function add_detour: + 0x00005555555613b1 <+0>: f3 0f 1e fa endbr64 + 0x00005555555613b5 <+4>: 55 push %rbp + 0x00005555555613b6 <+5>: 48 89 e5 mov %rsp,%rbp + 0x00005555555613b9 <+8>: 48 83 ec 10 sub $0x10,%rsp + 0x00005555555613bd <+12>: 89 7d fc mov %edi,-0x4(%rbp) + 0x00005555555613c0 <+15>: 89 75 f8 mov %esi,-0x8(%rbp) + 0x00005555555613c3 <+18>: 48 8d 3d 53 5c 00 00 lea 0x5c53(%rip),%rdi + 0x00005555555613ca <+25>: e8 b1 fd ff ff callq 0x555555561180 <puts@plt> + 0x00005555555613cf <+30>: 48 8b 05 ba bc 01 00 mov 0x1bcba(%rip),%rax + 0x00005555555613d6 <+37>: be 05 00 00 00 mov $0x5,%esi + 0x00005555555613db <+42>: bf 05 00 00 00 mov $0x5,%edi + 0x00005555555613e0 <+47>: ff d0 callq *%rax + 0x00005555555613e2 <+49>: c9 leaveq + 0x00005555555613e3 <+50>: c3 retq +``` + +We can see that a call to our trampoline function is made to the address given +by referencing the `QWORD` (out function pointer) at address `0x55555557d090`, +let's deference it: + +``` +print /x *(long unsigned int*)(0x55555557d090) +``` +``` +$20 = 0x7ffff7ffb000 +``` + +So the function pointer is pointing to address `0x7ffff7ffb000` which is our +trampoline function, let's dissasemble it: + +``` +x/10i 0x7ffff7ffb000 +``` + +``` + 0x7ffff7ffb000: endbr64 + 0x7ffff7ffb004: push %rbp + 0x7ffff7ffb005: mov %rsp,%rbp + 0x7ffff7ffb008: sub $0x10,%rsp + 0x7ffff7ffb00c: movabs $0x555555561395,%rax + 0x7ffff7ffb016: jmpq *%rax + 0x7ffff7ffb018: add %al,(%rax) + 0x7ffff7ffb01a: add %al,(%rax) + 0x7ffff7ffb01c: add %al,(%rax) + 0x7ffff7ffb01e: add %al,(%rax) +``` + +You can see that our trampoline contains the first `4` instructions that were +replaced when the `JMP` patch was placed in our target function. You can see a +jmp back to address `0x555555561395` which was disassembled earlier. This should +give you an idea of how the control flow modification is achieved. + +# INT3 Patching + +There is another method of function detouring which involves placing `INT3` +breakpoints at the start of the target function in memory. `INT3` breakpoints +are encoded with the `0xCC` opcode: + +``` +/* Generate int3 instruction. */ +uint8_t *cdl_gen_swbp(uint8_t *code) +{ + *(code + 0x0) = 0xCC; + return code; +} +``` + +So rather than placing a `JMP` patch to the detour we simply write the byte +`0xCC` to the target function being careful to `NOP` the unused bytes. Once the +`RIP` register reaches an address of an `INT3` breakpoint the Linux kernel sends +a `SIGTRAP` signal to the process. + +We can register our own signal handler but we need some additional info on the +signal such as context information. A context is the state of a program's +registers and stack. We need this info to compare the breakpoints `RIP` value to +any active global software breakpoints. + +This is how the signal handler is registered in `cdl86`: + +``` + struct sigaction sa = {}; + + /* Initialise cdl signal handler. */ + if (!cdl_swbp_init) + { + /* Request signal context info which + * is required for RIP register comparison. + */ + sa.sa_flags = SA_SIGINFO | SA_ONESHOT; + sa.sa_sigaction = (void *)cdl_swbp_handler; + sigaction(SIGTRAP, &sa, NULL); + cdl_swbp_init = true; + } + ... +``` + +Note the use of `SA_SIGINFO` to get context information. The software breakpoint +handler is then defined as follows: + +``` +void cdl_swbp_handler(int sig, siginfo_t *info, struct ucontext_t *context) +{ + int i = 0x0; + bool active = false; + uint8_t *bp_addr = NULL; + + /* RIP register point to instruction after the + * int3 breakpoint so we subtract 0x1. + */ + bp_addr = (uint8_t *)(context->uc_mcontext.gregs[REG_RIP] - 0x1); + + /* Iterate over all breakpoint structs. */ + for (i = 0; i < cdl_swbp_size; i++) + { + active = cdl_swbp_hk[i].active; + /* Compare breakpoint addresses. */ + if (bp_addr == cdl_swbp_hk[i].bp_addr) + { + /* Update RIP and reset context. */ + context->uc_mcontext.gregs[REG_RIP] = (greg_t)cdl_swbp_hk[i].detour; + setcontext(context); + } + } +} +``` + +Note that if a match of the `RIP` value to any known breakpoints occurs the `RIP` +value for the current context is updated and the new context applied using +`setcontext()`. A trampoline function similar to our `JMP` patch is allocated +and serves the same purpose. + +# Code Injection + +`cdl86` assumes that you are operating in the address space of the target +process. Therefore code injection is often required in practice and requires the +use of an +[injector](https://github.com/lunar-rf/robocraft/tree/main/injector). + +Once a shared library (`.so`) has been injected you can use the following code +to get the base address of the main executable module: + +``` +#include <link.h> +#include <inttypes.h> + +int __attribute__((constructor)) init() +{ +... + struct link_map *lm = dlopen(0, RTLD_NOW); + printf("base = %" PRIx64 , lm->l_addr); +... + +} +``` + +Or find the address of a function by symbol name: + +``` +void* dl_handle = dlopen(NULL, RTLD_LAZY); +void* add_ptr = dlsym(dl_handle, "add"); +``` + +# API +The API for the `cdl86` library is shown below: + +``` +struct cdl_jmp_patch cdl_jmp_attach(void **target, void *detour); +struct cdl_swbp_patch cdl_swbp_attach(void **target, void *detour); +void cdl_jmp_detach(struct cdl_jmp_patch *jmp_patch); +void cdl_swbp_detach(struct cdl_swbp_patch *swbp_patch); +void cdl_jmp_dbg(struct cdl_jmp_patch *jmp_patch); +void cdl_swbp_dbg(struct cdl_swbp_patch *swbp_patch); +``` + +# Source code +You can find the `cdl86` source code +[here](https://github.com/lunar-rf/cdl86).<br> + +# Signature + +``` ++---------------------------------------+ +| .-. .-. .-. | +| / \ / \ / \ | +| / \ / \ / \ / | +| \ / \ / \ / | +| "_" "_" "_" | +| | +| _ _ _ _ _ _ ___ ___ _ _ | +| | | | | | | \| | /_\ | _ \ / __| || | | +| | |_| |_| | .` |/ _ \| /_\__ \ __ | | +| |____\___/|_|\_/_/ \_\_|_(_)___/_||_| | +| | +| | +| Lunar RF Labs | +| https://lunar.sh | +| | +| Research Laboratories | +| Copyright (C) 2022-2025 | +| | ++---------------------------------------+ +``` + diff --git a/_posts/2023-09-17-gsctool.md b/_posts/2023-09-17-gsctool.md new file mode 100644 index 0000000..ca0bce0 --- /dev/null +++ b/_posts/2023-09-17-gsctool.md @@ -0,0 +1,537 @@ +--- +layout: post +title: A simple GSC loader for COD Black Ops 1 +author: Dylan Müller +--- + +> Learn how `GSC` scripts are loaded into [Black Ops 1](https://en.wikipedia.org/wiki/Call_of_Duty:_Black_Ops) and how to write your own +> simple `GSC` loader in `C` for the Microsoft Windows version of the game. + +1. [Fond Memories](#fond-memories) +2. [Game Script Code](#game-script-code) +3. [GSC Script Call Chain](#gsc-script-call-chain) +4. [Format of raw GSC scripts](#format-of-raw-gsc-scripts) +5. [Loading and executing GSC script assets](#loading-and-executing-gsc-script-assets) +6. [Black Ops Offsets](#black-ops-offsets) +7. [Source Code](#source-code) +8. [Demo](#demo) + +# Fond Memories + + +Call of Duty: Black Ops 1 + +`Call of Duty: Black Ops 1` for Microsoft Windows was released all the back back +in 2010! At the time I was still a teenager in high-school and a dedicated +gamer. + +It seemed like another purchase on steam back then but I would have never +expected a game to produce so many fond memories playing kino der toten zombies +with friends from all over the world. + +I mean who doesn't remember all the iconic glitches, mystery boxes, etc of the +game? Some say the `Black Ops 1` `zombies` was the best in the `Black Ops` series. +There is something about the game aesthetics that creates a rush of nostalgia +:video_game:. + +> Ok enough rambling, let's get technical... + +# Game Script Code + +`GSC` is an internal scripting language that is used to script missions and game +modes for `Black Ops 1`. It has a syntax similar to `C++` but is a very limited +language. Here is an example: + +``` +#include common_scripts\utility; +#include maps\_utility; + +init() +{ + level thread on_player_connect(); + + // TODO: put weapon init's here + maps\_flashgrenades::main(); + maps\_weaponobjects::init(); + maps\_explosive_bolt::init(); + maps\_flamethrower_plight::init(); + maps\_ballistic_knife::init(); +} + +on_player_connect() +{ + while( true ) + { + level waittill("connecting", player); + + player.usedWeapons = false; + player.hits = 0; + + player thread on_player_spawned(); + } +} +``` + +`GSC` scripts are only loaded and executed when a map or game has started and in +`Black Ops 1` they are reloaded every time a map or game mode is restarted. Writing +custom `GSC` scripts is thus the basis for the vast majority of `Black Ops 1` +modding efforts. + +`GSC` scripts are first loaded into memory and then compiled with an internal `GSC` +compiler. There is a public `GSC` compiler and decompiler on `GitHub`: +[gsc-tool](https://github.com/xensik/gsc-tool) but support for `Black Ops 1 (T5)` +seems to be missing. + +Luckily it turns out we don't actually need an external `GSC` compiler to compile +our own `GSC` scripts. We can (in theory) simply call the `compile()` function of +the internal `GSC` compiler and pass our mod script. + +Whenever we talk about hooking a function or calling an in-game function we +usually assume that static analysis has been performed on the game binary to +find functions and offsets of interest. `IDA Pro` is my disassembler of choice +but `Ghidra` is probably also worth mentioning. + +For debugging, `Cheat Engine` usually does the job quite well. However debugging +`Black Ops 1` is not as simple as attaching to the process and setting +breakpoints. + +As far as I know this was possible in the official release version of the game, +but the game received multiple updates/patches which introduced various +primitive `anti-debugging` and anti cheat detections which must be bypassed. + +Usually if I am starting reverse engineering on a game, my first quick option is +[scyllahide](https://github.com/nihilus/ScyllaHide). This seemed to work +for the latest `Black Ops 1` patch which allowed me to set breakpoints in game +which aided in the search for various `Black Ops 1` function addresses. + +As we know research is an important part of any reverse engineering task, +without adequate research you could be wasting hours, days or even months of +your time by doing the same work that others have done before. + +So I started research and found out that the source code for the `CoD 4` server +was leaked at some point and can be found in the following repo: +[cod4x_server](https://github.com/callofduty4x/CoD4x_Server) + +This is very valuable information and was the basis for understanding how `GSC` +scripts are loaded and compiled in game. + +# GSC Script Call Chain + +The next task was figuring out the call chain (sequence of function calls) which +is required for loading custom `GSC` scripts. So I decided to search for the +keyword `script` and stumbled upon a function called `Scr_LoadScript` +[here](https://github.com/lunarbin/CoD4x_Server/blob/14f0d8a205d80c9b30e308b57f0cd9bd6d7cdb1c/src/scr_vm_main.c#L1018) +which seemed to be responsible for loading a `GSC` or `CSC` script (`CSC` scripts are +almost identical to `GSC` scripts but are executed by the client instead of a +server). + +The only argument to `Scr_LoadScript` was a const character string however: + +``` +unsigned int Scr_LoadScript(const char* scriptname) +``` + +which I assumed was some type of resource/asset string. I therefore proceeded to +look for any functions that would load assets and stumbled upon `DB_AddXAsset` +[here](https://github.com/lunarbin/CoD4x_Server/blob/14f0d8a205d80c9b30e308b57f0cd9bd6d7cdb1c/src/bspc/db_miniload.cpp#L2919). + +The function signature for `DB_addXAasset` was as follows: + +``` +XAssetHeader __cdecl DB_AddXAsset(XAssetType type, XAssetHeader header) +``` + +I decided to search for the definition of `XAssetHeader`. `XAssetHeader` was +defined as a union of various +[structs](https://github.com/lunarbin/CoD4x_Server/blob/14f0d8a205d80c9b30e308b57f0cd9bd6d7cdb1c/src/xassets.h#L81): + +``` +union XAssetHeader +{ + struct XModelPieces *xmodelPieces; + struct PhysPreset *physPreset; + struct XAnimParts *parts; + ... + struct FxImpactTable *impactFx; + struct RawFile *rawfile; + struct StringTable *stringTable; + void *data; +}; +``` + +Out of all the structs listed `RawFile` seemed like the most interesting for our +purposes (loading `GSC` script assets). Let's have a look at the `Rawfile` +structure +[itself](https://github.com/lunarbin/CoD4x_Server/blob/14f0d8a205d80c9b30e308b57f0cd9bd6d7cdb1c/src/xassets/rawfile.h#L5): + +``` +struct RawFile +{ + const char *name; + int len; + const char *buffer; +}; +``` + +Seems simple enough: +* `name` is the resource/asset identifier. +* `len` is the total buffer length. +* `buffer` is the buffer holding our script to compile. + +One thing worth noting is that we don't actually know the format that scripts +are stored in before they are compiled (i.e format of `buffer`). For example, +are they plaintext, compressed or encrypted at all? We will discuss this further +on. + +Let us now for a brief moment turn our attention to the following code block: + +``` +XAssetEntry newEntry; + +newEntry.asset.type = type; +newEntry.asset.header = header; + +existingEntry = DB_LinkXAssetEntry(&newEntry, 0); +``` + +This seems to be doing the actual asset allocation. The first argument is of type +`XAssetEntry` which is defined as: + +``` +struct XAssetEntry +{ + struct XAsset asset; + byte zoneIndex; + bool inuse; + uint16_t nextHash; + uint16_t nextOverride; + uint16_t usageFrame; +}; +``` + +The first element of this struct `asset` looks interesting: + +``` +struct XAsset +{ + enum XAssetType type; + union XAssetHeader header; +}; +``` + +We know what `XAssetHeader` is and `XAssetType` is a simple enum which is used +to indicate the asset type: + +``` +enum XAssetType +{ + ASSET_TYPE_XMODELPIECES, + ASSET_TYPE_PHYSPRESET, + ASSET_TYPE_XANIMPARTS, + ... + ASSET_TYPE_RAWFILE +} +``` + +`ASSET_TYPE_RAWFILE` seems like the right enum value in our case. With all this +information we now know what is required to load a script asset: + +1. Allocate `RawFile` struct with script name, size and data buffer. +2. Initialise `XAssetHeader` union with pointer to our `Rawfile` struct. +3. Allocate `XAssetEntry` struct. +4. Set `asset.type` to `ASSET_TYPE_RAWFILE` (`XAssetEntry`). +5. Set `asset.header` to `XAssetHeader` allocated earlier (`XAssetEntry`). +6. Call `DB_LinkXAssetEntry` to 'link' the asset. + +After this series of steps it should be possible to call `Scr_LoadScript` with +the path of our asset. This would be the same as the `name` member of the +`RawFile` struct. + +# Format of raw GSC scripts + +We now have most of the required information to load `GSC` script assets into +memory but one question remains, what format are they in? + +To answer this question I thought a good starting point would be to hook/detour +`DB_LinkXAssetEntry` and dump the `RawFile` data buffer for a `GSC` script to a +file for analysis. + +To detour functions in the game we need to bypass an annoying `anti-debug` +feature introduced into the game that uses `GetTickCount` to analyse a thread's +timing behaviour. + +Using Cheat Engine I managed to trace the evil function to `0x004C06E0`: + +```c +#define T5_Thread_Timer 0x004C06E0 +``` + + +Anti-debug timer view in `IDA`. + +So make sure you patch this out otherwise the game will simply close after some +time after detouring. + +We would have to hook the function before loading a solo zombie match as script +assets are loaded into memory shortly after starting a game mode. I wrote a +simple detouring library called [cdl86](https://github.com/lunarbin/cdl86) +that can `hook/detour` functions either by inserting a `JMP` opcode at the target +functions address to a detour function or through a software breakpoint. + +I will leave this task up to the reader to perform. What you need is an address +and I took the liberty of finding the address of `DB_LinkXAssetEntry` for you in +`IDA Pro`. If you are wondering how I found the address, it would be a combination +of research, debugging and static analysis. + +A lot comes down to recognising patterns in the disassembly. In general as you +find the address of functions it gradually becomes easier to find the address of +other related functions. + +Bare in mind this is the address of the last `x86` (`32-bit`) `PC` version of the +game: + +```c +#define T5_DB_LinkXAssetEntry 0x007A2F10 +``` + +Once you have a `GSC` script dump, open it in your favorite text editor (I used +`HxD`). This is a dump of a small `GSC` script from memory: + + + +We can immediately see that the script is not stored in simple plaintext. In +this case I like to look for `magic` bytes which may indicate if the file was +compressed. `zlib` compression is one of the most common compression types for +binary files. + +Based off research I did, `zlib` uses the following magic bytes: + +``` +78 01 - No Compression/low +78 9C - Default Compression +78 DA - Best Compression +``` + +These bytes should appear not too far from the start of the file. Notice that in +the dump these bytes appear at offset `0x8` and `0x9` so there is a good chance +this file is `zlib` compressed. Let's test this theory, here is a simple `zlib` +`inflate` python script: + +``` +import zlib + +with open(sys.argv[1], 'rb') as compressed_data: + compressed_data = compressed_data.read() + unzipped = zlib.decompress(compressed_data[8:]) + with open(sys.argv[2], 'wb') as result: + result.write(unzipped); + result.close() +``` + +I get the following text: + +``` +// THIS FILE IS AUTOGENERATED, DO NOT MODIFY +main() +{ + self setModel("c_jap_takeo_body"); + self.voice = "vietnamese"; + self.skeleton = "base"; +} + +precache() +{ + precacheModel("c_jap_takeo_body"); +} + +``` +This text has a size of `0xC6`. + +Great and `+1` for the data being stored in plaintext! + +So what are the first `8` bytes used for? + +If you have a sharp eye you might have noticed that this 8 bytes actually +contains `2` values. + +The first `4` bytes seems to represent our `inflated` (decompressed) data size of +`0xC6` and the second value seems to represent the `size of the file - 8 bytes` +or the size of our compressed data which is `0x9A`. + +These values seem to be stored in `little-endian` format with the `4` byte integers +ordered as `LSB`. + +Great we now have enough information to load a plaintext `GSC` script, compress it +and load it as an asset using `DB_LinkXAssetEntry`. It's pretty cool that it is +this straightforward to decompress `GSC` scripts as it gives insight into how the +game modes/logic were implemented and plenty of ideas for mods of course! + +So in theory we could inject a `DLL`, read our `GSC` script, compress it, link it +with `DB_LinkXAssetEntry` and then call `Scr_LoadScript` right? We will discuss +this in the next section. + +# Loading and executing GSC script assets + +It turns out that we cannot simply call `Scr_LoadScript` when we like. As +mentioned above `CoD` `Black Ops` will load all script assets using +`DB_LinkXAssetEntry` and then compile them using `Scr_LoadScript` while a map is +loading. + +Therefore we have to load our `GSC` script at the correct time otherwise it won't +work. The strategy is therefore to hook `Scr_LoadScript` and then load our +custom `GSC` script asset when `Scr_LoadScript` is loading one of the last `GSC` +scripts for the current map. + +It turns out that one of the last `GSC` scripts contains the current map name. + +To get the value of the current map we can utilize the function `Dvar_FindVar` +which i found to be at address: `0x0057FF80`: + +``` +#define T5_Dvar_FindVar 0x0057FF80 +``` + +It's function prototype is defined as follows: +``` +typedef uint8_t* (__cdecl* Dvar_FindVar_t)( + uint8_t* variable +); +``` + +The variable in question is `mapname` which will return the current map. It +should be noted that `Scr_LoadScript` does not execute our `GSC` script however, +instead we defer execution of our main function in our `GSC` script until the map +has finished loading. + +To this end i found a convenient function that is called when the game is just +about to start at address `0x004B7F80`: + +``` +#define T5_Scr_LoadGameType 0x004B7F80 +``` + +with the following signature: +``` +typedef void (__cdecl* Scr_LoadGameType_t)( + void +); +``` + +To defer execution however we need to obtain the function handle after the `GSC` +script has been loaded in our hooked `Scr_LoadScript`. In this case we utilize a +function called `Scr_GetFunctionHandle` which returns the address of a function +which has been loaded into the `GSC` execution environment or `VM`. + +It has the following function signature: + +``` +typedef int32_t (__cdecl* Scr_GetFunctionHandle_t)( + int32_t scriptInstance, + const uint8_t* scriptName, + const uint8_t* functioName +); +``` + +`scriptInstance` is a variable which determines whether the script is a `GSC` or +`CSC` type script. A value of `0` indicates a `GSC` script while `1` indicated a `CSC` +script. + +The function handle to our loaded script is then stored in a global variable and +the function can be executed in it's own thread using `Scr_ExecThread` which has +the following prototype: + +``` +typedef uint16_t (__cdecl* Scr_ExecThread_t)( + int32_t scriptInstance, + int32_t handle, + int32_t paramCount +); +``` + +I found this function at address: `0x005598E0`: +``` +#define T5_Scr_ExecThread 0x005598E0 +``` + +I also noticed during static analysis that a call to `T5_Scr_FreeThread` was +always made following a call to `Scr_ExecThread` so that the pseudo code would +look something like: + +``` +int16_t handle = Scr_ExecThread(0, func_handle, 0); +Scr_FreeThread(handle, 0); +``` + +We also obviously also need a compression library and for this I am using +[miniz](https://github.com/lunarbin/miniz). + +So to summarize the steps to inject our `GSC` script via our `Scr_LoadScript` hook: + +1. Query map being loaded with `Dvar_FindVar`. +2. Compare current `scriptName` with map name. +3. On match open `GSC` script, compress and load asset as described above. +4. Call `Scr_LoadScript` on our loaded script asset. +5. Get function handle with `Scr_GetFunctionHandle`. + +To execute our `GSC` script entry function via our `Scr_LoadGameType_hk` hook: +1. Call `Scr_ExecThread` on function handle. +2. Call `Scr_FreeThread` on handle returned by `Scr_ExecThread`. + +# Black Ops Offsets + +I have included all the offsets I found for the game using `IDA`: +``` +#define T5_Scr_LoadScript 0x00661AF0 +#define T5_Scr_GetFunctionHandle 0x004E3470 +#define T5_DB_LinkXAssetEntry 0x007A2F10 +#define T5_Scr_ExecThread 0x005598E0 +#define T5_Scr_FreeThread 0x005DE2C0 +#define T5_Scr_LoadGameType 0x004B7F80 +#define T5_Dvar_FindVar 0x0057FF80 +#define T5_Assign_Hotfix 0x007A4800 +#define T5_init_trigger 0x00C793B0 +#define T5_Thread_Timer 0x004C06E0 +``` + +A simple challenge would be to find these offsets yourself, it is not difficult. + +`Black Ops 1` uses the `__cdecl` calling convention for functions. + +# Source Code + +I have included the source code of my simple `GSC` loader in the +[following](https://github.com/lunarbin/gsctool/) repo. + +I have also included a sample `GSC` script that spawns a raygun at spawn in +zombies solo mode and dumps `GSC` scripts as they are loaded via a +`DB_LinkXAssetEntry` hook. + +# Demo + + + +# Signature + +``` ++---------------------------------------+ +| .-. .-. .-. | +| / \ / \ / \ | +| / \ / \ / \ / | +| \ / \ / \ / | +| "_" "_" "_" | +| | +| _ _ _ _ _ _ ___ ___ _ _ | +| | | | | | | \| | /_\ | _ \ / __| || | | +| | |_| |_| | .` |/ _ \| /_\__ \ __ | | +| |____\___/|_|\_/_/ \_\_|_(_)___/_||_| | +| | +| | +| Lunar RF Labs | +| https://lunar.sh | +| | +| Research Laboratories | +| Copyright (C) 2022-2025 | +| | ++---------------------------------------+ +``` + diff --git a/_posts/2024-03-20-rf-primer.md b/_posts/2024-03-20-rf-primer.md new file mode 100644 index 0000000..4992121 --- /dev/null +++ b/_posts/2024-03-20-rf-primer.md @@ -0,0 +1,753 @@ +--- +layout: post +title: RF Circuit and Power Amplifier Design Basics +author: Dylan Müller +--- + +> A short primer on some of the basic concepts related to `RF` circuit +> and `RF` power amplifier design. + +1. [Average Power](#average-power) +2. [Transmission Lines](#transmission-lines) +3. [Impedance Matching](#impedance-matching) +4. [Electromagnetic Transducers](#electromagnetic-transducers) +5. [S-Parameters](#s-parameters) +6. [Harmonic Balance](#harmonic-balance) +7. [Lumped and Distributed Element Networks](#lumped-and-distributed-element-networks) +8. [Classes Of Operation](#classes-of-operation) +9. [Stability Analysis](#stability-analysis) +10. [Efficiency](#efficiency) +11. [P1dB Compression Point](#p1db-compression-point) +12. [Load Pull](#load-pull) +13. [LC Filtering](#lc-filtering) + +# Foreword + +The aim of this journal entry is to review some basic technical concepts +pertaining to general `RF` circuit design and modelling as well as `RF` power +amplifier design. + +`RF` circuit design is typically not covered in detail at an undergraduate level +and the author hopes that this journal entry will provide some useful +information to readers not familiar with the subject. + +# Average Power + +In order to be successful in `RF` power amplifier design, it is necessary to +understand `AC` power from both a frequency and time domain perspective. + +In the frequency-domain, complex `AC` power ($$S$$) is given by: + +$$ S = \overline{V}.\overline{I}^* $$ + +Where $$\overline{V}$$ and $$\overline{I}$$ are the voltage and current phasors. In +`RF` power amplifier design we are typically concerned with maximizing the real +component of complex power $$\Re(S)$$ which represents power dissipated in a load +such as an antenna (modelled by it's radiation resistance). The imaginary part +of $$S$$ represents reactive power which does not perform any useful work. + +Average real power ($$P_{avg}$$) is the industry standard measure of power for `RF` +and `microwave` systems and is measured in the `SI` units of `watts (W)`. Average +power over total time ($$T$$) for a continuous-wave (`CW`) signal is defined by the +following time-domain integral: + +$$ P_{avg} = \frac{1}{T} \int\limits_0^T v(t).i(t) dt $$ + +The terms $$v(t)$$ and $$i(t)$$ can then be expanded: + +$$ v(t) = v_{p}sin(2\pi ft), i(t) = i_{p}sin(2\pi ft + \theta) $$ + +Here $$\theta$$ represents the phase angle between $$i(t)$$ and $$v(t)$$ and ($$v_{p}$$) +and ($$i_{p}$$) represent the peak values of the voltage and current waveforms. + +It can be shown that for purely sinusoidal voltage and current waveforms: + +$$ P_{avg} = \frac{1}{2}v_{p}i_{p}cos(\theta) $$ + +Hence maximum real power is obtained when the current and voltage waveforms are +in phase. $$cos (\theta)$$ is commonly referred to as the power factor and this +constant relates real power to apparent power $$\|S\|$$. Apparent power is the +total complex power available to a particular load. + +In practice average real power is specified in terms of decibels referenced to `1` +`milliwatt (mW)` ($$dBm$$): + +$$ dBm(P) = 10\log_{10}(\frac{P}{1mW}) $$ + +While relative power is measured in decibels ($$dB$$): + +$$ dB(P) = 10\log_{10}(\frac{P}{P_{ref}}) $$ + +## Transmission Lines + +Traditional lumped circuit theory is only applicable to low frequency circuits +as it assumes that wires are lossless with negligible electrical length. +Electrical length is usually measured in terms of multiples of the wavelength +($$\lambda$$) of the highest frequency signal that is conducted over the line. + +As a rule of thumb, when the length of a wire is greater or equal to +$$\lambda/10$$, the voltage and current in the wire as a function of position +along the wire will no longer be constant with time and wave-like behaviour such +as reflection and standing waves start becoming important. In these cases +utilizing a distributed model, such as the transmission line, often becomes +necessary. + +Let $${u}$$ and $${i}$$ represent the voltage and current along the transmission line as +a function of position and time: + +$$ u = u(x,t), i = i(x,t) $$ + +Then at distance $${x+dx}$$ voltage and current can be expressed with a taylor +series expansion: + +$$ u(x + dx) = u(x,t) + \frac{\partial u}{\partial x} dx $$ + +$$ i(x + dx) = i(x,t) + \frac{\partial i}{\partial x} dx $$ + +A segment of a traditional transmission line is shown in the figure below. It +consists of a series distributed resistance $$R(x)$$, inductance $$L(x)$$ as well as +segment capacitance $$C(x)$$ and conductance $$G(x)$$. A transmission line is +thought of as a continuous series of these segments. + + + +It can be shown that the equations that describe changes in the voltage and +current on the line is given by: + +$$ \frac{\partial u}{\partial x} dx = -Ri - L \frac{\partial i}{\partial t} $$ + +$$ \frac{\partial i}{\partial x} dx = -Gu - C \frac{\partial u}{\partial t} $$ + +The equations above are known as the telegrapher's equations. These two +equations can be combined to produce two partial differential equations with one +isolated variable each ($${u}$$ or $${i}$$). + +$$u$$ and $$i$$ are related by the characteristic impedance ($$Z_{0}$$) of the +transmission line which can be derived from the telegraphers equations: + +$$ Z_{0} = \frac{V^+(x)}{I^+(x)} = \sqrt{\frac{R(x)+j\omega L(x)}{G(x)+j\omega +C(x)}} $$ + +For most transmission lines dielectric and conductor losses are low, so we +assume: + +$$ R(x) \ll j\omega L(x), G(x) \ll j\omega C(x) $$ + +As a result the equation above simplifies to: + +$$ Z_{0} = \sqrt{\frac{L(x)}{C(x)}} $$ + +It is worth noting that characteristic impedance $$Z_{0}$$ of a transmission line +only has an effect at `RF` frequencies. It is not possible to measure the +characteristic impedance of a transmission line directly with a multimeter +because resistance (ohms law) and characteristic impedance (electromagnetic +property) are not the same concept. + +Examples of transmission lines include coaxial cable and microstrip line. +Transmission lines typically have a characteristic impedance of `50` $$\Omega$$ in +`RF` systems and are used to carry `RF` signals from one point in a circuit to +another with minimal losses. + +# Impedance Matching + +It is often said that impedance matching is what truly differentiates `RF` circuit +design from low frequency circuit design. Indeed, it is one of the most common +tasks for an `RF` designer. + +`RF` signals travelling along a transmission line can be thought of as +electromagnetic power waves. Power waves are a hypothetical construct, one of +the many possible linear transformations of voltage and current. + +The figure below shows a typical `RF circuit` consisting of a `RF` power source +($$G$$) with an impedance of ($$Z_{R}$$) and a load with an impedance of $$Z_{L}$$. +The interface of the generator and load impedance is indicated by the dotted +line. + + + +It is at this interface that we experience reflection of the power wave (back to +the generator) when $$Z_{R} \neq Z_{L}$$. If the impedance of the generator and +load were equal then no reflection occurs. + +In general, the goal of an `RF` system is to transfer `RF` power as efficiently from +one point to another with minimal reflections. The degree of an electromagnetic +power wave reflected (at the boundary) is determined by the reflection +coefficient ($$\Gamma$$): + +$$ \Gamma_{L} = \frac{Z_{L} - Z_{0}}{Z_{L}+Z_{0}} $$ + +A $$\Gamma$$ of $$0$$ indicates no reflection, while a $$\Gamma$$ of `1` or `-1` +represents total reflection with or without phase inversion. + +Here ($$Z_{0}$$) represents the reference impedance of the system which is +typically `50` $$\Omega$$. Maximum power transfer over the boundary from the +generator to the load is only satisfied when: + +$$ Z_{L} = Z_{R}^* $$ + +This expression is known as the conjugate match rule for maximum power transfer. +In most practical `RF` systems $$Z_{L} \neq Z_{R}$$ and so a method of impedance +transformation is required to satisfy the conjugate match rule. + +Impedance transformation networks allow a $$Z_{L}$$ with both a real and imaginary +part to be transformed into another complex impedance using reactive components. +Reactive components do not dissipate real power unlike resistors which is why +resistors are rarely used in `RF` impedance matching and filter networks. + +The most basic type of impedance transformation network is known as the `LC` +network which consists of an inductor and capacitor. It can be used to transform +a complex load ($$R_{L}$$) to `50` $$\Omega$$. The figure below shows the basic +configuration of the `LC` impedance transformation network. + + + +In addition to the `LC` network's impedance transformation property, the `LC` +network can have a low-pass or high-pass response depending on the shunt element +($$jB$$). If it is capacitive then the `LC` network will have a low-pass response or +a high-pass response if it is inductive. + +Typically the low pass configuration (with a shunt capacitor) is desired. + +It should be noted that the shunt element ($$jB$$) should be placed on the side +with the largest impedance. Therefore there are two possible configurations of +this network: + + + +So that type `1` is used when $$R_{L} > R_{S}$$ and type `2` is used when $$R_{L} < +R_{S}$$. Here $$Z_{S}$$ represents $$Z_{0}$$ the system reference impedance $$(50 +\Omega)$$. + +The design procedure for a type `1` `LC` network is as follows. First we evaluate +the input impedance ($$Z_{in}$$) looking into the matching network. + + + +We start by defining a few variables. Let: + +$$ Z_{L} = R_{L} + jX_{L} $$ + +$$ Z_{S} = Z_{0} = 50 \Omega $$ + +$$ Z_{in} = jX + \frac{1}{(jB)^{-1} + (Z_{L})^{-1}} $$ + +Matching conditions are met when $$Z_{in} = Z_{S} = Z_{0}$$. + +The equation above can be simplified and separated into real and imaginary +parts: + +$$ B(X + X_{L}) = -(Z_{0}R_{L}-XX_{L}) $$ + +$$ B(R_{L} - Z_{0}) = Z_{0}X_{L} - R_{L}X $$ + +`B` can then be solved using the following equation: + +$$ B = -\frac{R_{L}^2+X_{L}^2}{X_{L} + \sqrt{\frac{R_{L}}{Z_{0}}} \sqrt{R_{L}^2 + X_{L}^2 - Z_{0}R_{L}}} $$ + +Once `B` is obtained, X may be found by rearranging for `X`: + +$$ X = \frac{B(R_{L} - Z_{0}) - Z_{0}X_{L}}{-R_{L}} $$ + +The figure below depicts the second variant of the `LC` matching network. + + + +Let: + +$$ Z_{L} = R_{L} + jX_{L} $$ + +$$ Z_{S} = Z_{0} = 50 \Omega $$ + +$$ Z_{in} = \frac{1}{(jB)^{-1} + (Z_{L} + jX)^{-1}} $$ + +Matching conditions are met when $$Z_{in} = Z_{S} = Z_{0}$$. + +The equation above can be simplified and separated into real and imaginary +parts: + +$$ B(X_{L} + X) = -Z_{0}R_{L} $$ + +$$ B(Z_{0} - R_{L}) = -Z_{0}(X_{L}+X) $$ + +Again, B can then be solved using the following equation: + +$$ B = -Z_{0}\sqrt{\frac{R_{L}}{Z_{0}- R_{L}}} $$ + +Again, `X` may be found by rearranging for $$(X_{L} + X)$$ and substituting to +yield: + +$$ X = \sqrt{R_{L}(Z_{0} - R_{L})} - X_{L} $$ + +For both `LC` circuit types it is assumed that $$jB$$ represents a capacitive +element and $$jX$$ represents an inductive element so that a low-pass response is +obtained. + +Matching networks with more than two reactive elements also exist. These include +the $$\pi$$-section and `T-section` matching networks that contain `3` reactive +elements each. The figure below depicts the $$\pi$$-section network. + + + +The $$\pi$$-section network is made up of a type `2` `LC` matching network in cascade +with a type 1 `LC` matching network. The central element $$jX$$ is the sum of the +inductive reactance ($$jX$$) of each `LC` matching section, which is combined into a +single reactance. + +A virtual load resistance $$R_{x}$$ is interposed in-between the two `LC` matching +networks. The goal of the first `LC` matching network is to then match the source +$$Z_{S}$$ to $$R_{x}$$ while the second `LC` matching network matches $$R_{x}$$ to +$$Z_{L}$$. + +The value of $$R_{x}$$ is chosen according to the desired `Q-factor` and must be +smaller than $$R_{L}$$ and $$R_{S}$$. The `Q-factor` of a matching network describes +how well power is transferred from source to load as you deviate from the +designed center frequency of the matching network ($$f_{0}$$). + +Is it defined as follows: + +$$ Q = \frac{f_{0}}{B} $$ + +Here $$f_{0}$$ represents the center frequency of the matching network and B +represents the total bandwidth over which no greater than `3dB` of power is lost +from source to load. + +A high `Q-factor` is often desirable for a narrowband matching network. It should +be mentioned that the `Q-factor` cannot be controlled with a simple 2 element `LC` +matching network (which id determined by $$R_{L}$$ and $$R_{S}$$). This is one of +the advantages of the $$\pi$$-section matching network. + +The `Q-factor` for the $$\pi$$-section filter is given by: + +$$ Q_{\pi} = \sqrt{\frac{\max(R_{S}, R_{L})}{R_{x}} - 1} $$ + +Here `max()` is a function that returns the maximum of two values. Hence the +`Q-factor` of a $$\pi$$-section matching network will be determined by the `LC` +matching section with the highest `Q-factor`. + +# Electromagnetic Transducers + +Once the necessary `RF` power has been generated and transmitted over various +transmission lines and matching networks it then becomes necessary to convert +this `RF` power into electromagnetic waves which can start propagating through +free space to reach a receiver. For this purpose we use an electromagnetic +transducer known as an antenna. + +An antenna can be also be thought of as an impedance transformation network that +matches an `RF` circuit (`50` $$\Omega$$ typical) to free space (`377` $$\Omega$$). + +An antenna can be as simple as a piece of wire connected to an `RF` output port +but the efficiency of the antenna in this case would be very low. The two most +common types of antenna's are the `monopole` and `dipole` antenna (shown below). + + + +The `monopole` antenna consists of a `quarter-wave` length ($$\lambda/4$$) of wire of +length $$L$$ which is connected to the 'hot' `RF` input terminal and a infinitely +large `RF` ground plane conductor which is connected to electrical ground. The +'hot' and `RF` ground elements are electrically separated from each other. + +Most practical `HF` and `VHF` `monopole` antennas are shorter than a quarter +wavelength due to size constraints and are therefore modelled as electrically +small `monopoles` ($$L \leq (1/8)\lambda$$) from this point forward. + +An electrically small `monopole` antenna can be modelled as a series circuit with +the elements shown in the image below. + + + +$$L_{a}$$ represents the antenna conductor inductance. This is the parasitic +inductance of the active element of the antenna which is just a wire made out of +a material such as copper with a length and diameter. This value is usually very +small and can be neglected. + +The reactance of an electrically short `monopole` ($$L \leq (1/8)\lambda$$) is +represented by $$X_{a}$$ and can be calculated as follows: + +$$ X_{a} = 60(1 - \ln{(\frac{L}{a})})\cot{(2\pi\frac{L}{\lambda})}j \Omega $$ + +This equation originally appears in the "Antenna Engineering Handbook", Third +Edition by `Richard C. Johnson`. + +From the equation above it can be seen that the reactance of an electrically +short `monopole` is primarily capacitive. + +The radius of the wire (active element) in meters is given by $$a$$ and the +operational wavelength is given by $$\lambda$$. The operational wavelength can be +calculated by the transmit frequency ($$f_{t}$$) and the speed of light in a +vacuum ($$c$$): + +$$ \lambda = \frac{c}{f_{t}} $$ + +The loss resistance $$r_{a}$$ is the effective ac resistance of the antenna active +element due to the skin effect. The skin effect is a phenomenon whereby `RF` +current tends to flow near the surface of the conductor as the frequency, and as +a consequence, magnetic field strength at the center of the conductor increases. + +The skin depth is given by: + +$$ \delta = \frac{1}{\sqrt{\pi f_{t} \mu_{0} \sigma}} $$ + +$$\sigma$$ is the conductivity of the active element's composite material (copper) +while $$\mu_{0}$$ is the permeability constant. Once $$\sigma$$ is computed the real +ac resistance can be found as follows: + +$$ r_{a} = \frac{L}{2\pi a \delta \sigma} $$ + +At `VHF` the loss resistance $$r_{a}$$ is also very small and can typically be +neglected. + +The radiation resistance $$r_{R}$$ of an electrically small monopole ($$L \leq +(1/8)\lambda$$) is the effective real resistance that represents the power that +is radiated away from the antenna as electromagnetic waves. For an electrically +short `monopole`, radiation resistance is given by the equation below: + +$$ r_{R} = 40 \pi^2(\frac{L}{\lambda})^2 \Omega $$ + +$$r_{G}$$ represents the losses in the `RF` ground plane and this parameter is +usually neglected. A stable `RF` ground plane is essential for an antenna to +function correctly. An antenna ground plane serves as an `RF` return path for `AC` +displacement current. + +For a `monopole` the ground plane also serves the function of mirroring the active +radiation element to form the second half of the antenna. This derivation is +obtained using image theory and is beyond the scope of this journal entry. + +# S-Parameters + +Most passive `RF` circuits such as filters, matching networks, etc are linear. +That is their output voltage, current and power relationships are derived from a +system of linear equations. Some active devices such as `class-A` `small-signal` +amplifiers are also linear. + +Scattering parameters or `S-parameters` can be used to model the behaviour of +these `linear` `RF` networks when stimulated by a steady-state signal. + + + +The image above depicts a `DUT` 'black box' which we use to derive the scattering +parameters. The network has `2` distinct ports (`port 1`, `port 2`) and 4 scattering +parameters: $$S_{11}, S_{12}$$ and $$S_{21}, S_{22}$$. The scattering parameters are +complex values with both a real and imaginary part. + +We define each scattering parameter in terms of two voltage waves at the input +(`port 1`) and output (`port 2`). Let $$a_{1}, b_{1}$$ represents the incident and +reflected voltage waves at `port 1` while $$a_{2}, b_{2}$$ represent the incident +and reflected voltage waves at `port 2`. + +The S-parameters can then be defined as follows: + +$$ S_{11} = \frac{b_{1}}{a_{1}} = \frac{V_{1}^-}{V_{1}^+}, S_{12} = +\frac{b_{1}}{a_{2}} = \frac{V_{1}^-}{V_{2}^+} $$ + +$$ S_{21} = \frac{b_{2}}{a_{1}} = \frac{V_{2}^-}{V_{1}^+}, S_{22} = +\frac{b_{2}}{a_{2}} = \frac{V_{2}^-}{V_{2}^+} $$ + +It should be mentioned at this point that s-parameters can also be defined in +terms of 'power waves' by considering the complex input impedance of the `DUT`. +However the voltage wave definition is the most popular. + +A system of linear equations can then be derived for $$b_{1}$$ and $$b_{2}$$: + +$$ b_{1} = S_{11}a_{1} + S_{12}a_{2} $$ + +$$ b_{2} = S_{21}a_{1} + S_{22}a_{2} $$ + +Which can be expressed in terms of the `S-parameter` matrix: + +$$ \begin{bmatrix} b_{1} \\ +b_{2} \end{bmatrix} \begin{bmatrix} S_{11} S_{12} \\ +S_{21} S_{22} \end{bmatrix} = \begin{pmatrix} a_{1} \\ +a_{2} \end{pmatrix} $$ + +There are various useful definitions that can be derived from the scattering +parameters. $$S_{21}$$ and $$S_{11}$$ are the most commonly used scattering +paramters. + +The first useful definition is the logarithmic power gain of the network in `dB` +($$G_{p}$$) which can be expressed in terms of $$|S_{21}|$$: + +$$ G_{p} = 20\log_{10}|S_{21}| $$ + +$$S_{11}$$ is related to the input reflection coefficient $$\Gamma_{in}$$ and can be +used to obtain the input impedance of the network $$Z_{in}$$: + +$$ S_{11} = \Gamma_{in} $$ + +$$ Z_{in} = Z_{0} \frac{1+S_{11}}{1-S_{11}} $$ + +The ratio of reflected ($$P_{ref}$$) and incident power ($$P_{inc}$$) at `port 1` is +given by: + +$$ \frac{P_{ref}}{P_{inc}} = |\Gamma_{in}|^2 $$ + +Another useful definition is the input (`port 1`) return loss $$RL_{in}$$: + +$$ RL_{in} = -20\log_{10}(|S_{11}|) $$ + +From the above formula it can be seen that return loss is typically a positive +number (since $$|S11| < 0$$), however sometimes it is quoted as a negative number +in which case the data is referring to the `log` magnitude of $$S_{11}$$ directly +($$-RL_{in}$$), not the actual return loss which is positive as mentioned. + +Return loss for ($$S_{11}$$) is a measure of how well matched `port 1` of the +network is to the reference impedance. A return loss greater than `10dB` is +usually desirable for a good match. + +Another measure of how well matched a network is to the reference impedance is +called `VSWR` (voltage standing wave ratio). The `VSWR` of `port 1` ($$s_{in}$$) is +defined by: + +$$ s_{in} = \frac{1+|S_{11}|}{1-|S_{11}|} $$ + +`VSWR` is typically used to measure the matching conditions of antennas. A +($$s_{in} < 2$$) is generally considered suitable for most antenna applications. + +A `VNA` (Vector Network Analyzer) is an instrument that is used to measure +`S-parameters`. Most affordable commercial `VNA's` are `2-port` `1-path` devices, i.e +they only measure $$S_{11}$$ and $$S_{21}$$ and the `DUT` must therefore be reversed +to obtain $$S_{22}$$ and $$S_{12}$$. + +The input and output impedance obtained for $$S_{11}$$ and $$S_{22}$$ can also be +represented in graphical form as a smith chart. A `smith chart` is a real and +imaginary chart where the imaginary (`y-axis`) axis has been bent around the +`x-axis`. It can be used to plot any value of complex impedance. An example of the +`smith chart` is shown in the figure below. + +The top half of the smith chart represent inductive reactance while the bottom +half represents capacitive reactance. The circles passing through the `x-axis` are +known as constant resistance circles where the left most point on the real axis +of the smith chart represents a short circuit (`SC`) while the rightmost point +represents an open circuit (`OC`). + +For an ideal `50` $$\Omega$$ match there should be a single point in the middle of +the smith chart. + +{:height="300px"} + +# Harmonic Balance + +`RF` power amplifier's are `large-signal` non-linear devices. + +At this point a distinction is required between small signal and large signal +amplifiers. For `small-signal` amplifiers the input and output power is typically +small and these devices typically operate in their `linear` region. + +`Large-signal`, `class AB`, `B` and `C` devices typically operate with large input and +output power and their response is strongly `non-linear` due to the class of +operation. Therefore for power amplifier classes other than `class A`, +`S-parameters` cannot be used reliably in the design of `RF` power amplifiers. + +Rather a designer relies on vendor supplied `non-linear` software models of the +power amplifier transistor and typically uses a `non-linear` frequency domain +analysis technique such as the harmonic balance method (`HBM`) to characterise the +power amplifier's performance. + +`Harmonic balance` is a frequency domain technique used to calculate the steady +state response of a non-linear circuit. `HBM` can be defined in multiple ways, +however in this case let us demonstrate `HBM` through an example circuit. + +Given a circuit with $$N$$ nodes, let vector $$v$$ represent the respective node +voltages. For ease of representation we model a circuit with capacitors and +voltage controlled resistors. Then applying `KCL` (Kirchoff's Current Law) to the +circuit yields the following systems of equations: + +$$ f(v,t) = i(v(t)) + \frac{d}{dt}q(v(t)) + \int_{-\infty}^t{y(t-\tau)v(\tau)} + +i_{s}(t) = 0 $$ + +We let $$q$$ and $$i$$ represent the sum of the charges and currents entering the +nodes due to the non-linearities. $$y$$ is the impulse response matrix of the +circuit with all non-linearities removed while $$i_{s}$$ represents the external +source currents. + +We then convert equation above into the frequency domain: + +$$ F(V) = I(V) + ZQ(V) + YV + I_{s} = 0 $$ + +Here $$Z$$ represents a matrix with frequency coefficients representing the +differentiation step. The convolution integral in the equation above maps to `YV` +as shown where `Y` is the admittance matrix for the `linear` portion of the circuit. + +`V` then contains the fourier coefficients of the voltage at each $$N$$ nodes at +every harmonic. This process is merely nothing more than `KCL` in the frequency +domain for `non-linear` circuits. + +# Lumped and Distributed Element Networks + +As mentioned above, as the size of a circuit element starts to approach a +fraction (typically `1/10`) of the wavelength of the highest `RF` signal frequency, +the lumped element approximation for the circuit element no longer holds. In +these cases lumped or discrete components cannot be used and so a distributed +element/model must be utilized. + +Traditionally the use of lumped element components at `RF` frequencies is most +common below around `500 MHz`. Above `500 MHz` these lumped circuit +elements become more difficult to design with. + +# Classes Of Operation + +Depending on the `DC` operating point of the power transistor, different values of +efficiency and output power can be obtained for the same input power. Efficiency +is generally considered the most important design parameter for an `RF` power +amplifier. + +In the `class-A` configuration the transistor is biased such that the quiescent +drain current is equal to the peak amplitude of the current expected through the +load. This allows for a symmetrical voltage and current swing at the output and +the transistor conducts for the full `360` degrees of the input waveform. + +The advantages of this configuration are excellent linearity and gain at the +expense of reduced efficiency. A `class-A` power amplifier with an inductively +loaded drain has a maximum theoretical efficiency of `50%`. + +`Class-B` power amplifiers aim to achieve greater efficiency by only conducting +for half of the input drive cycle (`180` degrees). `Class-B` power amplifiers have a +maximum theoretical efficiency of `75%` and the transsitor is biased at cutoff. +Again `class-B` `PA's` can be placed in a single-ended configuration or push-pull. +The advantage of `class-B` are increased efficiency at the expense of decreased +linearity. + +A compromise is thus needed between `class A` and `class B` such that we have +sufficient linearity at a reasonable efficiency. A `class-AB` power amplifier is a +solution to this problem. + +In the `class-AB` mode of operation the transistor is biased with a quiescent +drain current slightly to moderately above cutoff, depending on the linearity +requirements. This improves the linearity of the power amplifier while typically +maintaining an efficiency of `50%` to `70%` in practice. `Class-AB` power +amplifier's have a conduction angle between `180` and `360` degrees. + +The `single-ended`, `class-AB` mode of operation is therefore a popular choice +amongst designers. + +# Stability Analysis + +`RF` power +amplifier's are inherently unstable devices which often require some form of +stabilization to operate correctly. + +Amplifier instability is usually caused by some type of gain and feedback +mechanism. In `MOSFET` transistor's the feedback mechanism is typically due to the +gate to drain capacitance that couples a portion of the output back into the +input and vice versa. This effect often manifests as unwanted oscillation at +either the input and output of the PA. + +Small signal parameters (such as `S-parameters`) are typically used to +characterize stability of `RF` power amplifier's even though these are `non-linear` +devices. This is because small signal models can still provide useful insights +into the design of a `PA` without requiring complex non-linear calculations. +Typically the operation of a `PA` is linearized around an operating point. + +There are various stability factors available to a designer and an amplifier may +be conditionally stable or unconditionally stable. If an amplifier is +conditionally stable then there exists a load and source impedance that causes +the amplifier to oscillate. Therefore unconditional stability is usually +desired. + +The most common stability factor in use is the `Rollet` stability factor ($$K$$) +which is defined as follows: + +$$ K = \frac{1 - |S_{11}|^2 - |S_{22}|^2 + | \Delta |^2}{2|S_{12}S_{21}|} $$ + +Here $$\|\Delta\|$$ is defined as the scattering-matrix determinant: + +$$ | \Delta | = |S_{11}S_{22} - S_{21}S_{12}| $$ + +We define three more stability Criterion in terms of $$\Delta, B_{1}$$ and +$$B_{2}$$: + +$$ B_{1} = 1 + |S_{11}|^2 - |S_{22}|^2 - |\Delta|^2 $$ + +$$ B_{2} = 1 + |S_{22}|^2 - |S_{11}|^2 - |\Delta|^2 $$ + +In order for an amplifier to be conditionally stable we require the +following conditions: + +$$ K \ge 1 $$ + +$$ \Delta < 1 $$ + +$$ B_{1} > 0, B_{2} > 0 $$ + +# Efficiency +The most common measure of `RF` power amplifier efficiency is $$PAE$$ +(power added efficiency) and is defined as follows: + +$$ PAE = \frac{P_{out} - P_{in}}{P_{DC}} $$ + +Here $$P_{in}$$ represents the `RF` input power from the source, $$P_{out}$$ +represents `RF` power delivered to the load and $$P_{DC}$$ represents the total `DC` +power. + +# P1dB Compression Point + +The `P1dB` point is defined as the output power level at which +the gain of an amplifier decreases by `1 dB` from its nominal value which +indicates the onset of gain non-linearity. + +Most amplifier's start to compress approximately `5` to `10 dB` below their `P1dB` +point. + +The `P1dB` point indicates that power amplifier's have a linear and non-linear +region of power gain. + +# Load Pull + +`Load-pull` is an empirical `RF` `PA` design technique in which the +reflection coefficient or impedance presented to the drain of a `RF` power +transistor is varied by an electrical or mechanical impedance tuner to any +arbitrary value. The technique is traditionally used to determine the optimum +load impedance to present to an `RF` power amplifier for maximum output power. + +Once the optimum load impedance is determined the synthesis of matching networks +can then take place. `Load-pull` tuners are expensive devices and are therefore +typically out of the reach of most students or experimenters. + +In the case where a physical `load-pull` tuner is not available and a `non-linear` +model for the chosen `RF` power transistor exists, then a simulated `load-pull` can +be performed. + +# LC Filtering + +As `non-linear` devices, power amplifiers typically produce +harmonic frequency content that must be filtered out in order to comply with +regulatory standards on spurious emissions. `LC` networks can be constructed with +varying number of elements (poles) in order to achieve a specific roll-off in +the stop-band. + +Two types of filter response are commonly used in `RF` circuit design, these are +the `Chebyshev` and `Butterworth` responses. `Chebyshev` `LC` filters typically have a +steeper roll-off but suffer from passband ripple. + +`Butterworth` `LC` filters typically have a flat response in the passband with a +gradual roll-off in the stop band. These types of filters can be designed +manually using filter constants or using `RF` design software such as `Matlab` (`RF +Toolbox`) and typically an optimization algorithm. We typically optomize for +either stopband attenuation or input impedance. + +That concludes this journal entry! I have just touched on some basic principles that +might help with understanding more advanced concepts. + +# Signature + +``` ++---------------------------------------+ +| .-. .-. .-. | +| / \ / \ / \ | +| / \ / \ / \ / | +| \ / \ / \ / | +| "_" "_" "_" | +| | +| _ _ _ _ _ _ ___ ___ _ _ | +| | | | | | | \| | /_\ | _ \ / __| || | | +| | |_| |_| | .` |/ _ \| /_\__ \ __ | | +| |____\___/|_|\_/_/ \_\_|_(_)___/_||_| | +| | +| | +| Lunar RF Labs | +| https://lunar.sh | +| | +| Research Laboratories | +| Copyright (C) 2022-2025 | +| | ++---------------------------------------+ +``` |