Dynamic Linking and Memory Relocations in Rust
When you compile source code into object files (such as .o files), the compiler generates machine code along with metadata that indicates how different parts of the code should be adjusted when the program is loaded into memory. These adjustments are known as relocations. They ensure that references to functions and variables point to the correct memory addresses, even if the final placement of the code in memory isn't known at compile time.
A relocation typically specifies:
In this guide, we’ll focus on parsing ELF object files, extracting relocation entries, resolving symbol addresses across multiple libraries, and applying these relocations to a simulated memory space.
Setting Up the Environment
Before diving into the code, ensure you have Rust installed. You’ll also need the goblin, anyhow, and plain crates, which facilitate parsing ELF files, error handling, and byte-level data manipulation, respectively.
Cargo.toml
Begin by setting up your Cargo.toml with the necessary dependencies:
[package]
name = "toy_linker_demo"
version = "0.1.0"
edition = "2021"
[dependencies]
goblin = "0.7"
anyhow = "1.0"
plain = "0.3"
Writing the Linker in Rust
We’ll construct a Rust program that simulates a simple linker. This linker will:
Structuring the Global Symbol Table
To manage symbols across multiple libraries, we introduce a GlobalSymbolTable. This structure maintains a mapping of exported symbols to their memory addresses and keeps track of loaded memory sections.
struct ExportedSymbol {
file_name: String,
address: usize, // Memory address where the symbol resides
}
struct GlobalSymbolTable {
exports: std::collections::HashMap<String, ExportedSymbol>,
mem_map: std::collections::HashMap<String, Vec<u8>>,
}
impl GlobalSymbolTable {
fn new() -> Self {
Self {
exports: std::collections::HashMap::new(),
mem_map: std::collections::HashMap::new(),
}
}
}
Loading and Relocating Object Files
The core functionality resides in the load_and_relocate_object function. This function performs several critical tasks:
Here’s how the function is implemented:
fn load_and_relocate_object(
file_name: &str,
load_base: usize,
global_syms: &mut GlobalSymbolTable,
) -> Result<()> {
println!("Loading file: {} at base 0x{:x}", file_name, load_base);
// 1) Read the object file
let bytes = fs::read(file_name)?;
// 2) Parse the ELF
let obj = match Object::parse(&bytes)? {
Object::Elf(elf) => elf,
_ => {
println!("Not an ELF file: {}", file_name);
return Ok(());
}
};
// Create a memory buffer (64 KB for demonstration)
let mut memory = vec![0u8; 65536];
// 3) Copy .text, .data, .rodata, etc. into 'memory'
for sh in &obj.section_headers {
if sh.sh_size == 0 {
continue;
}
if let Some(name) = obj.shdr_strtab.get_at(sh.sh_name) {
if name == ".text" || name == ".data" || name == ".rodata" {
let section_start = load_base + (sh.sh_addr as usize);
let section_end = section_start + (sh.sh_size as usize);
let file_offset = sh.sh_offset as usize;
let file_end = file_offset + (sh.sh_size as usize);
memory[section_start..section_end]
.copy_from_slice(&bytes[file_offset..file_end]);
println!("Copied section {}: 0x{:x}..0x{:x}",
name, section_start, section_end);
}
}
}
// 4) Parse the symbol table and note which are exported vs. undefined
let mut symbols: Vec<(String, Sym)> = Vec::new();
let syms = &obj.syms; // Direct Symtab reference
for sym in syms.iter() {
if sym.st_name == 0 {
continue;
}
if let Some(name) = obj.strtab.get_at(sym.st_name) {
symbols.push((name.to_string(), sym));
}
}
// 4b) For each symbol, if st_shndx != 0 => export
for (sym_name, sym) in &symbols {
if sym.st_shndx != 0 {
let sym_addr = load_base + sym.st_value as usize;
println!("Symbol '{}' exported at 0x{:x} by {}",
sym_name, sym_addr, file_name);
global_syms.exports.insert(sym_name.clone(), ExportedSymbol {
file_name: file_name.to_string(),
address: sym_addr,
});
} else {
// It's an undefined symbol => we'll patch references
println!("Symbol '{}' is UNDEF in {}", sym_name, file_name);
}
}
// 5) Apply relocations: .rel.* (Rel) or .rela.* (Rela)
apply_rel_or_rela(&obj, &bytes, false, load_base, &mut memory, &symbols, global_syms)?;
apply_rel_or_rela(&obj, &bytes, true, load_base, &mut memory, &symbols, global_syms)?;
// 6) Store the final memory buffer
global_syms.mem_map.insert(file_name.to_string(), memory);
Ok(())
}
Handling Relocations
Relocations are entries that specify where and how to adjust addresses in the loaded sections. The apply_rel_or_rela function processes both .rel.* and .rela.* relocation sections. It utilizes the plain crate to parse raw bytes into Rel or Rela structures.
fn apply_rel_or_rela(
obj: &goblin::elf::Elf,
file_bytes: &[u8],
is_rela: bool,
load_base: usize,
memory: &mut [u8],
symbols: &[(String, goblin::elf::Sym)],
global_syms: &mut GlobalSymbolTable,
) -> Result<()> {
for sh in &obj.section_headers {
if let Some(name) = obj.shdr_strtab.get_at(sh.sh_name) {
if (is_rela && name.starts_with(".rela")) || (!is_rela && name.starts_with(".rel")) {
println!("Processing relocation section: {}", name);
let entry_size = if is_rela {
std::mem::size_of::<Rela>()
} else {
std::mem::size_of::<Rel>()
};
let count = sh.sh_size as usize / entry_size;
let mut offset = sh.sh_offset as usize;
for _ in 0..count {
if is_rela {
let rela: Rela = from_bytes::<Rela>(&file_bytes[offset..offset + entry_size])
.map_err(|e| anyhow!("Failed to parse Rela: {:?}", e))?;
offset += entry_size;
let sym_index = rela.r_info >> 32;
let r_type = (rela.r_info & 0xffffffff) as u32;
let reloc_offset = rela.r_offset as usize;
let addend = rela.r_addend;
apply_one_reloc(
reloc_offset,
sym_index as usize,
r_type,
addend,
load_base,
memory,
symbols,
global_syms
)?;
} else {
let rel: Rel = from_bytes::<Rel>(&file_bytes[offset..offset + entry_size])
.map_err(|e| anyhow!("Failed to parse Rel: {:?}", e))?;
offset += entry_size;
let sym_index = rel.r_info >> 32;
let r_type = (rel.r_info & 0xffffffff) as u32;
let reloc_offset = rel.r_offset as usize;
// .rel typically has implicit addend = 0
apply_one_reloc(
reloc_offset,
sym_index as usize,
r_type,
0,
load_base,
memory,
symbols,
global_syms
)?;
}
}
}
}
}
Ok(())
}
This function iterates over all section headers, identifying relocation sections based on their names (.rel.* or .rela.*). For each relocation entry, it parses the raw bytes into a Rel or Rela structure and then delegates the patching process to apply_one_reloc.
Patching Memory with Relocations
The apply_one_reloc function performs the actual memory patching. It calculates the final address for a symbol and updates the memory buffer accordingly.
fn apply_one_reloc(
reloc_offset: usize,
sym_index: usize,
r_type: u32,
addend: i64,
load_base: usize,
memory: &mut [u8],
symbols: &[(String, goblin::elf::Sym)],
global_syms: &mut GlobalSymbolTable,
) -> Result<()> {
let patch_addr = load_base + reloc_offset;
println!("Applying reloc @ 0x{:x}, sym_idx {}, type {}, addend={}",
patch_addr, sym_index, r_type, addend);
// 1) Find symbol name from sym_index
let (sym_name, sym) = match symbols.get(sym_index) {
Some(pair) => pair,
None => {
eprintln!("No symbol for index {}", sym_index);
return Ok(()); // Gracefully skip unresolved symbols
}
};
// 2) Resolve the symbol address
let final_addr: u64 = if sym.st_shndx == 0 {
// Imported symbol; look it up in the global symbol table
if let Some(export) = global_syms.exports.get(sym_name) {
export.address as u64
} else {
eprintln!("Symbol '{}' not found in global exports!", sym_name);
0
}
} else {
// Local symbol; compute its address based on load_base
(load_base + sym.st_value as usize) as u64
};
// Incorporate the addend into the relocation value
let reloc_value = final_addr.wrapping_add(addend as u64);
// 3) Patch the memory buffer with the computed address (little-endian)
let bytes = reloc_value.to_le_bytes();
for i in 0..8 {
memory[patch_addr + i] = bytes[i];
}
println!(" -> Patched 0x{:x} with 0x{:x} (symbol={})",
patch_addr, reloc_value, sym_name);
Ok(())
}
This function begins by calculating the absolute address where the relocation needs to be applied. It then retrieves the symbol’s name and determines whether the symbol is local or imported. For imported symbols, it looks up the address in the global symbol table. Finally, it updates the memory buffer at the specified offset with the resolved address, taking into account any addend.
The Main Function
The main function orchestrates the loading and linking process. It initializes the global symbol table, loads each object file, and displays the resolved symbols.
fn main() -> Result<()> {
let mut global_symbols = GlobalSymbolTable::new();
// Load 'b.o' first, then 'a.o'
load_and_relocate_object("b.o", 0x20000, &mut global_symbols)?;
load_and_relocate_object("a.o", 0x30000, &mut global_symbols)?;
println!("\nDone loading both libraries!\n");
println!("Global symbols known are:");
for (name, sym) in &global_symbols.exports {
println!(" - {} => address 0x{:x} (in file {})",
name, sym.address, sym.file_name);
}
Ok(())
}
This function sequentially loads each object file, allowing symbols exported by earlier files to be resolved by later ones. After loading, it prints out the symbols that have been successfully linked.
Compiling and Running the Linker
Before running the linker, ensure that your object files (a.o and b.o) are in ELF format. On macOS, the default object file format is Mach-O, which is incompatible with ELF parsers like goblin. To generate ELF object files on macOS, you need to cross-compile them targeting Linux.
Compiling C Source Files to ELF Object Files
Assuming you have two C source files, a.c and b.c, where a.c references a function defined in b.c:
b.c
// b.c
int my_add(int x, int y) {
return x + y;
}
a.c
// a.c
extern int my_add(int x, int y);
int foo(int val) {
return my_add(val, 5);
}
Compile these files to ELF object files using clang with the appropriate target:
clang -c -target x86_64-linux-gnu b.c -o b.o
clang -c -target x86_64-linux-gnu a.c -o a.o
Ensure you have the necessary cross-compilation tools installed. On macOS, tools like brew install llvm can provide the required clang with cross-compilation capabilities.
Addressing Common Issues
Handling Non-ELF Object Files on macOS
If you attempt to run the linker on Mach-O object files, goblin will fail to recognize them as ELF files, resulting in messages like:
Recommended by LinkedIn
Not an ELF file: b.o
Not an ELF file: a.o
To avoid this, ensure you’re using ELF-formatted object files by cross-compiling as shown above.
Resolving Slice Out-of-Bounds Errors
During the relocation process, you might encounter errors indicating that a slice index is out of bounds. For example:
thread 'main' panicked at src/main.rs:91:23:
range end index 131090 out of range for slice of length 65536
This occurs because the computed section_start exceeds the size of the memory buffer. To address this:
let mut memory = vec![0u8; 2 * 1024 * 1024]; // 2 MB buffer
2. Adjust Section Placement: Instead of using sh.sh_addr directly, manage section placement within the buffer to ensure they fit.
let mut place_offset = 0;
for sh in &obj.section_headers {
if sh.sh_size == 0 {
continue;
}
if let Some(name) = obj.shdr_strtab.get_at(sh.sh_name) {
if name == ".text" || name == ".data" || name == ".rodata" {
let section_start = place_offset;
let section_end = section_start + (sh.sh_size as usize);
if section_end > memory.len() {
panic!("Out of space in memory buffer!");
}
let file_offset = sh.sh_offset as usize;
let file_end = file_offset + (sh.sh_size as usize);
memory[section_start..section_end]
.copy_from_slice(&bytes[file_offset..file_end]);
println!("Copied section {} into memory offset {:#x}..{:#x}", name, section_start, section_end); place_offset = section_end; // Advance for the next section } } }
This adjustment ensures that sections are placed sequentially within the allocated memory, preventing out-of-bounds errors.
Ignoring Unrelated Relocations
Object files may contain relocation entries for sections like .rela.eh_frame, which pertain to debugging and unwinding information. These relocations reference symbols that your toy linker doesn't handle, resulting in messages like:
Applying reloc @ 0x20020, sym_idx 2, type 2, addend=0
No symbol for index 2
To mitigate cluttering your output with these messages:
if name == ".rela.eh_frame" || name == ".rela.debug_info" {
println!("Skipping relocations in {}", name);
continue;
}
clang -c -target x86_64-linux-gnu a.c -o a.o -fno-asynchronous-unwind-tables -fno-exceptions -g0
clang -c -target x86_64-linux-gnu b.c -o b.o -fno-asynchronous-unwind-tables -fno-exceptions -g0
This approach prevents the inclusion of relocation entries that your linker doesn’t process, resulting in cleaner output.
Running the Linker
With the code properly set up and object files in ELF format, running the linker should process the sections and apply relocations without errors. An example output might look like:
Loading file: b.o at base 0x20000
Copied section .text: 0x20000..0x20012
Copied section .data: 0x20100..0x201XX
Symbol 'my_add' exported at 0x20000 by b.o
Processing relocation section: .rela.text
Applying reloc @ 0x30014, sym_idx 4, type 4, addend=0
No symbol for index 4
Processing relocation section: .rela.eh_frame
Applying reloc @ 0x30020, sym_idx 2, type 2, addend=0
-> Patched 0x30020 with 0x20000 (symbol=my_add)
Done loading both libraries!
Global symbols known are:
- my_add => address 0x20000 (in file b.o)
- foo => address 0x30000 (in file a.o)
In this output:
Enhancing the Toy Linker
While this example provides a foundational understanding of memory relocations and symbol resolution, real-world linkers handle a multitude of complexities beyond this scope:
To expand this toy linker, consider implementing additional features such as:
By manually parsing ELF object files and applying relocations in Rust, we’ve built a simplified linker that demonstrates the core principles of symbol resolution and memory patching.
You can find the complete program on my Github repo here.
🚀 Discover More Free Software Engineering Content! 🌟
If you enjoyed this post, be sure to explore my new software engineering blog, packed with 200+ in-depth articles, 🎥 explainer videos, 🎙️ a weekly software engineering podcast, 📚 books, 💻 hands-on tutorials with GitHub code, including:
🌟 Developing a Fully Functional API Gateway in Rust — Discover how to set up a robust and scalable gateway that stands as the frontline for your microservices.
🌟 Implementing a Network Traffic Analyzer — Ever wondered about the data packets zooming through your network? Unravel their mysteries with this deep dive into network analysis.
🌟Implementing a Blockchain in Rust — a step-by-step breakdown of implementing a basic blockchain in Rust, from the initial setup of the block structure, including unique identifiers and cryptographic hashes, to block creation, mining, and validation, laying the groundwork.
And much more!
✅ 200+ In-depth software engineering articles 🎥 Explainer Videos — Explore Videos 🎙️ A brand-new weekly Podcast on all things software engineering — Listen to the Podcast 📚 Access to my books — Check out the Books 💻 Hands-on Tutorials with GitHub code 🚀 Mentoship Program
👉 Visit, explore, and subscribe for free to stay updated on all the latest: Home Page
LinkedIn Newsletter: Stay ahead in the fast-evolving tech landscape with regular updates and insights on Rust, Software Development, and emerging technologies by subscribing to my newsletter on LinkedIn. Subscribe Here
🔗 Connect with Me:
Wanna talk? Leave a comment or drop me a message!
All the best,
Luis Soares luis@luissoares.dev
Lead Software Engineer | Blockchain & ZKP Protocol Engineer | 🦀 Rust | Web3 | Solidity | Golang | Cryptography | Author