That would warrant an entire blog post to describe all the pitfalls [1], but I'll condense it down to the highlights.
On MIPS, loading a pointer is classically done in two instructions, a LUI and an ADDIU, which forms a HI16/LO16 relocation pair I need to identify precisely in order to delink code. I'm using Ghidra's references in my analyzers, but these are attached to only one instruction operand, typically a register or an immediate constant.
So my MIPS analyzer has to traverse the graph of register dependencies for a reference within an instruction and find which two instructions are the relocation pair. It's trickier than it sounds because references can have an addend that's baked in the immediate constants (so we can't just search for the pattern of the address bits inside the instructions) and complex memory access patterns inside large functions can create large graphs (ADDU in particular generates two branches to analyze, one per source register). It's bad enough that I have one method inside my analyzer in particular that is recursive and takes six arguments, four of which rotate right one step at each iteration.
But that graph traversal can't be done in reverse program order, because there are instruction patterns that can terminate the graph traversal too early with the right mix of branches, instruction sequencing and register reuse. I've had to integrate code flow analysis to figure out which parent code block has to be actually considered during the register graph traversal.
But the most evil horror is the branch delay slot. One particular peephole optimizer consists of vacuuming up a branch target instruction inside a branch delay slot and shift the branch target one instruction forward, which effectively shortens the execution flow by one instruction. It also duplicates the instruction, which is catastrophic if it had a HI16 relocation because now we have LO16 relocations with multiple HI16 parents, which can't be represented by object file formats. I have to detect and undo that optimization on the fly by shifting the branch targets one instruction back, which I accomplish by adjusting the relocation addends for the branches.
I've only written relocation analyzers for x86 and MIPS so far. I don't know what other horrors are lurking inside other architectures, but I expect that all RISC architectures with split relocations will require some form of that register graph traversal and code flow analysis [2]. What I do know is that my MIPS relocation analyzer [3] is probably the most algorithmically complex piece of code I've ever written so far, one that I've rewritten a half-dozen times over two years due to all the edge cases that kept popping up. I also had to create an extensive regression test suite to keep the analyzer from breaking down in subtle ways every time I need to modify it. I expect that there are still edge cases to fix in there that I haven't encountered yet.
Wow I only really know amd64, arm64, and i8086. What is it about MIPS that makes it so evil?