Hacker News new | past | comments | ask | show | jobs | submit login

I am a beginner to programming language implementation but here's what I learned.

The first two chapters of the LLVM tutorial are good enough to learn the idea of how to implement lexing and parsing:

https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/index...

I wasn't targeting LLVM so I didn't do much except skim the LLVM stuff.

This document is good at explaining Pratt parsing which is useful to understand how to parse expressions with precedence.

https://abarker.github.io/typped/pratt_parsing_intro.html

At university I wrote a poor motorola 68000 emulator so I had a motorola CPU textbook, so that's how I learned how encoding of instructions worked. I wrote a Java SWING GUI tool to create the opmasks for each instruction component. It was a strange way of doing it but it worked.

For AMD64 x86_64 I wrote some GNU Assembler and then reviewed the generated machine code with the following commands.

Put C you want the assembly for in example.c:

  gcc -o example example.c
Put your GNU Assembler in example.S

  gcc -o example example.S
Then run and review example-text to see the machine code and opcodes beside the assembly.

  objdump -dj .text example > example-text       
I used this page to work out the Mod/RM format for opcodes.

https://www.cs.uaf.edu/2016/fall/cs301/lecture/09_28_machine...

I have a barebones toy compiler here that compiles a simple mathematical expression to assembly:

I do simple live range analysis and register allocation. I use A Normal Form as my representation.

https://replit.com/@Chronological/Compiler3

I have the beginnings of a toy JIT compiler here which generates some machine code for MOVs and ADD but I haven't implemented much else... I need to implement C FFI.

http://github.com/samsquire/compiler

For code how I generate an "add" instruction:

https://github.com/samsquire/compiler/blob/main/jitcompiler....

I use a case statement to match on the registers opcodes for source/destination and literally insert opcode bytes into a malloced array.

You essentially end up with a nested switch statement.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: