Hacker News new | past | comments | ask | show | jobs | submit login
My experience crafting an interpreter with Rust (2021) (ceronman.com)
176 points by airstrike on March 25, 2023 | hide | past | favorite | 52 comments



Crafting interpreters is the best compiler/interpreter book I read. I had tried both the Dragon book and Andrew Appel's "Modern Compiler Implementation in Java" and Crafting Interpreters feels like a masterclass in how to write an accessible book. The author just has a special talent for explaining complex concepts simply.


I have a question about the book. I haven't been able to ask anyone else because I don't know anyone else who is working through the book.

Is the first half of the book boring or is it just me?

I wasn't really expecting the scanner/lexer to be interesting, but the interpreter part as well feels like an absolute slog. I started with a lot of enthusiasm but just couldn't keep going after implementing functions. I downloaded someone else's JsLox implementation and was planning on just skipping to the VM part of the book and hopefully would find it more engaging.

I also am a pretty jr developer. I have about 5 years experience since I started learning to code so I just may not "get" it yet. But I found writing the interpreter to be incredibly boring.


It's possible the subject itself is "boring", and I have found other compiler books to mostly be intolerable.

Crafting Interpreters, IMHO, is the most accessible book on the subject. If you didn't find the implementation of the first part exciting, I would suggest to probably ignore the book code after reading the text and implement it yourself, and later compare with the book implementation. I encountered several pleasent surprises using this approach.


Yeah I personally found the second half of the book much more interesting! My strategy was to read the first half of the book casually, but then actually implement the second half (in Rust, like the article here).


Compiler writing does involve a fair amount of tedium. I don't think it has anything to do with experience, either. It's why tools like Yacc, Lemon, or ANTLR exist.


May I suggest Rob Pike’s video on writing a lexer? It will make the task far more interesting

https://youtu.be/HxaD_trXwRE


Except where comparisons being made, the first and second halves are mostly independent. If the tree-walking interpreter might not be what you have hope for, you can, in principle, skip finishing it. The bytecode VM half is a re-implementation (in C instead of Java as the first half) and only the scanner/tokenizer has some inevitable redundancies in implementation. For me, who had less prior knowledge in interpreters, it is still good to have fully gone through both halves to relate to the comparisons and trade-offs of both implementations.


perhaps you had no motivation for writing an interpreter? almost all programming tasks that are of any interest require you to have some special problem to solve.

i really don't know how to describe this, but if you have no specific goal, you are not going to do, or learn how to do, anything. why do you deeply, deeply want to learn how to write an interpreter? what problem will writing an interpreter solve for you?


Nothing in particular other than knowledge that I can use across domains. I don't want to know how to write an interpreter per se, but I want to understand the steps from text to machine code.

I think it has already yielded benefits in my better understanding of environments and lexical scoping, grammars + LALR. It has just given me different conceptual tools that I can use to learn new concepts.

A concrete example was that I was playing around with a recursive prompt with ChatGPT the other day trying to figure out if I could get it to write a more specific version of a prompt I had given it and sort of bootstrap its way to a very specialized "train-of-thought". I was thinking about grammars and how recursive prompting is sort of like a meta-grammar. It's just a new way of thinking about things.


Could be worth a try. I didn't find the first half as boring as you did, but I did find the second part more fun.


One tip, copilot is absurdly good at giving you interpreter/compiler boilerplate. You can definitely use it to speed through the boring parts.


Well, the dragon book is not very good and only really applicable if you want to write a compiler for an imperative language in an imperative language.


Modern Compiler Implementation in Java is not really comparable to Crafting Interpreters, is it? I've read the former cover to cover (the ML version) and I've looked through Crafting Interpreters, and I don't feel there is much overlap between the two.


> I would often write huge refactorings thinking that they would be a solution to a borrow checker error, just to get another one at the end of the refactoring.

Yeah this can often be a problem because the borrow checker doesn't kick in until the refactoring is done and no other errors are left.


Proficiency in Rust requires learning what won't work, rather than trying various things until they compile. Borrow checking is a set of rules to design for, rather than an oracle to test against.

It does take time and effort to really internalize these rules. But eventually you should be able to have a mental model of how ownership and borrowing work; consider what owns the data, if it can be borrowed within statically known scopes, will it need to mix sharing and mutation, and then apply Rust-specific types and design patterns to match.

Conversely, there are things that you just can't do, like using temporary references for parent-child relationships. Knowing this will save you time from even trying.

Perhaps learning materials need to be more explicit about it. For example, I suspect lots of people are discovering the limitation of self-referential structs the hard way.


Count me in. I was doing the advent of code and tried to solve one problem with a tree that had parent references. Much thinking was done trying to please the borrow checker with no results, add some inner mutability and I got stucked.

I agree that it should be more prominent in some place. I got most of my info about that in StackOverflow and I tried using a hashmap to simulate the tree structure, but I don't think I finished that one.


This was my experience too. Although, it's less of an issue now. It was definitely frustrating, though.


This is really interesting!

I wrote a mini Lisp interpreter in Rust while following along with the book! I ended up using a generational arena to store objects too, but I used a "slotmap", which I wrote myself but is similar to the slotmap crate (see link below).

I am surprised and also sad that the arena had such a large performance impact, because it's a very nice pattern and lets you completely avoid unsafe code and raw pointer wrangling.

I do wonder whether changes could have been made to speed up the safe arena style object store! Ideally accessing objects would be as fast as a regular index into a vector which should be pretty fast.

Boxing the inner object in the GcHeader is going to make accessing objects from the Gc do some extra pointer chasing, I think you could avoid this and the dynamic dispatch in general using enums and static generics!

Anyways this was fun to read, thanks!

https://crates.io/crates/slotmap


A downside of Rust for applications like virtual machines is that it doesn't have a goto statement. Ideally, both a labelled and a computed goto would be available in the language.


Not only does it not have computed goto (almost no languages do, tbh), it also doesn't have guaranteed tall-call elimination or allow you to use the GHC calling convention. Which are more egregious, imo.


Why is the lack of "GHC calling convention" egregious? I've never heard this before as an argument against Rust.


I was looking for that (tail recursion) and it seems the story and feature request goes back years ago and still hasn't been added. Somewhere someone wrote that the issue is with LLVM but can't the tail call be transformed to an iteration before that?


The most practical uses of recursive functions are those that cannot be easily eliminated. For an interpreter, it could mean dozens of functions all calling each other in a pattern determined at runtime.

The issue used to be with LLVM, yes, but that appears to be solved (see Clang's 'musttail' attribute). Instead it's more about language design. Constructors, destructors, and all of the fancy language features Rust has must work with this new feature, with no changes being visible elsewhere.


Tail call optimization isn’t just for recursive functions. Any tail call can be optimized to little more than a jump, which makes implementing threaded code or similar much nicer. Your compiled code will just push the instruction pointer along an array of opcodes in a tight loop with no function call overhead.


Exactly, though I do not see how it is not an example of recursion.


> it also doesn't have guaranteed tall-call elimination

I had no idea about that. Is there a specific reason why that choice was made?


AFAIK guaranteed TCE wasn't supported in LLVM at all when Rust was still being designed and considered adding it. Support for musttail in LLVM has been added relatively recently.

WASM doesn't support guaranteed TCE yet (it's a proposal), and that's an important target for Rust.

https://github.com/rust-lang/rfcs/issues/2691


`become` is still a reserved keyword for this in Rust, so there's that.


I solved it in case of an interpreter with

  outer: loop {
    loop {
      let res = self.interpreter_loop(…);
      match res {
        Continue => {}
        Return => { break 'outer; }
        Exception => { break; }
      }
   }
I also made sure that the interpreter_loop call had an inline pragma, but that was otherwise a naive switch-case for the instructions. I didn’t find a way to make a template interpreter with safe Rust.


I think a bigger one from what I have looked is the complexity of (en^de)code bytes around.

I suspect if you could use UNION better and be the same as Enum in Rust you could do us a favor! Also I wish I could do this easily:

    enum Tag {
        Int,
        Bool
    }

    enum Data {
        Int(Vec<i32>),
        Bool(Vec<bool>)
    }

    struct Val {
        tag:Tag,
        data:Data
    }

    let n = val.as_i32::<Tag::Int>(&self) -> Vec<i32>


There are some crates that let you do that. For example enum-ptr

https://crates.io/crates/enum-ptr


As someone not versed in Virtual Machine development, is goto so important? What do you gain by using it?


> What do you gain by using it?

Performance, although this possibly depends on your compiler, whether you use PGO, and similar finicky issues.

Example: https://eli.thegreenplace.net/2012/07/12/computed-goto-for-e...

Some prior HN discussion: https://news.ycombinator.com/item?id=18678920

Another example where goto is relevant is implementing finite automata. A (very short) paper from 1988 that discusses three different ways of implementing a finite state machine is "How (Not) to Code a Finite State Machine". The documentation of RE2C may be even more interesting: https://re2c.org

RE2C is a program that compiles finite automata into C, Go, or Rust code. It provides many implementation strategies: it can make use of computed or labelled gotos when the language provides them.

Implementing pushdown automata comes with similar issues.


Goto is quite a powerful tool for having a full, imperative grip on control flow. It is quite often useful due it's simple yet powerful ability to "just" work. Virtual machines may often encounter situations where a tight control over the flow can be advantagous for performance etc. reasons.

Goto has been iirc a point of discussion from the language design pov for Rust. I don't think it promotes good design for most applications but ymmv.


In what situations do you miss goto the most, when implementing a VM?


I like how the author identified that his program was slower than he thought it should be, profiled, and identified some low-hanging fruit. He tried quite a lot of things, some of which panned out and some of which did not. Interpreters and Rust aside, this is a perfect example of software performance work.


I have also made an interpreter using Rust. https://github.com/loda-lang/loda-rust/tree/develop/rust_pro...

You can try the interpreter in the browser for computing the prime numbers: https://loda-lang.org/edit/?oeis=40

The language is "LODA", a math AI, using OEIS as training data. https://loda-lang.org/


I’m curious about the extent to which DOS is a concern for an interpreter like this. I can see it be an issue if you’re exposing a service to external actors, but how big of a concern is it really in this context?


It was a big problem for web-facing services in interpreted languages "back in the day." All the dynamic languages basically fixed this problem ten years ago (or 20 years ago for Perl): https://lwn.net/Articles/474912/

If you weren't going to expose clox to the web or other untrusted input (and why would you), it's not a concern.


I went down that rabbit hole myself after reading the article. This sums up the situation: https://sudonull.com/post/127497-Non-cryptographic-hash-func...


I'm going to assume that at least one popular interpreter uses a weak hash function for its hashtables/objects, and I haven't ever heard about it being an issue.


> While the clox compiler does prevent a lot of these cases, it does not prevent stack overflows. The author intentionally decided to skip this check because it would require a lot of boilerplate code. Real-world interpreters using unsafe structures like these must absolutely do it though.

You can do both overflow and underflow checks without any efficiently loss by using virtual memory page faults to "catch" the resulting access (which in any case is a fatal error).


Only if under/overflow can be bounded. E.g., with stack probing.


Does Crafting Interpreters require preexisting C knowledge? I'm curious to follow the book, but my experience with X is beginner level at best.


It doesn't. Half of the book is not even in C (The book is half java (simpler interpreter), half C (bytecode compiler + VM)). The book explains what the code needs to do very very clearly, so even without looking at the C code you would gain a lot of value.


I think you should at least understand the basic concepts like loops, variables, pointers and whatnot before trying to follow the book, otherwise it may be a little rough.

You definitely don't need any expert knowledge though! Also following the book will increase your C knowledge as you go (at least in the second half).


I have the book but haven't gone through it. When I do eventually get to it, I will probably ignore all of the implementation languages, instead choosing something like F# or Racket. You could do something similar.


I've been teaching a class the lasts few weeks with a dozen people who have never programmed before using that book. It's going well.


https://github.com/vishpat/lisp-rs

A simple Lisp Interpreter in Rust


.


Seems up to me at the moment.


nope, works




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: