Participle: A parser library for Go

kjeetgill · on July 17, 2018

Parsers are one of those fields I wish I had the time to really study deeply. I'm especially impressed with LALRPOP.

Cribbed from a previous comment of mine:

From time to time I find myself revisiting this thread: "Writing parsers like it is 2017"[0].

In particular I love a Rust parser generator called LALRPOP and it's emphasis on diagnosing ambiguous grammars [1].

> What I’ve tried to do now in LALRPOP is to do that clever thinking for you, and instead present the error message in terms of your grammar. Perhaps even more importantly, I’ve also tried to identify common beginner problems and suggest solutions.

They work out a fairly deep example with error guidance in ambiguous grammars in that post.

[0]: https://news.ycombinator.com/item?id=15016061

[1]: http://smallcultfollowing.com/babysteps/blog/2016/03/02/nice...

iampims · on July 17, 2018

A good intro by Rib Pike for those who’d like to learn more about parsers/lexers https://youtube.com/watch?v=HxaD_trXwRE

deklerk · on July 17, 2018

s/Rib/Rob :)

skybrian · on July 17, 2018

Cool idea. The way it handles alternatives looks awkward since each alternative is a field in a struct. You have to check each field to find the one that's not nil.

(See Term in the example.)

Normally this would be done with an interface, like in go/ast

skybrian · on July 17, 2018

Although, on second thought, this means writing a chain of if-else statements instead of a type switch in a visitor. Maybe it's not that big a difference?

pebers · on July 17, 2018

I've used this before - really nice library and an easy way of getting started without having to write so much custom code. Only down side is that of course it's still faster to hand-write a parser :)

pjmlp · on July 17, 2018

Clever idea of using struct tags for grammar rules.

childintime · on July 17, 2018

This maps well to how I think of parsing. Refreshing. I will have to give it a shot.

laumars · on July 17, 2018

Wouldn't that then require reflection, which is "slow" (comparatively speaking - if this isn't in a hot path then perhaps who cares?)

bpicolo · on July 17, 2018

It's not in the hot path, it's used to generate the parser.

laumars · on July 17, 2018

Ahhh, then that's a cool approach to take parser generation.

weberc2 · on July 17, 2018

This looks really cool. I spent a few weeks last year trying to build my first programming language, but I kept getting bogged down by parsing and eventually gave up.

cweagans · on July 17, 2018

You should check out Thorsten Ball's interpreterbook.com and compilerbook.com. I went through the former - it's amazing. I'm eagerly awaiting the release of the latter.

It walks through the creation of an entire interpreter from scratch using just core Go packages, including test coverage.

weberc2 · on July 17, 2018

I'll do that, thanks for the tip! I heard about the interpreter book from the GoTime podcast, but didn't know about the compiler book. :)

cweagans · on July 18, 2018

It was announced relatively recently. If it's even half as good as the interpreter book, it'll be worth the price :)

laumars · on July 17, 2018

Parsing isn't actually that hard once you wrap your hard around the problem. There are lots of good guides for writing parsers online which really helped me out when I was feeling the same bourdon as you are.

So what I'm trying to say is stick with it and I'm sure you'll crack it. :)

azhenley · on July 17, 2018

Definitely. I used to always get stuck on the parsing part, and it definitely is time consuming to get right, but once you get past it once it is much easier to tackle in the future.

Then you realize that parsing is only a small piece of making your own language!

laumars · on July 17, 2018

The Eureka moment for me was learning that a parser isn't a single entity. It's reading tokens, making an abstract syntax tree, etc. One I learned how to break the problems down I found writing parsers very easy.

As for it only being a small part of making your own language, that is so true too. However I do think it's a rewarding project to tackle.

jerf · on July 17, 2018

The ability to parse is one of the things that differentiates "power programmers" from people just hooking things together. If you wish to be one of the former, it is on the Must Study list.

It is true that many, perhaps even most programmers can go their entire career without having to face the problem of writing their own code to take a text file and view it as a tree of tokens. This is even more true than ever with so many off-the-shelf serialization options, meaning that creating your own new one had better have a really, really darned good justification to it. But if you conceptualize a parser as "a thing that receives a stream of low level, primitive inputs and needs to extract sensible state information and know when constraints have been violated"... ah, now that sounds pretty useful, right? And very modern as we continue our inevitable marches towards viewing the world in terms of event streams rather than blobs of completed input that we can take in all at once. The generalized parsing problem of taking a stream of primitives and extracting higher level meaning from them is as relevant as ever, if not more relevant.

weberc2 · on July 17, 2018

How do you write a parser in a language like Go? I stumbled upon parser combinatory and tried to shoehorn them into Go, but the lack of generics mostly prohibited this. I also tried imperatively writing the parser, but this was tedious and error prone. I also explored parser generators, but these seemed to have huge learning curves and still produced awful code. Beyond that, it’s hard to tell what kind of parser I need because of all of the academic jargon. :/

I would love to be pointed in the right direction!

laumars · on July 17, 2018

The lack of generics wasn't an issue for me.

A 30 second overview:

The first thing you need to do is scan your file for tokens (eg "{", "}", "[", "]", ",", "\"", etc in the case of JSON).

Then you traverse your tokens and turn them into an AST (Abstract Syntax Tree) which is essentially just a "struct" version of the code (in the dumbest description imaginable). The point of this is you start adding definitions to your tokens. Eg what does a "[" mean?

From there you can start writing your real business logic since you now have a Go struct version of your code which you can traverse.

However I would really would recommend doing a google for how to write your own compiler. There are dozens of good resources out there (I can't recall the one that I used off hand) and they'll explain this process a thousand times better than I have above.

One last thing to note, the theory of writing a parser is pretty language agnostic so don't get put off that one guide is in C or another in Java or Python or LISP or whatever. As long as you get the theory nailed you can write the Go code equivalent without too much headache.

zzzcpan · on July 17, 2018

Parser combinators don't need generics, as they are merely higher-order recursive descent parsers. I.e. recursive parsers where functions are created via other functions to generalize and compose them. Go is expressive enough to do that.

For example, let's say we start with a function that parses a character '['.

  buf, pos := []byte("[123]"), 0
  
  LBRACK := func()bool {
     if buf[pos] != '[' {
        return false
     }
     pos ++  // consume character
     return true
  }

It would suck to write such function for every single character we have to parse. We can generate it instead:

  CHAR := func(c byte)func()bool {
     return func()bool {
        if buf[pos] != c {
           return false
        }
        pos ++
        return true
     }
  }
  
  LBRACK := CHAR('[')

Now let's do the same for parsing a sequence. We combine outputs of other functions and return one that parses them one by one.

  SEQ := func (args... func()bool) func()bool {
     return func()bool {
        for i := 0; i < len(args); i++ {
           f := args[i]
           if f() == false {
              return false
           }
        }
        return true
     }
  }
  
  parser := SEQ(
     CHAR('['),
     SEQ(
        CHAR('1'),
        CHAR('2'),
        CHAR('3'),
     ),
     CHAR(']'),
  )

This is the idea.

This is not a real world example though. You'll need to have more primitives, a wrapper that handles various things, like restoring buffer index on failure, captures, that save starting index, move buf/pos into context structure, virtual stack to store captures/errors, etc.

  func SEQ (args... func(ctx *Ctx)bool) func(ctx *Ctx)bool {
     return wrap (func(ctx *Ctx)bool {
        ...
     })
  }

But it's important to get there on your own. And it's pretty much as fast and as flexible as completely hand-written recursive descent parsers and as powerful as parser generators.

weberc2 · on July 17, 2018

Is that a parser? That looks like a matcher. I actually built something that looks almost exactly like this, but it doesn't get you an AST, right? What am I missing?

zzzcpan · on July 17, 2018

To get to AST you have to create AST nodes. Make a function that captures the output and put it into places where you want to create AST nodes. Sort of like this:

  SEQ(
     ...
     CAPTURE(
        // parser to capture
        SEQ(...),
        // extra arguments: 
        // a name, a function to call,
        // that creates an AST node, etc.
     ),
  )

laumars · on July 17, 2018

Is the code online? What was the output of your routine?

slentprog · on July 17, 2018

I can recommend the book writing an interpreter in Go. It requires some basic knowledge of the Go programming language to get started.