Hacker News new | past | comments | ask | show | jobs | submit login
Ohm – A library and language for building parsers, interpreters, compilers, etc. (github.com/harc)
348 points by testing_1_2_3_4 on March 27, 2021 | hide | past | favorite | 100 comments



Ohm’s key selling point for me is the visual editor environment, which shows how the parser is executing on various sample inputs as you modify the grammar. It makes writing parsers fun rather than tedious. One of the best applications of “live programming” I’ve seen.

https://ohmlang.github.io/editor/


A lot of regex testers do this and I can't imagine writing a regex or a parser without.


>a parser without

can you show me a parser generator that produces this kind of visualization?


Not a syntactic parser generator, but [chevrotain](https://chevrotain.io/playground/) can generate flow diagrams for your ruleset. IIRC not just in the playground, but in general.


antlr has many that produces such visualization, vscode has some great plugins for it.


great thanks!


I used to debug parsing process for VHDL grammar (which is ambiguous on lexem level) with parsing combinators and Haskell REPL.

Whenever my "compiler" found a syntax error in test suite, I was able to load part of source around error and investigate where my parser's error or omission is by running parser of smaller and smaller part of grammar on smaller and smaller parts of input.

It was 12 years ago.

And yes, it is fun. ;)


I'm so happy to see this on HN. I've used Ohm for several projects. If you want a tutorial for building a simple programming language using Ohm, check out this series I put on GitHub.

https://github.com/joshmarinacci/meowlang


Compiler compilers are great, I love writing DSLs for my projects. I usually use yacc/lex, or write my own compiler (typically in go these days).

However (and this is just me talking), I don't see the point in a javascript-based compiler. Surely any file format/DSL/programming language you write will be parsed server-side?


> I don't see the point in a javascript-based compiler

JavaScript is a full programming language. Why wouldn't it be a fine choice to write a compiler in? People have a funny idea that compilers are more complex software or are somehow something low-level? In reality they're conceptually simple - as long as your language lets you write a function from one array of bytes to another array of bytes, then you can write a compiler in it. And for practicalities beyond that you just need basic records or objects or some other kind of structure, and you can have a pleasant experience writing a compiler.

> Surely any file format/DSL/programming language you write will be parsed server-side?

JavaScript can be used user-side, or anywhere else. It's just a regular programming language.


> I don't see the point in a javascript-based compiler

Typescript, sass, jsx... There are a lot of languages running on top of js. Or you might want to do colorizing, autoformating on input in the browser?

Along with all that, there's as mentioned nodejs, deno for running server side.

But at any rate - lots of front-end problems involve various kinds of parsing/validation and transformation (eg: processing.js).


> Why wouldn't it be a fine choice to write a compiler in?

Javascript doesn't seem suited to compiler construction because it lacks lots of features that make compiler construction pleasant (e.g. strong rich types, algebraic data types, etc.)

It might be "fine" but it's not "good".


I interned with the PI behind Ohm (Alex Warth) and one of his reasons for using the browser was simple:

“If I send someone an executable, they will never download it. If I send them a URL, they have no excuse.”


We are talking about a compiler here.

If someone interested in a compiler doesn't download it, it's not a excuse, it's a filter. Or a warning sign.


You know all those jokes that people like Linus make about Real Programmers—the ones who have hair on their chests, etc—you know those are all jokes, right? Jokes in the laughing-at-them sort of way, the way Colbert did it—not something that you're supposed to unironically buy into.

> If someone interested in a compiler doesn't download it, it's not a excuse, it's a filter. Or a warning sign.

You're so invested in gatekeeping that you're confusing the point of research with technofetishism.

Here's what Joe Armstrong had to say in "The Mess We're In":

"I downloaded this program, and I followed the instructions, and it said I didn't have grunt installed! [...Then] I installed grunt, and it said grunt was installed, and then I ran the script that was gonna make my slides... and it said 'Unable to find local grunt'."

Looks like someone needs to go dig up Joe and let him know that the real problem is that there was a mistake in letting him get past the point where he was supposed to be filtered out.


> doesn't download it, it's not a excuse, it's a filter

If it's a decently large project, sure. But if it's a small project with only a couple contributors who I've never heard of? There's the potential for that to be hiding malicious code. Plus the potential complexity of getting a project that's only ever been built on (say) 2 computers to successfully compile and run on my system. Plus figuring out whatever build system and weird flags they happen to use. And potentially wrangling a bunch of dependencies.

All that just to take a quick look at a language that might not actually be of interest to me in the end. The browser offers huge benefits here - follow a link and play around in a text box. It just works. (This is also why I use Godbolt - I don't want to bother with a Windows VM or wrangle ten different versions of Clang and GCC myself.)


Spoken like someone who has never taught real students!


I mean it's JavaScript, I don't think it's intended for you to write C compilers in it - but for compile-to-JS languages, it's a real asset to be able to run it in the browser, although more and more that can be done with WebAssembly as well. However, look at the project listed as using it - it may not even be for web languages, but just projects that need to parse something.


(also just me talking -- here are some potential counterpoints)

The choice of language often matters a lot less than how familiar you are with it (and its ecosystem(s)). I think it's totally reasonable to want to use JS for a compiler in, e.g., a Node project if for no other reason than to not have to learn too many extra things at once to be productive with the new tool.

I also don't think it's fair to assume everything will be parsed, tokenized, etc server-side. Even assuming that data originates server-side (since if it didn't you very well might have a compelling case for handling it client-side if for no other reason than latency), it's moderately popular nowadays to serve a basically static site describing a bunch of dynamic things for the frontend to do. Doing so can make it easier/cheaper to hit any given SLA at the cost of making your site unusable for underpowered clients and pushing those costs to your users, and that tradeoff isn't suitable everywhere, but it does exist.

It's interesting that you seem to implicitly assume the only reason somebody would choose JS is that they're writing frontend code. It's personally not my first choice for most things, but it's not too hard to imagine that some aspect of JS (e.g., npm) might make it a top contender for a particular project despite its other flaws and tradeoffs.


This makes me feel really good. I’m working on my first DSL and I’m writing it in JS. I really don’t know what I’m doing, and it felt like JS wasn’t as good a choice as a more “serious” language like C++.

But I’m standing my ground because I’m not even writing a proper “compiler” - in my case, the output is JSON. So it just kinda feels like it makes sense to stick with JS.


Write software, grow from your experience and - sometimes - mistakes. Worrying and getting caught in analysis paralysis is the best way to stagnate.

Tons of brilliant people and huge companies have made software in X and found out it would have been better in Y. There's always tomorrow, as long as you learned today.


If your ecosystem is JS, having a JS based compiler is pretty convenient. As long as it's just "slower by some constant", rather than by a runtime order, the fact that it's not as fast as yacc/bison etc. is pretty much irrelevant, so being able to keep everything JS is quite powerful for people new to the idea having started their programming career using JS, as well as seasoned devs working in large JS codebases.

(and you can always decide that you need more speed - if you have a grammar defined, it's almost trivial to feed it to some other parser-generator)


>Surely any file format/DSL/programming language you write will be parsed server-side?

Well, Javascript has been used for over a decade heavily on the server side, with Node, WASM and other projects.

And as far as raw speed goes, something like v8 smokes all scripting languages bar maybe LuaJit.

So, there's that...


There’s definitely a use for js based parsing for tooling that runs in the browser (autocomplete, documentation browsing etc). Integration with the Monaco editor is a common use case.


There's a great deal of value to making programming environments available in a browser, especially in the context of creative coding and education. I have built and used many such tools which are purely client-side.

There is a world of difference in accessibility between a tool that requires installation and a tool that you can use by following a hyperlink.


> I don't see the point in a javascript-based compiler

My CC is Javascript based (well it was initially, then TypeScript, now a lot of it is written in itself).

99% of the time I use the actual languages I make in it server side (nodejs), but I am able to develop the languages in my browser using https://jtree.treenotation.org/designer/. It's super easy and fun (at least for me, UX sucks for most people at the moment). There's something somewhat magical about being able to tweak a language from my iPhone and then send the new lang to someone via text. (Warning: Designer is still hard to use and a big refresh is overdue).


Wait, what do you use treenotation for? What are the languages for? I think I'm just a little surprised someone's using treenotation other than to play with it.


Oh yeah I use it everywhere. The ideas are live on prod on a number of sites.

My recent fun public focus now is to power Scroll, (https://scroll.publicdomaincompany.com/). "Scrolldown" now powers my blog (an example post: https://github.com/breck7/breckyunits.com/blob/main/insist-o...). I think from what I'm seeing so far Scrolldown may be one of the first Tree Lang breakouts. Simple but powerful from extensibility.

TreeBase is used extensively at a few moderately successful websites. I think Tree Notation (or 2D langs generally) will be used OOMs more in this ___domain. It integrates so incredibly seamlessly with Git.

At Our World in Data Tree Notation is used for the researchers to build interactive data visualizations (https://www.youtube.com/watch?v=vn2aJA5ANUc&t=145s). That one uses it's own implementation called "GridLang" because I didn't want to depend on jtree, which is a bit too R&D for a site with that kind of traffic. The 2D lang/Tree Notation ideas are so simple that it's easy to roll your own code and you don't have to use "jtree". I view the "jtree" project in a way as just an experiment to confirm that yes, you can do anything/everything without any visible syntax characters. Space is all you need.

On the contracting side I'm helping a crypto group with a shockingly ambitious 2-D crypto.


In that case, way I ask why you are not a Racket user? Sounds like it'll save you a ton of time and keep your implementations high level.


A ton of front end templating languages/frameworks. They involve compilers to different degrees, don't they?


This is an example of a library we built using Ohm: https://github.com/Bridgeconn/usfm-grammar [1]

It works great for our use-case though I have been eyeing tree-sitter[2] for its ability to do partial parses.

[1] USFM: https://ubsicap.github.io/usfm/ [2] https://tree-sitter.github.io/tree-sitter/


This is a follow-up to a major component of the http://vpri.org/writings.php project that created an self-contained office suite, OS and compiler suite in something like 100-200k lines of code without external dependencies.


They were trying for 10k lines of code(I think I saw Alan Kay mention online that they got to about 20k lines).


Do you have a link to the project? I'm failing to find it on that page.


Not op, and can’t google now but the project was called STEPS, they did a down-to-metal os including network and GUI (and mote) in 20k lines.

Don’t remember anything about office suite. Related names I remember are Alan Kay, Dan Amelang, Alessandro Wirth and Ian Piumarta.


The biggest artifact from STEPS was Frank, which was at the time bootstrapped using Squeak Smalltalk and included the work from Ian Piumarta (IDST/Maru, which was a fully bootstrapped LISP down to the metal), Dan Amelang (Nile, the graphics language, and Gezira, the 2.5D graphics library implemented in Nile, which both depended on Maru), Alex Warth (OMeta, which had some sort of relationship to Ian's work on Maru), Yoshiki Ohshima (a lot of the experimental things from Alan's demos of Frank were made by Yoshiki) and then several other names. I got close to getting Frank working, but honestly, I'm not sure it's worth it at this point. A lot of the work is 10-15 years old, and the last time I dove in, I ran into issues running 32-bit binaries. The individual components are more interesting and could be packaged together in some other way.

Since it was a research project, STEPS never quite achieved a cohesive, unified experience, but they proved that the individual components could be substantially minimized and the cost of developing them amortized over a large project like a full GUI environment. Nile and some of the applications of Maru, like a minimal but functioning TCP/IP stack that can be compiled to bare metal by virtue of being made in Maru, still fascinate me.

Work on Maru is ongoing, albeit run by a community (with some input from Ian), Nile has been somewhat reborn of late, Ohm is again under active development as the successor to OMeta and Alan is still around.

(Source: Dan is a friend and colleague, and I've met a few of the STEPS/VPRI people that way.)


Is there some place STEPS fans can gather and gather our notes? There are archives of the FONC mailing list here [1].

I'm an outsider and also never got Frank to work. I was waiting for the Nile/Gezira thesis to get a high level (but hopefully also some detailed) descriptions) of how they handled graphics. I vaguely remember getting parts of idst working but for each of these projects, there were always multiple versions lying around. Sometimes in odd places.

I read Alex Warth's thesis and it's well written, in a way that makes it very easy to understand. So, of course, I had to implement my own OMeta variant [2].

Also, the VPRI website itself says it's shutting down (presumably folks moved to HARC at that time?).

Edit to add that OMeta is the language agnostic parser and compiler!

[1] https://www.mail-archive.com/[email protected]/ [2] https://github.com/asrp/pymetaterp


> Is there some place STEPS fans can gather and gather our notes? There are archives of the FONC mailing list.

Maru development is documented on an active mailing list.[1] Ohm development is being coordinated through GitHub. I'd personally like to take the extant code from OMeta/JS and the JS implementation of Nile & Gezira, and modernize them.

Recently I've been wondering if there's enough interest for a Discord server or something. (In the spirit of STEPS, it'd be ideal to make a new collaborative thing that's really different than static text/audio/video on the web, but gotta start somewhere. :) ) Unfortunately, I have had other, higher-priority projects at the moment, so I have taken no initiative to try to build a community.

I will also say that in my opinion, it's not clear to many of the people who made this stuff how special it is. The only exception to that is Bret Victor, who actually is not well-understood, but even the banana pudding versions of his ideas are typically much better than the industry's.

> I'm an outsider and also never got Frank to work. I was waiting for the Nile/Gezira thesis to get a high level (but hopefully also some detailed) descriptions) of how they handled graphics. I vaguely remember getting parts of idst working but for each of these projects, there were always multiple versions lying around. Sometimes in odd places.

I've never gotten Frank to work, and I abandoned my attempts. I've seen it run, though. The name was fully truthful: it really is Frankenstein's monster.

I did get Nile + Gezira to work (albeit in a very crude way by printing numbers to the console rather than hooking it up to a frame buffer). That's how I met Dan. I don't want to betray any confidences with him, but there is ongoing work with Nile.

Here's Dan himself presenting a related language at Oracle Open World in a demo (around 25 mins in).[2] (Full disclosure: I worked on the demo.)

If it were me getting started, I would take a look at the JavaScript implementation of Nile in Dan's Nile repo on GitHub. It should more or less work out of the box, and there's an HTML file containing a fairly full subset of Gezira. The only problem is that the JS style is way out of date, and so it does some things that are heavily frowned upon today. It may not work with tools like Webpack.

The Maru-based Nile is trickier to get working, but it does work. The issue with Ian's Maru is that it's quite hard to reason about and lacks clear debugging tools. I've gotten both up and running. I seem to remember the Boehm GC was pivotal in getting Maru to bootstrap and then run Nile.

> I read Alex Warth's thesis and it's well written, in a way that makes it very easy to understand. So, of course, I had to implement my own OMeta variant [2].

Pymetaterp is cool! I agree: Warth's work on OMeta was impressive. In some ways, Ohm feels inferior to me, though they're both good tools with lots of potential.

OMeta is the one tool from STEPS that is basically simple to understand and use without having to do a bunch of code archaeology.

> Also, the VPRI website itself says it's shutting down (presumably folks moved to HARC at that time?).

VPRI closed because STEPS ended and because Alan had to retire at some point. HARC and/or CDG Labs continued the work, but then closed as well. (I don't know all of the details, but someone here suggested SAP withdrew funding. That would track with what I do know.)

Today, Ian is teaching in Japan, Dan is at Vianai, Alex is at (IIRC) Google, Yoshiki is at Croquet, Bret Victor is doing Dynamicland, Vi Hart is at Microsoft Research and then Alan is retired. There were quite a few others I'm missing, and they are all doing interesting things as well.

[1] https://groups.google.com/g/maru-dev

[2] https://www.oracle.com/openworld/on-demand.html?bcid=6092429...


I've put up a barebones Slack [1] and editable Wiki [2]. I might fill that with info I have in the coming weeks since I realized all I had were scattered files.

Diving into this a bit, I remembered that fonc had it's own (now defunct) wiki. [3] It seems like a lot of the important pages were unfortunately not updated though.

[1] https://join.slack.com/t/footprintsorg/shared_invite/zt-o7ch... [2] https://hackmd.io/SB4QqG7bSxmgoUvPPoSzUA [3] http://vpri.org/fonc_wiki/ [3, archive.org] https://web.archive.org/web/20110901193854/http://vpri.org/f...


> I will also say that in my opinion, it's not clear to many of the people who made this stuff how special it is. The only exception to that is Bret Victor, who actually is not well-understood, but even the banana pudding versions of his ideas are typically much better than the industry's.

I would love to hear more about how you believe not only outsiders, but also the people who made this misunderstand this work?

How do you see the importance of STEPS and Bret Victor's work? I'm a big fan, and you clearly have a lot of knowledge. I'd love to read more!


Thanks, a lot of this is new and useful to me.

> Recently I've been wondering if there's enough interest for a Discord server or something. (In the spirit of STEPS, it'd be ideal to make a new collaborative thing that's really different than static text/audio/video on the web, but gotta start somewhere. :) ) Unfortunately, I have had other, higher-priority projects at the moment, so I have taken no initiative to try to build a community.

I don't really like Discord because they keep asking for phone verification and early on, they were pretty aggressively shut down alternate client attempts.

What about Mattermost? I could try to set one up though initially, we wouldn't have email notifications or a CDN. Might not be so good if the initial group is small.

Slack? Don't know how they compare to Discord but at least they don't ask for phone verification.

A subreddit? A mailing list? Some kind of fediverse thing?

If there's some possibility of migrating to our own platform, I guess it doesn't matter as much where we start.

I could try to set something up in the coming week. But interest in this HN thread will still have died by that time.

> I did get Nile + Gezira to work (albeit in a very crude way by printing numbers to the console rather than hooking it up to a frame buffer). That's how I met Dan. I don't want to betray any confidences with him, but there is ongoing work with Nile.

Nice! I'm not anywhere near that. I'm still looking for a description of what it _is_ and at a very high level, how does it work internally? Something like "it's mathematical notation to describe the pixel positions/intensities implicitly via constraint equations; it uses a <something> solver for ...". What's in quote could be way off and is from memory of what I remember seeing.

> I've gotten both up and running. I seem to remember the Boehm GC was pivotal in getting Maru to bootstrap and then run Nile.

I also vaguely remember something about getting the right Boehm GC version so that some of

> Pymetaterp is cool! I agree: Warth's work on OMeta was impressive. In some ways, Ohm feels inferior to me, though they're both good tools with lots of potential.

Thanks! I share similar thoughts about Ohm. Having a visual editor is very nice, though I tend to use breakpoints for parser debugging [1].

Edit to add that id-objmodel [2] is another STEPS project I found to be simple and useful as an idea.

[1] See, for example, "Debugging" in https://blog.asrpo.com/adding_new_statement [2] https://www.piumarta.com/software/id-objmodel/


> I ran into issues running 32-bit binaries

You could use a VM to workaround this issue no?

It seems to me that making a working demo of Frank as an open source project should be the first priority even if it runs only in a 32-bit VM, because then if the demo is interesting, you may even get help from other for "modernizing" Frank so that it runs natively.


I have a (mostly) working Frank. I have collected all the source code, papers and talks from VPRI and the Squeak and Croquet research community, several TB but it needs to be organised. Volunteers? morphle at ziggo dot nl. I'm working on a many core processor for FPGA, ASIC and Wafer Scale Integration to run all these bytecoded VM's and GUI's.


Can you link to both the Maru community and the reborn Nile work? I've always tried to follow the latter, but [1] seems to be the only place to find information and it's been silent for a long time.

[1] https://github.com/damelang/nile/issues/3


Maru development is documented on an active mailing list.[1]

Dan did a demo of a related language to Nile at Oracle Open World in September 2019. (Full disclosure: I worked on the demo.) I would predict that more information will be forthcoming about Nile this year.

[1] https://groups.google.com/g/maru-dev

[2] https://www.oracle.com/openworld/on-demand.html?bcid=6092429...


Can you publish what you have collected?


I plan to: a working system. That seems to be true to the spirit of STEPS, VPRI and Alan Kay himself.


My collection needs to be reorganised so we can publish it. You can help us do that. morphle at ziggo dot nl


Thank you, funnily enough this lead me back to the orange website:

"STEPS Toward the Reinvention of Programming, 2012 Final Report Submitted to the National Science Foundation (NSF) October 2012"

https://news.ycombinator.com/item?id=11686325


The 'Word' equivalent was called Frank but AFAIK nobody has been able to reproduce what was demonstrated..

Quite painfully ironic for a software research project that they didn't use properly a VCS..


They did use VCS, actually, but a lot of them used SVN and each person in the STEPS project was hosting their own code. Most of those servers have gone dark now, though you can find random ports over to GitHub (rarely with the version history). As far as I can tell, Dan Amelang and Alex Warth were the only two who used git or moved their code over to git.


See:

https://en.m.wikipedia.org/wiki/Ometa (including reference section)

Or go to: http://www.vpri.org/writings.php

If I recall correctly you want: "STEPS Toward the Reinvention of Programming, 2012 Final Report Submitted to the National Science Foundation (NSF) October 2012" (and earlier reports)

Discussed on hn: https://news.ycombinator.com/item?id=11686325

And: https://news.ycombinator.com/item?id=585360

Notable for implementing tcp/ip by parsing the rfc.

"A Tiny TCP/IP Using Non-deterministic Parsing Principal Researcher: Ian Piumarta

For many reasons this has been on our list as a prime target for extreme reduction. (...) See Appendix E for a more complete explanation of how this “Tiny TCP” was realized in well under 200 lines of code, including the definitions of the languages for decoding header format and for controlling the flow of packets."

(...)

"Appendix E: Extended Example: A Tiny TCP/IP Done as a Parser (by Ian Piumarta) Elevating syntax to a 'first-class citizen' of the programmer's toolset suggests some unusually expres- sive alternatives to complex, repetitive, opaque and/or error-prone code. Network protocols are a per- fect example of the clumsiness of traditional programming languages obfuscating the simplicity of the protocols and the internal structure of the packets they exchange. We thought it would be instructive to see just how transparent we could make a simple TCP/IP implementation. Our first task is to describe the format of network packets. Perfectly good descriptions already exist in the various IETF Requests For Comments (RFCs) in the form of "ASCII-art diagrams". This form was probably chosen because the structure of a packet is immediately obvious just from glancing at the pictogram. For example:

  +-------------+-------------+-------------------------+----------+----------------------------------------+
  | 00 01 02 03 | 04 05 06 07 | 08 09 10 11 12 13 14 15 | 16 17 18 | 19 20 21 22 23 24 25 26 27 28 29 30 31 |
  +-------------+-------------+-------------------------+----------+----------------------------------------+
  |   version   |  headerSize |      typeOfService      |                     length                        |
  +-------------+-------------+-------------------------+----------+----------------------------------------+
  |                     identification                  |  flags   |                  offset                |
  +---------------------------+-------------------------+----------+----------------------------------------+
  |       timeToLive          |         protocol        |                    checksum                       |
  +---------------------------+-------------------------+---------------------------------------------------+
  |                                               sourceAddress                                             |
  +---------------------------------------------------------------------------------------------------------+
  |                                             destinationAddress                                          |
  +---------------------------------------------------------------------------------------------------------+


If we teach our programming language to recognize pictograms as definitions of accessors for bit fields within structures, our program is the clearest of its own meaning. The following expression cre- ates an IS grammar that describes ASCII art diagrams."


Each PEG generator promises a revolution but only burns a car.

I was disappointed with how they do operator precedence; they use the usual trick to make a PEG do operator precedence which looks cool when you apply it to two levels of precedence but if tried to implement C or Python in it it gets unwieldy. Most of your AST winds up being nodes that exist just to force precedence in your grammar, working with the AST is a mess.

For all the horrors of the Bell C compilers, having an explicit numeric precedence for operators was a feature in yacc that newer parser gens often don't have.

I worked out the math and it is totally possible to add a stage that adds the nodes to a PEG to make numeric precedence work and also delete the fake nodes from the parsed AST. Unparsing I'm not so sure of, since if someone wrote

   int a = (b + c);
how badly you want to keep the parens is up to you; a system like that MUST have an unparse-parse identity in terms of 'value of the expression', but for sw eng automation you want to keep the text of the source code as stable as you can.


This title is misleading. It's a library and language for building parsers. Full stop. Parsing toolkit, as they say themselves.


The title copies the second sentence of their readme:

> You can use it to parse custom file formats or quickly build parsers, interpreters, and compilers for programming languages.


I guess it depends on what it means to somebody to build a compiler. Something like yacc says "compiler compiler" in the name but really it is a parser generator. The hard part of industrial compilers is the optimization.


I've used PEGs in the past. They're nice since they combine the mental model of LL grammars with the automation of LALR parser generators. However, it is quite easy to accidentally write rules where you never parse the second rule due to the ordering priority for rules. For instance:

    ident ::= name | name ("." name)+

Because with PEGs, the parser tries the first rule, then the second, and because whenever the second rule matches, the first one will also match, we will never parse the second rule. That's kinda annoying.

Of course with PEG tools you could probably solve this by computing the first sets for both rules and noticing that they're the same. Hopefully that's what this tool does.


This is what's called left-recursion, and there's indeed a way to deal with it in PEG parsers: https://github.com/PhilippeSigaud/Pegged/wiki/Left-Recursion.


I recently wrote a similar parser, maybe less fancy, for a workshop on parsing. It does display the the abstract syntax tree with d3.js and also has a build evaluator for a limited set of language constructs. https://fransfaase.github.io/ParserWorkshop/Online_inter_par... It is based on a parser I implemented in C++.


I’ve built a number of toy language projects with Ohm and it’s really wonderful. Just a joy to use the visual tooling also. All around really beautiful machinery


Always fun to find the first commit:

https://github.com/harc/ohm/commit/4611bf63c5ecb90d782112d68...

2014

Neat tool. I write parsers by hand though. More fun, and you can be a lot sleazier.


When should one use Ohm over Racket?


When they want a library and toolkit for building parsers and languages, rather than a general programming language based on Scheme.


So, I guess you don't know why OP specifically asked about Racket: https://www.cs.utah.edu/plt/dagstuhl19/ https://beautifulracket.com/stacker/why-make-languages.html


Nah, I know about Racket's DSL support and touting itself as friengly to language writing, but it's still not the same as a dedicated parsing toolkit, the same way I wouldn't consider a Lisp with reader macros equivalent either...


... but racket basically exists to create parsers and languages. It happens to also be a general programming language. But so is JS nowadays with Node.


We are using Ohmjs on a project at work and it is fantastic. I'm hoping one day that Ohmjs and Ohm/s (Squeak) can be compatible again -- would love to have the Smalltalk version of our interpreter and environment we built using this


Speaking of - what’s the status of HARC? Is it defunct?


Yep, HARC is no more. I don't recall the exact history but iirc SAP withdrew its funding and HARC basically ceased to exist.

Now, ohm survives as an open-source project, Bret Victor continues work with Dynamicland and Vi Hart is currently employed at Microsoft Research.


Defunct enough to let their TLS cert expire.


Love it, this is great for teaching purposes.


I recently created a library for the other part of an interpreter.

https://github.com/codr7/liblg

https://github.com/codr7/liblgpp


It'd be cool if the online editor dispensed with the need to "write the grammar" entirely. A node based parser-generator in addition to Ohm being yet another grammar based parser-generator would be pretty great.


Even better would be to generate parser from examples. See the Microsoft Research Excel Flash Fill paper.


If I want to modify GraphQL to support custom syntax, would Ohm work? Or does a solution exist already for my needs?


I'd rather put my hand in boiling water than develop a compiler in a dynamic weak typed language.


Please don't post unsubstantive and/or flamebait comments to HN. We don't want tedious flamewars here, including programming language flamewars that were tedious 20 years ago.

We detached this subthread from https://news.ycombinator.com/item?id=26604134.


My experience doing both in practice is that the type system helps you with things that aren't really a problem anyway (a compiler doesn't really have complex data structures and you don't often get these basic things wrong) and all but the most sophisticated type systems don't even begin to help you with things you really need help with - maintaining invariants.


Exactly. E.g. remove this line, and incorrect optimizations ensure, so the compiler fails halfway through recompiling itself:

http://www.kylheku.com/cgit/txr/tree/share/txr/stdlib/optimi...

The type is fine whether or not the line is present. It's all about that invariant.

None of the hair pulling I've experienced in compiler debugging had anything even remotely to do with type, which is something flushed out by testing.

Whenever doing anything, like an optimization test case, I put in print statements during development to see that it's being called, and what it's doing. You'd never add a new case into a compiler that you never tested. Just from the sheer psychology of it: too much work goes into it to then not bother running it. Plus the curiosity of seeing how often the case happens over a corpus of code.


Write a compiler in a strongly typed language, and then remove all the type annotations. This may come as a shock, but this is what a compiler (or any codebase) could look like when developed in a weakly typed language.


> Write a compiler in a strongly typed language, and > then remove all the type annotations.

Help! That's what I did. I chose to write the compiler in OCaml, a language that's already ~30 years old by now. But I can not find any type anotations! What should I do? I'm stuck!


No, you're done.

> language that's already ~30 years old by now

Relevance?


ocaml is statically typed, it just uses type inference for 99% of cases. So your're wrong in this case, he got all the advantages from static typing. Errors ade found at compile time.


You're confusing strong and static typing (javascript has neither). In more sophisticated languages such as scala, C# or haskell you can let the compiler infer the types for you, and you can then ask your IDE which type that is. This way you don't need to type out all the boilerplate, you get to see what a functions signature is, and you get compiler errors rather than runtime errors.


Welcome to TypeScript.


That doesn't work if you're using types for anything beyond correctness-checking. Type-driven dispatch, for example, which tends to be used heavily in big compiler and interpreter projects. And tagged unions (or algebraic datatypes), a natural fit for representing ASTs, become more unwieldy without type-directed features like pattern matching.


> Type-driven dispatch

Smalltalk and everything since then would like a word with you.

> And tagged unions (or algebraic datatypes) [...] type-directed features like pattern matching.

Erlang and Prolog would like to have a chat, too.


I'm talking about language implementation, not features of languages implemented.


Sounds like a double standard and possibly moving the goalposts. There are strongly typed languages that don't have those features, and compiler codebases that don't use that kind of architecture. Do they get a pass or not?


I'm directly responding to the claim that you can "write a compiler in a strongly typed language, and then remove all the type annotations", and that's what compiler architecture looks like in a "weakly typed language". Compiler projects built in languages with richer typing can and do use it for purposes beyond correctness checking, and the idea that you can simply erase all the types and expect the code to work the same is a misconception.


You're pointing to cases to invalidate the claim as if that claim was supposed to be interpreted to be bracketed by a universal quantifier. It wasn't, and arguing in that way is not particularly insightful.

"You can write a compiler in a weakly typed language that resembles a compiler in a strongly typed language." Happy now?

The point, which you have ignored, is that there are strongly typed languages where the features you're relying on are not present. In fact, this is true of a bunch of the compilers that are among the most widely used in the world--ones that people are using to build projects written in C and C++ and things like the language support baked into IDEs for Java, C#, etc. So the relevant factor is not "strong vs. weak?" but rather those features (structural matching, etc) that you are relying on.

And let's be real, the original comment ("I'd rather put my hand in boiling water than develop a compiler in a dynamic weak typed language"; now flagged) was no more than a drive-by insult.


> "You can write a compiler in a weakly typed language that resembles a compiler in a strongly typed language." Happy now?

Sure, that's a better claim.

> The point, which you have ignored, is that there are strongly typed languages where the features you're relying on are not present.

There sure are! I don't think I was ever trying to say otherwise.

> In fact, this is true of a bunch of the compilers that are among the most widely used in the world--ones that people are using to build projects written in C and C++ and things like the language support baked into IDEs for Java, C#, etc.

Sorry, I'm having trouble parsing this. Are you referring to compilers of C/C++ here, or compilers written in those languages? The architectures of compilers I've worked with that were built in C++ were specifically on my mind when I wrote my comment.

> And let's be real, the original comment... was no more than a drive-by insult.

I think you're maybe reading too much into me here? I didn't write that comment you're referring to. I responded very narrowly to a claim in the light of a common misconception about how type systems factor into software architecture (namely, that they're don't do anything as long as your code is "correct"). I'm picking up a lot of hostility that I don't think I've earned.


By type-driven dispatch do you mean dynamic dispatch on more than 1 parameter? Most statically typed languages do not have this and you have to write a bunch of boilerplate to get them to pretend that they do.


No, I'm referring to things like specialization, which you'll see used in any large language runtime built in C++, for example.


So "type driven dispatch" meant only dispatch that can be decided at compile time? Is it even called dispatch at that point?


Yeah, static dispatch.


Because you prefer to beaten by a stick after work, right? Helps your swollen hand.

Lisp is one of the best compiler implementation languages. Doing the same in C of C++ is about 3-20x more effort.


Extremely beyond the point, but it's not about Lisp, it's about automatic memory management, and to lesser extent lambdas and pattern matching.

There's nothing magical about Lisp that makes it super fit for compiler development.


Right. For compiler development you just need proper tree handling libs, proper generic types and proper macro abstractions. Static langs rarely have these features, besides ocaml and mercury, but most dynamic langs have it. Hence parents comment was being critized.

Javascript is of course torture for other reasons, but lisp, prolog, clojure et al do make sense. Lisp being the language with the most implemented compilers. Prolog probably being the easiest. Prolog compilers are usually much shorter and better than lisp ones. Super fit is only OCaml because it already comes with all the infrastructure, C parsers and such. In lisp you'd need to write 50 lines.


OHM is also the acronym for Open Hardware Monitor, a great open-source project for monitoring computer temperatures, fan speeds, voltages, etc: https://openhardwaremonitor.org/




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: