Soufflé: A Datalog Synthesis Tool for Static Analysis

NeutralForest · on Nov 30, 2022

I sometimes see posts about Datalog & Co being posted here. I must say I don't understand where and when this is used. Like, I see the home page, I see the example page and I still don't understand. Is there any actual application using this? What for?

jitl · on Nov 30, 2022

You could theoretically use Datalog anywhere you're currently using a SELECT SQL statement - it's a declarative way to produce a set of tuples given some existing sets of tuples.

But, Datalog is much better at deductive reasoning compared to SQL. There's a great interactive tutorial here: https://percival.ink/. To see how Datalog and SQL can be equivalent, I have a build of percival.ink that transforms simpler queries to SQLite: https://percival.jake.tl/

You can also see how Souffle itself can be useful in this blog post: https://ianthehenry.com/posts/drinking-with-datalog/

That post describes building a recipe engine that can tell you, given a bunch of recipes + the current contents of your bar cart, what drinks you can make, as well as giving you top ingredients you could buy that would allow you to make new drinks. It's quite a readable walkthrough of a very practical application of Datalog/Souffle in a place where SQL would certainly struggle.

emmanueloga_ · on Dec 1, 2022

I'm curious I can't find any reference to "unification" in percival's source code [1]. I'd assume you can't have datalog without unification :-)

1: https://github.com/ekzhang/percival

infogulch · on Dec 1, 2022

I've also worked on percival a bit, it compiles (transpiles?) the datalog ast into javascript code on demand and executes it to get the results, see [1]. Percival's creator, Eric, submitted a Show HN that received a couple comments [3], and also submitted a 10m presentation about the project to the HYTRADBOI 'virtual conference' earlier this year [2]. The Have You Tried Rubbing A Database On It conference included several awesome presentations featuring datalog, which readers may find interesting [4].

[1]: https://github.com/ekzhang/percival/blob/main/crates/perciva...

[2]: https://www.hytradboi.com/2022/percival-a-reactive-language-...

[3]: https://news.ycombinator.com/item?id=29521975

[4]: https://www.hytradboi.com/

philzook · on Dec 1, 2022

Depending on what you mean, I'd say datalog is partially characterized as compared to prolog by it's lack of unification (also it's typically executed bottom up and sometimes considered to not have compound terms). Unification is roughly bidirectional pattern matching whereas datalog rules are in essence performing unidirectional pattern matching / query on a database.

zozbot234 · on Dec 1, 2022

You can do "deductive" reasoning in SQL too, via views and possibly-recursive CTE's.

jitl · on Dec 1, 2022

Sure; you can do most Datalog things in SQL and most SQL things in Datalog. But I find it much clearer to express deductive reasoning in Datalog. I built a toy Datalog-to-CTE compiler at https://perceval.jake.tl if you want to play with some examples of the equivalence.

burakemir · on Nov 30, 2022

(Mangle was posted recently which I like so I am a bit biased)

Datalog is a formalism like relational algebra but additionally supports recursion.

So it can roughly play the role of SQL. Compiler writers have used it to query their symbol tables and identify patterns for example (seems to be the origin of Soufflé, "static analysis").

I believe the reemergence of datalog is at least in part due to compute and memory being more plentiful so it is more affordable and practical to use a declarative query language. A lot of databases (or the part that is relevant for answering a particular query) comfortably fits in a single machine's memory and an expressive query language is fun to use. There is also an incremental evaluation story.

Another potential reason is that the logical data model makes it easy to represent any kind of data without schema changes and such.

A lot of programs end up either doing SQL or something SQL-like. There are therefore many applications of such datalog like languages.

In many places where people use custom "rule languages" they could use datalog instead. (edited: typo & rule languages)

lukashrb · on Nov 30, 2022

As I understand, implicit joins are another selling point

mechtaev · on Nov 30, 2022

Souffle Datalog is used for defining program analyses in such projects like Doop [1] for Java and cclyzer++ [2] for LLVM.

GitHub's CodeQL [3] is another Datalog dialect used for detecting bugs and vulnerabilities.

Datomic [4] is a database that uses Datalog as the query language.

[1] https://bitbucket.org/yanniss/doop/src/master/

[2] https://github.com/GaloisInc/cclyzerpp

[3] https://codeql.github.com/

[4] https://www.datomic.com/

gavinray · on Dec 1, 2022

I'd never heard of this cclyzer tool, thanks for sharing. This looks really interesting. Know of any other tools for C/C++/JVM static analysis that folks might not be aware of?

At the moment, I use GCC's -fanalyzer, the LLVM sanitizers + static analyzer, FB's Infer, and PVS Studio.

yababa_y · on Dec 1, 2022

https://github.com/kframework/c-semantics while you can do static analysis with this the dynamic instrumentation of UB isnfar more thorough than ubsan

gavinray · on Dec 1, 2022

This is great and I'd also never seen this either -- thank you!

ruricolist · on Dec 1, 2022

DDisasm ("a fast disassembler which is accurate enough for the resulting assembly code to be reassembled") uses Souffle: https://grammatech.github.io/prj/ddisasm/

philzook · on Dec 1, 2022

From my perspective its main use case is static program analysis. Dataflow analyses over approximating what values certain variables can take or where certain references can point https://yanniss.github.io/points-to-tutorial15.pdf . I assume people in different sectors of the software industry may have different use cases they find compelling.

Static analyses tend to be depend on each other (mutually recursive) so slamming all the rules together in a single system is useful. Loops in your programs lead to loops in your analysis somewhere, so the recursive nature of datalog is also useful. The monotonic accumulating and terminating nature of datalog are also desirable properties of static analyses. The logic of program analyses is subtle and complex to get right, so it's nice to have a high level declarative way to state and adjust them as time goes on. See monotone frameworks https://tudelft-cs4200-2019.github.io/lectures/statics/monot...

Mostly, I just think it's all kind of neat. Same with most other CS topics. I only work in the software industry because I find something compelling about the subject matter. That's true for most of us I assume. Well and money of course :). Some bits of CS I don't care about until I find some reason related to things I already think are neat.

WesternStar · on Nov 30, 2022

Its also being used in the creation of Rust's next gen borrow checker Polonious.

rad_gruchalski · on Nov 30, 2022

Not Datalog but inspired by Datalog: https://www.openpolicyagent.org/docs/latest/policy-language/.

sasaf5 · on Dec 1, 2022

"Synthesis", "Static Analysis", I thought it was related to chip design!

Krasnol · on Dec 1, 2022

> Soufflé is short for Systematic, Ontological, Undiscovered Fact Finding Logic Engine. The EDB represents the uncooked Soufflé and the IDB causes the Soufflé to rise, i.e., monotonically increasing knowledge. When it stops rising and a fixed-point is reached, the result is a puffed-up ready-to-eat Soufflé. Big thanks to Nicholas Allen and Diane Corney from Oracle Labs/Brisbane for finding a translation.

Great name...not.

I know coming up with a name for a project is hard but why not at least google something before you chose is as a name first? This is horrible. Just go and take some lesser known Hindu deity if you have no idea at all and don't care. There are many.

HelloNurse · on Dec 1, 2022

"Soufflé" is a frivolous but serious enough for academic purposes name AND (as explained in the quote) unusually meaningful AND a common word: what more can you ask for?

pron · on Dec 1, 2022

Oracle Labs have another static analysis tool called Parfait. I don't know which came first, but they got a little theme going there.