In a classic relational model everything is a tuple containing scalar values. Graph-relational extends the relational data model in three ways:
- every relation always has a global immutable key independent of data (explicit autoincrement keys aren't needed)
- this enables us to add a "reference type", which is essentially a pointer to some other record (i.e. a foreign key)
- attributes can be set-valued, so you can have nested collections in queries and in your data model.
This is what lets us do `Movie.actors.name` instead of a bunch of `JOINs`, because `actors` is declared as a set-valued reference type in the `Movie` relation.
That's awesome. I think you've hit the nail on the head by trying to fix the SQL part of relational databases and not the relational part. It's been a pet peeve of mine for ages that relational databases have been described as inadequate for modelling relationships and graph databases have been described as the solution. You CAN'T fix the problem just by going from n-ary to binary relationships.
How deeply is EdgeDB integrated into Posgresql? Any chance it could be used to query other databases eventually?
Firstly, what happens to the performance when I have a sizeable resultset of set-valued data?
I've seen similar ideas implemented in the past that look fine for the Movies and Actors or Books and Authors examples but fall apart badly when you query a number of fields (20+) that have sets within them, which can happen on say, a sizeable reference database of marketing information.
Another question: How deep in the graph can I go, and how much circular reference protection is there? E.g. if I query Movie.actors.movies.actors?
I'm interested in graph databases and data modeling and while it offers some convenience I'm always skeptical but hopeful (mostly from having lost a lot of hours) that these problems have been solved sufficiently to keep performance good in practical use cases.
EdgeDB is graph-relational, not a pure graph database, and so the performance characteristics of traversing links are that of a relational JOIN. Which, of course, depends wholly on the size of each relation being joined. So, if you want to select the list of actors for _every_ movie in your database and there are lots of movies, it'd be a pretty expensive operation. If, on the other hand, you want to select some relationships on a handful of objects (or even just one), then it doesn't really matter that much how deep your link traversal is, because all of the steps would be fast index scans.
> How deep in the graph can I go,
As much as you want, though the path must be explicit, EdgeQL currently doesn't have any way to say "traverse link foo recursively".
I’ve always wished for MongoDB to have a “deepFind”, so that when I fetch a document, it will fetch the nested relations also instead of doing an aggression to do the lookup. Feel like if their objectID only included a collection name reference then somehow it should be possible. Perhaps a depth parameter would use be useful for more relational data.
Congrats on the milestone! Will definitively have a look at edgeDB.
There used to be a DBLink data type containing both collection and id. But without any additional features using it.
It was considered bad practice and eventually got deprecated. Since 99% of the time the collection link in a given attribute is fixed and known in advance so it’s just duplicate information.
This is a nice example. How the data is stored physically? Does the model work for large datasets and when it could break down? What are optimal workloads? Do we still need to fiddle with indexing and such?
Data is stored relationally in Postgres in 3NF. References are indexed automatically, but you still need to index type properties if you use them in `filter`.
Wait, it stores the data in Postgres? So this is essentially a data model on Postgres?
FWIW, I do think there's a space in the market for a thin wrapper over Postgres (or MySQL) which would automate certain optimisations such as whether to index a particular table. It always struck me as perverse that that optimisation was delegated to the developer, when it's no more subjective or application-specific than a thousand other automated optimisations the engine makes. I'd be really interested if your project covered that.
It's built on Postgres, but it isn't a _thin_ wrapper. We lean hard into Postgres query machinery and type system in order to pull off EdgeQL and graph-relational efficiently.
Ah, OK, interesting! I don't have an immediate use case personally, but I wish you guys the very best. Honestly, database space needs way more competition than it has at present.
There are countless permutations of the choices that database designers face, so it's a shame there aren't mature products for more of them. I hope this particular permutation turns out to be a good one for lots of people :)
We compile EdgeQL queries into SQL currently, because it makes the architecture simpler and less us run on unmodifed Postgres, but conceptually nothing stops us from targeting the query planner directly via an extension or an alternative frontend that consumes EdgeQL IR direclty.
Ah, this is interesting: so Postgres is effectively a 'backend' for you, in much the same way that e.g. InnoDB is a backend for MySQL[0]?
And you - or hypothetically the end user - could change the backend, e.g. to Cockroach for better horizontal scalability, while trusting that EdgeDB will only rely on Postgres's public API at least in meeting its own public API/contract?
[0] It's hard to make that analogy with Postgres b/c it only has one storage engine, but of course the separation still exists.
We're working on a more comprehensive explanation of why EdgeDB isn't an ORM. Does EdgeDB do "object-relational mapping" under the hood — absolutely. The reason we try to distance ourselves from the category of ORMs is that the term "ORM" comes with a big bag of preconceptions that don't apply here.
EdgeDB has:
- Full schema model with indexing, constraints, defaults, computed properties, stored procedures
- A query language that replaces SQL. If there's something you can do in SQL that isn't possible in EdgeQL, it's a bug.
- The query language is backed by a full type system, grammar, set of functions and operators, etc.
- A set of drivers for different languages that implement our binary protocol.
By any definition, EdgeDB is a database. It's a new abstraction built on a lower-level abstraction: Postgres's query engine. Both abstractions indubitably fit any reasonable definition of "database".
Basically: just because there's a declarative object-oriented schema doesn't mean this "is just an ORM" (unless your definition is quite pedantic).
It runs as a separate (stateless) process between the client and the PostgreSQL server. There was a talk about the details of the architecture on the live stream today: https://youtu.be/WRZ3o-NsU_4?t=5294
It's something we want to do at some point, but unlike Hasura, which operates on GraphQL which is conceptually much simpler and limited, EdgeQL would be much harder to fit onto a "pipelined polling" model that Hasura utilizes to implement subs.
So, there is, basically, no way to work with Postgres directly, as well as installing EdgeDB with the use of the existing Postgres installation and its data, right? Feels like a Postgres is a prisoner of the EdgeDB :)
If you look at the code they used to have —postgres-dsn URL, now they changed it to —backend-dsn. If you go beyond the marketing, edgedb is a postgres connection pool+orm combined together with a query language (its a huge work in itself and commendable given it takes away many dependencies and provide a consistent developer experience). It’s leveraging postgresql server which is the true database providing ACID compliance.
Edgedb is more akin to hasura than to a database in traditional sense.
Hi there!
Yes, I do understand and that the EdgeDB is "a postgres connection pool+orm combined together" and the work guys did is incredible. And I do beleive it will find its niche. But for me, personaly, the idea of "postgres is now a black box for you, don't even try to go deeper into it" stops me from digging into EdgeDB right away (the tooling is amazing! migrations!) and even trying it in our production for our enterprize clients. Probably, with time, when edgeql can fully replace sql and plpgsql, we could say that EdgeDB is ready for everything.
BTW, @RedCrowbar, how can I call a udf or procedure (written in sql/plpgsql) from edgeql? I didn't find it in the docs.
There is no way to call UDF SQL (or PL/pgSQL) functions from EdgeDB, because there is no way to _define_ or manage them. The only place that is allowed to do this is the standard library (and, in the near future, extensions).
We realize that the database must be extensible and flexible, so non-EdgeQL UDF will become a reality (and if things work out the way we hope they will, they'll be amazing and far beyond what you can do with plpgsql).
Re "just an ORM", it seems like the word "mapper" is the hint that it isn't, since there's no "mapping" from the object-oriented view to the big-pile-of-scalars in a traditional relational db.
It's the first time I hear about graph-relational DBs. I remember back in college learning about graph databases, but since I never touched one I don't remember much TBH.
Is a graph-relational database something completely disjointed from a graph database? Or do they share some performance improvements to some use cases? Also does EdgeDB keep the advantages of a true graph database even being based on Postgres?
> It's the first time I hear about graph-relational DBs.
This is unsurprising, because we just invented the term :-)
> Is a graph-relational database something completely disjointed from a graph database?
Graph-relational is still relational, i.e. it's a relational model with extensions that make modeling and querying graph-like data easier. And in apps everything is graph-like (hence GraphQL etc). An important point is that graph-relational, like relational is storage-agnostic, i.e. it makes no assumptions on how data is actually arranged on disk.
Pure graph databases, on the other hand, encode the assumption that data is actually _physically_ organized as a graph into their model and query languages.
I guess the word "graph" is simply too overloaded in computing.
There’s at least one German-Bulgarian company called Plan-Vision that implemented such graph-relational approach like 15 years ago. their VSQL is similarly working on the E/R conceptual level and gets translated (or compiled into) to the underlying Postgresql or Oracle. You also get a neat EcmaScript like language that works with the collections in a graph like manner.
Long before Arango, Orient etc.
The company is absolutely nowhere near to you guys in terms of marketing, but their thing works with more than 40 enterprise clients so far.
So you definitely did not ‘just’ invent the concept. A lot of companies approach the problem one way or another…
>Pure graph databases, on the other hand, encode the assumption that data is actually _physically_ organized as a graph into their model and query languages.
What would be the benefit / disadvantage in each case?
In a classic relational model everything is a tuple containing scalar values. Graph-relational extends the relational data model in three ways:
- every relation always has a global immutable key independent of data (explicit autoincrement keys aren't needed)
- this enables us to add a "reference type", which is essentially a pointer to some other record (i.e. a foreign key)
- attributes can be set-valued, so you can have nested collections in queries and in your data model.
This is what lets us do `Movie.actors.name` instead of a bunch of `JOINs`, because `actors` is declared as a set-valued reference type in the `Movie` relation.