I have spent a lot of time figuring out how to deal with a large graph a couple of years ago. My conclusion - there will never be such a thing as a "graph database". There are many efforts in this area, someone here already mentioned SPARQL and RDF, you can google for "triple stores", etc. There are also large-scale graph processing tools on top of Hadoop such as Giraph or Graphx for Spark.
For the particular project we ended up using Redis and storing the graph as an adjacency list in a machine with 128GB of RAM.
The reason I don't think there ever will be a "graph database" is because there are so many different ways you can store a graph, so many things you might want to do with one. It's trivial to build a "graph database" in a few lines of any programming language - graph traversal is (hopefully) taught in any decent CS course.
Also - the latest versions of PostgreSQL have all the features to support graph storage. It's ironic how PostgreSQL is becoming a SQL database that is gradually taking over the "NoSQL" problem space.
In my point of view, the fact that you can add an expert index very easily to a graph database written in a modern language (say no C/C++) makes it even easier to customize an existing graph database to suit your direct need. In turn, storage and runtime can be tunned more easily. Making so easy to have the performance you need. But at the end of the day not dealing with algreba is the best.
Years ago PostgreSQL already support recursive query, and in Oracle you have CONNECT BY. I have only used the recursive with once and it was just a quick demo, but my understanding is update is extremely expensive.
We're using TitanDB. One of the main benefits for us is that AWS has provided backend integration with DynamoDB. This affords you practically infinite and painless scaling on a pay-as-you-go model. Love it.
Depends on what kind of data and graph you are going to store/use. Neo4j is quite popular, cypher isn't very hard to learn, and it has lots of examples. Might be a good choice for a beginner.
Not really, if you learn Cypher you should be fine learning the basics of Gremlin, SPARQL, or other languages to operate on graphs in a few hours.
There was some post about enabling SPARQL in Neo4J, but when you install Neo4J it comes with cypher by default (not sure if it supports anything else).
I use Apache Jena + SPARQL, but had to use Neo4J to help in a master thesis. Took me a few hours of "How the heck can I do that same thing I'd do in SPARQL that way?", plus some reading of the tutorials.
Cypher expressses the graph patterns that you're looking for in an ASCII-art-syntax, so you don't loose sight of the core of your question. On top of that you get filtering, projection, aggregation, pagination.
The most fun things are in-query dataflow which allows you to pass information from one query part to the next (projected, aggregated, ordered etc).
And the really cool collection and map functions, so you save a lot of roundtrips between client and server.
ConceptNet [1] started out as an academic project that I was responsible for for a while. Since then I've left to start a company but I still maintain ConceptNet and build lots of stuff on it.
Here's a list of databases, some of them graph databases, some of them barely databases, where I've tried to store and look up edges of ConceptNet:
- SQLite
- PostgreSQL
- MongoDB
- Some awful IBM quad-store
- HypergraphDB
- Tinkerpop
- Neo4J
- Solr
- Riak
- SQLite with APSW to speed up importing
- Just a hand-rolled hashtable on disk
Here are the systems that have succeeded to any extent, in that I could do simple things with them and they didn't collapse:
- PostgreSQL
- SQLite with APSW to speed up importing
- Just a hand-rolled hashtable on disk
The time when I tried Tinkerpop, HypergraphDB, and Neo4J because I had a graph and graph databases are supposed to be good at graphs was particularly terrible. Graph databases seem to only be good at dealing with graphs so small that anything can deal with them.
If this has changed, please point me at an open-source graph database that's not terrified of gigabytes. (No trying to sell me SaaS, please.)
Graph DB technology has been advancing fast over the last few years, and more evolutions are coming down the pipe. For example, Titan and Blazegraph are distributed and can handle billions of edges, and Blazegraph can be GPU-accelerated, which "demonstrated a throughput of 32 Billion Traversed Edges Per Second (32 GTEPS), traversing a scale-free graph of 4.3 billion directed edges in 0.15 seconds" (https://www.blazegraph.com/product/gpu-accelerated/).
NB: TinkePop is not a graph DB -- it's a graph software stack / computing framework for graph DBs (OLTP) and graph analytic systems (OLAP). Since TinkerPop is integrated with almost all of the graph DBs and graph processing engines, its mailing lists are good place to discuss and get help with graph-related projects.
I like how at least the BlazeGraph people are talking about billions of edges and not thousands, but I'm not sure that's something I could use. That seems to be a "pre-order" page, so it sounds neither open source nor existent. And I'm trying to figure out what their normal non-GPU non-distributed software is, but it seems to mostly be a pile of Javadocs.
Using distributed computing on mere gigabytes of data is silly.
I think TinkerPop was something else back in 2011, but apologies if I've used the wrong terminology.
Knowing the ConceptNet project a little, still I don't understand the workload/algorithms you run on the database. Otherwise said, «this doesn't work for us» doesn't help anybody.
It really depends on the kind of algorithm you run on the database.
Based on open source project, in read/write mode, no db can help you since you load everything into memory. As a noob NLP user, I rather use something like AjguDB https://github.com/amirouche/ajgudb
It's interesting. I suppose most graph db operations could be easily enough broken down into a series of steps that could be performed in a more scalable but slower way.
Did your hand-rolled hashtable have any characteristics that would make its performance characteristics difficult for a smarter optimizer (if such a thing existed in Neo4j)?
Can you psudocode an example slow query/operation and indicate how many edges/vertices were being considered at each step?
Sorry to ask these kinds of questions, I'm just really curious about the situation you described.
The failures of these databases were a lot more fundamental than I think you're looking for. And so far it hasn't been a trade-off where a non-graph DB has been more scalable but slower; instead, non-graph DBs have been more scalable and faster.
Here's what I have to be able to do in the database:
1. Import millions of edges from a flat file (time limit: 24 hours)
2. Query any node to return up to 100 edges connected to it (time limit: 100 milliseconds)
3. (nice to have) Find the maximal core of nodes that all have degree at least n to each other (time limit: a few hours)
4. Iterate all the edges between the nodes in a specified subset, such as the degree-3 core, which may still be millions of edges (time limit: a few hours)
#3 is optional, and the alternative is to export all the edges and compute it outside the database. But it's the only thing here that's actually a graph algorithm. However, every open-source graph database I've tried is orders of magnitude too slow at one of the other steps. They either fail at importing, fail at iterating, or fail to respond to trivial queries in a timely manner.
I forgot to mention one other non-graph-database system that met my requirements, which is Kyoto Cabinet. The main downside of it is the GPLv3 license.
Might I ask what sort of dataset size, servers etc you are using? I'm looking for an graph database and Cayley seems the best fit, though I'm not sure what sort of limits on the data there would be in the real world.
Cayley has been stable for us so far but I can't vault for it's scalability as our database is very small, less than 6M quads, so an 8GB machine is more than enough.
What's the story on inserting data into Cayley? I think every single code example I have seen on it only shows traversing graphs with it.
Also, if you don't mind me asking, how does it not being a property graph affect modelling your data and queries? At a glance it seems that queries would get significantly more complex if you wish to take several properties of a vertex into account.
Cayley can be run in two ways: as HTTP service or as a Go library that you import from your Go app. To insert into Cayley when it's running as HTTP service you can do something like: `curl http://localhost:64210/api/v1/write -d '[{ "subject": "Krissy", "predicate": "loves", "object": "Justin Trudeau"}]'`.
Or if you like the idea of using a graph database developed by a secretive organisation full of geniuses dedicated to collecting and organising the world's information, but would rather it was public-sector, there's Gaffer:
It's been stable for us since deployment eight mounts ago, although we haven't pushed it too hard - our data is relatively small and queries not too complex.
From what I've seen, our use has a lot in common with Seed-DB, only in a different economic sector/activity.
"The performance will suffer if the dataset is much bigger than the memory."
That is a huge drawback when compared to relational databases.
A good follow-up question would be: which open-source graph databases can reasonably import and store graph data that's not small -- that is, more data than fits in than RAM? Without proprietary extensions?
Let me explain this quotation. When your graph data (including indices) do no longer fit into the RAM of a single server, you can either live with the higher latency of loading data from disk or you can use sharding, which will lead to communication and therefore slower traversals.
That does not mean that things stop working, but performance will be less good, you can no longer visit tens of millions of nodes per second in a traversal as in RAM on a single server.
If you actually only traverse a much smaller hot subgraph, I would probably go for the disk based single server approach.
If your graph has a natural known clustering, then an optimized sharding solution with fine tuned sharding keys us probably your best bet.
You can do all this with ArangoDB.
However, graph traversals vary greatly in many respects, and your mileage may vary accordingly, with any approach.
I would love to chat in more detail about your use case.
Throwing money on RAM because most graph databases haven't figured out how to use the disk effectively is not a good use of money. What's cheaper than buying all the RAM in the universe is figuring out a different system besides a graph database that does the job.
A good example would be the graph of Wikipedia links. About 100 million edges among 5 million nodes, last I checked. The nodes have large differences in degree.
The raw data for this is not the slightest bit large. We're only talking about gigabytes. But it would absolutely destroy Neo4J to even try to import it, to say nothing of running an interesting algorithm that justifies using a graph database on it, and Neo4J seems to be everyone's favorite open-source graph database for some reason.
There are multiple systems out there, however I have my doubts. It is important that your data does not get corrupted, and that your transactions will not get lost. Furthermore, speedups are possible with certain indices. That is why I personally would want to see some more safety/speed analysis and comparisons between the different systems.
(Full disclosure: I'm the author, we are VC backed) https://github.com/amark/gun is an Open Source graph database with Firebase like realtime synchronization.
Dijkstra's algorithm is wonderful, but by no means is a requirement for being a graph database. A graph database is that, a database composed of nodes that can interconnect into a graph. GUN supports this and allows for traversing the graph. We haven't implemented Dijkstra's algorithm, which is what your "discussion" refers to.
We've never talked about Dijkstra as far as I recall. Your product doesn't support graphs any more than, say, mongodb does because ultimately all you are doing is loading an object from JSON and then sending it to the consumer. This is in contrast to true graph databases whose main selling point is their ability to efficiently traverse, filter and aggregate large graphs to find the answer to some question, and then send the answer to the consumer.
Gun doesn't meet anyone's definition of "graph database" other than your own. If I load some JSON from a URL, and use lodash to pluck some data out of it, is it a graph database?
Are you inventing your own new, hipstor definition of a "graph database" here? I used graph databases a lot, and I never heard of any of the requirements you've listed.
Not at all but maybe I'm not being clear. My definition of "graph database" is "a database which can efficiently represent graphs and offers functionality for traversing them, in order to allow queries such as 'find the business relationships between user X and user Y based on who they've worked with' (ala linkedin)"
This is not what gun does. It alternately calls itself "the simplest database out there", "not a database" and "a distributed cache". It provides a mechanism for sharing a list of objects across multiple peers, but must transfer all of the data to each peer. It is conceptually similar to downloading a large chunk of JSON from a server and using lodash, ramda etc to query it, but no one would call that a graph database.
> a database which can efficiently represent graphs and offers functionality for traversing them
First part is matching the commonly accepted definition (the one that had been around for about 50 years). The second part is your own invention.
> This is not what gun does
I did not even have a chance to take a look at that product yet. So far I'm just puzzled by the graph database definition some people seem to be using in this thread.
I only just discovered graph databases a few years ago, so I have no idea what the definition of "graph database" was in ye olde days, but the parent's definition is the only one I'm familiar with.
And it makes the most sense. After all, as programmers we're rarely concerned about the layout of data in memory, but rather the abstract data type (ADT) that we have to work with. An ADT is defined not by it's memory layout (i.e., a set of vertices and a set of edges do not a graph make (set, of course, also being an ADT) -- there are several possible ways to represent a graph in memory), but by the operations that are defined for the data type and their characteristics (i.e., an adjacency relation (possibly along with an incidence relation) does a graph make). Of course, specialized traversal operations are more of a convenience than a necessity (and they typically allow greater performance than implementing solely in terms of the adjacency relation), but the point stands.
Consider that a list may be represented in several ways: cons cells, classes, closures (just to name a few). But for clients of the list, none of that really matters. It only matters if a list has certain operations: cons, car, cdr (or some equivalent interface). As far as clients are concerned, any object which provides the list interface is a list; inversely, any object which does not provide that interface is not a list.
In a relational database, it hardly matters how data are stored in memory. What matters is that they provide an interface that allows relational algebra (or some close approximation) to be performed on the data. Likewise, I'd argue that how a graph database stores its data is inconsequential, and the only requirement is that it expose graph operations on that data.
> After all, as programmers we're rarely concerned about the layout of data in memory, but rather the abstract data type (ADT) that we have to work with. An ADT is defined not by it's memory layout
Otherwise, you can think of it as tradeoff, between expressiveness and computing speed (over an area of expertise) so really in between Memory Layout and ADT.
> In a relational database, it hardly matters how data are stored in memory.
Of course it does. It depends on where you want to optimise for speed.
> I'd argue that how a graph database stores its data is inconsequential, and the only requirement is that it expose graph operations on that data.
No. It makes different trade-offs for different purpose so it's not inconsequential.
GraphDB might be only a niche where only a few people have to use it. But still worth engineering because it help the ADT/expresiveness cause.
I see now that I didn't make this clear in my comment, but I'll reproduce part of my reply to the sibling:
> I don't mean to imply that choice of data layout/representation doesn't matter at all, but that it doesn't matter for the purpose of deciding what constitutes a graph and distinguishing graphs from non-graph objects. Of course, as with any ADT, there are various trade-offs that need to be considered before deciding on a particular memory layout.
The beauty of ADTs is that once you've exposed your operations, you are free to change the memory layout without breaking clients -- or even to supply multiple structures with different layouts at the same time, each optimized for a different use case -- and in doing so, you never change the notion of what constitutes a graph.
A typical graph dbms exposes a node selection (often only allowing to start at a single entry node or selecting a set of nodes by tags) and arc selection / filtering. Given that they were mostly used for CADs such an interface makes a lot of sense.
None of such databases ever featured a query language capable of defining a Dijkstra algorithm.
And, no, for a typical use of a graph database, it matters most how cheap it is to follow a graph arc. Therefore, representation matters. Otherwise a graph interface on top of a relational storage would have been sufficient.
> A typical graph dbms exposes a node selection (often only allowing to start at a single entry node or selecting a set of nodes by tags) and arc selection / filtering.
That doesn't contradict my view. In fact, I'd argue that it counts, due to the fact that graph operations are provided.
> None of such databases ever featured a query language capable of defining a Dijkstra algorithm.
Most modern graph databases do seem to feature some sort of query language. I won't argue that it's strictly necessary, as long as you have well-suited, well-defined operations on graphs. I can't speak to Dijkstra's algorithm -- that was a part of the thread that I overlooked previously.
> And, no, for a typical use of a graph database, it matters most how cheap it is to follow a graph arc. Therefore, representation matters. Otherwise a graph interface on top of a relational storage would have been sufficient.
I don't mean to imply that choice of data layout/representation doesn't matter at all, but that it doesn't matter for the purpose of deciding what constitutes a graph and distinguishing graphs from non-graph objects. Of course, as with any ADT, there are various trade-offs that need to be considered before deciding on a particular memory layout.
> None of such databases ever featured a query language capable of defining a Dijkstra algorithm.
Yes, because Dijkstra is not what is the most interesting stuff to write against a graph in every day use. Dijkstra is a primitive than you use but that you have to tweak to solve the particular problem. What is the interest of optimizing writing that particular algorithm?
You seem to follow the idea that there is a super-algorithm to define the way mind works instead I think that's it many small algorithms with similar purpose.
perhaps this is a terminology issue, I'll admit that I'm too young to have used pre-relational graph databases, but presumably they have a way of navigating / jumping between documents/vertices, (presumably based on pointers), otherwise what point would there be in having a graph?
I'd encourage you to look at the product and see whether it meets your definition.
Yes, of course you could always select a node (often you'd always have to start from a single root node), select arcs, filter the arcs by some criteria, etc.
But I've never seen a complex query language that would allow to express any complex traversal strategies (like Dijkstra algorithm), and from your wording I concluded that this was your requirement for something to be called a graph database.
Query language isn't a requirement but exposing the capability to do those kinds of selects / filtering operations is. It doesn't have to support Dijkstra's algorithm but it should be possible to implement it (efficiently).
Everybody's focused on graph databases here but let's talk about Cray! One of the most forward-thinking computer technology companies ever to exist is starting to get out there again. If they got a few hundred million dollars from an outside investor, they could do friggin' incredible things. They already do incredible things but not out there in the way it so easily could be.
Cray is a brand name that has been passed around between half a dozen companies (including Sun and SGI) dotted by various kinds of product reboots and commercial failures. Cool stuff but supercomputing isn't the most financially sound business it seems. The current name holder is the company previously called Tera, originally famous for making an aggressively multithreaded HPC computer.
"an aggressively multithreaded HPC computer" - still sold as Urika-GD, marketed as a dedicated graph appliance. That's where Cray Graph Engine originated.
I am huge fan a graph-y stuff. I did several iteration over a graph database written -- in Python -- using files, bsddb and right now wiredtiger. I also use Gremlin for querying. Have a look at the code https://github.com/amirouche/ajgudb.
I've seen people using graph databases as a general-purpose backing store for webapps/microservices. What are people's opinions about this?
My feeling is that graph databases are not suitable/ready for — for lack of a better term — the kind of document-like entity relationship graphs we typically use in webapps. Typical data models don't represent data as vertices and edges, but as entities with relationships ("foreign keys" in RDBMS nomenclature) embedded in the entities themselves.
This coincidentally applies to the relational model, in its most pure, formal, normal form, but the web development community has long established conventions of ORMing their way around this. The thing is, you shouldn't need an ORM with a graph database.
It introduces false dichotomy "graph vs relational".
In fact, most (if not all) graph algorithms can be expressed using linear algebra (with specific addition and multiplication). And matrix multiplication is a select from two matrices, related with "where i=j" and aggregation over identical result coordinates.
The selection of multiplication and addition operations can account for different "data stored in links and nodes".
So there is no such dichotomy "graph vs relational".
Strictly speaking yeah. Practically speaking: Not really true.
Just because something can be done, doesn't mean it can be done easily or well. I've done a lot of work with relational databases, and I love them for a lot of data sets. But I also have done a lot of work with graph databases - and they make working with graph shaped data a pleasure. I could do a graph in SQL, it's even moderately straight-forward in postgres these days by using WITH RECURSIVE - but it's still not as simple as just loading orient or arango for those tasks.
It's the same reason I keep multiple knives in my kitchen. Sure I could do everything with an 8" chef's knife, but the paring knife and the boning knife just make some tasks easier.
Facebook has a very good paper descriptions on how they do graph on top of relational database. Google "facebook tao" for details.
I read that and implement my own version with SQL in < 500 lines of python code and found it just perfect for my own use cases. I can easily query any edges, notes in web speed ( < 10 ms) from databases with millions of nodes, edges, GBs of info.
I am curious what I might be missing with that approach as compare to a real graph database?
From my quick read of tao, it seems to be doing essentially what graph databases do but with the data storage layer being in sql rather than some other object store. And with the interface layer being not quite as feature complete. So the query layer in tao seems to lack a way to follow multiple edges without first returning to the application code, which graph dbs present as a native feature. The other thing that's lacking, which some graph dbs provide, is labeled edges - that is edges that contain data besides just an "edge type".
I implemented a table for Nodes, one for Edges. The Nodes table has an entry for JSON for that Nodes.
If I need more info for particular "Edge type", I just add new Node entry type "Edge_info" that link the Edge type to a JSON that content such info. I found that very flexible, but I have not used any real graph database.
Curious, since you seem to have experience with both graph and relational databases (and I have not really worked much with the former)... if I had a graph where I just want to compute the shortest weighted path between two nodes, which model would suit better and how much difference would it make in terms of performance?
Of course you can express anything on top of a relational model. But for graphs such a representation would have been awfully inefficient. For this reason, CADs never even tried to switch to a relational data storage once that fancy new relational databases appeared, most of the professional CADs are still using good old graph databases.
I am part of the team developing Russian CAD system [0]. It uses what one can consider a hypergraph db (relation includes many objects), but that DBMS system has queries on par with SQL. And they prove themselves very useful in development of CAD.
What you describe can be explained with development inertia. Most CADs are C/C++ and these languages are not very well suited for changes that go through all code base (change of storage engine and data model).
I also made experiments during a dev of graph analytics engine in one part of my experience. The relational model (actually, linear algebra model) has proven itself very competitive. It allows for easy distribution of data, the operations over distributed data are close to optimal, etc, etc.
I worked on a (very old) ship-building and factory-building CAD, written mostly in PL/I and Fortran. It was built around a graph database. Throughout its long and turbulent history, there were many attempts of moving to a relational storage, every time resulting in orders of magnitude drop in performance.
Keep in mind that in such a CAD designs are huge. Think of an aircraft carrier scale of "huge". And it was designed when memory was very limited. Therefore, pretty much all the CAD operations depended on the database access.
So, nobody really cared about the queries, they were insignificant. What people cared about was:
* Performance of following an arc
* Transactions
* Data consistency
* Compactness of representation (remember, disk space is also a limited thing when you're building aircraft carriers).
Had the state of art changed from the time of their last attempt to move to relational DB? Had the hardware changed during that time?
You tell us about some old project, written in hard to maintain languages, which had many failures to adapt to new tech. This is exactly what to expect.
I am talking about relatively modern language (C#) using good DB tech (lagging about seven, maybe five years from the state of art). Maybe, the story will be different in our case.
Fundamentally nothing changed in the relational storage. Follow-a-graph-edge operation is as expensive as it used to be (involves an index lookup, it cannot be cheap).
If you know a relational arrangement suitable for a cheap O(1) edge traversal - please share. But I am very skeptical.
And I cannot see how the host language is relevant at all. C#, Haskell, whatever - none can make data access operations cost less than what the data model predicts.
Use covering indices for edges, it is vastly cheaper than regular indices. Choose write-optimized storage layer (fractal indices (circa ~2007, BTW!), LSM) for DB. Don't do singular accesses, use bulk accesses (and now we hit the wall of PL/1 and Fortran you mentioned). I can go on.
I cannot help but feel that what you describe is a classic example of technical inertia due to massive technical debt. It cannot prove that relational DBs are bad for CADs.
Is any of the indices you're describing O(1)? Very, very unlikely.
As for the bulk operations, they're in most cases totally useless. Rendering - maybe, but most of the other CAD operations require precise edge following.
And why even going into all the troubles with using this totally unsuitable relational representation when a proper graph dbms is so much easier? Relational religion is so funny, almost as funny as OOP.
What exactly are these "most CAD operations"? I bet they follow many links from nodes (note the plural!) in most of them. Single node operations are seldom and even then there are many links to follow. This is true for scheme/PCB editor, I bet it is even more true for arch design.
You can look at these indices as O(1) operations. Btrees are just like that.
(if you think that memory access is O(1), you are wrong)
Graph DBs, more often than not, are ad hoc bug ridden slow poor implementation of one tenth of relatively complete implementation of relational DB.
CSG, place and route (pipes, cables, etc.), design constraint checks, all that stuff.
> I bet they follow many links from nodes (note the plural!) in most of them.
Not that many, mostly single-digit numbers.
> This is true for scheme/PCB editor
Which is very, very different from an oil refinery or an aircraft carrier. Both in a scale and typical operations.
> You can look at these indices as O(1) operations. Btrees are just like that.
WAT?!? Not even close. O(log n) at best. And a multiplier there is huge.
> Graph DBs, more often than not, are ad hoc bug ridden slow poor implementation of one tenth of relatively complete implementation of relational DB.
What?
Graph DBs are orders of magnitude simpler than any relational pile of a mess. It's really hard to screw them up. Everything is trivial there, including transactions, logging, referential transparency and all that.
Design constrain checks touch A LOT of stuff: usually things have several classes attached to them and check need to be performed on union or intersection of class-related data. Place and route, even for 2D PCB, also touches a lot of stuff - it needs to access constraints, at the very least. In our case, constraint system had to return data to PCB real-time router in under 100us, for several tens of millions of different classes of constraints. I cannot see how you can speak of singular accesses in this context. Or "single-digit as many" accesses.
Now I'll leave conversation. We clearly have different view on almost everything, including, but not limited to "huge multipliers".
Anybody know dgraph.io?
it's a Scalable, Distributed, Low Latency, High Throughput Graph Database over terabytes of structured data.
DGraph supports facebook GraphQL as query language, and responds in JSON and the storage engine is facebook rocksdb a very fast database.
see more in https://github.com/dgraph-io/dgraph
One of the biggest challenges in databases is handling concurrency and sharding, wish this would have talked a bit more about how that changes between a graph database and a relational database.
For the particular project we ended up using Redis and storing the graph as an adjacency list in a machine with 128GB of RAM.
The reason I don't think there ever will be a "graph database" is because there are so many different ways you can store a graph, so many things you might want to do with one. It's trivial to build a "graph database" in a few lines of any programming language - graph traversal is (hopefully) taught in any decent CS course.
Also - the latest versions of PostgreSQL have all the features to support graph storage. It's ironic how PostgreSQL is becoming a SQL database that is gradually taking over the "NoSQL" problem space.