Hacker News new | past | comments | ask | show | jobs | submit login

Dijkstra's algorithm is wonderful, but by no means is a requirement for being a graph database. A graph database is that, a database composed of nodes that can interconnect into a graph. GUN supports this and allows for traversing the graph. We haven't implemented Dijkstra's algorithm, which is what your "discussion" refers to.



We've never talked about Dijkstra as far as I recall. Your product doesn't support graphs any more than, say, mongodb does because ultimately all you are doing is loading an object from JSON and then sending it to the consumer. This is in contrast to true graph databases whose main selling point is their ability to efficiently traverse, filter and aggregate large graphs to find the answer to some question, and then send the answer to the consumer.

Gun doesn't meet anyone's definition of "graph database" other than your own. If I load some JSON from a URL, and use lodash to pluck some data out of it, is it a graph database?


Dijkstra was your complaint about shortest path.

GUN can do efficient traversal and filtering, and this is going to be even better in our 0.5.x release with lexical cursor support.

By "anyone" do you mean Wikipedia's? https://en.m.wikipedia.org/wiki/Graph_database , because GUN does match its definition. Although we haven't implemented Dijkstra's.

I'm out in France right now and just boarded a plane to Slovenia, so I won't be able to reply again. Have a good one.


Are you inventing your own new, hipstor definition of a "graph database" here? I used graph databases a lot, and I never heard of any of the requirements you've listed.


Not at all but maybe I'm not being clear. My definition of "graph database" is "a database which can efficiently represent graphs and offers functionality for traversing them, in order to allow queries such as 'find the business relationships between user X and user Y based on who they've worked with' (ala linkedin)"

This is not what gun does. It alternately calls itself "the simplest database out there", "not a database" and "a distributed cache". It provides a mechanism for sharing a list of objects across multiple peers, but must transfer all of the data to each peer. It is conceptually similar to downloading a large chunk of JSON from a server and using lodash, ramda etc to query it, but no one would call that a graph database.


> a database which can efficiently represent graphs and offers functionality for traversing them

First part is matching the commonly accepted definition (the one that had been around for about 50 years). The second part is your own invention.

> This is not what gun does

I did not even have a chance to take a look at that product yet. So far I'm just puzzled by the graph database definition some people seem to be using in this thread.


I only just discovered graph databases a few years ago, so I have no idea what the definition of "graph database" was in ye olde days, but the parent's definition is the only one I'm familiar with.

And it makes the most sense. After all, as programmers we're rarely concerned about the layout of data in memory, but rather the abstract data type (ADT) that we have to work with. An ADT is defined not by it's memory layout (i.e., a set of vertices and a set of edges do not a graph make (set, of course, also being an ADT) -- there are several possible ways to represent a graph in memory), but by the operations that are defined for the data type and their characteristics (i.e., an adjacency relation (possibly along with an incidence relation) does a graph make). Of course, specialized traversal operations are more of a convenience than a necessity (and they typically allow greater performance than implementing solely in terms of the adjacency relation), but the point stands.

Consider that a list may be represented in several ways: cons cells, classes, closures (just to name a few). But for clients of the list, none of that really matters. It only matters if a list has certain operations: cons, car, cdr (or some equivalent interface). As far as clients are concerned, any object which provides the list interface is a list; inversely, any object which does not provide that interface is not a list.

In a relational database, it hardly matters how data are stored in memory. What matters is that they provide an interface that allows relational algebra (or some close approximation) to be performed on the data. Likewise, I'd argue that how a graph database stores its data is inconsequential, and the only requirement is that it expose graph operations on that data.


> After all, as programmers we're rarely concerned about the layout of data in memory, but rather the abstract data type (ADT) that we have to work with. An ADT is defined not by it's memory layout

Otherwise, you can think of it as tradeoff, between expressiveness and computing speed (over an area of expertise) so really in between Memory Layout and ADT.

> In a relational database, it hardly matters how data are stored in memory.

Of course it does. It depends on where you want to optimise for speed.

> I'd argue that how a graph database stores its data is inconsequential, and the only requirement is that it expose graph operations on that data.

No. It makes different trade-offs for different purpose so it's not inconsequential.

GraphDB might be only a niche where only a few people have to use it. But still worth engineering because it help the ADT/expresiveness cause.


I see now that I didn't make this clear in my comment, but I'll reproduce part of my reply to the sibling:

> I don't mean to imply that choice of data layout/representation doesn't matter at all, but that it doesn't matter for the purpose of deciding what constitutes a graph and distinguishing graphs from non-graph objects. Of course, as with any ADT, there are various trade-offs that need to be considered before deciding on a particular memory layout.

The beauty of ADTs is that once you've exposed your operations, you are free to change the memory layout without breaking clients -- or even to supply multiple structures with different layouts at the same time, each optimized for a different use case -- and in doing so, you never change the notion of what constitutes a graph.


A typical graph dbms exposes a node selection (often only allowing to start at a single entry node or selecting a set of nodes by tags) and arc selection / filtering. Given that they were mostly used for CADs such an interface makes a lot of sense.

None of such databases ever featured a query language capable of defining a Dijkstra algorithm.

And, no, for a typical use of a graph database, it matters most how cheap it is to follow a graph arc. Therefore, representation matters. Otherwise a graph interface on top of a relational storage would have been sufficient.


> A typical graph dbms exposes a node selection (often only allowing to start at a single entry node or selecting a set of nodes by tags) and arc selection / filtering.

That doesn't contradict my view. In fact, I'd argue that it counts, due to the fact that graph operations are provided.

> None of such databases ever featured a query language capable of defining a Dijkstra algorithm.

Most modern graph databases do seem to feature some sort of query language. I won't argue that it's strictly necessary, as long as you have well-suited, well-defined operations on graphs. I can't speak to Dijkstra's algorithm -- that was a part of the thread that I overlooked previously.

> And, no, for a typical use of a graph database, it matters most how cheap it is to follow a graph arc. Therefore, representation matters. Otherwise a graph interface on top of a relational storage would have been sufficient.

I don't mean to imply that choice of data layout/representation doesn't matter at all, but that it doesn't matter for the purpose of deciding what constitutes a graph and distinguishing graphs from non-graph objects. Of course, as with any ADT, there are various trade-offs that need to be considered before deciding on a particular memory layout.


> None of such databases ever featured a query language capable of defining a Dijkstra algorithm.

Yes, because Dijkstra is not what is the most interesting stuff to write against a graph in every day use. Dijkstra is a primitive than you use but that you have to tweak to solve the particular problem. What is the interest of optimizing writing that particular algorithm?

You seem to follow the idea that there is a super-algorithm to define the way mind works instead I think that's it many small algorithms with similar purpose.


perhaps this is a terminology issue, I'll admit that I'm too young to have used pre-relational graph databases, but presumably they have a way of navigating / jumping between documents/vertices, (presumably based on pointers), otherwise what point would there be in having a graph?

I'd encourage you to look at the product and see whether it meets your definition.


Yes, of course you could always select a node (often you'd always have to start from a single root node), select arcs, filter the arcs by some criteria, etc.

But I've never seen a complex query language that would allow to express any complex traversal strategies (like Dijkstra algorithm), and from your wording I concluded that this was your requirement for something to be called a graph database.


Query language isn't a requirement but exposing the capability to do those kinds of selects / filtering operations is. It doesn't have to support Dijkstra's algorithm but it should be possible to implement it (efficiently).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: