Hacker News new | past | comments | ask | show | jobs | submit login

I only just discovered graph databases a few years ago, so I have no idea what the definition of "graph database" was in ye olde days, but the parent's definition is the only one I'm familiar with.

And it makes the most sense. After all, as programmers we're rarely concerned about the layout of data in memory, but rather the abstract data type (ADT) that we have to work with. An ADT is defined not by it's memory layout (i.e., a set of vertices and a set of edges do not a graph make (set, of course, also being an ADT) -- there are several possible ways to represent a graph in memory), but by the operations that are defined for the data type and their characteristics (i.e., an adjacency relation (possibly along with an incidence relation) does a graph make). Of course, specialized traversal operations are more of a convenience than a necessity (and they typically allow greater performance than implementing solely in terms of the adjacency relation), but the point stands.

Consider that a list may be represented in several ways: cons cells, classes, closures (just to name a few). But for clients of the list, none of that really matters. It only matters if a list has certain operations: cons, car, cdr (or some equivalent interface). As far as clients are concerned, any object which provides the list interface is a list; inversely, any object which does not provide that interface is not a list.

In a relational database, it hardly matters how data are stored in memory. What matters is that they provide an interface that allows relational algebra (or some close approximation) to be performed on the data. Likewise, I'd argue that how a graph database stores its data is inconsequential, and the only requirement is that it expose graph operations on that data.




> After all, as programmers we're rarely concerned about the layout of data in memory, but rather the abstract data type (ADT) that we have to work with. An ADT is defined not by it's memory layout

Otherwise, you can think of it as tradeoff, between expressiveness and computing speed (over an area of expertise) so really in between Memory Layout and ADT.

> In a relational database, it hardly matters how data are stored in memory.

Of course it does. It depends on where you want to optimise for speed.

> I'd argue that how a graph database stores its data is inconsequential, and the only requirement is that it expose graph operations on that data.

No. It makes different trade-offs for different purpose so it's not inconsequential.

GraphDB might be only a niche where only a few people have to use it. But still worth engineering because it help the ADT/expresiveness cause.


I see now that I didn't make this clear in my comment, but I'll reproduce part of my reply to the sibling:

> I don't mean to imply that choice of data layout/representation doesn't matter at all, but that it doesn't matter for the purpose of deciding what constitutes a graph and distinguishing graphs from non-graph objects. Of course, as with any ADT, there are various trade-offs that need to be considered before deciding on a particular memory layout.

The beauty of ADTs is that once you've exposed your operations, you are free to change the memory layout without breaking clients -- or even to supply multiple structures with different layouts at the same time, each optimized for a different use case -- and in doing so, you never change the notion of what constitutes a graph.


A typical graph dbms exposes a node selection (often only allowing to start at a single entry node or selecting a set of nodes by tags) and arc selection / filtering. Given that they were mostly used for CADs such an interface makes a lot of sense.

None of such databases ever featured a query language capable of defining a Dijkstra algorithm.

And, no, for a typical use of a graph database, it matters most how cheap it is to follow a graph arc. Therefore, representation matters. Otherwise a graph interface on top of a relational storage would have been sufficient.


> A typical graph dbms exposes a node selection (often only allowing to start at a single entry node or selecting a set of nodes by tags) and arc selection / filtering.

That doesn't contradict my view. In fact, I'd argue that it counts, due to the fact that graph operations are provided.

> None of such databases ever featured a query language capable of defining a Dijkstra algorithm.

Most modern graph databases do seem to feature some sort of query language. I won't argue that it's strictly necessary, as long as you have well-suited, well-defined operations on graphs. I can't speak to Dijkstra's algorithm -- that was a part of the thread that I overlooked previously.

> And, no, for a typical use of a graph database, it matters most how cheap it is to follow a graph arc. Therefore, representation matters. Otherwise a graph interface on top of a relational storage would have been sufficient.

I don't mean to imply that choice of data layout/representation doesn't matter at all, but that it doesn't matter for the purpose of deciding what constitutes a graph and distinguishing graphs from non-graph objects. Of course, as with any ADT, there are various trade-offs that need to be considered before deciding on a particular memory layout.


> None of such databases ever featured a query language capable of defining a Dijkstra algorithm.

Yes, because Dijkstra is not what is the most interesting stuff to write against a graph in every day use. Dijkstra is a primitive than you use but that you have to tweak to solve the particular problem. What is the interest of optimizing writing that particular algorithm?

You seem to follow the idea that there is a super-algorithm to define the way mind works instead I think that's it many small algorithms with similar purpose.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: