ConceptNet [1] started out as an academic project that I was responsible for for...

espeed · on March 11, 2016

Graph DB technology has been advancing fast over the last few years, and more evolutions are coming down the pipe. For example, Titan and Blazegraph are distributed and can handle billions of edges, and Blazegraph can be GPU-accelerated, which "demonstrated a throughput of 32 Billion Traversed Edges Per Second (32 GTEPS), traversing a scale-free graph of 4.3 billion directed edges in 0.15 seconds" (https://www.blazegraph.com/product/gpu-accelerated/).

NB: TinkePop is not a graph DB -- it's a graph software stack / computing framework for graph DBs (OLTP) and graph analytic systems (OLAP). Since TinkerPop is integrated with almost all of the graph DBs and graph processing engines, its mailing lists are good place to discuss and get help with graph-related projects.

[1] http://tinkerpop.incubator.apache.org/

[2] TinkerPop / Gremlin Users Mailing List http://groups.google.com/group/gremlin-users

[3] TinkerPop Developer Mailing List http://mail-archives.apache.org/mod_mbox/incubator-tinkerpop...

rspeer · on March 11, 2016

I like how at least the BlazeGraph people are talking about billions of edges and not thousands, but I'm not sure that's something I could use. That seems to be a "pre-order" page, so it sounds neither open source nor existent. And I'm trying to figure out what their normal non-GPU non-distributed software is, but it seems to mostly be a pile of Javadocs.

Using distributed computing on mere gigabytes of data is silly.

I think TinkerPop was something else back in 2011, but apologies if I've used the wrong terminology.

amirouche · on March 10, 2016

Knowing the ConceptNet project a little, still I don't understand the workload/algorithms you run on the database. Otherwise said, «this doesn't work for us» doesn't help anybody.

It really depends on the kind of algorithm you run on the database.

Based on open source project, in read/write mode, no db can help you since you load everything into memory. As a noob NLP user, I rather use something like AjguDB https://github.com/amirouche/ajgudb

grandalf · on March 10, 2016

It's interesting. I suppose most graph db operations could be easily enough broken down into a series of steps that could be performed in a more scalable but slower way.

Did your hand-rolled hashtable have any characteristics that would make its performance characteristics difficult for a smarter optimizer (if such a thing existed in Neo4j)?

Can you psudocode an example slow query/operation and indicate how many edges/vertices were being considered at each step?

Sorry to ask these kinds of questions, I'm just really curious about the situation you described.

rspeer · on March 10, 2016

The failures of these databases were a lot more fundamental than I think you're looking for. And so far it hasn't been a trade-off where a non-graph DB has been more scalable but slower; instead, non-graph DBs have been more scalable and faster.

Here's what I have to be able to do in the database:

1. Import millions of edges from a flat file (time limit: 24 hours)

2. Query any node to return up to 100 edges connected to it (time limit: 100 milliseconds)

3. (nice to have) Find the maximal core of nodes that all have degree at least n to each other (time limit: a few hours)

4. Iterate all the edges between the nodes in a specified subset, such as the degree-3 core, which may still be millions of edges (time limit: a few hours)

#3 is optional, and the alternative is to export all the edges and compute it outside the database. But it's the only thing here that's actually a graph algorithm. However, every open-source graph database I've tried is orders of magnitude too slow at one of the other steps. They either fail at importing, fail at iterating, or fail to respond to trivial queries in a timely manner.

I forgot to mention one other non-graph-database system that met my requirements, which is Kyoto Cabinet. The main downside of it is the GPLv3 license.

akarambir · on March 10, 2016

Can you explain how you were using PostgreSQL as Graph Database?

espeed · on March 10, 2016

FYI: See this recent paper by Google Research and IBM Watson Research:

SQLGraph: An Efficient Relational-Based Property Graph Store http://research.google.com/pubs/archive/43287.pdf

Previous discussion: https://news.ycombinator.com/item?id=11101013

rspeer · on March 10, 2016

I mean, I was putting a graph in PostgreSQL, I don't know if that makes it a "graph database". Table of nodes, table of edges.