Hacker News new | past | comments | ask | show | jobs | submit login

Ooh, bad memories, my boss made the same argument at me last week.

Anyway! The relational model uses tables to model all data. Though you can squeeze a comment tree into a relational data mode any number of ways, it is most clearly expressed by a hierarchical model as in XML, S-exprs or a tree data type in your favorite language.

So I'm not saying it can't be expressed as relational data, what I mean is that that representation is not its most natural or useful form.

At work I use a query like:

    select s1.id, s2.id, s3.id, s4.id ...
    FROM subject s1
    LEFT JOIN subject s2 WHERE s2.id=s1.parent
    LEFT JOIN subject s3 WHERE s3.id=s2.parent
    LEFT JOIN subject s4 WHERE s4.id=s3.parent
    ....
    WHERE s1.id=?
to get all parents of a given node. It's ugly, inefficient, and fragile. You can use the nested set model to simplify this particular query, but inserting rows becomes extremely complex and painful; you really just trade one problem for another.

So, while you can model a tree as relational data, it's not a good match.




I think what the parent meant is that such a tree is fundamentally relational (not merely that it can be modeled as such). Since a relation is just a mapping between tuples and a set of logical values (TRUE and FALSE in most cases) then clearly there is a well-defined relation among the notion of 'parent' and 'child'. Thus the problem is the de-facto standard query language. It seems to me that we could use new special query language (or an extension to SQL) that could handle creating trees from pure relations in a performant and lexically elegant way. Or maybe it’s already been tried and I missed it. (edit: looks like a lot of smart people have already figured it out)


Agreed. An interesting alternative approach to a single 'parent' pointer in the record is the use of an ancestor table.

http://www.evolt.org/working_with_hierarchical_data_in_sql_u...

This allows all children (or parents) to be fetched in a single query. A suitable "order by" clause also groups them together for easier processing in one pass.

The cost is the same as any denormalisation - additional updates and integrity issues - so it isn't a no-brainer but it's probably a win in most such situations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: