Hacker News new | past | comments | ask | show | jobs | submit | __afk__'s comments login

Great read, thank you. Also nice that you brought in RDF databases to help expand and refine the problem space. When I come across a graph database topic here at HN and it leads with an LPG approach, 99% of the time, there is hardly a mention of RDF. That leaves everyone a little worse off.


I agree. As I say in the blog, there is no perfect data model. RDF is great fit for certain things, e.g., when storing dbpedia-like datasets, and doing "automatic" advanced OWL reasoning over it by implementing OWL rules. For example, if a record t X is of type Y and there is a rule that says Y is of type Z, then by OWL reasoning rules, X is of type Z. So if a user scans all Z's, a system that does automatic OWL reasoning should return record t. RDFox is an interesting system that attempts this and has a niche space. Overall the semantic web community has thought greatly about these types of problems and pioneered these types computations.

Property GDBMSs can do manual reasoning queries if you ask questions with Kleene star but they would have to implement OWL rules and "OWL rule" data type, and URI data type etc. There is no other way I'm aware of. I am convinced with the right architecture this should be possible and if we see interest in this, we might work on this in Kùzu. But currently we definitely plan to support URIs as first class citizen data types so at least common queries over URI-heavy datasets and manual reasoning can be done efficiently, and that should cover most use cases for applications that want to model their data as triples of URIs.


There are more ways to type (at schema and inline) and more types (see the common XSD types) available OOB in XML than JSON could ever have. Then we can introduce custom types. JSON looks rather impoverished in comparison.


Many people "love yaml" and "hate XML" but never dealt with XSD, so they don't even realize what they're talking about syntax instead of functionality.

Admittedly, XML isn't the most pleasant syntax to interact with by hand.


XML was a great idea. At some point in the early naughts, however, XML-abuse became rampant-- too many crappy tools, too many half-assed attempts at putting square pegs into round holes. It didn't help that XML got inextricably associated with those horrible over-complicated ws-* standards that ended up sending Russian novel-sized "hello world" information exchanges.

The main idea, however, was wonderful. An information exchange format (xml itself), a way to define schemas (xsd), a transformation language (xslt), and stylesheets (xsl?). I kind of miss it.


I miss XSLT every time I see megabytes of JavaScript converting some JSON into a soup of HTML to display in browsers. A bunch of JavaScript with a bunch of overwrought "components" is serving as a very complicated templating system.

A lot of really stupid JavaScript could be done fairly easily with XSLT natively inside a browser engine. An XML document can point to its own XSL(s) so get rendered to the appropriate delivery format or just used directly as data.


I don't miss it. I had several projects where it seemed like a good idea. I pushed it as a good idea. I really wanted it to work. Some developers (usually the sharper bulbs) got it. A lot of developers really struggled with it. Especially the folks who were 'front end' developers. Even in the 2000's, when everything was trying to go XML, staffing those roles was hard.


Tooling was XSLT's Achilles heel. Too many tools had no or poor XSLT support. What's confusing to me is a lot of the JavaScript templating today has equally bad tooling yet everyone's in love with it.


Doesn't that suggest that tooling wasn't the issue after all?


The lack of tooling for XSLT and modern JavaScript stuff are on different sides of the adoption curve. XSLT lacked tooling on the upside of the curve so it never hit critical mass. Its other problems could have been solved had it hit critical mass and seen wider adoption.


XSL (Extensible Stylesheet Language) the stylesheet part, it included XSLT (transform), XPath, and XSL-FO (XSL Format Objects).

XSLT and XPath each went on to great solo careers, XSL-FO kind of fizzled when XML+XSL failed to displace HTML+CSS.


XSL:FO is great for turning into PDFs though.

There's also XQuery for transformations and XSD/RelaxNG for schemas, the whole combination of XML database, WYSIWYG (via XSLT) XML editor and output transformations is very powerful, every journal we published and every site we ran at the British Medical Journal was generated that way.


XML syntax is fine so long as you're using a proper IDE. I manage a team of reporting analysts and all of them come from Business Analyst or Accountancy backgrounds; they write and maintain XSLT transforms just fine, once they spend a day or two getting used to IntelliJ IDEA.


> Many people "love yaml" and "hate XML" but never dealt with XSD, so they don't even realize what they're talking about syntax instead of functionality.

No, I'm pretty sure that if they have only used XML syntax as a data rep and not XML-associated tooling like XSD, they are quite aware that when thet say they hate XML they are talking about syntax.

OTOH. I am surprised at the number of people that seem to thing “has a schema language that can be used to specify structure beyond just the basic raw language” distinguishes XML from other data representation languages.

Or, more generally, the way that “if you don’t like XML, maybe your problem is you aren’t using enough of it” has become an edgy contrarian position rather than steretypical establishment enterprise consultant thing it was 15 or so years ago.


JSON-LD can express any RDF datatype, and some XSD types (though not all) have standardized matches in RDF.


Thanks!


Take up a specialty that sits somewhere between SW and Data Science? I'm a freelancer specializing in knowledge graphs and see a world of promise in this area. It's still a very young area but promising. Get deep on some graph databases, learn the ropes with W3C semantic web, and start looking for companies that have ongoing graph/semantics projects. All the big tech companies have significant investments in this area, Pharma and science are not far behind, enterprise will bring be joining in soon as they realize their data landscapes are hopeless. And of course, finish your degree.


I applaud Apple for this. It reaffirms the reasons why it's one of the few companies I feel strongly about. If you can't understand the logic of a decision like this, maybe you don't realize the mountain of suffering that is caused from people trading this material. I recommend listening to Sam Harris's podcast with Gabriel Dance (#213 - THE WORST EPIDEMIC) to get a better picture of the problem.

The main beef people seem to have here is the slippery slope argument. Binary choices are nice, I agree - but almost always they obscure a complex surface that deserves nuance.


This system will only search for known images. If you make new images of child pornography, you're fine (or at least, you won't be flagged by this system). So this initiative does nothing to prevent child abuse.


I understand that. I recommend listening to the podcast if you want to understand the issue in more detail. It's a heavy subject but people here generally want to see both sides of an issue, and I am pretty confident that most people have not really taken in the untold damage this brings to the abuse victims and their families as they are continuously notified about past images of their abuse being found and circulated. Which is how this actually works.

I would be worried about a model predicting child sexual abuse content from unknown images but I am not in the least concerned with one that fingerprints known images.


Self-describing?

Maybe to someone who could make sense of the DDL and read the language the label col names are written in. And understand all the implicit units, rules around nulls/empties, and presence of magic strings (SSN, SKU) and special numbers (-1) and on and on. For that you need something like RDF and a proper data model.


Aren't you conflating the lexicon of data management with specific implementations of a relational database management system (RDBMS)?

Sorry, but your response sounds snarky and reminds me of all the ego hurdles I had to overcome when leaving/loving databases and set theory. Please remember that your comment could be someone's first introduction or step early step in learning.


If you use Oracle, PostgreSQL or MySQL (those are the ones I'm familliar) you can always query the data dictionary and see how your tables relate. For me that is self-describing.


Graph databases can hold totally bespoke data that makes sense only to the consuming application or they can hold data that has been factored and connected to outside terminologies and external datasets. One holds data, one holds knowledge.


The amazing part of about json-ld is you can make up your own terms when you have missing terminology! And, if you care enough, you can push for them in Schema.org and get them introduced.


As a semantic architect, this is not my experience. In fact, I see very few large graphs in the wild. The problem is, unsurprisingly, that describing data is difficult. Relating your own conceptualization of a ___domain to anothers is frustrating and time consuming. It will always be easier to create a bespoke model. So, people just don't do it. As for OBO, there are many interesting comments here. The OBO ontologies all utilize BFO as an upper-level and in this regard they are united. But otherwise, their quality and utility varies tremendously. I still believe in this work and hope that one day everyone will think about their data as being longer-lived and more important than the software that generated it.


Thank you for sharing your thoughts. Just curious: If you were tasked with architecting and implementing a semantic layer for a complex SaaS platform in a large ___domain from scratch, what would be your approach and what technology stack would you prefer to use and why? What best practices would you adopt, if any?


Lovely to finally see Microsoft adopting RDF and taking on projects like this. There is hope yet for those of us who believe in the promise of semantic data.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: