More

__afk__ · on Jan 13, 2023

Great read, thank you. Also nice that you brought in RDF databases to help expand and refine the problem space. When I come across a graph database topic here at HN and it leads with an LPG approach, 99% of the time, there is hardly a mention of RDF. That leaves everyone a little worse off.

semihsalihoglu · on Jan 13, 2023

I agree. As I say in the blog, there is no perfect data model. RDF is great fit for certain things, e.g., when storing dbpedia-like datasets, and doing "automatic" advanced OWL reasoning over it by implementing OWL rules. For example, if a record t X is of type Y and there is a rule that says Y is of type Z, then by OWL reasoning rules, X is of type Z. So if a user scans all Z's, a system that does automatic OWL reasoning should return record t. RDFox is an interesting system that attempts this and has a niche space. Overall the semantic web community has thought greatly about these types of problems and pioneered these types computations.

Property GDBMSs can do manual reasoning queries if you ask questions with Kleene star but they would have to implement OWL rules and "OWL rule" data type, and URI data type etc. There is no other way I'm aware of. I am convinced with the right architecture this should be possible and if we see interest in this, we might work on this in Kùzu. But currently we definitely plan to support URIs as first class citizen data types so at least common queries over URI-heavy datasets and manual reasoning can be done efficiently, and that should cover most use cases for applications that want to model their data as triples of URIs.

__afk__ · on Jan 6, 2023

There are more ways to type (at schema and inline) and more types (see the common XSD types) available OOB in XML than JSON could ever have. Then we can introduce custom types. JSON looks rather impoverished in comparison.

mattpallissard · on Jan 6, 2023

Many people "love yaml" and "hate XML" but never dealt with XSD, so they don't even realize what they're talking about syntax instead of functionality.

Admittedly, XML isn't the most pleasant syntax to interact with by hand.

crispyambulance · on Jan 7, 2023

XML was a great idea. At some point in the early naughts, however, XML-abuse became rampant-- too many crappy tools, too many half-assed attempts at putting square pegs into round holes. It didn't help that XML got inextricably associated with those horrible over-complicated ws-* standards that ended up sending Russian novel-sized "hello world" information exchanges.

The main idea, however, was wonderful. An information exchange format (xml itself), a way to define schemas (xsd), a transformation language (xslt), and stylesheets (xsl?). I kind of miss it.

giantrobot · on Jan 7, 2023

I miss XSLT every time I see megabytes of JavaScript converting some JSON into a soup of HTML to display in browsers. A bunch of JavaScript with a bunch of overwrought "components" is serving as a very complicated templating system.

A lot of really stupid JavaScript could be done fairly easily with XSLT natively inside a browser engine. An XML document can point to its own XSL(s) so get rendered to the appropriate delivery format or just used directly as data.

phoehne · on Jan 7, 2023

I don't miss it. I had several projects where it seemed like a good idea. I pushed it as a good idea. I really wanted it to work. Some developers (usually the sharper bulbs) got it. A lot of developers really struggled with it. Especially the folks who were 'front end' developers. Even in the 2000's, when everything was trying to go XML, staffing those roles was hard.

giantrobot · on Jan 7, 2023

Tooling was XSLT's Achilles heel. Too many tools had no or poor XSLT support. What's confusing to me is a lot of the JavaScript templating today has equally bad tooling yet everyone's in love with it.

waldir · on Jan 7, 2023

Doesn't that suggest that tooling wasn't the issue after all?

giantrobot · on Jan 7, 2023

The lack of tooling for XSLT and modern JavaScript stuff are on different sides of the adoption curve. XSLT lacked tooling on the upside of the curve so it never hit critical mass. Its other problems could have been solved had it hit critical mass and seen wider adoption.

dragonwriter · on Jan 7, 2023

XSL (Extensible Stylesheet Language) the stylesheet part, it included XSLT (transform), XPath, and XSL-FO (XSL Format Objects).

XSLT and XPath each went on to great solo careers, XSL-FO kind of fizzled when XML+XSL failed to displace HTML+CSS.

spiralx · on Jan 7, 2023

XSL:FO is great for turning into PDFs though.

There's also XQuery for transformations and XSD/RelaxNG for schemas, the whole combination of XML database, WYSIWYG (via XSLT) XML editor and output transformations is very powerful, every journal we published and every site we ran at the British Medical Journal was generated that way.

Devasta · on Jan 6, 2023

XML syntax is fine so long as you're using a proper IDE. I manage a team of reporting analysts and all of them come from Business Analyst or Accountancy backgrounds; they write and maintain XSLT transforms just fine, once they spend a day or two getting used to IntelliJ IDEA.

dragonwriter · on Jan 7, 2023

> Many people "love yaml" and "hate XML" but never dealt with XSD, so they don't even realize what they're talking about syntax instead of functionality.

No, I'm pretty sure that if they have only used XML syntax as a data rep and not XML-associated tooling like XSD, they are quite aware that when thet say they hate XML they are talking about syntax.

OTOH. I am surprised at the number of people that seem to thing “has a schema language that can be used to specify structure beyond just the basic raw language” distinguishes XML from other data representation languages.

Or, more generally, the way that “if you don’t like XML, maybe your problem is you aren’t using enough of it” has become an edgy contrarian position rather than steretypical establishment enterprise consultant thing it was 15 or so years ago.

zozbot234 · on Jan 7, 2023

JSON-LD can express any RDF datatype, and some XSD types (though not all) have standardized matches in RDF.

__afk__ · on March 4, 2022

Thanks!

__afk__ · on March 1, 2022

Take up a specialty that sits somewhere between SW and Data Science? I'm a freelancer specializing in knowledge graphs and see a world of promise in this area. It's still a very young area but promising. Get deep on some graph databases, learn the ropes with W3C semantic web, and start looking for companies that have ongoing graph/semantics projects. All the big tech companies have significant investments in this area, Pharma and science are not far behind, enterprise will bring be joining in soon as they realize their data landscapes are hopeless. And of course, finish your degree.

__afk__ · on Aug 6, 2021

I applaud Apple for this. It reaffirms the reasons why it's one of the few companies I feel strongly about. If you can't understand the logic of a decision like this, maybe you don't realize the mountain of suffering that is caused from people trading this material. I recommend listening to Sam Harris's podcast with Gabriel Dance (#213 - THE WORST EPIDEMIC) to get a better picture of the problem.

The main beef people seem to have here is the slippery slope argument. Binary choices are nice, I agree - but almost always they obscure a complex surface that deserves nuance.

bambax · on Aug 6, 2021

This system will only search for known images. If you make new images of child pornography, you're fine (or at least, you won't be flagged by this system). So this initiative does nothing to prevent child abuse.

__afk__ · on Aug 6, 2021

I understand that. I recommend listening to the podcast if you want to understand the issue in more detail. It's a heavy subject but people here generally want to see both sides of an issue, and I am pretty confident that most people have not really taken in the untold damage this brings to the abuse victims and their families as they are continuously notified about past images of their abuse being found and circulated. Which is how this actually works.

I would be worried about a model predicting child sexual abuse content from unknown images but I am not in the least concerned with one that fingerprints known images.

__afk__ · on April 8, 2020

Self-describing?

Maybe to someone who could make sense of the DDL and read the language the label col names are written in. And understand all the implicit units, rules around nulls/empties, and presence of magic strings (SSN, SKU) and special numbers (-1) and on and on. For that you need something like RDF and a proper data model.

Ari_Ugwu · on April 9, 2020

Aren't you conflating the lexicon of data management with specific implementations of a relational database management system (RDBMS)?

Sorry, but your response sounds snarky and reminds me of all the ego hurdles I had to overcome when leaving/loving databases and set theory. Please remember that your comment could be someone's first introduction or step early step in learning.

bambambazooka · on April 9, 2020

If you use Oracle, PostgreSQL or MySQL (those are the ones I'm familliar) you can always query the data dictionary and see how your tables relate. For me that is self-describing.

__afk__ · on March 7, 2020

Graph databases can hold totally bespoke data that makes sense only to the consuming application or they can hold data that has been factored and connected to outside terminologies and external datasets. One holds data, one holds knowledge.

__afk__ · on Jan 23, 2020

The amazing part of about json-ld is you can make up your own terms when you have missing terminology! And, if you care enough, you can push for them in Schema.org and get them introduced.

__afk__ · on Jan 21, 2020

As a semantic architect, this is not my experience. In fact, I see very few large graphs in the wild. The problem is, unsurprisingly, that describing data is difficult. Relating your own conceptualization of a ___domain to anothers is frustrating and time consuming. It will always be easier to create a bespoke model. So, people just don't do it. As for OBO, there are many interesting comments here. The OBO ontologies all utilize BFO as an upper-level and in this regard they are united. But otherwise, their quality and utility varies tremendously. I still believe in this work and hope that one day everyone will think about their data as being longer-lived and more important than the software that generated it.

ablekh · on Jan 21, 2020

Thank you for sharing your thoughts. Just curious: If you were tasked with architecting and implementing a semantic layer for a complex SaaS platform in a large ___domain from scratch, what would be your approach and what technology stack would you prefer to use and why? What best practices would you adopt, if any?

__afk__ · on Aug 26, 2019

Lovely to finally see Microsoft adopting RDF and taking on projects like this. There is hope yet for those of us who believe in the promise of semantic data.