Hacker News new | past | comments | ask | show | jobs | submit login

I'm as big a critic of Google as anyone, but I'm always surprised at modern day takes around the lost semantic web technologies - they are missing facts or jumping to conclusions in hindsight.

Here's what people should know.

1) The failure of XHTML was very much a multi-vendor, industry-wide affair; the problem was that the syntax of XML was stricter than the syntax of HTML, and the web was already littered with broken HTML that the browser vendors all had to implement layers of quirk handling to parse. There was simply no clear user payoff for moving to the stricter parsing rules of XML and there was basically no vendor who wanted to do the work. To my memory Google does not really stand out here, they largely avoided working on what was frequently referred to as a science project, like all the other vendors.

2) In subsequent years, Google actually has actually delivered a semantic web of sorts: https://developers.google.com/search/docs/appearance/structu...

A few things stand out as interesting. First of all, the old semantic web never had a business case. JSON+LD Structured Data does: Google will parse your structured data and use it to inform the various snippets, factoids, previews and interactive widgets they show all over their search engine and other web properties. So as a result JSON+LD has taken off massively. Millions of websites have adopted it. The data is there in the document. It is just in a JSON+LD section. If you work in SEO you know all about this. Seems to be quite rare that anyone on Hacker News is aware of it however.

Second interesting thing, why did we end up with the semantic data being in JSON in a separate section of the file? I don't know. I think everyone just found that interleaving it within the HTML was not that useful. For the legacy reasons discussed earlier, HTML is a mess. It's difficult to parse. It's overloaded with a lot of stuff. JSON is the more modern thing. It seems reasonable to me that we ended up with this implementation. Note that Google does have some level of support for other semantic data, like RDFa which I think is directly in the HTML - it is not popular.

Which brings us to the third interesting thing, the JSON+LD schemas Google uses, are standards, or at least... standard-y. The W3C is involved. Google, Yahoo, Yandex and Microsoft have made the largest contributions to my knowledge. You can read all about it on schema.org.

TL;DR - XHTML was not a practical technology and no browser or tool vendor wanted to support it. We eventually got the semantic web anyway!






The “semantic web” has been successful in a few areas but not so much as SQL or document databases. Many data formats use it, such RSS feeds and XMP metadata used by Adobe tools.

Point n°2 is only partially correct.

Google does support multiple semantic web standards: RDFa, JSON+LD and I believe microdata as well.

JSON+LD is much simpler to extract and parse, however it makes site HTML bigger because information gets duplicated compared to RDFa where values could be inclined.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: