ISO 639 Cookbook was ... LD? algorithm and questions. from Gannon Dick on 2012-03-17 (public-lod@w3.org from March 2012)

From: Gannon Dick <gannon_dick@yahoo.com>
Date: Sat, 17 Mar 2012 14:52:19 -0700 (PDT)
To: public-lod <public-lod@w3.org>
Cc: "eGov IG \(Public\)" <public-egov-ig@w3.org>
Message-ID: <1332021139.97115.YahooMailNeo@web112615.mail.gq1.yahoo.com>
"A criticism voiced by detractors of Linked Data suggest that Linked Data modeling is too hard or time consuming."

There are some sets of standard codes which are infrequently updated.� It might pay for a data set repository to build identifiers to order.� In this way, the standards can be maintained complete and, more to the point, applications can "assume" they are complete.

There is an example (ISO 639 Language Codes) here: http://www.rustprivacy.org/2012/urn/lang/loc.tar.gz

This includes two mysql databases:
1. A "lite" version with just the tables needed to specify either "terminology" or "bibliographic" codes (including currency).� I used the D2R Server.

2. A full maintainable version, which starts with a "maintain table" and regenerates the tables which address the sticky bits.

(The following in case you get caught playing with this at your day job, otherwise, have fun)

There are a number of little technical issues, but for Government, one huge Moral Hazard.� The language of Legislation, Policy and Statistical Reporting are coupled with Jurisdiction.� The Moral Hazard arises from the situation where speaking a language not understood by a psychiatrist is then considered insane.� Nobody wants a government who acts like that, and the Open Data Community doesn't want data sets which skip over distinct populations (without saying so) either. 


--Gannon




________________________________
 From: Bernadette Hyland <bhyland@3roundstones.com>
To: Hugh Glaser <hg@ecs.soton.ac.uk>; Yury Katkov <katkov.juriy@gmail.com> 
Cc: Semantic Web <semantic-web@w3.org>; public-lod <public-lod@w3.org> 
Sent: Friday, March 16, 2012 4:11 PM
Subject: Re: How to find the data I need in LD? algorithm and questions.
 

Hi,
Hugh - I responded earlier today to Yury, off-list. �So I would offer a different perspective, perhaps because the sun is out here today and it is Friday afternoon and the plum blossoms are blooming...

We've moved from:
* shouting (circa 2003-2006) to
* the meme of Linked Data by TimBL (2007) [1]�
* proof-of-concepts (2008-2010) to
* a couple academic books, conference talks & keynotes on real world deployments involving LD/LOD�(2010, 2011) to
* developers books, W3C Recommendations, published use cases/CXO guides (2012)

FWIW, I offered to fold in some of Yury's guidance to the draft Linked Data Cookbook[2] and suggested the cookbook as a possible resource for his students.

If you are open to a different viewpoint, here is what I see on the ground in 2012. �There are publishers, both in the private & public sector, who are beginning to publish data as Linked Data. �It is of course a new approach to data publishing and consumption and there are some really entrenched players, so it isn't going to happen within one or two years. �Furthermore, everyone has a "day job" and learning yet another way to publish your data doesn't sound like a career-building activity on face value ...

I contend, it will take some public successes, plus a couple of pragmatic Linked Data books for developers, some cookbooks or how-to's, and some well-formed W3C Recommendations for Linked Open Data to be pervasive ... all of which is in progress.

It will take probably 10 years before LD/LOD publishing is 'mainstream' but make no mistake, it will happen. �A�Linked Data approach to publishing data (on the Web of data) is as disruptive as the Web of documents was circa 1995. ��

It will save organizations millions and governments billions of dollars (or their currency equivalents) in enterprise information integration. �Do I have documented ROIs in a glossy printed consulting report to back that up - no, not yet. �I believe we (as in the Linked Data ecosystem) will have this soon. ��The numbers & case studies will come from big international organizations involved in issue tracking & customer care, business publishing, healthcare, logistics and defense (the non-secret-squirrel-part of defense).

Regardless whether orgs are doing LD behind the firewall or in front of it, publishing Linked Data makes good economic sense but we're in the early days. �Don't loose heart.

I see university students are learning about LD now in undergrad CS classes. �About 20 of us from the UK, Netherlands, Spain, US, India, Australia in government / academe / private sector meet weekly on the W3 Gov't Linked Data Working Group �to nut out vocabs, best practices & a cookbook for gov't publication & consumption. �

FYR, data.gov recently featured a blogpost [4] by a uni student who did a mashup where he didn't know the publisher of US Gov't content, although he did work under the supervision of someone who knows a bit about RDF.


Kind regards,

Bernadette Hyland


[1]�http://www.w3.org/DesignIssues/LinkedData.html
[2]�http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook
[3] �http://www.data.gov/communities/node/116/blogs/6170


On Mar 16, 2012, at 4:15 PM, Hugh Glaser wrote:

Hi Yury
>Well I am sorry to see you have had no response, but it is not so surprising, really.
>You will find that essentially there are very few people doing what you are trying to do.
>The Semantic Web and Linked Data world is made up of people who publish, and rarely consume.
>It is almost unheard of for someone to consume someone else's data, unless they know the publisher.
>Everyone is shouting, but not many listening.
>OK, I might not be in a great mood today, but I'm not far wrong.
>
>To your problem.
>Your steps seem reasonable.
>I would, however, add the use of VoiD (http://www.w3.org/TR/void/, http://semanticweb.org/wiki/VoiD).
>VoiD is designed to deliver what you want, I think (if it doesn't, then it should be made to).
>Some sites do publish VoiD descriptions, and these can often be located automatically by looking in the sitemap, which can in turn be discovered by looking in robots.txt.
>Keith Alexander has a store of collected VoiD descriptions (http://kwijibo.talis.com/voiD/), as do we (http://void.rkbexplorer.com).
>I would also suggest that my own site, http://sameas.org might lead from interesting URIs to other related URIs, and hence interesting stores.
>
>Hope that helps.
>Best
>Hugh
>
>On 16 Mar 2012, at 04:58, Yury Katkov wrote:
>
>
>Hi!
>>
>
>>
>What do you usually do when you want to find a dataset for your needs?
>>
>I'm preparing a tiny tutorial on this topic for the students and ask
>>
>you to share your experience.
>>
>My typical algorithm is the following:
>>
>0) Define the topic. I have to know precisely what kind of data I need.
>>
>1) Look at Linked Data cloud and other visualizations to ensure that
>>
>the needed data is presented somewhere. If for example I want to
>>
>improve Mendeley or Zotero I look at these visualizations and search
>>
>for publication data.
>>
>2) Search the needed properties and classes with Sindice, Sig.ma and Swoogle.
>>
>3) Look at CKAN description of the dataset, its XML citemap and VoiD metadata.
>>
>4) explore the dataset that were found on the previous step with some
>>
>simple SPARQL queries like these:
>>
>
>>
>SELECT DISTINCT ?p WHERE {
>>
>?s ?p ?o
>>
>}
>>
>
>>
>SELECT DISTINCT ?class WHERE {
>>
>{ ?class a rdfs:Class . }
>>
>UNION
>>
>{?class a owl:Class . }
>>
>}
>>
>
>>
>SELECT DISCTINCT ?label WHERE {
>>
>{?a rdfs:label ?label}
>>
>UNION
>>
>{?a dc:title ?label}
>>
>/* and possibly some more things to search foaf:name's and so on */
>>
>}
>>
>
>>
>I can also use COUNTing and GROUPing BY to get some quick statistics
>>
>about the datasets.
>>
>5) When I find some interesting URIs I use semantic web browsers
>>
>Marbles and Sig.ma to navigate through the dataset.
>>
>5) Ask these smart guys in Semantic Web mailing list and Public LOD
>>
>mailing list. Probably go to semanticoverflow and ask for help there
>>
>as well
>>
>======================
>>
>Here are my questions:
>>
>
>>
>1) What else do you typically doing to find the dataset?
>>
>2) Is there a resource where I can find the brief description of the
>>
>dataset in terms of properties and classes that mentioned there? And
>>
>these cool arrows in Richard Cyganiak's diagram: is there a resource
>>
>where I can find the information about relationship between the given
>>
>dataset and the rest of the world?
>>
>3) I have similar algorithm for searching vocabularies. Can resources
>>
>like Schemapedia help me in searching the dataset?
>>
>4) Do you know any other meeting SPARQL queries that can be handy when
>>
>I search something in the dataset.
>>
>
>>
>Sincerely yours,
>>
>-----
>>
>Yury Katkov
>>
>
>>
>-- 
>Hugh Glaser, �
>������������Web and Internet Science
>������������Electronics and Computer Science,
>������������University of Southampton,
>������������Southampton SO17 1BJ
>Work: +44 23 8059 3670, Fax: +44 23 8059 3045
>Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
>http://www.ecs.soton.ac.uk/~hg/
>
>
>
Received on Saturday, 17 March 2012 21:52:49 UTC