Re: voiD 1.0 guide comments

Jiri,

Thanks for the feedback.

On 1 Feb 2009, at 21:03, Jiri Prochazka wrote:
> In the article I haven't found a solid definition of what is a dataset
> and when to use another dataset/subset. I think this has to be clearly
> defined.

�A dataset in voiD (void:Dataset) is a collection of data, which is:
- published and maintained by a single provider, and
- available as RDF, and
- accessible, for example, through dereferenceable HTTP URIs or a  
SPARQL endpoint.�

I think this is as clear as it's possible without becoming overly  
constraining.

> From what I understood, the publisher which is the "primary key" of
> datasets.

It's three points, see above.

> I think that it should be emphasized that categorizing datasets should
> only be used, if the data in it are somewhat homogeneous - the
> categorization applies to all of it.

Categorization is an art that is way older than voiD, and we don't  
want tell people how to do it properly! And I definitely don't agree  
with you when you say that �a categorization must apply to all of the  
dataset�. For example, I think it would be absolutely adequate to say  
that DBpedia is about people and geography, because it is a sizable  
and valuable resource for both those areas, even though it also  
contains data about lots of other things.

> I guess the categorization it is fairly unusable in use cases like
> personal website, because the information are various...

Well, http://dbpedia.org/resource/Personal_web_page might be a nice  
subject here. (Assuming that you do have some interesting RDF on your  
site!)

(I note with regret that the Wikipedia article on �Random stuff� has  
been deleted, it would make for another nice DBpedia resource...)

> Another thing - dataset partitioning. Combination of dataset
> categorization and partitioning led me to great confusion - I have
> thought voiD also wanted to categorize the data in the dataset.
> Better to put a notice that partitioning should be used carefully and
> that it was designed for mirroring of datasets.

I don't understand. �I have thought voiD also wanted to categorizing  
the data in the dataset� -- yes, that IS what we want. �partitioning  
was designed for mirroring of datasets� -- no, it was designed for  
cases where voiD authors want to say something about just a part of  
the dataset, and not about the entire dataset, for whatever reason.

Best,
Richard



>
>
> Best regards,
> Jiri Prochazka
>
>
> PS: Please send the replies also directly to me, as I am not  
> subscribed
> to this list.
>

Received on Monday, 2 February 2009 00:44:57 UTC