Re: [Dbpedia-discussion] DBtax questions from Marco Fossati on 2015-09-18 (public-lod@w3.org from September 2015)

From: Marco Fossati <hell.j.fox@gmail.com>
Date: Fri, 18 Sep 2015 10:03:15 +0200
To: John Flynn <jflynn12@verizon.net>, 'Magnus Knuth' <magnus.knuth@hpi.uni-potsdam.de>
Cc: public-lod@w3.org, 'dbpedia-discussion' <dbpedia-discussion@lists.sourceforge.net>
Message-ID: <55FBC543.2080706@gmail.com>

Hi John,

I agree with you, but DBTax is general-purpose, not ___domain-specific.
Cheers!

On 9/18/15 1:25 AM, John Flynn wrote:
> I guess this is a "point-of-view" comment, but attempting to assign "correct" types to entities seems upside-down. An ontology, consisting of specific classes, subclasses, properties, subproperties plus the specific relationships between these should describe a specific ___domain of interest. Once the ___domain of interest ontology is created, then the process of identifying and assigning entities/instances that belong within that ___domain of interest can begin. If the ontology is properly designed it should be very clear which entities fit within that ___domain of interest as well as where they fit.
>
> John Flynn
> http://semanticsimulations.com
>
> -----Original Message-----
> From: Marco Fossati [mailto:hell.j.fox@gmail.com]
> Sent: Thursday, September 17, 2015 11:26 AM
> To: Magnus Knuth
> Cc: public-lod@w3.org; dbpedia-discussion
> Subject: Re: [Dbpedia-discussion] DBtax questions
>
> Hi Magnus and thanks for your interest,
>
> Generally speaking, the challenge of assigning "correct" types to entities is always a highly subjective task.
>   From a strictly linguistic point of view, a classification taxonomy is itself a very debatable way to describe the semantics of content expressed in natural language: one should always keep in mind contextual pieces of information to deeply understand the sense of e.g., some Wikipedia article.
>
> Said that, the main goal of DBTax is to assign as many types as possible, provided that they are different from owl#Thing.
> In this way, we can cluster entities with more meaningful types and query the knowledge base accordingly.
>
> Of course, you can say that owl#Thing has 100% coverage, but does it make sense?
> The claimed 99% stems instead from a *set* of more specific types.
> Then high recall comes with a precision cost.
>
> On 9/17/15 4:04 PM, Magnus Knuth wrote:
>> One structural problem I recognized when seeing the approach [http://jens-lehmann.org/files/2015/semantics_dbtax.pdf], is that there is in most (non-complex) categories an article having exactly the same name, e.g. dbr:President dc:subject dbc:President. And indeed these resources are typed accordingly, e.g. http://it.dbpedia.org/resource/Presidente is a dbtax:President and http://it.dbpedia.org/resource/Pagoda is dbtax:Pagoda.
> That is obvious for a human, but is it the same for an algorithm? :-)
>>
>> A type coverage of more than 99 percent is very suspicious, because I’d expect much more resources in DBpedia not type-able. Why? A lot of articles in DBpedia describe very abstract concepts, e.g. Liberty, Nationality, Social_inequality (well, you have the class dbtax:Concept, but what is on the other hand not a concept?), or they describe classes by their selves, e.g. President, Country, Person, Plane (well, you have the class dbtax:Classification, but it is not used as such [http://it.dbpedia.org/sparql?default-graph-uri=&query=SELECT+*+%7B%3Fres+a+%3Chttp%3A%2F%2Fdbpedia.org%2Fdbtax%2FClassification%3E%7D&format=text%2Fhtml&debug=on]). For some articles it is arguable whether they are instance or class, e.g. Volkswagen_Polo, Horse.
>>
>> I see that the classes you extracted are truly valuable for enriching the DBpedia ontology, but it obviously needs some tidy up and disambiguate efforts.
> I completely agree: I think we should merge DBTax into the DBpedia ontology mappings wiki to do so.
> BTW, DBTax overlaps with the DBpedia ontology by more than 20%.
>
> Cheers!
>
>

-- 
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j

Received on Friday, 18 September 2015 08:03:46 UTC