- From: Marco Fossati <hell.j.fox@gmail.com>
- Date: Fri, 18 Sep 2015 10:03:15 +0200
- To: John Flynn <jflynn12@verizon.net>, 'Magnus Knuth' <magnus.knuth@hpi.uni-potsdam.de>
- Cc: public-lod@w3.org, 'dbpedia-discussion' <dbpedia-discussion@lists.sourceforge.net>
Hi John, I agree with you, but DBTax is general-purpose, not ___domain-specific. Cheers! On 9/18/15 1:25 AM, John Flynn wrote: > I guess this is a "point-of-view" comment, but attempting to assign "correct" types to entities seems upside-down. An ontology, consisting of specific classes, subclasses, properties, subproperties plus the specific relationships between these should describe a specific ___domain of interest. Once the ___domain of interest ontology is created, then the process of identifying and assigning entities/instances that belong within that ___domain of interest can begin. If the ontology is properly designed it should be very clear which entities fit within that ___domain of interest as well as where they fit. > > John Flynn > http://semanticsimulations.com > > -----Original Message----- > From: Marco Fossati [mailto:hell.j.fox@gmail.com] > Sent: Thursday, September 17, 2015 11:26 AM > To: Magnus Knuth > Cc: public-lod@w3.org; dbpedia-discussion > Subject: Re: [Dbpedia-discussion] DBtax questions > > Hi Magnus and thanks for your interest, > > Generally speaking, the challenge of assigning "correct" types to entities is always a highly subjective task. > From a strictly linguistic point of view, a classification taxonomy is itself a very debatable way to describe the semantics of content expressed in natural language: one should always keep in mind contextual pieces of information to deeply understand the sense of e.g., some Wikipedia article. > > Said that, the main goal of DBTax is to assign as many types as possible, provided that they are different from owl#Thing. > In this way, we can cluster entities with more meaningful types and query the knowledge base accordingly. > > Of course, you can say that owl#Thing has 100% coverage, but does it make sense? > The claimed 99% stems instead from a *set* of more specific types. > Then high recall comes with a precision cost. > > On 9/17/15 4:04 PM, Magnus Knuth wrote: >> One structural problem I recognized when seeing the approach [http://jens-lehmann.org/files/2015/semantics_dbtax.pdf], is that there is in most (non-complex) categories an article having exactly the same name, e.g. dbr:President dc:subject dbc:President. And indeed these resources are typed accordingly, e.g. http://it.dbpedia.org/resource/Presidente is a dbtax:President and http://it.dbpedia.org/resource/Pagoda is dbtax:Pagoda. > That is obvious for a human, but is it the same for an algorithm? :-) >> >> A type coverage of more than 99 percent is very suspicious, because I’d expect much more resources in DBpedia not type-able. Why? A lot of articles in DBpedia describe very abstract concepts, e.g. Liberty, Nationality, Social_inequality (well, you have the class dbtax:Concept, but what is on the other hand not a concept?), or they describe classes by their selves, e.g. President, Country, Person, Plane (well, you have the class dbtax:Classification, but it is not used as such [http://it.dbpedia.org/sparql?default-graph-uri=&query=SELECT+*+%7B%3Fres+a+%3Chttp%3A%2F%2Fdbpedia.org%2Fdbtax%2FClassification%3E%7D&format=text%2Fhtml&debug=on]). For some articles it is arguable whether they are instance or class, e.g. Volkswagen_Polo, Horse. >> >> I see that the classes you extracted are truly valuable for enriching the DBpedia ontology, but it obviously needs some tidy up and disambiguate efforts. > I completely agree: I think we should merge DBTax into the DBpedia ontology mappings wiki to do so. > BTW, DBTax overlaps with the DBpedia ontology by more than 20%. > > Cheers! > > -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j
Received on Friday, 18 September 2015 08:03:46 UTC