Re: PHP RDF fetching code from Hugh Glaser on 2010-01-28 (public-lod@w3.org from January 2010)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Thu, 28 Jan 2010 12:26:29 +0000
To: Stephane Corlosquet <scorlosquet@gmail.com>
CC: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <EMEW3|da3f70751b24b3437a0fdc964f295579m0RCQa02hg|ecs.soton.ac.uk|C78732F5.FC4D%>
Thanks for the pointer.
(Won�t actually look at the ARC code at the moment, as it may be hard to comply with Benji�s license.)

However, rather than being as clever as possible, somehow I thought I should respect what the publisher said, so perhaps first Content-Type, then extension, rather than ignoring them.

The reason I wasn�t relying on rapper --guess is that the handover to rapper is part of the RDF store, and I will probably use other stores that don�t use rapper.
Also, I wanted to gather statistics on what RDF format people were using, and couldn�t see an option to rapper to tell me the input type that it guessed.

At the moment I record the Content-Type and the extension, and then let rapper or whatever do their magic � I guess that is enough.

Cheers
Hugh

On 28/01/2010 02:25, "Stephane Corlosquet" <scorlosquet@gmail.com> wrote:

Hugh,

The ARC2 parser has a "built-in RDF format detector" [1]. You might want to look at the code to see how it's done.

Why not using the --guess option of rapper?

Steph.

[1] http://arc.semsol.org/docs/v2/parsing

On Wed, Jan 27, 2010 at 9:08 PM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:
On 27/01/2010 09:49, "Tom Heath" <tom.heath@talis.com> wrote:

> +1 for Moriarty, whether you're working with the Platform or not. Ian
> and the other contributors have done a great job - personally I'd
> start here before writing any new code.
Too true mate.

Now my next bit of pissing about.
Before writing it (if I can find the gumption).
Don't think this is in Moriarty, as the Talis Platform is, of course, well-behaved.

I run cURL, using an amended version of what was described before (as at the end of this message).

So now I need to deal with what comes back.
I actually hand it over to rapper, so would sort of like to know what the data is to improve the reliability by setting the rapper type parameter.
I am trying to avoid looking inside the file, although am happy to if someone can provide the code :-).
The Content-Type is unreliable � for example could (is likely to) be text/plain for a turtle file that someone has put on a standard web server.
So it is the usual problem of messing about with extensions, modified by extra information from the Content-Type.
Of course we need to worry about the final URL (curl_getinfo($ch)['url']), possibly as well as the requesting URI, as that might be where there is an extension.
So perhaps something that sets the Content-Type in curl_getinfo($ch) as best it can?

Any offers? (Pretty please!)
And maybe we can feed back to Moriarty, PEAR, etc, unless already there and I missed it.

On another worry, If the requesting URI does a 302 to a new URI, which then does 303, it looks an interesting challenge to capture the new URI as expected. I don�t intend to do this at the moment, but if anyone has done that, ...

Enjoy.
Hugh

PHP much preferred.

Fetching code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $_REQUEST['uri']);
curl_setopt($ch, CURLOPT_USERAGENT, "http://void.rkbexplorer.com/ submission agent 1.0");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept: application/rdf+xml, text/n3, text/rdf+n3, text/turtle, application/x-turtle, application/turtle, text/plain"));
$data = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);

>
> My 2p worth :)
>
> Tom.
>
>
> 2010/1/26 Ian Davis <lists@iandavis.com>:
>> You may find something useful in my Moriarty project:
>>
>> http://code.google.com/p/moriarty/
>>
>> It's geared towards the Talis Platform but there is a lot of code in
>> there that has no dependencies on the platform, e.g.:
>>
>> http://code.google.com/p/moriarty/source/browse/trunk/httprequest.class.php
>>
>> some documentation for that class here:
>>
>> http://code.google.com/p/moriarty/wiki/HttpRequest
>>
>> Ian
>>
>>
>> ______________________________________________________________________
>> This email has been scanned by the MessageLabs Email Security System.
>> For more information please visit http://www.messagelabs.com/email
>> ______________________________________________________________________
>>
>
>
Received on Thursday, 28 January 2010 12:27:28 UTC