Re: scientific publishing process (was Re: Cost and access) from Hugh Glaser on 2014-10-05 (public-lod@w3.org from October 2014)

From: Hugh Glaser <hugh@glasers.org>
Date: Sun, 5 Oct 2014 18:17:35 +0100
To: Ivan Herman <ivan@w3.org>
Cc: Laura Dawson <Laura.Dawson@bowker.com>, Daniel Schwabe <dschwabe@inf.puc-rio.br>, W3C Semantic Web IG <semantic-web@w3.org>, W3C LOD Mailing List <public-lod@w3.org>, Phillip Lord <phillip.lord@newcastle.ac.uk>, Eric Prud'hommeaux <eric@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Bernadette Hyland <bhyland@3roundstones.com>
Message-Id: <D3642989-0D9F-4AB0-A374-E0543723E83D@glasers.org>
Hi Ivan,
> On 5 Oct 2014, at 16:42, Ivan Herman <ivan@w3.org> wrote:
> 
> 
> On 05 Oct 2014, at 16:47 , Laura Dawson <Laura.Dawson@bowker.com> wrote:
> 
>> I think I mentioned previously, Ivan, but perhaps not on this thread -
>> Hugh McGuire has developed a Wordpress tool called PressBooks which allows
>> you to write a book in HTML and export it as an EPUB file. He even
>> supports schema.org markup in a separate plugin.
>> (http://www.pressbooks.com)
> 
> Indeed, I forgot!
> 
> The problem with this service (but also for the others I guess) is that, at least through the standard offers on the sites), they may not be appropriate for a workshop, that would require leaving access to a large(r) numbers of submitters in the submission phase, followed by a selection process to end up in a small number of the submissions in the final book. This does not really fit in the business models. It should be up to the scholarly publishers to pick this up�
Yes, we must keep remembering that the documents are simply one bit of a social machine, long before they get anywhere near (the unlikely event of them) being published.
> 
> (But I guess we digress greatly from the main topic of this mailing list, ie, semantic web�)
We did that quite a while ago, I think :-)
But in the end you just gotta go with the flow, man.

Best
Hugh
> 
> Ivan
> 
>> 
>> On 10/5/14, 10:34 AM, "Ivan Herman" <ivan@w3.org> wrote:
>> 
>>> This is not a direct answer to Daniel, but rather expanding on what he
>>> said. Actually, he and I were (and still are) in the same IW3C2
>>> committee, ie, we share the experience; and I was one of those (although
>>> the credit really goes to Bob Hopgood, actually, who was pushing that the
>>> most) who tried to come up with a proper XHTML template.
>>> 
>>> The real problem is still the missing tooling. Authors, even if
>>> technically savy like this community, want to do what they set up to do:
>>> write their papers as quickly as possible. They do not want to spend
>>> their time going through some esoteric CSS massaging, for example. Let us
>>> face it: we are not yet there. The tools for authoring are still very
>>> poor. This in spite of the fact that many realize that PDF is really not
>>> the format for our age; we need much more than a reproduction of a
>>> printed page digitally (as someone referred to in the thread I really
>>> suffer when I have to read, let alone review, an article in PDF on my
>>> iPad...).
>>> 
>>> But I do see an evolution that might change in the coming years. Laura
>>> dropped the magic word on the early phases if this thread: ePub. ePub is
>>> a packaged (zip archived) HTML site, with some additional information. It
>>> is the format that most of the ebook readers understand (hey, it can even
>>> be converted into a Kindle format:-). Both Firefox and Chrome have ePub
>>> reader extensions available and Mac OS comes with a free ebook reader
>>> (iBook) that is based on it. I expect (hope) that the convergence between
>>> ePub and browsers will bring these even closer in the coming years.
>>> Because ePub is a packaged web site, with the core content in HTML5 (or
>>> SVG), metadata can be added to the content in RDFa, microdata, embedded
>>> JSON-LD; in fact, metadata can also be added to the archive as a separate
>>> file so if you are crazy enough you can even add RDF data in RDF/XML (no,
>>> please, don't do it:-). And, of course, it can be as much as a hypertext
>>> as you can just master:-)
>>> 
>>> Tooling? No, not yet:-( Well, not yet for lambda users. But there, too,
>>> there is an evolution. The fact is that publishers are working on "XML
>>> first" (or "HTML first") workflows. O'Reilly's Atlas tool[1] means that
>>> authors prepare their documents in, essentially, HTML (well, a restricted
>>> profile thereof), and the output is then produced in EPUB, PDF, or pure
>>> HTML at the end. Companies are created that do similar things and where
>>> small(er) publishers can develop full projects (Metrodigi, Inkling,
>>> Hachette, ...; but I do not think it is possible to use these for a big
>>> conference, although, who knows?). Importantly to this community, these
>>> tools also include annotation facilities, akin to MS Word's commenting
>>> tools.
>>> 
>>> Where does it take us _now_? Much against my instinct and with a bleeding
>>> heart I have to accept that conferences of the size of WWW, but even ISWC
>>> or ESWC, cannot reasonably ask their submitters to submit in ePub (or
>>> HTML). Yet. Not today. It is a chicken and egg problem, and change may
>>> come only with events, as well as more progressive scholarly publishers,
>>> experimenting with this. Just like Daniel (and Bernadette) I would love
>>> to see that happening for smaller workshops (if budget allows, I could
>>> imagine a workshop teaming up with, say, Metrodigi to produce the
>>> workshop's proceedings). But I am optimistic that the change will happen
>>> within a foreseeable time and our community (as any scholarly community,
>>> I believe) will have to prepare itself for a change in this area.
>>> 
>>> Adding my 2� to Daniel's:-)
>>> 
>>> Ivan
>>> 
>>> P.S. For LaTeX users: I guess the main advantage of LaTeX is the math
>>> part. And this is the saddest story of all: MathML has been around for a
>>> long time, and it is, actually, part of ePUB as well, but authoring
>>> proper mathematics is the toughest with the tools out there. Sigh...
>>> 
>>> P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that
>>> space...
>>> 
>>> 
>>> [1] https://atlas.oreilly.com
>>> [2] http://metrodigi.com
>>> [3] https://www.inkling.com
>>> 
>>> 
>>> 
>>> On 04 Oct 2014, at 04:14 , Daniel Schwabe <dschwabe@inf.puc-rio.br> wrote:
>>> 
>>>> As is often the case on the Internet, this discussion gives me a
>>>> terrible sense of dej� vu. We've had this discussion many times before.
>>>> Some years back the IW3C2 (the steering committee for the WWW
>>>> conference series, of which I am part) first tried to require HTML for
>>>> the WWW conference paper submissions, then was forced to make it
>>>> optional because authors simply refused to write in HTML, and eventually
>>>> dropped it because NO ONE (ok, very very few hardy souls) actually sent
>>>> in HTML submissions.
>>>> Our conclusion at the time was that the tools simply were not there,
>>>> and it was too much of a PITA for people to produce HTML instead of
>>>> using the text editors they are used to. Things don't seem to have
>>>> changed much since.
>>>> And this is simply looking at formatting the pages, never mind the
>>>> whole issue of actually producing hypertext (ie., turning the article's
>>>> text into linked hypertext), beyond the easily automated ones (e.g.,
>>>> links to authors, references to papers, etc..). Producing good
>>>> hypertext, and consuming it, is much harder than writing plain text. And
>>>> most authors are not trained in producing this kind of content. Making
>>>> this actually "semantic" in some sense is still, in my view, a research
>>>> topic, not a routine reality.
>>>> Until we have robust tools that make it as easy for authors to write
>>>> papers with the advantages afforded by PDF, without its shortcomings, I
>>>> do not see this changing.
>>>> I would love to see experiments (e.g., certain workshops) to try it out
>>>> before making this a requirement for whole conferences.
>>>> Bernadette's suggestions are a good step in this direction, although I
>>>> suspect it is going to be harder than it looks (again, I'd love to be
>>>> proven wrong ;-)).
>>>> Just my personal 2c
>>>> Daniel
>>>> 
>>>> 
>>>> On Oct 3, 2014, at 12:50  - 03/10/14, Peter F. Patel-Schneider
>>>> <pfpschneider@gmail.com> wrote:
>>>> 
>>>>> In my opinion PDF is currently the clear winner over HTML in both the
>>>>> ability to produce readable documents and the ability to display
>>>>> readable documents in the way that the author wants them to display.
>>>>> In the past I have tried various means to produce good-looking HTML and
>>>>> I've always gone back to a setup that produces PDF.  If a document is
>>>>> available in both HTML and PDF I almost always choose to view it in
>>>>> PDF.  This is the case even though I have particular preferences in how
>>>>> I view documents.
>>>>> 
>>>>> If someone wants to change the format of conference submissions, then
>>>>> they are going to have to cater to the preferences of authors, like me,
>>>>> and reviewers, like me.  If someone wants to change the format of
>>>>> conference papers, then they are going to have to cater to the
>>>>> preferences of authors, like me, attendees, like me, and readers, like
>>>>> me.
>>>>> 
>>>>> I'm all for *better* methods for preparing, submitting, reviewing, and
>>>>> publishing conference (and journal) papers.  So go ahead, create one.
>>>>> But just saying that HTML is better than PDF in some dimension, even if
>>>>> it were true, doesn't mean that HTML is better than PDF for this
>>>>> purpose.
>>>>> 
>>>>> So I would say that the semantic web community is saying that there
>>>>> are better formats and tools for creating, reviewing, and publishing
>>>>> scientific papers than HTML and tools that create and view HTML.  If
>>>>> there weren't these better ways then an HTML-based solution might be
>>>>> tenable, but why use a worse solution when a better one is available?
>>>>> 
>>>>> peter
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 10/03/2014 08:02 AM, Phillip Lord wrote:
>>>>> [...]
>>>>>> 
>>>>>> As it stands, the only statement that the semantic web community are
>>>>>> making is that web formats are too poor for scientific usage.
>>>>> [...]
>>>>>> 
>>>>>> Phil
>>>>>> 
>>>> 
>>>> Daniel Schwabe                      Dept. de Informatica, PUC-Rio
>>>> Tel:+55-21-3527 1500 r. 4356        R. M. de S. Vicente, 225
>>>> Fax: +55-21-3527 1530               Rio de Janeiro, RJ 22453-900, Brasil
>>>> http://www.inf.puc-rio.br/~dschwabe
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C 
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> GPG: 0x343F1A3D
>>> WebID: http://www.ivan-herman.net/foaf#me
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C 
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
> 
> 
> 
> 
> 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Received on Sunday, 5 October 2014 17:18:08 UTC