OpenJournal - Discuss academic papers (open source)

streptomycin · on Feb 26, 2013

When I read papers, I often write short notes to myself about them, as I imagine most people do. I could imagine these notes as also being comments on a site like this, particularly if it's all open and community-run. However, if the paper I'm reading is some 10 year old paper in some niche subfield, which ultimately is the majority of what scientists end up reading, I'm just going to be talking to myself there, so what's the point? I'm not going to submit a paper there that I know no other readers will care about.

How do you overcome that barrier? It's not even network effects alone, it's like extreme network effects because the tail is so long in science. I guess it's really the barrier between being high-level discussion about only the latest big-name science news in certain fields, and being a central hub for all kinds of relatively short comments on scientific papers. Seems like a difficult problem to solve.

Have you thought about maybe pre-populating your database with all papers, ready for discussion, rather than having it as a Reddit-like discussion of only recently submitted things? Or is there some other kind of grand vision for where this will go?

kanzure · on Feb 26, 2013

  > When I read papers, I often write short notes to myself about
  > them, as I imagine most people do.

I picked up a strange habit because of a high school biology teacher.. the vast majority of homework that he assigned was inevitably content creation based on source material. As a consequence, when I read papers now, I compulsively:

* re-draw diagrams in different styles

* make tables for data that wasn't presented as a table

* categorize which claims are associated with which citations

* make lists, lots of lists! You can never have enough lists. Gunkel said so: http://ideonomy.mit.edu/gunkel.html

* change the way information is displayed

* demonstrate equations, or at least attempt to work through them

* make up summaries of content

* paste direct excerpts.. which over time deteriorate in usefulness.

So far, every "annotation solution" I have tried just impedes my work, so I always resort to using some HTTP file server to host my files, or write HTML when I want to add markup to my notes.

mekarpeles · on Feb 26, 2013

One of my buddies (michael) is doing something similar with annotations and social reading @ pen.fm. I'll see how we may be able to incorporate some of these ideas.

I do have around 450k papers from JSTOR (released legally via archive.org). I have these papers indexed and can expose them via openjournal.

Also, as I mentioned in my previous post, a few guys from Berkeley (Tony Chen et al) are work on peer library which will be a more complete non profit academic search engine.

Good points about the 'barrier'. I'm hoping people will contribute their own papers (even if they are no published).

My ultimate vision would be for people to write their papers within a git repo and then upload/submit their .tex source (along with unit tests). I'm in the process of building some of these features.

Also, experimenting with some ocr and pdf analysis to scrape as much useful contextual information as I can from the papers contributed (for the great benefit of our users)

Thanks for taking the time to respond, streptomycin!

streptomycin · on Feb 26, 2013

I do have around 450k papers from JSTOR (released legally via archive.org). I have these papers indexed and can expose them via openjournal.

PubMed would be another obvious datasource, if you were going to go that route.

jayunit · on Feb 26, 2013

Not to speak for Mek, but one hope I have is that these small personal notes automatically make their way onto places like OpenJournal and start to seed conversations. Something like a cross between ScienceBlogs and Twitter, with enough mini metadata to tie to the paper and paragraph.

Do you use a library management tool like Zotero or Mendeley for notetaking, or writing on printouts, or something else?

jamesjporter · on Feb 26, 2013

I use zotero to manage my paper library and its wonderful. One of my favorite features is tag search. If I find a paper where they measure, say, the strength of a particular protein-DNA interaction that I'm interested in, but I know I'm never going to remember the title, I can just tag it with "Pnt binding strength" or something, and then search for it later.

mekarpeles · on Feb 27, 2013

Thanks James, tag / faceted search is something I've envisioned for openjournal. I'll start working on the feature.

streptomycin · on Feb 26, 2013

I used to use plain text files. Recently I started using Mendeley because it's so damn convenient and it has a very nice UI, but my problems with Mendeley are (1) comments are private and (2) it's proprietary.

http://www.researchblogging.org/ is a great website that aggregates more long-form posts about papers, but it has a really shitty UI and it doesn't do anything for short comments on random papers by random non-bloggers.

mekarpeles · on Feb 27, 2013

Let's see what inspiration we can draw from researchblogging.org -- I've read a bunch of data driven / research oriented blogs (or blogs about [understanding] research) and found them helpful.

MrGunn · on Feb 27, 2013

Research blogging is great, but just to clear up the misunderstanding, comments are only private on Mendeley if you make them in a private group. Public groups are open to the web.

mekarpeles · on Feb 26, 2013

I'm a fan of plaus, academia, and mendley, etc. I wish there was more of an outlet/community for people to contribute their own research (in a way that made the research repeatable and improvable)

jayunit · on Feb 26, 2013

Mek, this is great. Thanks for sharing & open-sourcing it. I love the idea of bringing together paper discussions online - whether post-publication peer review like f1000research.com or more casual discussion like r/science.

One challenge I've found is that it's often difficult for discussion sites to gain sufficient traction to build a critical mass of discussion - http://plasmyd.com, http://papercritic.com, http://scicombinator.com, http://chemfeeds.com.

I wonder if focusing on supporting existing small-group interactions (real-life journal clubs) would help?

I took a slightly different approach when I wrote http://www.papernautapp.com and chose instead to aggregate existing discussions about academic papers (mostly blogs, a few news sites, HN, and r/science, with a goal to cover to more sites and mailing lists). It's also freely licensed, and there are some interesting things I discovered that might be useful to OpenJournal (looking at your TODO list and GH issues):

* CrossRef.org runs a ton of cool lookup/crossref/deref services at http://labs.crossref.org/

* They also have some great libraries at https://github.com/crossref/ - who wouldn't geek out at this: http://labs.crossref.org/pdfextract/

* If you want to do some auto-identification on webpages, the https://github.com/zotero/translators project is great and actively maintained by the Zotero community.

(Some notes on how Papernaut is put together, if you're interested: http://jayunit.net/2013/01/06/papernaut-exploring-online-dis... )

Lastly, if you're fostering discussion and feedback on papers, there's overlapping interest with the http://altmetrics.org and http://altmetric.com folks.

ejstronge · on Feb 27, 2013

I think the issue of getting enough traction might be mitigated by restricting the papers that could be discussed. HN is a good example of this - very few posts engender discussions but those that seem promising (by the number of upvotes) are given a spotlight on the front page.

I think an improvement to the services you linked to would be to add a few new articles each week from very selective journals/conferences in each field. I imagine existing measures like a journal's impact score or the number of a conference's attendees would be a good start and tracking blogposts (as you're already doing) could be a good supplement.

This might help pull older or less visible publications out of obscurity; if something published in a ___domain-specific journal is germane to a discussion, a commenter might point this out while discussing a more highly visible article.

mekarpeles · on Feb 27, 2013

Thanks ejstronge and great idea. I'll see if I can get a group of hackers together to post one recommended paper a week.

Do you have any interest in being informed? Also, do you know anyone who may be interested in helping curate / contribute?

ejstronge · on March 1, 2013

Hey mekarpeles, thanks for responding. I'm a biologist by training and would be interested in joining if that's a field you'd be featuring - I think an environment where people could discuss recent, high-impact biology articles would be great.

yliu · on Feb 26, 2013

Something like this is long overdue, I think. Great work. Open academic publication models have had a difficult time for a number of reasons, but systems like this are very helpful in making the case for openness.

One thing that always bothers me with a purely Reddit-style, point-based system for surfacing academic discussions across domains, though, is that it's unclear what kind of papers are being surfaced: a very good paper in a very niche space may not get the attention that a mediocre paper written for a mass audience (for some definition of "mass") would. Is that an acceptable drawback for openjournal? Or should there be some way for niche papers to gain exposure? Forking openjournal and making your own "sub-openjournal" for your research ___domain? Weighted voting mechanisms?

Also, like reddit, it might be useful to have a mechanism to demonstrate, emphasize, and/or sort by specific commenters' backgrounds, training, and credentials. For many domains, peer review and commentary from people in the same field might be more useful than general commentary.

As a minor wish, I've always wanted to see a mechanism for encouraging sharing of implementations, test code, and other raw experimental results along with the actual papers. 'Cause really, for most cases, I'm not going to implement a multi-page algorithm just to verify a conclusion or make use of an insight. But if I can fork and compile a github repo associated with the paper...

robrenaud · on Feb 26, 2013

Why use this instead of a subreddit?

I'd love a great solution to this problem and I'd even consider trying to build one, but I am not sure there is any money in it.

One thing I'd love in an academic paper reader is something that allowed comments/annotations inline with the paper. For example, if a paper in the future contradicts something that is stated, you could add a comment linking to the contradiction. Or you could merely ask and provide clarifications, or comment on simpler alternatives to given part of the paper.

Also, I'd like to be able to rate papers for say, readability or difficulty, tag them as theoretical or empirical, etc.

I'd also like if cited papers were automatically dereferenced so I didn't have to hunt down the references myself.

Personalization would also be a nice feature. EG, recommend other papers by the same author, or other highly cited papers that cite/are cited a given paper, or frequent co-authors of some author that I like.

I'd love to be able to download a bunch of papers easily for offline viewing.

mhluongo · on Feb 27, 2013

We launched http://scholr.ly in January- we've still got plenty of issues, but we handle some of your use cases, like citation linking. We also first-class authors so you can see an easy summary of a researcher's work. We're still working on personalization and have kicked around tagging/rating for some time- I'd love your feedback.

It seems like the rest of your issues could be solved by Mendeley- WDYT?

mekarpeles · on Feb 27, 2013

mhluongo -- Love that you specialize in search and have author profiles.

You guys should get in touch with the peer library guys, send me an email if you'd like an intro. I'd love to see more collaboration in the space.

Internet Archive (archive.org) is also interested in contributing to the space and has been super helpful in aiding our efforts at open journal.

I think the three biggest problems in the space are (1) discovery + accessibility (including open-access), (2) collaboration (sharing, commenting, contributing), and (3) quality assurance (maintainability, scm-backed, repeatable research).

There are many solutions to target discovery and accessibility but I'm (as an academic) personally dissatisfied with the level sharing/collaboration/openness, the lack of community, and the lack of standards in academic research. I think the world needs for academia and research what github did for social programming.

mhluongo · on Feb 27, 2013

mekarpeles - an intro to the PeerLibrary team would be awesome- I'll do that.

I've loved the GitHub for science analogy ever since it first surfaced a couple years ago- I couldn't agree more.

mekarpeles · on Feb 26, 2013

rob, not really interested in the monetary aspects of this problem, more interested in the open-access nature.

There is a subreddit for academic papers, I figured it would make sense having something ultra-targeted for computer science papers and I didn't feel like the right community was using the subreddit (the results quality wasn't great). It did teach me that a lot of people like requesting papers, so this is something I am considering.

Also, this was a good opportunity for me to attack a problem I am really passionate about while testing out a web framework I've been writing (waltz).

Finally, I did the project for sentimental reasons. I was talking to Aaron Swartz about open journal a while back over skype and was looking forward to working with him on it, so I thought it would be nice to finish it in his honor :o)

You have some great ideas (mass download is something I've seen requested by some of my friends from my phd program). There's a team called peer library who's attacking many of these problems and I'll be giving them as much support as I can.

Thanks for your kind remarks + great feedback

nhaehnle · on Feb 26, 2013

The academic world could benefit from a place that allows commenting on any paper (you can have too many comments; not sure how moderating such a system would work).

To this end, it would be great if this were written in such a way that it implicitly considers papers from all the "standard" academic sources as part of the system, ideally with duplicate removal.

That is, automatically add articles from arXiv and major currently existing journals and conferences, try to automatically detect duplicate papers (perhaps add a concept of versions of papers).

In addition, such a site could really benefit by having "virtual journals", where users collect topical collections of must-read papers.

mtrn · on Feb 26, 2013

This is great and I would use this. I even think about porting/forking/stealing this for a German audience.

The only thing that would keep such a site from growing is the relative reservation of less technical crowds (at least that has been my observation). HN, proggit, SO: they are all useful and fun for the technical minded. Similar sites for other segments (excluding cats, cats always win) are much less active and sometimes fail to attract some critical mass.

mekarpeles · on Feb 27, 2013

Please do, let me know how I can help you with it.

[email protected]

mekarpeles · on Feb 26, 2013

Would love feedback.

Please feel welcome to submit issues on github and I will try to deal with them in realtime: https://github.com/mekarpeles/openjournal/issues

mekarpeles · on Feb 26, 2013

Also, thanks for your patience, it's running on a micro EC2 and I haven't put too much effort yet into optimizing request handling (just using waltz over web.py at this point)

stillbourne · on Feb 26, 2013

RSS/Atom please.

mekarpeles · on March 3, 2013

https://hackerlist.net:1443/rss -- done, please let me know if you'd like it in a different format or if there are any mistakes. Thanks, stillbourne!

mekarpeles · on Feb 26, 2013

Adding this to the github issue tracker, I'll get this setup for you as soon as I have a break in my schedule today. Thanks!

mekarpeles · on Feb 26, 2013

Issue oppened -- thanks! https://github.com/mekarpeles/openjournal/issues/16