When I read papers, I often write short notes to myself about them, as I imagine most people do. I could imagine these notes as also being comments on a site like this, particularly if it's all open and community-run. However, if the paper I'm reading is some 10 year old paper in some niche subfield, which ultimately is the majority of what scientists end up reading, I'm just going to be talking to myself there, so what's the point? I'm not going to submit a paper there that I know no other readers will care about.
How do you overcome that barrier? It's not even network effects alone, it's like extreme network effects because the tail is so long in science. I guess it's really the barrier between being high-level discussion about only the latest big-name science news in certain fields, and being a central hub for all kinds of relatively short comments on scientific papers. Seems like a difficult problem to solve.
Have you thought about maybe pre-populating your database with all papers, ready for discussion, rather than having it as a Reddit-like discussion of only recently submitted things? Or is there some other kind of grand vision for where this will go?
> When I read papers, I often write short notes to myself about
> them, as I imagine most people do.
I picked up a strange habit because of a high school biology teacher.. the vast majority of homework that he assigned was inevitably content creation based on source material. As a consequence, when I read papers now, I compulsively:
* re-draw diagrams in different styles
* make tables for data that wasn't presented as a table
* categorize which claims are associated with which citations
* demonstrate equations, or at least attempt to work through them
* make up summaries of content
* paste direct excerpts.. which over time deteriorate in usefulness.
So far, every "annotation solution" I have tried just impedes my work, so I always resort to using some HTTP file server to host my files, or write HTML when I want to add markup to my notes.
One of my buddies (michael) is doing something similar with annotations and social reading @ pen.fm. I'll see how we may be able to incorporate some of these ideas.
I do have around 450k papers from JSTOR (released legally via archive.org). I have these papers indexed and can expose them via openjournal.
Also, as I mentioned in my previous post, a few guys from Berkeley (Tony Chen et al) are work on peer library which will be a more complete non profit academic search engine.
Good points about the 'barrier'. I'm hoping people will contribute their own papers (even if they are no published).
My ultimate vision would be for people to write their papers within a git repo and then upload/submit their .tex source (along with unit tests). I'm in the process of building some of these features.
Also, experimenting with some ocr and pdf analysis to scrape as much useful contextual information as I can from the papers contributed (for the great benefit of our users)
Thanks for taking the time to respond, streptomycin!
Not to speak for Mek, but one hope I have is that these small personal notes automatically make their way onto places like OpenJournal and start to seed conversations. Something like a cross between ScienceBlogs and Twitter, with enough mini metadata to tie to the paper and paragraph.
Do you use a library management tool like Zotero or Mendeley for notetaking, or writing on printouts, or something else?
I use zotero to manage my paper library and its wonderful. One of my favorite features is tag search. If I find a paper where they measure, say, the strength of a particular protein-DNA interaction that I'm interested in, but I know I'm never going to remember the title, I can just tag it with "Pnt binding strength" or something, and then search for it later.
I used to use plain text files. Recently I started using Mendeley because it's so damn convenient and it has a very nice UI, but my problems with Mendeley are (1) comments are private and (2) it's proprietary.
http://www.researchblogging.org/ is a great website that aggregates more long-form posts about papers, but it has a really shitty UI and it doesn't do anything for short comments on random papers by random non-bloggers.
Let's see what inspiration we can draw from researchblogging.org -- I've read a bunch of data driven / research oriented blogs (or blogs about [understanding] research) and found them helpful.
Research blogging is great, but just to clear up the misunderstanding, comments are only private on Mendeley if you make them in a private group. Public groups are open to the web.
I'm a fan of plaus, academia, and mendley, etc. I wish there was more of an outlet/community for people to contribute their own research (in a way that made the research repeatable and improvable)
Mek, this is great. Thanks for sharing & open-sourcing it. I love the idea of bringing together paper discussions online - whether post-publication peer review like f1000research.com or more casual discussion like r/science.
I wonder if focusing on supporting existing small-group interactions (real-life journal clubs) would help?
I took a slightly different approach when I wrote http://www.papernautapp.com and chose instead to aggregate existing discussions about academic papers (mostly blogs, a few news sites, HN, and r/science, with a goal to cover to more sites and mailing lists). It's also freely licensed, and there are some interesting things I discovered that might be useful to OpenJournal (looking at your TODO list and GH issues):
* If you want to do some auto-identification on webpages, the https://github.com/zotero/translators project is great and actively maintained by the Zotero community.
I think the issue of getting enough traction might be mitigated by restricting the papers that could be discussed. HN is a good example of this - very few posts engender discussions but those that seem promising (by the number of upvotes) are given a spotlight on the front page.
I think an improvement to the services you linked to would be to add a few new articles each week from very selective journals/conferences in each field. I imagine existing measures like a journal's impact score or the number of a conference's attendees would be a good start and tracking blogposts (as you're already doing) could be a good supplement.
This might help pull older or less visible publications out of obscurity; if something published in a ___domain-specific journal is germane to a discussion, a commenter might point this out while discussing a more highly visible article.
Hey mekarpeles, thanks for responding. I'm a biologist by training and would be interested in joining if that's a field you'd be featuring - I think an environment where people could discuss recent, high-impact biology articles would be great.
Something like this is long overdue, I think. Great work. Open academic publication models have had a difficult time for a number of reasons, but systems like this are very helpful in making the case for openness.
One thing that always bothers me with a purely Reddit-style, point-based system for surfacing academic discussions across domains, though, is that it's unclear what kind of papers are being surfaced: a very good paper in a very niche space may not get the attention that a mediocre paper written for a mass audience (for some definition of "mass") would. Is that an acceptable drawback for openjournal? Or should there be some way for niche papers to gain exposure? Forking openjournal and making your own "sub-openjournal" for your research ___domain? Weighted voting mechanisms?
Also, like reddit, it might be useful to have a mechanism to demonstrate, emphasize, and/or sort by specific commenters' backgrounds, training, and credentials. For many domains, peer review and commentary from people in the same field might be more useful than general commentary.
As a minor wish, I've always wanted to see a mechanism for encouraging sharing of implementations, test code, and other raw experimental results along with the actual papers. 'Cause really, for most cases, I'm not going to implement a multi-page algorithm just to verify a conclusion or make use of an insight. But if I can fork and compile a github repo associated with the paper...
I'd love a great solution to this problem and I'd even consider trying to build one, but I am not sure there is any money in it.
One thing I'd love in an academic paper reader is something that allowed comments/annotations inline with the paper. For example, if a paper in the future contradicts something that is stated, you could add a comment linking to the contradiction. Or you could merely ask and provide clarifications, or comment on simpler alternatives to given part of the paper.
Also, I'd like to be able to rate papers for say, readability or difficulty, tag them as theoretical or empirical, etc.
I'd also like if cited papers were automatically dereferenced so I didn't have to hunt down the references myself.
Personalization would also be a nice feature. EG, recommend other papers by the same author, or other highly cited papers that cite/are cited a given paper, or frequent co-authors of some author that I like.
I'd love to be able to download a bunch of papers easily for offline viewing.
We launched http://scholr.ly in January- we've still got plenty of issues, but we handle some of your use cases, like citation linking. We also first-class authors so you can see an easy summary of a researcher's work. We're still working on personalization and have kicked around tagging/rating for some time- I'd love your feedback.
It seems like the rest of your issues could be solved by Mendeley- WDYT?
mhluongo -- Love that you specialize in search and have author profiles.
You guys should get in touch with the peer library guys, send me an email if you'd like an intro. I'd love to see more collaboration in the space.
Internet Archive (archive.org) is also interested in contributing to the space and has been super helpful in aiding our efforts at open journal.
I think the three biggest problems in the space are (1) discovery + accessibility (including open-access), (2) collaboration (sharing, commenting, contributing), and (3) quality assurance (maintainability, scm-backed, repeatable research).
There are many solutions to target discovery and accessibility but I'm (as an academic) personally dissatisfied with the level sharing/collaboration/openness, the lack of community, and the lack of standards in academic research. I think the world needs for academia and research what github did for social programming.
rob, not really interested in the monetary aspects of this problem, more interested in the open-access nature.
There is a subreddit for academic papers, I figured it would make sense having something ultra-targeted for computer science papers and I didn't feel like the right community was using the subreddit (the results quality wasn't great). It did teach me that a lot of people like requesting papers, so this is something I am considering.
Also, this was a good opportunity for me to attack a problem I am really passionate about while testing out a web framework I've been writing (waltz).
Finally, I did the project for sentimental reasons. I was talking to Aaron Swartz about open journal a while back over skype and was looking forward to working with him on it, so I thought it would be nice to finish it in his honor :o)
You have some great ideas (mass download is something I've seen requested by some of my friends from my phd program). There's a team called peer library who's attacking many of these problems and I'll be giving them as much support as I can.
The academic world could benefit from a place that allows commenting on any paper (you can have too many comments; not sure how moderating such a system would work).
To this end, it would be great if this were written in such a way that it implicitly considers papers from all the "standard" academic sources as part of the system, ideally with duplicate removal.
That is, automatically add articles from arXiv and major currently existing journals and conferences, try to automatically detect duplicate papers (perhaps add a concept of versions of papers).
In addition, such a site could really benefit by having "virtual journals", where users collect topical collections of must-read papers.
This is great and I would use this. I even think about porting/forking/stealing this for a German audience.
The only thing that would keep such a site from growing is the relative reservation of less technical crowds (at least that has been my observation). HN, proggit, SO: they are all useful and fun for the technical minded. Similar sites for other segments (excluding cats, cats always win) are much less active and sometimes fail to attract some critical mass.
Also, thanks for your patience, it's running on a micro EC2 and I haven't put too much effort yet into optimizing request handling (just using waltz over web.py at this point)
How do you overcome that barrier? It's not even network effects alone, it's like extreme network effects because the tail is so long in science. I guess it's really the barrier between being high-level discussion about only the latest big-name science news in certain fields, and being a central hub for all kinds of relatively short comments on scientific papers. Seems like a difficult problem to solve.
Have you thought about maybe pre-populating your database with all papers, ready for discussion, rather than having it as a Reddit-like discussion of only recently submitted things? Or is there some other kind of grand vision for where this will go?