Hacker News new | past | comments | ask | show | jobs | submit login
Poll: If I asked, would you send me your "Saved Stories"?
82 points by RiderOfGiraffes on Feb 28, 2011 | hide | past | favorite | 61 comments
Before I start on the project I wanted to ask this: would you be willing to send to me the first four pages (items 1 through 120) of your "Saved Stories"?

Obviously this would take some effort on your behalf, but it's not that hard. One downside for you is that I could then see if you've been upvoting my contributions, and you might fear reprisals if you didn't. That, of course, is daft.

But more seriously, you would be releasing some personal data to me, and although I see it as harmless, I would protect it as well as I protect my own personal data. Which is reasonably well.

Even so, you'd have to trust me by as much as you care about the data.

If enough people say yes then I'll design the processing and get back to you. Even then, it might take some time, so this really is just testing the waters with no promises.

But would you?

ADDED IN EDIT: The contributions can be anonymous - yes.

Probably - if you explain your project first.
243 points
Yes, almost certainly.
102 points
Maybe - depends what it's for.
66 points
Probably - if I had time when you asked.
37 points
Yes - if it's anonymous. (added after there were about 50 votes - yes, this will skew the results.)
24 points
Probably not - you'd have to convince me.
22 points
Absolutely not.
10 points



Absolutely, as long as you send me a picture of you riding a giraffe!


Agreed.


or a unicorn, manatee or narwhale. Pony in a pinch.


Just scraped all 345 of my saved stories and exported them to Google Docs using the Scraper Extension[1,2] and the XPath:

    //*[contains(concat( " ", @class, " " ), concat( " ", "title", " " )) and (((count(preceding-sibling::*) + 1) = 3) and parent::*)]//a
I got that nasty XPath using http://www.selectorgadget.com/

Anyone who is interested, the equivalent CSS selector is:

    .title:nth-child(3) a
Note: this just scrapes the title and url, not the submitter or number of upvotes

[1] https://chrome.google.com/webstore/detail/mbigbapnjcgaffohmb...

[2] https://github.com/mnmldave/scraper/


RiderOfGiraffes,

Since it's you, I'll definitely send this data.

It just has been my observation that you actively scan the new page here, and comment/upvote first on a lot of deserving posts that otherwise would sink without a trace. (Unless you have already written a bot for this purpose :-))

So it's a good time to say - Thank you. Just let us know in which format you need this data.

p.s. I do remember you working on a dup detector bot as well a while ago.


There should be an option on here... "if you provide me a quick script to run, then yes, I'll provide the data"


I voted Probably - if you explain your project first. So yeah, if you feel like telling us more about what this is for, I'm very likely game; as long as the project is anything reasonably interesting. I'm guessing some sort of HN meta-analysis of what stories get saved, or why, or when or something?


Well, it is a meta-analysis, but I'm still doing mock-experiments to see if the analsysis I want will be possible, so I don't want to say too much yet. I'm not being secretive, I just need to understand more myself before asking for help or data.

I would certainly provide some information about what I'm doing.

And thanks for responding. Belatedly I've realise that I submitted this when the item will only get 30 to 40 minutes on the "New" page, so I probably won't get many responses. Every reply is worth a lot.


To be very blunt, you'd have to convince me that what you want to do with it was interesting enough to be worth my effort to collect and send you the information - either by coming up with a very cool use or a really easy way for me to send it.

I don't mind sharing that information with you, otherwise.


What do you mean "Saved Stories?"


Upvoted = saved.


Woah. Didn't realize that was there.

That's really a random bunch of stories in my feed. I can't believe I upvoted some of them. But hell, I'll trade the lot for a photo of RiderOfGiraffes doing just that.


I didn't know that as well, but it's a nice feature. More than once i later wished i had kept some bookmark of stories i found via HN. Now i'll just upvote more and save all the precious bits.


Looks like submitted = saved as well.


That's because submitted -> upvoted, and upvoted = saved.


Just spend half an hour throwing together a Chrome plugin and see how many you get to install it.


May take longer than half an hour, but I'd like to volunteer for this [Chrome plugin for "Saved Stories" extraction]. Planned to do something similar for a side project anyway and it sounds like whatever you're cooking up might benefit the community. If RoG is interested, feel free to get in touch: username[ at ]creventures.net


If PG would add a checkbox to our profiles to make the saved history public, I'd check it.


Yes, if you agree to run all of our saved stories through DirectedEdge's API and show me a personalized homepage.

(Hoping that's what you're doing. It seems painfully obvious that someone should).


Currently the DE API doesn't have the concept of things decaying over time (though there are some kludgy work-arounds that a couple folks are using), but we've got a customer of some significance that specifically needs that feature, so it'll probably make its way to the API in short order.

I've thought about nudging pg to see if he'd let me goof around with the actual data some, but I've been worried that it could turn into a distraction.

Beyond that there are a couple interesting things:

• I have some half-nuanced thoughts on the interaction between personalized news and communities, notably that I'm not sure that people want personalized news, though they often think they do. A lot of the function of news is to facilitate "informedness" which breaks down if parts of a community are reading different things.

• I'm not sure that the upvote density on HN is actually high enough to drive personalization. There aren't very many upvotes in aggregate. I only upvote a few things per week, really and the frontpage is really only the sum of < 1000 votes in a community of tens of thousands of people. It's unclear to me that people upvote things enough for there to be enough overlap to do reasonable personalization.


That's very close to what I'd be doing, although not identical. I could probably piggy-back that work onto my intended analysis. Thanks for the suggestion.


I'm more worried about my time than my "privacy", so pray you ask me at a good time ;)


Yes. You've clearly shown you have good intent towards the HN community (your research recently on 'New' upvotes, your dupe detector work, generally being a good poster, etc). I'm interested to see what you'll do with it, and certainly willing to send them once you provide some sort of instructions how you'd like to receive them.

Also, it'd be interesting if there was an option to make our saved stories public a la reddit. I don't think voting rings are much of a concern at HN, at least as far as I can see.


I don’t know, if you dropped those pages then they could be read by those who ride shorter animals.


I have thought about this concept for a while. Since you are broaching the subject, I will throw my thought in.

I thought it would be quite interesting if PG recorded all actions of all HN members. Then did some data mining on the results. I would like to see if there was some correlation between what interested folks, and their success as founders.


The response I wanted to click is, "Yes, if you reported it anonymously." What I did choose is "Yes, almost certainly."


On a similar note, would the project still work if we reported it to you anonymously? I trust that you won't do anything bad with what I up-voted, but others might care more about what they've saved, and maybe letting them give it to you anonymously would get you more submissions.


It would work, but it would make the results less useful to you. Just as useful to me, though, so I wouldn't mind. I just couldn't feed back the results, that's all.


Since you say it would "just as useful to me", then I would assume that means it doesn't have to do with the actual person who submitted it (and correlate to age, or something) because you couldn't do that (easily) without a name attached.

Couldn't you then, in that case, give each anonymous user a number, and say "Anonymous User 1"? On the other hand, given that it might be possible to reverse it to the user who made those upvotes.


Absolutely. I can just call each of you XX_01 up to XX_(however many) and then do my analysis. If it shows something interesting about you and your voting patterns (and I don't know what it might show - this is (informal) research so I don't (yet) know what I'm doing) then I could only tell those who weren't anonymous.

I'll probably try to arrange something so that people can semi-reverse engineer their results. Once I've worked out what I'm doing, and get any results, I'll let people know, and they can decide whether to release their data, or become known, or whatever.

I have thought about this, I haven't got any firm conclusions, my principle is that people's data is theirs to release.


Could you expand on what you mean by "saved stories" and how we would send them to you?


For you it's here: http://news.ycombinator.com/saved?id=pbhjpbhj

This is from your profile page: http://news.ycombinator.com/user?id=pbhjpbhj

Other people can't see your saved stories - only you can. Click on my profile to see the difference:

http://news.ycombinator.com/user?id=RiderOfGiraffes

I would provide instructions.


Wow. The HN interface sure keeps its features carefully hidden.


Ha ha, I've been here a couple of years now and hadn't noticed it - I don't really look at my own profile ... Funnily enough this is a feature I thought was lacking.

Looks like some UI design changes might be needed.


To clarify, stories are automatically “saved” and put in that list when you upvote them. Only upvoted submissions appear in the list, not upvoted comments.


Voted yes, would like to know if there is a way to automatically get a list of all of them, not only the first four pages. If there is, I wouldn't mind to send them all.


All this time I assumed they were public, as on reddit!


You could always provide the option to scrape them via giving a username and password, obviously not everyone would go for this but if you committed to deleting them after it or even noting when your done so people can change pass I would be fine with it.

Just looking at saved some of the most interesting questions would be, at what rate do people upvote/save and how some analysis on the group think I guess by seeing how much of a persons upvoted stories were exceptionally high upvoted stories or something.


While I am one of those that would hand over my saved stories, I wonder about the validity of any conclusions you would draw from the group analysis due to the dual nature of the save button. The same button that is used to up vote an article is also the same button that HNers use to save an article to read later, but might not upvote the article after having read it. I'm not quite sure what could be concluded from such an unclean data point.


Could you expand on that? I was unaware of any way to "save" a story without upvoting it.


That's my point, there is no way to save a story without upvoting it, but the two are mutually exclusive, or atleast should be. People will use the vote button as a way to bookmark stories that they want to read later but are forced to vote as well.


Judging from the comments in this post, I don't think many people knew about the saved stories feature to begin with, and if they do I doubt many upvote a story purely to read it later. There are better services to do that like Instapaper.


Ah, your point is that they may upvote it in order to save it for reading later, but had they read it first, they might not have upvoted it.

Yes, understood. I believe that will not be a significant effect in the analysis I want to perform, but I will keep in in mind.

Thank you.


If I might add a data point, When I am busy, I primarily use the upvote button to read later. It's my instapaper. I upvote the article based on the content when I am not too busy.


Yes... if you give me a button to click that will do it.


Hmm, I don't save stories. I check if they're in delicious (by who and what tags, and I've started checking pinboard.in. Stories that hit here or reddit or, say lambda ultimate, Artima, Infoq etc invariably get into delicious quickly);

Or if it's a topic that's really important to me, if they show up in first page of google or duckDuck Go results with me entering 3 or 4 reasonable sets of search terms.


This data is very interesting to me. I have always had a feeling that upvotes were slightly skewed towards websites that fail against major corporate firewalls. For example, I cannot read blogs at work so I upvote more blogs than I probably would if I could see them at work. Good luck on your venture, and you definitely have my vote!


I would be very nice to know the intention of the data. Most of the companies won't give a clue on why they are collecting some information, however folks here are as curious as they seems (or just me), it would be nicer to have that information as well. At least we know what we are contributing to.


Yes, no problem. Drop me an email when you need them and I'll try to send them in less than one day (everything is pretty tied here, so that may be two days or three, depends on my inbox flow).


That'd be a very interesting data set. I'd love to explore it too - I'm also sure personalized news feeds are achievable and valuable.

PG: why not make an anonymous sample available for us all to play with?


Wow, until now I had assumed upvotes/saved stories were public.


I'm very interested in this. Let us know how you want the data.


Personally I'd be interested in the results and any statistical analysis. Would this be made public?


I only have 46, but I'd be happy to send those over if suitable.


"... But more seriously, you would be releasing some personal data to me ..."

No, but then again how would I be able to stop you?


Sorry, I don't understand your question. Can you elaborate? Thanks.


"... I don't understand your question. Can you elaborate? ..."

while I don't like the idea of being identified, inferred by what I read, I can't actually stop you.


Your saved stories show up on your profile page - but only for you. Nobody else sees that link.


"... [would you be willing to send to me the first four pages (items 1 through 120) of your "Saved Stories"?] Nobody else sees that link. ..."

How daft I didn't see that. Submissions are public ~ http://news.ycombinator.com/submitted?id=Vivtek So RiderOfGiraffes will rely on willing users to submit them? Why not scrape names off existing comments in posts? and submissions?


Because he doesn't want submissions or comments, he wants saved posts, which is really different for me, at least, and I assume for most of us. I've submitted 8 posts (only one of which went front page) of which naturally I didn't upvote any, because I can't. I've upvoted 135 posts ... which is way more than I would have thought, actually. And I've made roughly a thousand comments, which I would estimate would be on maybe 500 posts (because I don't normally get into extended discussion, so I'm guessing I average out to maybe 2 comments a post) - and the posts I comment on are not necessarily those I upvote, because I usually forget that you can upvote posts unless I really like a post.


I will send it all. You do not have to tell me about your project. Just be sure to post it here when you are ready.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: