HN front page, 16000 visitors in a day, how many actually read the article?

gnosis · on May 12, 2013

I rarely read even the articles with interesting titles that make it to HN's front page.

This is because when I did read more of them, they usually turned out to be a lot less interesting than the ensuing discussion on HN.

So now I use the HN discussion as a proxy for article quality. In the HN discussion I can often find a good summary of the article and get a sense of whether the article is likely to be worth reading or not.

Only maybe 1 out of 10 articles or less that I look interesting to me on HN wind up ones that I actually bother to click through. And of the ones I click through, only 1 out of 10 wind up deserving of being read rather than skimmed.

Some years back, there were a couple of "HN Full Feed" type RSS feeds, that would send the contents of the entire linked article, so I could read them without even bothering to go on to the web site.

I valued these services not only because they were more convenient in that it made clicking through and waiting for the aritcle to load no longer necessary, but also because there'd be less tracking of my interests this way.

I also have javascript disabled for 99% of the sites I visit, and am considering starting to use TOR for more of my browsing. It's really nobody's business what I'm reading, and it's a real pity the Internet wasn't built with more inherent privacy and anonymity features.

consz · on May 12, 2013

Agreed, I'm the same way, I probably haven't clicked an article on HN in weeks, I usually just read the comment sections of articles that have interesting titles.

onlyup · on May 12, 2013

Same, didn't even read this one

JoeAltmaier · on May 12, 2013

Yet that discussion is littered with uninformed pot-shots at the topic. We can hope that more than 6% of the HN comments come from folks who read/understand the topic I guess.

Btw how did you accumulate your '1 in 10' stats? Off the cuff? Can you think of a way to measure this? Because I don't think 1 in 10 articles 'deserve to be read'. Somebody DID read them, took the time to post them here. So for some audience at least they were meaningful.

gnosis · on May 12, 2013

"Btw how did you accumulate your '1 in 10' stats? Off the cuff?"

Yep. Just a rough estimate based on my sense of how many articles I actually bother clicking through to. Having a more accurate estimate of my own click-through rate would not be valuable to me, so I never bothered to try to find out.

"Can you think of a way to measure this?"

If I was interested in gathering such stats for myself, I suppose I could use a browser add-on to measure my HN use.

Alternatively, HN could start using indirect links. But I'm not sure if I'd stay with HN if they started doing that. I hate being spied upon, which is one major reason I stopped using Google, and would probably drop HN as well if they started going down that road. Not that HN really needs to do that, since they already know which HN discussion pages I open and what I write (which are reasons for me to start making myself a bit more anonymous in my HN use).

"Because I don't think 1 in 10 articles 'deserve to be read'. Somebody DID read them, took the time to post them here. So for some audience at least they were meaningful."

It all depends on who the audience is, doesn't it? If you aim for the lowest common denominator, you'll probably get a bigger audience. This is a major reason for much of the mainstream media content being such utter garbage (from my perspective).

Also, just because someone clicked through on an HN link doesn't mean that they liked what they found when they got there. The same goes for tracking of people clicking on "Like" buttons or even sending links to their friends.

People could have all sorts of reasons for clicking "Like" buttons that have nothing to do with them enjoying or even reading the content. And I can't count the number of times I've forwarded unread articles to friends because I thought it might be something they might be interested in, but that I had no interest in myself.

qu4z-2 · on May 13, 2013

Hah, it's like Slashdot all over again.

"You read the article? You must be new here!", etc

lenazegher · on May 12, 2013

These data are not being interpreted correctly. Analytics calculates time-on-page based on the time between loads of the google JS embedded in your pages. Any visitor who 'bounces' - that is, only visits a single page - only loads the JS once, so their time-on-page is recorded as 0 seconds, regardless of how long they actually spent on the page.

_mpf · on May 12, 2013

I`ve checked how it works with "Real-Time" option of the Google Analytics. When i closed a tab Google Analytics knew that i stopped reading the page (the on-line user counter was updated -1). Maybe it works different with a standard report (i didn`t read the code), but maybe not? They could use javascript browser events detection and i don`t think such a bug would remain unseen.

rickdale · on May 12, 2013

Are you absolutely sure this is how it works? I have a 1 page website where the average time is 1:36, shouldn't it stay at 0 since the users are not clicking to other pageS?

lenazegher · on May 12, 2013

Reasonably sure, unless I'm misinterpreting something: https://support.google.com/analytics/answer/1006253?hl=en

itsybitsycoder · on May 12, 2013

Maybe I'm misinterpreting something, but the way I understand it, the last (or only) page of a website is handled differently. If it's a single page website with a demo download, and someone downloads the demo, then the time between pageload and the download would be counted. If there's a Flash movie, the time between loading the page and playing the movie would be counted. It also looks like you can come up with your own custom events to track, maybe like clicking an outgoing link?

I've never seriously used GA so take my interpretation with a basketball-sized grain of salt.

_mpf · on May 12, 2013

I`ve read that doc, and it looks like you are right. Now i`m even more courious how they measure single hit reading time.

jonknee · on May 12, 2013

What is your pages per visit number?

rickdale · on May 12, 2013

Before changing the site to a multiple page site, I was averaging 1.2 pages/visit. Again with a 1 page site, but I would develop on the server so I figured GA was picking that up. Since changing to a multiple page site my pages/vist is 2.55 and my average time on site has increased by like 20 seconds, which seems a lot. Maybe people are on 56k, haha.

gabipurcaru · on May 12, 2013

My guess is that a small percentage of users refresh the page after a while (if the traffic is small, maybe you're weighing in on that yourself)

jfoster · on May 12, 2013

But does anyone actually know? I can see from this thread that plenty of people thought they knew, but it seems nearly half thought it worked one way, and nearly half thought it worked the other way.

plam · on May 12, 2013

I apologise for my naive misunderstanding of the Analytics metric. I've updated the post to point this out at the top of my post. Thanks for the explanation!

ramykhuffash · on May 12, 2013

Exactly what I came here to say. Those people could have spent 10 hours really absorbing that one article and their time would have still been recorded as 0 seconds.

mikek · on May 12, 2013

The time on site is calculated as the time on the last page hit (or engagement hit) minus the time of the first page hit.

https://support.google.com/analytics/answer/1006253?hl=en

MasterScrat · on May 12, 2013

> Analytics calculates time-on-page based on the time between loads of the google JS embedded in your pages.

Source?

They may be using JavaScript to track that more accurately.

stopcyring · on May 12, 2013

what if i told you they use the onUnload event?

tehwebguy · on May 12, 2013

Do they?

stopcyring · on May 12, 2013

they track heatmaps might as well count that + 100 other hacks.

plam · on May 12, 2013

do they?

cosmie · on May 12, 2013

There seems to be a lot of confusion about how GA tracks user engagement, which is understandable as even the Support article linked in another comment doesn't accurately explain what happens with single page visits.

First off, the metric by definition will always be skewed lower than reality. For multi-page visits, GA takes the time of the first hit and time of the second hit to calculate time on page (and will chain these together to get time on site). Since the page the user leaves on doesn't have a "second hit", that time is never included.[1]

For single page visits, as blog posts tend to be, the calculation is slightly different.[1]

   Time on Page = (time of last “engagement hit” on page) – (time of first hit from page)

If you set up Event Tracking to trigger as a user scrolls to predetermined lengths of your article, it'll trigger these 'engagement hits' and give you a better approximation of time on site. If you just throw in a standard tracking code that fires off a _trackPageview() event on page load, then GA will never see a second engagement and will not be able to calculate any approximation of time on page/site, so it'll default into the "less than 10 seconds" bucket. Depending on what blogging platform you're using, there are some add-ins that provide such functionality.[2]

[1] http://cutroni.com/blog/2012/02/29/understanding-google-anal...

[2]http://www.analytics-ninja.com/blog/2012/06/google-analytics...

happyshadows · on May 12, 2013

Analytics calculates the time on page by the time difference _between_ page hits. One hit: 0 seconds on site. Because of this, it isn't an accurate metric to measure engagement for a single blog post.

plam · on May 12, 2013

I assumed they have some fancy javascript thing to take care of that? what would you suggest to use to estimate actually readership?

happyshadows · on May 12, 2013

I would stick with GA and just trigger an event via javascript once the body of text is scrolled through.

If I didn't know how to do that, I would probably use a scrollmap tool like CrazyEgg.

RivieraKid · on May 12, 2013

Most of long articles are just a waste of time, the actual information can be condensed into a single paragraph or less and the rest is just redundancy or useless information.

kijin · on May 12, 2013

As other commenters have mentioned, 0 seconds doesn't mean anything. Meanwhile, those 18 visitors who spent more than 1800+ seconds on your page? Probably they just opened the page in a new tab and only got around to reading it a few hours (or days) later. So data at both extremes are useless.

If we ignore the 0-second anomaly, it looks like we've got a nice bell curve peaking between 180-600 seconds, probably closer to 180 than to 600. That sounds about right for a 670-word article.

edent · on May 12, 2013

Interesting, that ties in with my observations of around 700 visitors per hour on the front page - http://shkspr.mobi/blog/2012/11/whats-the-front-page-of-hack...

While I didn't track engagement time, I looked at number of comments (both here and on my posts, and shares on Twitter and Facebook) to try and figure out how much of it was "real" traffic.

olalonde · on May 12, 2013

Here are my stats for two of my blog posts that made the front page:

http://syskall.com/how-to-roll-out-your-own-javascript-api-w...

    3051 visits
    00:00:16 average visit duration
    98.9% less than 10 seconds

http://syskall.com/yc-w12-startups-hosting-decisions/index.h...

    3920 visits
    00:00:14 average visit duration
    99.1% less than 10 seconds

Somewhat depressing...

(edit: according to lenazegher's comment the average visit duration stats might not be as bad as they look since my bounce rate was pretty high, ~95% on both posts)

johnpowell · on May 12, 2013

Interesting. I posted a link to my shithole of a site in a comment a hundred deep in a post that had reached the top of the frontpage and had fallen to pretty much the bottom when I posted in the thread.

I got a extra 300 visitors that day and about 50 the next. The average visitors per day is around 25 so this is a big and noticeable spike.

I guess I am kinda shocked a random link in the middle of a dying thread generated that much traffic while something hitting the frontpage only generated about 53 times more traffic.

dasil003 · on May 12, 2013

You should put your site in your profile so you can see how many people come after you post an offhand reference to the fact that you may have once written something interesting elsewhere.

johnpowell · on May 12, 2013

Well I thought it was interesting that my shitty comment that linked to something I wrote got so many hits while getting on the frontpage got so little in comparison.

I could get your angst if I had linked to it again or actually linked to it in my profile. But I did neither. So now I just think you are a dick.

bennyg · on May 12, 2013

I had a Show HN that made no headway on the frontpage (like 3 karma then it disappeared into the depths), but I posted that same link, relevantly, as a comment somewhere in the middle of the thread and it absolutely blew up. It's now my highest starred repo on Github and I started the phone interview process with Hulu because of it. Don't underestimate the comments here. A TON of people browse these posts too.

dasil003 · on May 12, 2013

Angst? I wasn't being facetious, I actually checked your profile for a link to your site.

e12e · on May 12, 2013

This discussion highlights exactly why I don't consider GA a very useful tool - there is no real transparency as to what and how data is collected/measured/filtered [That I've been able to find, anyway].

So in the end, the only useful information you get from GA data, is the rate of change (which is useful for many things) -- but not, for instance, the actual number of visits to your pages -- because you have no idea what is counted and what isn't -- and what is considered a visit.

feniv · on May 12, 2013

If the article is long, I usually add it to instapaper or pocket to read later. The time spent on the actual site is low but I still engage with the content.

plam · on May 12, 2013

good point, I'll keep an eye on returning visitor metric over the next few days

Smerity · on May 12, 2013

I was just discussing this with a friend today. We both had front page stories on HN recently. He reported 6k of 6.8k[1] leaving within 10 seconds and I saw that 11k of 13.5k[2] left within 10 seconds.

If these numbers aren't accurate due to Google Analytics, I'd be interested to know a way to get the accurate numbers.

The other annoying thing was that, HTTPS never sends referrers. Hence, not a single one of my visits said it was from Hacker News.

I know, you don't want to leak the referrer in most circumstances when it's HTTPS, but it just seems so vital. The Internet was made and understood by referrals and links, lacking an ability to see referrers seems quite unfortunate, especially if all the Internet ends up HTTPS.

Google and Facebook are the only ones who would be able to stitch together significant portions of referral traffic due to Google Analytics or Facebook Like / Connect. Everyone else is just left stumbling around blind.

[1]: https://twitter.com/taybenlor/status/326622962377695232

[2]: https://twitter.com/Smerity/status/333534743670951936

jseliger · on May 12, 2013

I was just discussing this with a friend today. We both had front page stories on HN recently. He reported 6k of 6.8k[1] leaving within 10 seconds and I saw that 11k of 13.5k[2] left within 10 seconds.

This is probably minority behavior, but I will often use Instapaper to bookmark articles for later, and then read a batch together. For most in-depth articles, I probably spend less than ten seconds decide whether I should click "Read Later" and then leaving, even though I do in fact read later.

Smerity · on May 12, 2013

I did consider the possibility of Instapaper, Readability or other similar apps, but as you say I couldn't imagine they'd be the majority, even on the relatively tech savvy Hacker News.

As an example of alternate HN clients, hckrnews.com had 124 referrals, ihackernews.com had 85, HackerWeb had 77 and PulseWeb had 48. That's a grand total of around 300 out of 13.5k.

I'd imagine Readability and Instapaper to be big but probably only some small multiple of that at best.

gwern · on May 12, 2013

> As an example of alternate HN clients, hckrnews.com had 124 referrals, ihackernews.com had 85, HackerWeb had 77 and PulseWeb had 48. That's a grand total of around 300 out of 13.5k.

I see similar proportions for submissions of my pages as well.

zerd · on May 12, 2013

To get an accurate number of the duration of a visit you'd have to make a ajax request every second or so, or somehow make a request whenever the user closes the tab. Also, you need to measure if the tab loses focus. To get even more accurate you could store the amount of scroll the user has done over the duration, to see if the visitor actually read to the bottom. These ways are seen as quite invasive just to get the duration of a visit. I know that e.g. woopra makes a request every 10 seconds with different metrics, so they would at least get better accuracy than google analytics.

Smerity · on May 12, 2013

Understood. Other services do accurately measure this however. ChartBeat[1] tries to differentiate between users who are reading, writing and so on. Re: your example, it does literally keep track of how far you scrolled as well.

Unfortunately, my personal website is not their target market. Plans start at $9.95 per month: infinitely more than my hosting cost and also overkill for whenever I'm not front paged. It'd only really make sense for me if it had pricing based on usage, which I can almost guarantee they'll never launch.

I'd argue that Google Analytics is vague in defining or explaining their terms. In a perfect world, yes, they'd offer "pings" to see if a user is active, but even without that I'd at least hope they clarify their terms. Misunderstood terms lead to misunderstood statistics. The less misunderstood statistics, the better.

[1]: https://chartbeat.com/

RogerDodger_n · on May 12, 2013

I suspect that if you only make one page view, Google will assume your visit was 0 seconds. Most HN visitors will read the article and close the tab.

gwern · on May 12, 2013

Here's a bit of data: last week my page http://www.gwern.net/Google%20shutdowns was submitted to Hacker News and hit the front page for a while, racking up thousands of visitors. As it happens, I was running an A/B test on fonts, where a JavaScript timer sleeps 40 seconds and then fires, telling Google Analytics that a reader has 'converted'. (This hopefully avoids the bouncing distortion of the 'time on page' metric.) So, what percentage of readers stayed on the page long enough for the timer to fire after 40 seconds? (The Markdown source is somewhere around 12k words, so it's not the quickest read in the world.)

~18%

(See http://dl.dropboxusercontent.com/u/85192141/Analytics%20www.... )

BCM43 · on May 12, 2013

What happens if a user is running noscript?

gwern · on May 12, 2013

Then they won't be counted either in the page load (Analytics was never run) or conversion figures (both Analytics and the conversion trigger will never run).

plam · on May 12, 2013

at this rate, I'm now hoping that I can do an analysis of the analysis of a viral post that also have gone viral. how awesome would that be? :)

brudgers · on May 12, 2013

If I read your page on my desktop, JavaScript was off - unless I had forgotten to revoke temporary permissions after whitelidting Google analytics on another page.

Which illustrates that Google analytics reports something, but what it reports is what it reports. To put it another way, Google Analytics records information useful to Google. What it reports back to the datapoint is designed to appear useful to the datapoint. The purpose of the information provided is solely to encourage the datapoint to keep using Google Analytics so that Google can keep using the datapoint's website to track people on the internet.

gojomo · on May 12, 2013

A post that estimated how many HN visitors block Google Analytics would be useful.

trhiawd · on May 12, 2013

It may surprise you, but HN visitors are pretty low on the scale of privacy-concerned. /. and /g/ are far more active in using privacy tools.

jonknee · on May 12, 2013

That's not too surprising as a decent percentage of HN visitors are currently making tools that destroy privacy.

trhiawd · on May 12, 2013

s/destroy/monetize/

qu4z-2 · on May 13, 2013

Pretty sure more of them monetize the lack of privacy, and destroy privacy to do it. Just sayin'.

gojomo · on May 12, 2013

It might surprise me a little. What are the numbers?

hispanic · on May 12, 2013

For me (echoing some of what gnosis has stated), the real value that HN brings is the discussion. Frequently, I find the comments, insights, opinions, and tangents elicited by HN submissions to be more interesting and thought-provoking than the submissions themselves. I typically browse through the discussion a good bit before ever clicking through to the article/site which initially drove the discussion. There are plenty of "show-and-tell" mechanisms on the Web. What sets HN apart, in my mind, is the round-table that develops in response to a lot of those submissions.

thauck · on May 12, 2013

Like has been mentioned by many, this is incorrect interpretation... and quite common to see on blogs or single page sites. Although technically it's not dependent pageviews, but interactions (so pageviews or events).

So, one common way to handle this on blogs is to use setTimeout in conjunction with an event. Basically you fire an event after 15 or so sections which will then count as an interaction.

tylerneylon · on May 12, 2013

As another data point, I recently had a 60% read rate on a post that was on HN's front page for a while.

It was a post on medium.com about Pac-Man. Medium tracks number of views and number of reads per day. I think they use a metric that's not just time-on-page to differentiate between views and reads. My post had about 26k views and about 15k reads the day it was on the front page.

huhtenberg · on May 12, 2013

Paul, you may want to compare raw web server logs to the numbers you get from GA. I wouldn't be surprised if there's a big discrepancy, especially when there's HN in the mix. Moreover, those who are nerdy enough to surf with tracking scripts blocked might be the ones who actually read the article ;)

koshak · on May 12, 2013

Can anyone count those who read translations without link to the original article or to the HN discussion?...

Do this stats make any sense at all?

Rephrase: can anyone count positive effect of the articles mentioned on HN and further discussions to them?

iM8t · on May 12, 2013

I tend to quickly go through the front page and bookmark the articles that I'm interested in. Then, when I have the time and I'm on my tablet - I read them.

It may be that I'm not the only person that does this kinda thing.

petercooper · on May 12, 2013

Here's another way to measure engagement for longer content, scroll depth: http://robflaherty.github.io/jquery-scrolldepth/

MasterScrat · on May 12, 2013

A tool to record which portion of the screen was visible for how long would be interesting.

Using something like this for example: http://larsjung.de/fracs/

NathanKP · on May 12, 2013

The most traffic a ___domain controlled by me ever received from HN was about 3000 uniques: 2000 the first day, 1000 the next.

jamescmartinez · on May 12, 2013

I add a lot of these articles to my Pocket and read them later. I wonder how GA reflects that.

tonylemesmer · on May 12, 2013

Anything to do with preloading by Chrome?

ForFreedom · on May 12, 2013

16K is not the real number

ronaldx · on May 12, 2013

tl;dr