I rarely read even the articles with interesting titles that make it to HN's front page.
This is because when I did read more of them, they usually turned out to be a lot less interesting than the ensuing discussion on HN.
So now I use the HN discussion as a proxy for article quality. In the HN discussion I can often find a good summary of the article and get a sense of whether the article is likely to be worth reading or not.
Only maybe 1 out of 10 articles or less that I look interesting to me on HN wind up ones that I actually bother to click through. And of the ones I click through, only 1 out of 10 wind up deserving of being read rather than skimmed.
Some years back, there were a couple of "HN Full Feed" type RSS feeds, that would send the contents of the entire linked article, so I could read them without even bothering to go on to the web site.
I valued these services not only because they were more convenient in that it made clicking through and waiting for the aritcle to load no longer necessary, but also because there'd be less tracking of my interests this way.
I also have javascript disabled for 99% of the sites I visit, and am considering starting to use TOR for more of my browsing. It's really nobody's business what I'm reading, and it's a real pity the Internet wasn't built with more inherent privacy and anonymity features.
Agreed, I'm the same way, I probably haven't clicked an article on HN in weeks, I usually just read the comment sections of articles that have interesting titles.
Yet that discussion is littered with uninformed pot-shots at the topic. We can hope that more than 6% of the HN comments come from folks who read/understand the topic I guess.
Btw how did you accumulate your '1 in 10' stats? Off the cuff? Can you think of a way to measure this? Because I don't think 1 in 10 articles 'deserve to be read'. Somebody DID read them, took the time to post them here. So for some audience at least they were meaningful.
"Btw how did you accumulate your '1 in 10' stats? Off the cuff?"
Yep. Just a rough estimate based on my sense of how many articles I actually bother clicking through to. Having a more accurate estimate of my own click-through rate would not be valuable to me, so I never bothered to try to find out.
"Can you think of a way to measure this?"
If I was interested in gathering such stats for myself, I suppose I could use a browser add-on to measure my HN use.
Alternatively, HN could start using indirect links. But I'm not sure if I'd stay with HN if they started doing that. I hate being spied upon, which is one major reason I stopped using Google, and would probably drop HN as well if they started going down that road. Not that HN really needs to do that, since they already know which HN discussion pages I open and what I write (which are reasons for me to start making myself a bit more anonymous in my HN use).
"Because I don't think 1 in 10 articles 'deserve to be read'. Somebody DID read them, took the time to post them here. So for some audience at least they were meaningful."
It all depends on who the audience is, doesn't it? If you aim for the lowest common denominator, you'll probably get a bigger audience. This is a major reason for much of the mainstream media content being such utter garbage (from my perspective).
Also, just because someone clicked through on an HN link doesn't mean that they liked what they found when they got there. The same goes for tracking of people clicking on "Like" buttons or even sending links to their friends.
People could have all sorts of reasons for clicking "Like" buttons that have nothing to do with them enjoying or even reading the content. And I can't count the number of times I've forwarded unread articles to friends because I thought it might be something they might be interested in, but that I had no interest in myself.
These data are not being interpreted correctly. Analytics calculates time-on-page based on the time between loads of the google JS embedded in your pages. Any visitor who 'bounces' - that is, only visits a single page - only loads the JS once, so their time-on-page is recorded as 0 seconds, regardless of how long they actually spent on the page.
I`ve checked how it works with "Real-Time" option of the Google Analytics. When i closed a tab Google Analytics knew that i stopped reading the page (the on-line user counter was updated -1). Maybe it works different with a standard report (i didn`t read the code), but maybe not? They could use javascript browser events detection and i don`t think such a bug would remain unseen.
Are you absolutely sure this is how it works? I have a 1 page website where the average time is 1:36, shouldn't it stay at 0 since the users are not clicking to other pageS?
Maybe I'm misinterpreting something, but the way I understand it, the last (or only) page of a website is handled differently. If it's a single page website with a demo download, and someone downloads the demo, then the time between pageload and the download would be counted. If there's a Flash movie, the time between loading the page and playing the movie would be counted. It also looks like you can come up with your own custom events to track, maybe like clicking an outgoing link?
I've never seriously used GA so take my interpretation with a basketball-sized grain of salt.
Before changing the site to a multiple page site, I was averaging 1.2 pages/visit. Again with a 1 page site, but I would develop on the server so I figured GA was picking that up. Since changing to a multiple page site my pages/vist is 2.55 and my average time on site has increased by like 20 seconds, which seems a lot. Maybe people are on 56k, haha.
But does anyone actually know? I can see from this thread that plenty of people thought they knew, but it seems nearly half thought it worked one way, and nearly half thought it worked the other way.
I apologise for my naive misunderstanding of the Analytics metric. I've updated the post to point this out at the top of my post. Thanks for the explanation!
Exactly what I came here to say. Those people could have spent 10 hours really absorbing that one article and their time would have still been recorded as 0 seconds.
There seems to be a lot of confusion about how GA tracks user engagement, which is understandable as even the Support article linked in another comment doesn't accurately explain what happens with single page visits.
First off, the metric by definition will always be skewed lower than reality. For multi-page visits, GA takes the time of the first hit and time of the second hit to calculate time on page (and will chain these together to get time on site). Since the page the user leaves on doesn't have a "second hit", that time is never included.[1]
For single page visits, as blog posts tend to be, the calculation is slightly different.[1]
Time on Page = (time of last “engagement hit” on page) – (time of first hit from page)
If you set up Event Tracking to trigger as a user scrolls to predetermined lengths of your article, it'll trigger these 'engagement hits' and give you a better approximation of time on site. If you just throw in a standard tracking code that fires off a _trackPageview() event on page load, then GA will never see a second engagement and will not be able to calculate any approximation of time on page/site, so it'll default into the "less than 10 seconds" bucket. Depending on what blogging platform you're using, there are some add-ins that provide such functionality.[2]
Analytics calculates the time on page by the time difference _between_ page hits. One hit: 0 seconds on site. Because of this, it isn't an accurate metric to measure engagement for a single blog post.
Most of long articles are just a waste of time, the actual information can be condensed into a single paragraph or less and the rest is just redundancy or useless information.
As other commenters have mentioned, 0 seconds doesn't mean anything. Meanwhile, those 18 visitors who spent more than 1800+ seconds on your page? Probably they just opened the page in a new tab and only got around to reading it a few hours (or days) later. So data at both extremes are useless.
If we ignore the 0-second anomaly, it looks like we've got a nice bell curve peaking between 180-600 seconds, probably closer to 180 than to 600. That sounds about right for a 670-word article.
While I didn't track engagement time, I looked at number of comments (both here and on my posts, and shares on Twitter and Facebook) to try and figure out how much of it was "real" traffic.
3920 visits
00:00:14 average visit duration
99.1% less than 10 seconds
Somewhat depressing...
(edit: according to lenazegher's comment the average visit duration stats might not be as bad as they look since my bounce rate was pretty high, ~95% on both posts)
Interesting. I posted a link to my shithole of a site in a comment a hundred deep in a post that had reached the top of the frontpage and had fallen to pretty much the bottom when I posted in the thread.
I got a extra 300 visitors that day and about 50 the next. The average visitors per day is around 25 so this is a big and noticeable spike.
I guess I am kinda shocked a random link in the middle of a dying thread generated that much traffic while something hitting the frontpage only generated about 53 times more traffic.
You should put your site in your profile so you can see how many people come after you post an offhand reference to the fact that you may have once written something interesting elsewhere.
Well I thought it was interesting that my shitty comment that linked to something I wrote got so many hits while getting on the frontpage got so little in comparison.
I could get your angst if I had linked to it again or actually linked to it in my profile. But I did neither. So now I just think you are a dick.
I had a Show HN that made no headway on the frontpage (like 3 karma then it disappeared into the depths), but I posted that same link, relevantly, as a comment somewhere in the middle of the thread and it absolutely blew up. It's now my highest starred repo on Github and I started the phone interview process with Hulu because of it. Don't underestimate the comments here. A TON of people browse these posts too.
This discussion highlights exactly why I don't consider GA a very useful tool - there is no real transparency as to what and how data is collected/measured/filtered [That I've been able to find, anyway].
So in the end, the only useful information you get from GA data, is the rate of change (which is useful for many things) -- but not, for instance, the actual number of visits to your pages -- because you have no idea what is counted and what isn't -- and what is considered a visit.
If the article is long, I usually add it to instapaper or pocket to read later. The time spent on the actual site is low but I still engage with the content.
I was just discussing this with a friend today. We both had front page stories on HN recently. He reported 6k of 6.8k[1] leaving within 10 seconds and I saw that 11k of 13.5k[2] left within 10 seconds.
If these numbers aren't accurate due to Google Analytics, I'd be interested to know a way to get the accurate numbers.
The other annoying thing was that, HTTPS never sends referrers. Hence, not a single one of my visits said it was from Hacker News.
I know, you don't want to leak the referrer in most circumstances when it's HTTPS, but it just seems so vital. The Internet was made and understood by referrals and links, lacking an ability to see referrers seems quite unfortunate, especially if all the Internet ends up HTTPS.
Google and Facebook are the only ones who would be able to stitch together significant portions of referral traffic due to Google Analytics or Facebook Like / Connect.
Everyone else is just left stumbling around blind.
I was just discussing this with a friend today. We both had front page stories on HN recently. He reported 6k of 6.8k[1] leaving within 10 seconds and I saw that 11k of 13.5k[2] left within 10 seconds.
This is probably minority behavior, but I will often use Instapaper to bookmark articles for later, and then read a batch together. For most in-depth articles, I probably spend less than ten seconds decide whether I should click "Read Later" and then leaving, even though I do in fact read later.
I did consider the possibility of Instapaper, Readability or other similar apps, but as you say I couldn't imagine they'd be the majority, even on the relatively tech savvy Hacker News.
As an example of alternate HN clients, hckrnews.com had 124 referrals, ihackernews.com had 85, HackerWeb had 77 and PulseWeb had 48.
That's a grand total of around 300 out of 13.5k.
I'd imagine Readability and Instapaper to be big but probably only some small multiple of that at best.
> As an example of alternate HN clients, hckrnews.com had 124 referrals, ihackernews.com had 85, HackerWeb had 77 and PulseWeb had 48. That's a grand total of around 300 out of 13.5k.
I see similar proportions for submissions of my pages as well.
To get an accurate number of the duration of a visit you'd have to make a ajax request every second or so, or somehow make a request whenever the user closes the tab. Also, you need to measure if the tab loses focus. To get even more accurate you could store the amount of scroll the user has done over the duration, to see if the visitor actually read to the bottom. These ways are seen as quite invasive just to get the duration of a visit. I know that e.g. woopra makes a request every 10 seconds with different metrics, so they would at least get better accuracy than google analytics.
Understood. Other services do accurately measure this however. ChartBeat[1] tries to differentiate between users who are reading, writing and so on. Re: your example, it does literally keep track of how far you scrolled as well.
Unfortunately, my personal website is not their target market. Plans start at $9.95 per month: infinitely more than my hosting cost and also overkill for whenever I'm not front paged. It'd only really make sense for me if it had pricing based on usage, which I can almost guarantee they'll never launch.
I'd argue that Google Analytics is vague in defining or explaining their terms. In a perfect world, yes, they'd offer "pings" to see if a user is active, but even without that I'd at least hope they clarify their terms.
Misunderstood terms lead to misunderstood statistics.
The less misunderstood statistics, the better.
Here's a bit of data: last week my page http://www.gwern.net/Google%20shutdowns was submitted to Hacker News and hit the front page for a while, racking up thousands of visitors. As it happens, I was running an A/B test on fonts, where a JavaScript timer sleeps 40 seconds and then fires, telling Google Analytics that a reader has 'converted'. (This hopefully avoids the bouncing distortion of the 'time on page' metric.) So, what percentage of readers stayed on the page long enough for the timer to fire after 40 seconds? (The Markdown source is somewhere around 12k words, so it's not the quickest read in the world.)
Then they won't be counted either in the page load (Analytics was never run) or conversion figures (both Analytics and the conversion trigger will never run).
If I read your page on my desktop, JavaScript was off - unless I had forgotten to revoke temporary permissions after whitelidting Google analytics on another page.
Which illustrates that Google analytics reports something, but what it reports is what it reports. To put it another way, Google Analytics records information useful to Google. What it reports back to the datapoint is designed to appear useful to the datapoint. The purpose of the information provided is solely to encourage the datapoint to keep using Google Analytics so that Google can keep using the datapoint's website to track people on the internet.
For me (echoing some of what gnosis has stated), the real value that HN brings is the discussion. Frequently, I find the comments, insights, opinions, and tangents elicited by HN submissions to be more interesting and thought-provoking than the submissions themselves. I typically browse through the discussion a good bit before ever clicking through to the article/site which initially drove the discussion. There are plenty of "show-and-tell" mechanisms on the Web. What sets HN apart, in my mind, is the round-table that develops in response to a lot of those submissions.
Like has been mentioned by many, this is incorrect interpretation... and quite common to see on blogs or single page sites. Although technically it's not dependent pageviews, but interactions (so pageviews or events).
So, one common way to handle this on blogs is to use setTimeout in conjunction with an event. Basically you fire an event after 15 or so sections which will then count as an interaction.
As another data point, I recently had a 60% read rate on a post that was on HN's front page for a while.
It was a post on medium.com about Pac-Man. Medium tracks number of views and number of reads per day. I think they use a metric that's not just time-on-page to differentiate between views and reads. My post had about 26k views and about 15k reads the day it was on the front page.
Paul, you may want to compare raw web server logs to the numbers you get from GA. I wouldn't be surprised if there's a big discrepancy, especially when there's HN in the mix. Moreover, those who are nerdy enough to surf with tracking scripts blocked might be the ones who actually read the article ;)
I tend to quickly go through the front page and bookmark the articles that I'm interested in. Then, when I have the time and I'm on my tablet - I read them.
It may be that I'm not the only person that does this kinda thing.
This is because when I did read more of them, they usually turned out to be a lot less interesting than the ensuing discussion on HN.
So now I use the HN discussion as a proxy for article quality. In the HN discussion I can often find a good summary of the article and get a sense of whether the article is likely to be worth reading or not.
Only maybe 1 out of 10 articles or less that I look interesting to me on HN wind up ones that I actually bother to click through. And of the ones I click through, only 1 out of 10 wind up deserving of being read rather than skimmed.
Some years back, there were a couple of "HN Full Feed" type RSS feeds, that would send the contents of the entire linked article, so I could read them without even bothering to go on to the web site.
I valued these services not only because they were more convenient in that it made clicking through and waiting for the aritcle to load no longer necessary, but also because there'd be less tracking of my interests this way.
I also have javascript disabled for 99% of the sites I visit, and am considering starting to use TOR for more of my browsing. It's really nobody's business what I'm reading, and it's a real pity the Internet wasn't built with more inherent privacy and anonymity features.