Hacker News new | past | comments | ask | show | jobs | submit login
Vertical vs. grid product listings - a surprising AB test (westiseast.co.uk)
131 points by westiseast on Nov 11, 2011 | hide | past | favorite | 57 comments



If you're going to A/B test something, you need to reduce the variables down to one. For all you know, the supposedly 'pretty' framing and box-shadow you added to the grid layout may actually be what's causing customers to prefer the flat list.

Testing multiple variables in a given trial makes it impossible to discover which change (or combination thereof) actually improved the conversion rate.


The frame and box shadow may also slow the site down. Amazon found that a 100 ms increase in page load time reduces sales by 1%.


Yeah, it's a blunt tool. The next planned test is to 'prettify' the vertical listing and see what happens.


That was my first reaction to this test as well. It's not an A/B test and the conclusions are pretty much worthless.

From experience though, an 'uglier' site getting more clicks is actually fairly common. I've seen the results on 100,000's of visitors across a multitude of landers. The rationalization that seems most common is that people click to 'get away' or get to whatever they want.


Well, they could have done a multivariate test if there was enough traffic to make it statistically significant and tested all combos, like GWO allows.


The author made one glaring mistake: basing "performance" on click through rates. What really matters on an ecommerce site is revenue.

So which format resulted in a higher revenue? You'd be silly to make the assumption that greater click through = greater revenue.


"You'd be silly to make the assumption that greater click through = greater revenue."

Performance is based upon how well the conversion funnel performs. Revenue of course matters, but the way you get that revenue is to get people to go through the conversion funnel. What this A/B test shows, though crudely, is one optimization of one portion of the funnel. When you get more people to click on, say, a product page, you're "widening the funnel" at that particular point.

It's entirely possible to widen the funnel higher up, and still have the same amount of revenue; however, this doesn't mean that it was a useless optimization. Whenever you widen the funnel, you get the potential for higher revenue, especially with further optimizations. That makes for an increase in performance.

Lastly, widening the funnel at any point also means that you are getting increased page views, increased product views, and increased stickiness. That is _always_ a good thing, regardless of whether you increase your revenue or not. It means that people are more likely to interact, and buy from, your site in the future-- users are naturally more likely to go to a site that they know. So, greater click throughs to a product page, though you may not immediately see an effect, will almost always lead to greater revenue.


A wider funnel isn't always better. The thing that should be optimised is volume multiplied by some measure of "lead quality" (and total money spent is probably a good approximation for a business to use).

For example, would you rather have 1000 people through the funnel with a 1% average likelihood of purchasing (for an expected 10 sales) or 100 people through the funnel with a 50% average likelihood of purchasing (for an expected 50 sales).

It may well be that you can get more people through the funnel but it doesn't make it a good optimisation if the people you are enticing are unlikely to buy anything from you.

Real world example: Apple products. Apple specifically target the top end of consumers (narrow funnel) who are happy to pay a little bit more, whereas other producers go for the mass market (wider funnel) sales at a lower margin.


Yep, 100% correct.

Although in this case, my justification is - the 'distance' between this product listing and an actual purchase is maybe 4 steps total (ie. view product, add to basket, give details, pay), and there are decreasing response percentages at each step. If I tried to measure actual sales with my current tool the test would take much much longer to conclude. Rest assured, I've got AB tests running elsewhere in the process too :)


But this makes your entire article nonsense, do you not get that?

People could have been clicking through more because one was better at conveying information than the other, so they didn't need to click through to make a purchasing decision. Maybe the list meant that they ignored the text?

What if the end result is you still only sold 20 boxes of tea for each? Your assumption would then be wrong, the grid would be better because it's serving less page views.

What if because the consumer had to click around more you actually had less sales? They took too long to find what they wanted?


You're sort of touching on two of the important metrics; product views per session, and page views per product view. You want to maximize the former, and minimize the latter. Why would you want to maximize the former? Stickiness. Your customer will be more likely to stay around and purchase, whether it's now, or later, if they see a lot of different products-- even if they aren't finding what they're looking for (which, btw, is impossible to divine from this metric).

"What if because the consumer had to click around more you actually had less sales?"

That's very difficult to measure-- you can't assume that a customer at the browse portion of the funnel is able to find what they're looking for or not, just by looking at the number. You might want to look at the number of add-to-cart actions per product view for a better answer to that question.


one of these listing styles performed 15% better than the other in generating clicks on actual products. One of them results in 40% click through, while the other one generates only a quarter

That isn't 15% better. That's a 60% increase in click-through rate. That's massive.

We had already planned to remove a grid display from our app and replace it with a linear flow, but this seals the deal. Thanks!


why not AB test yours and share the results? My results are accurate and significant (statistically) but I don't know under what circumstances they are transferable. I have a hunch that I get these results because it's a small inventory of products, but with a large inventory, maybe it works differently.


Our app is not live to the public yet. Otherwise we would.


cool - well when it is live, I'd love to see it, and AB test results!


I assumed list would do better.

There's basic underlying usability rules that permeate through everything.

The human eye scans like an F.

It's almost the golden rule of design. Always build your layouts in an F structure for the most important items.

A list design goes hand in hand with complimenting the F average.


You articulated my point better than I could have.

I too guessed and have always believed vertical listings are "more natural" way of consuming information than the grid listings. Grids are too overwhelming to digest any of the information and make a decision. They are probably good to browse sometimes.

I also wanted to say that a common perception that prettier is better is rarely true. Google is a great example and so was Plentyoffish. Utility and usability pretty much make the biggest difference and can make up for lack of pretty design. Infact sometimes pretty design is often not the most usable design.


There is also a cultural element to consider. Some regions, like Japan, prefer high-information density, cluttered UIs.

A prime example is yahoo.com VS yahoo.co.jp

White space is an important tool in controlling that density.


I don't think Yahoo Japan is any more dense or cluttered than regular Yahoo. Maybe it's just the unfamiliar characters giving that appearance?



Interesting - have you got any external links you can post about this 'F' behaviour? I'd be interesting in reading them!



Can someone let the guys over at The Verge(http://www.theverge.com/) know? That layout is driving me nuts. Which is a shame because they have some talented writers.


Wow. I wasn't completely convinced about a grid being an unnatural layout to scan until I visited that site. That's amazing.



I'd imagine this depends a lot on what you are displaying. In this case there is no significant visual difference between the products, and you have to read the text to find out what it actually is. Then I can see having the text in a list would be more natural.

In cases where there is enough visual difference between the products so that they can be distinguished between without any text, then I think a grid is probably a better layout since you can compare more products at a time. So while the results are certainly interesting, I don't think I would try to generalize them too far.


I think it depends.

I'm used to looking at very visual samples in a grid format because it allows for more large images to be shown in a row. Things like furniture, clothing, templates... For things that require some text descriptions (such as tech specs), I think a list view is superior.


It's a bit hard to assess the significance of this. Sample size isn't even given. 40 people went through this test, or 4 million?


sorry, yes you're right. The sample size was not huge (@500 people viewed each variant) although the result was statistically significant. I just added a note to the post about this. My aim is to repeat the test at a later date to see if the results repeat themselves.


Correct me if I am wrong, but isn't statistical significance a function of sample size?


Yes-- sample size, and effect size. He's saying that the results were strong enough that given the sample size, they were statistically significant.

Simple example: if I have a study of 20,000 people and I find that 51% of people prefer foos to bars, that may be similarly significant to your study of 20 people, all of which prefer foos to bars.


You are right.

The author should plug the numbers in here: http://www.usereffect.com/split-test-calculator

I'd go as far as to consider the data to be worthless without the confidence value


If I've understood it correctly, the smaller your sample size, the bigger the difference you need between response rates. So if the difference is 60%, then a sample of 30 people can be statistically significant. But if you're looking for a 5% improvement on something, you'll need to sample a large number of people.

I just used this:

http://www.prconline.com/education/tools/statsignificance/in...


Not surprising at all. Conventional layout wisdom says give the eye a clear path in a direction it knows.

A multi-dimensional grid takes you in a Z shape, whereas a vertical lineup's images and text are all aligned and easier to distinguish.


I guess it was surprising for a bozo like me :)

I think I'd overestimated "pretty" and underestimated "functional" in this case, so the results were a wakeup.


I was surprised that the author was assuming everyone would guess the grid view would do better. It seems completely obvious to me that a list view is better.


I am surprised you would assume that everyone would think like you.


The ranking may have also a role in this: ranking on a grid is usually presented left-to-right top-to-bottom, but that's not the order the eye scans (while in a vertical layout the top-to-bottom order is natural).

This paper from Yahoo Research contains an interesting analysis of the problem:

Optimizing Two-Dimensional Search Results Presentation

http://www.chierichetti.name/papers/optimizing2d.pdf?attredi...


The test doesn't control for all the variables to make the conclusion that verticals are better than grids. The grid elements having borders and shadows may hurt it. Hard to tell but pics in the grid look smaller than in vertical. The spacing between elements in grid is tighter than between elements in vertical.

I'd still expect vertical to do better even if the grid was equally well executed, but reality beats expectations.


It's interesting, I was considering moving to a grid. I think with the test conducted here, the images in the vertical listing are a bit bigger than those in the grid. Would be interesting to test the grid vs a more compact product listing.

All a function of how many products you have really, ideally your sending people to pages that are focused enough that they don't need to trawl through many products.


Color me unsurprised. The grid calls for bi-dimensional navigation while the list is mono-dimensional, less brain cycles burnt to go through it.


exactly, I find navigating a vertical list much less 'stressful' than trying to understand the ordering of a matrix while trying to digest the actual product info.

Also the eye is attracted first to the product images then moves naturally to the right to read the description, with the matrix the eye has to move less naturally downward to read the related product info.


This also has to do with the audience. Just because an A/B test measured results on a listing of tea products doesn't mean much. Consider how Amazon displays their product: http://i.mking.me/3t3e2E3z2G1Y1X0T2633 I can imagine Amazon has ton a bit of A/B testing...


Audience, and also variance of product. The product images shown in the test are extremely similar, reducing the effectiveness of any arrangement that emphasizes visuals over details.

Amazon grids don't always use images that are quite as effective at indicating discrete products as the book-cover screenshot you've linked (just try browsing through headphones), but for a surprisingly sizable portion of their catalogue image-oriented browsing is both effective and enjoyable.

People like looking at pictures - but showing effectively the same picture over and over again for different products makes me (as a user) feel like I'm wasting time at best, mislead at worst ("the items in this list of unique things can't possibly all look the same in person, can they?")


I think it really depends on how many products you need to display. If you have 200 products in a category, I think you'd want the user to be able to scan very quickly through all of them (via a grid), whereas if there are only a few, you'd want them to consider each product individually (vertical listing).


That's my feeling too - perhaps it's also a function of how many products you think might be suitable for a person. eg. if I'm looking at iconfinder.com I like 100 to a page, in a grid, so I can scan the icons really fast, little scrolling. I'm going to eliminate 99% of the icons very quickly.

But with teas, it's perhaps not so obvious, so the vertical listing seems to work better to allow people to choose.


This is great, I've been having a debate with a friend about this for a while now. This settles it (in my favor).


Though you should really AB test that ;)


Testimonial here, bought teas from Chris earlier this year, the products and the service are both very good indeed.

If your experience of Chinese tea is getting a pot of Jasmine in a restaurant, try these, it will be a revelation.


Well, Amazon uses the list-view in their search results. Never noticed this but now I assume that if they do it then it's been tested and deemed better.


Not exactly. Amazon uses list view if you do not select a department for your search. As far as I can tell, they use grid view if you do select a department. There must be something to that. I'd love to learn more about this if anybody has done extensive testing here.

Edit: It looks like there may be exceptions to the grid view within departments (see "Amazon Instant Video", "Books", and "Video Games").


Have you A/B tested the retention rate of those users? I'd be interested in seeing if you kept more users with the prettier layout.


No - it was a limitation of the test design, but it strictly measured just click-through performance for one step of the sales funnel. Other (equally important) considerations measuring quality of the interaction weren't measured. There's pros and cons of setting up the test this way, but I think that could be discussed for years :)


This is part of the reason why people use Tumblr over Pintrest.


Plus an extra four years of bake time doesn't hurt Tumblr either.


Nice insight Chris, hope things are going well in China!


Are there any other studies on this?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: