Hacker News new | past | comments | ask | show | jobs | submit login

How can they tell whether I've been buying cigarettes or carrots at the grocery store?

My credit card statement just shows a store name, timestamp and amount. They'd presumably have to be colluding with the grocery chains to get the sort of information mentioned in the article.




From TFA, emphasis added: "The company purchases the data from brokers who cull public records, store loyalty program transactions, and credit card purchases."

Store loyalty programs do track SKU-level purchases. There was a case years ago where a patron tripped and fell at a store, and filed a personal injury suit. The store pulled up that person's loyalty program records, noted that they'd been purchasing a larger than average amount of alcoholic beverages, and insinuated at trial that the patron might have been drunk.

Guess who won that case.


Who won that case?


The article seems to be taking things further than is necessarily possible right now, but while it might be hard to know precisely what you buy at a supermarket, it's a pretty fair bet that if you made a purchase at McDonalds you were buying junk food; if you made a purchase at an off-licence you were probably buying alcohol. And so on.

The data might not be perfectly accurate but it can leak a lot of probable information.

If you ran a machine learning algorithm on the data, without knowing how much anything cost and just wanted to correlate certain purchase amounts with whether people tended to get sick or not, you would probably find correlations with certain amounts that happen to correspond to things like cigarette purchases.

This is especially likely because people often tend to buy only cigarettes, or maybe a couple of other things, rather than only buying them along with a larger group of items that would tend to disguise the purchase.

It should be pretty easy to spot the difference in average prices between someone buying a packet of cigarettes, vs a bar of chocolate, vs a weeks' worth of shopping. It might not be super accurate for each data point, but given enough data it's likely some fairly consistent patterns will emerge.

In fact, one of the things about ML is that it's good at spotting all sorts of correlations. Those don't prove the existence of a causal link, but often that doesn't matter: the fact a correlation exists is enough. So simple things like buying patterns might be correlated with certain tendencies or risk factors, regardless of what the contents of the purchases actually are (of course this is purely hypothetical).


If it were me, and that was all I had to go on, I'd take the amount and work out the likely combinations of goods that you might buy with that amount, then use other purchases where the store was likely to be selling one thing to alter the probability assigned to various purchase combinations. You could also weight it by buying frequency, the likelihood of particular purchasing habits in their demographic - things like that.

Maybe you wouldn't be able to tell precisely what someone bought, but I imagine you could get a reasonable idea.

Not that I don't imagine - the article and rosser say as much - that they have other sources of info.


When the merchant submits the transaction to their credit card processor, depending on the processor, they can have the option of sending a list of items. I'm not sure why anyone does (maybe it'd be required for sales tax purposes, or for more complicated rate calculations), but for example PayPal Pro [1] offers the ability to send a list of items.

[1]: https://developer.paypal.com/webapps/developer/docs/api/#ite...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: