Is preventing browser fingerprinting a lost cause? (2012) [pdf]

rblatz · on Dec 18, 2015

When the EFF's Panopticlick says my iPhone running the latest version of iOS is basically unique I start to get suspicious. There are millions of phones just like mine out there.

saurik · on Dec 18, 2015

I just tried it out, and 1) their definition of "nearly unique" was "1 in 2068276.0 browsers", which I agree is not good but also seems to be a non-intuitive rating scale, and 2) I swear from looking at the results that they are failing to take into account cross-variable correlations (but have not dug into this enough to be sure, so don't quote me on this or anything: do your own research), by which I mean that each entropy source is considered in isolation, even though my screen size and user agent (which tells you I have an iPhone running iOS 8.4) should massively discount the screen size statistic (which is effectively just telling you I have an iPhone 6), as well as my "browser plugin details" (as they are likely the same for every single iPhone), and probably to some lesser extent even my time zone (I live in California, where I would imagine having a semi-recent iPhone would be more common, though I could see an argument that my using an out-of-date firmware is less common ;P). FWIW, if this is not yet including any knowledge of my IP address, then one in millions is brutal, as I am probably one of only a small handful of people in Isla Vista that has this specific iPhone configuration.

jrapdx3 · on Dec 19, 2015

It is disconcerting to see the browser "fingerprint" and realizing it's very difficult to significantly reduce its "uniqueness" based on all the factors listed.

I guess that's the point, each element in the list contributes to the uniqueness "profile". I don't think it matters that certain elements just add up to identifying an iPhone 6 or whatever we're using.

I gather the profile is strictly by the numbers, here tallied as bits of entropy. The statistical inference of uniqueness reflects the ability to use the indexed bits as a pattern to identify the particular browser and presumably the individual user (if not by the user's name).

It's one case where using common, generic components offers the greatest anonymity. I'd guess the HN crowd is more likely than average to go for exotic or bleeding edge configurations which no doubt will have the most distinct fingerprints of all.

saurik · on Dec 19, 2015

Yes, I understand that; my point is that if "one in ten people are using a device with this exact screen resolution" and "one in a thousand people have this user agent" then it is not true that "one in ten thousand people have a device like this one" if "half the people with this user agent have this exact screen resolution": it is only then true that "one in five thousand people have a device like this one". If everyone with that user agent has that screen resolution, the result would just be "one in a thousand people have a device like this one".

I am concerned that they are calculating entropy incorrectly as I feel very confident that there are pretty obvious correlations between many of these variables: given my user agent you can guess my screen resolution with a probability much better than random chance (as you know I have an iPhone, and there are only so many screen resolutions for an iPhone, and some of these iPhone configurations are more common than others in the wild) and it might be the case that you can guess my "plugin configuration" with 100% accuracy.

jrapdx3 · on Dec 19, 2015

OK, IIUC I'd agree that the "entropy" or uniqueness calculation would be misleading if in fact all user agents with a particular string identifier had the same screen resolution. IOW in that case it's redundant information, and therefore would represent only 10 bits of entropy vs. 13.

However I imagine keeping track of all correlations of this type would be a big burden for the identifying algorithm. Perhaps it's more practical to sacrifice some accuracy for the sake of expediency.

If the goal is determining the probability that this browser is the same browser that previously connected to my server, the estimate just has to be good enough. Additional refinement may simply not have a big enough payoff to make it worth the trouble.

Just my view from the sideline...

saurik · on Dec 19, 2015

The "goal" of this website is to educate people about browser fingerprinting, not to provide a practical means for you to fingerprint people on your website. I do not think it is OK if their algorithm is flawed (and I stress "if", as I wanted to open a discussion about whether or not their algorithm is flawed, and was somewhat shocked to end up in an argument that somehow it doesn't matter that the algorithm is flawed ;P), as then they would essentially be using pseudo-science to scare people into believing something; and even if the fear is correct, I'm really a strong believer that the knowledge leading to the fear needs to be accurate or you end up with people making really dumb decisions either at a personal or federal level to try to mitigate the misunderstood problem.

jrapdx3 · on Dec 19, 2015

I sure didn't intend to be argumentative, and as I said, I do think you are right, the fingerprinting example is flawed just as you point out.

I was only considering the calculations in the abstract, as representing how such algorithms might be used by a real entity attempting to track its users. I was putting myself in the place of the tracker and imagining how I'd deal with the issues you brought up. I certainly do not and never have done any such thing on the servers I run in real life.

A year or so ago when I was first aware of the Panopticlick site, I was impressed with how easy it was to more or less uniquely identify my browser. I tried numerous ways to make it less unique. Best I could do at the time was reduce the "entropy" from 22 to 15 or so. Even though I knew these results were hypothetical, it was still educational.

So the Panopticlick site makes its point even if it's not 100% accurate. How much it scares people is hard to say, but probably not a lot. Bottom line is I've been aware that fingerprinting could be happening, but also not a whole lot to do to reduce my browser's uniqueness. Obviously, it hasn't stopped me from getting on the web anyway.

cpeterso · on Dec 19, 2015

If I'm reading it correctly, a recent Microsoft study of Bing and Hotmail traffic showed 80% of users could be uniquely identified using passive, server-side fingerprinting of just client IP address and User-Agent header.

http://research.microsoft.com/pubs/156901/ndss2012.pdf

"User-agent strings combined with IP addresses have an entropy of 20.29 bits—higher than that of browser plug-ins, screen resolution, timezone, and system fonts combined."

compactmani · on Dec 19, 2015

Or said more succinctly, the matrix of users by features is low rank. Which is likely the case.

PadEdKedpo · on Dec 18, 2015

There are many reasons why your seemingly generic hardware has a unique fingerprint. Without access to your results page I couldn't tell you exactly why, but there are a lot of things that can be used to fingerprint your device, including browser configuration, how your gpu handles vectors (same model chips, different results), injecting supercookies, or possibly your browser is leaking other unique identifiers.

If you used tor browser on panopticlick you'll see it report your browser to be fairly generic.

jandrese · on Dec 18, 2015

They do look at hashes of specific system objects (GL Context, canvas) and I'm curious how immutable those are on a particular device. It could be that they're accidentally including some bits from /dev/random and calling everybody unique. It doesn't necessary translate into trackable.

semi-extrinsic · on Dec 18, 2015

If you look at the detailed breakdown, those are not counted towards the fingerprint.

But I think there is a big inconsistency in their data. My Moto G3 is identified as "1 in 3 million browsers have this value" based on just the user agent (which does not contain anything region specific). Since more than 0.5 million Moto G3s were sold just in India in the first 2 months after release [1], their numbers imply there is more than 100 unique browsers for every living human.

[1] http://www.bgr.in/news/motorola-sold-over-5-lakh-units-of-mo...

livingparadox · on Dec 19, 2015

Running the fingerprinting test on my normal browser and then incognito implies that its at least consistent within a small time window.

gleb · on Dec 18, 2015

I agree, seems like BS at least for iPhones. I think that tool may incorrectly assume that all factors it tests for are uncorrelated, e.g. Screen size vs browser user agent.

Zikes · on Dec 18, 2015

Millions of iPhones out there, thousands in your area, hundreds that are the same model/screen size, and so on.

CyberDildonics · on Dec 19, 2015

Probably not as many with the same IP.

mnot · on Dec 18, 2015

For the current thinking about this in the W3C (or at least in the TAG), see: http://www.w3.org/2001/tag/doc/unsanctioned-tracking/

There's also a document being put together by the Privacy Interest Group: https://w3c.github.io/fingerprinting-guidance/

schoen · on Dec 18, 2015

Does anyone know who the author of this presentation is? It doesn't seem to be credited to anybody within the presentation itself.

dsp1234 · on Dec 18, 2015

It appears to be Brad Hill[0] according to https://www.w3.org/2002/09/wbs/1/tpac2012followup/results

[0] - https://twitter.com/hillbrad?lang=en

finid · on Dec 18, 2015

Based on test results using Panopticlick given at http://linuxbsdos.com/2015/12/18/trying-to-prevent-browser-f..., it appears so, though using Tor Browser Hardened tilts the odds in your favor slightly

taf2 · on Dec 19, 2015

An interesting topic I think it's easy to forget is that at least with browsers we have the option to study and understand how the browser can be used to track us. For all the pro app crowd the tracking capabilities of a native app is not only far superior it also requires significantly more work to understand. It is also not a matter of trying to understand 3 or 4 browsers but rather a near infinite number of apps...

CyberDildonics · on Dec 19, 2015

Has anyone made a VM specifically for this? I would think you could randomly clock up and down the VM speed to get around the profiling fingerprinting.

alfiedotwtf · on Dec 19, 2015

With CSS media queries, can we finally get rid of the user agent string?

greggman · on Dec 19, 2015

No, because too many features are still not easily discover-able. My latest, whether or not the Web Audio API can analyse streaming audio data. Currently iOS WebKit and Android Chrome can not but there is no easy way to detect that :(

alfiedotwtf · on Dec 19, 2015

How about falling back to Javascript to detect?