Hacker News new | past | comments | ask | show | jobs | submit login

So what primary key does WideAngle use to track users across sessions? It mentions anonymised IP? Isn't that what Google do?

You mention you store anonymised IP's "Unlike some other vendors, our anonymization process is not reversible.", what is the methodology here?




Since most people are still on ipv4, does this even mean anything? You'd need the salt stored in some way to reproduce hashes at all, and creating 4 billion hashes to find an ip won't take any meaningful amount of time. Even with a high cost algorithm, if the government requires finding the ip (because honestly Google wouldn't care here, the unique identifier is what they need), they'll be able to find it. If it's a truly irreversible hash, it would also be impossible too link up two separate requests no?


Guessing IP would be unpractical. Absolutely. But without random component, it could be "reversed". For example, I would like to retroactively check when and where you, ApolloFortyNine visited my site. All I would need to get is your IP (residential IPs change, but not that often) and User-Agent. I could replicate hashing algorithm and identify your traffic.

The random component prevents that. And yes, there is a trust component. You have to trust that we discard these salts after 24h. We operate in Germany in a legal framework that allows you to sue us if we mislead you. So at a certain point, technology must make place for the legal system.

Because salt is rotated every few hours, never more than 24h, we can, with sufficient probability, determine that two requests are from the same visit/session. So have indication of new/unique visit in short window. Not days, but hours.

If you were to transmit a parameter that additionally attached Personal Data (email, User ID) to that session, then that becomes identifiable and is no longer anonymous. But that is strictly AT YOUR DISCRETION. And we NEVER share it with anyone but you. You will also need to inform your guest, that you associate personal data and ask for consent. But until you do, we cannot identify anyone after the salts cycle.


Randomized, daily rotated un-guessable component is added to every hash. There is whole bucket of these such that across single day, per group of users there is small overlap. These are transient, strictly never logged. After 24h there is no way for us to reverse the IP. To reverse the IP we would need this transient value (long gone by that point), the EXACT user agent and the IP itself.

We mentioned "Unlike some other vendors" because we noticed that not everyone is (or was, at the time of our research) adding a random component. Without that component, salt if you like, you cannot guess the IP, but knowing the user IP and agent, you could find their historical traffic, hence attribute the traffic to an individual.

Our solution can't do it.

This practice has been used and documented in software engineering for now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: