Serious question: what is the value of web analytics for people?
I run a SaaS business and I dropped Google Analytics a long, long time ago. Primarily because of the tracking, but also because I really couldn't see the value of the data.
In the old days, you could at least use the "Referer" (sic) header to know where people came from and what they searched for. But that is long gone, and the only source of that data is Google/Bing search console.
Page visits are a vanity metric: they tell me nothing about my business. The only thing that actually matters for a SaaS are signups and MRR. Measuring your business by page views is like measuring the business performance of a Walmart by counting cars on the freeway nearby. Yes, the numbers are somewhat related, but you can't draw any conclusions.
I made it a point not to include any third-party JavaScript on my site, but even if I were to make an exception for these analytics, I can't really see the point, unless you are running an ad-driven site where pageviews are king.
This seems contrarian just for contrarian sake, given how much literature there is about this, and the fact that it's almost self evident. Tracking impact of your changes, seeing if your users are getting lost after changing something, understanding where they spend the most time, etc.
Say for example, if all your users start spending 30% more time in your reset password page after you pushed out some changes. How would you know? What could be causes of that? Could something be broken with the login? Apply this to everything.
Not having analytics is literally not caring about what they do in your product, so you're either never changing the product and 100% confident it'll always work, or you're probably giving them a worse experience than you could.
How you do this tracking is another story, but there's ethical ways to do it.
> Tracking impact of your changes, seeing if your users are getting lost after changing something [...]
The change of adding obnoxious tracking of course accounts for some user loss itself, which it cannot measure.
On some of those "modern" websites, that show me a whitescreen without JS, I check my uBlockOrigin and see the ___domain of that website and some Google shit? Tab closed. No thank you, I will go elsewhere.
normal people will not see the tracking. It's when laws force the cookie banners that it starts to become an item in people's minds, because that cookie banner is annoying.
Laws don't force the cookie banners, laws force requiring consent for personalised tracking. Banners as we know them are malicious compliance. There's a difference.
I'm a bit confused. You're claiming what I'm saying is false, but you're just referring to someone advising something as a precaution? Do you have a primary source for a legislation mandating cookie banners? (Also, is there a cookie banner on apple.com?)
There is no "disproportionate punishment" under GDPR in practice, unless you're doing something egregious, and even then (see Facebook). I'm very familiar with the UK regulator, they publish their enforcement actions [1]. I'm not aware of a single case of a cautionary letter, much less "disproportionate punishment", that they sent over a cookie banner on its own. Are you?
Besides, you correctly hinted at the incentive structure. Your lawyer might advise you to slap a cookie banner just because because they have zero incentive not to, they don't care about your users' experience. You might care though. Personally I consulted multiple external DPOs and lawyers, as well as primary sources, before forming my opinion.
I take my legal advice from lawyers, not the internet. They are the ones defending us in court if need come.
Their position was simple: my team uses 3rd party analytics tools (no ads or anything) so IPs will be passed and cookies will be stored. We don’t control them, we don’t know what kind, if they can be considered personal info or not (GDPR is intentionally vague - classic bad law). So we need to be extra careful since our regulator is not a sane one like the UK’s. Thus: follow the common practice - cookie banner. End of story.
> We don’t control them, we don’t know what kind, if they can be considered personal info or not
If I were you, I'd consider changing my lawyers. This is explicitly forbidden by GDPR (art 28), you have to know what your contracted data processors are doing, and you have to have processes in place to assure data subjects rights (eg remove their data from your contracted third parties on request). Cookie banners have nothing to do with this, and you're in breach of GDPR cookie banner or not. If your lawyers didn't stop you from breaching art 28 but recommended slapping a cookie banner "to be extra careful", that's a major red flag.
That “we” was the lawyer’s “we”. But their point stands: tools change and even if we understand and trust their specs and descriptions now, those change too inevitably in the future.
A bad law, an ambiguous law compels you to be defensive and take precautions. Cookie banners are one of many such defenses and everybody seems to be doing it, validating our strategy.
Thanks for your advice, but unless you are willing to defend me in court and put your money where your mouth is, with all due respect, I will consider its value to be exactly how much I paid for it.
GDPR is not in any way ambiguous there, take a look for yourself [1]. Keeping an eye on those changes is a part of your responsibilities as a data controller, it's your vendors' responsibility to inform you of any changes, and it's your responsibility to vet vendors for GDPR compliance. Again, if your lawyers didn't explain this to you (and you haven't read the law yourself), I'd be very cautious of those lawyers.
On the other hand they probably realise there's zero chance for substantial review of your GDPR practices by the regulator (much less seeing them in court), so they can recommend sticking a useless plaster (opt-in has to be specific, and how can it be specific if you collect it for unknown future changes) and keep you in the dark about more substantial requirements.
GDPR is a very good and clearly stated law, you can read through it yourself in about half an hour to an hour, a negligible time investment for such an important piece of legislation. The purported ambiguity is a psyop by people who don't want to comply.
The only way GDPR is unambiguous is if you interpret it in the strictest sense. Which we actually did - you truly have to, in a business-hostile place like the EU.
For example, consider IP addresses as PII. (This is of course not clearly specified by the GDPR). Then analytics processing them needs consent. Thus cookie popup.
Anything else is interpretation unproven in court.
Sales is driven through traffic. No traffic == no sales.
Understanding what drives traffic to your SaaS website is such an important piece of information. For instance, if you write two articles, one describing how to use your product to achieve a certain thing which customers want to do, and another article which compares your product to a competitor product and one of the two articles creates 50x more traffic than the other then you'd certainly want to know this, because then you know what articles give you the biggest return on your time writing them.
Just one of so many examples how web analytics is such an important tool to being a good sales person.
That sounds like a non-example. Why do you need invasive, personalized surveillance for that? Traffic and aggregate data are an entirely different question.
Do you not at least want to know page views ratio to sign ups so you can see conversion rate? Or do you have a different / better way to do things like testing a new design / price.
Only page view? That's not really useful and you already got that with backend logs.
With true analytics, understanding typical session helps you optimising users workflow, making sure relevant features are easily discovered at the right place.
It really helps when you want to work on user experience. You may need metrics such as LCP, INP and CLS with details per type of page, ability to drill down data and get that in real time.
ROI of such script depends on what you do with the data. If that's vanity or not even looked at, you are emitting CO2 for nothing.
>optimising users workflow, making sure relevant features are easily discovered
>work on user experience
These are qualitative improvements which are extremely unlikely to stem from quantitative metrics, especially when the sample size is not significant (which it is for the vast majority of pages in existence).
In around 8 years of web development, mostly focused on consulting and focusing on ecommerce, I've never seen a net gain from using analytics on a site. If the end goal is to produce data for the sake of data, well sure that will work. Rarely does anyone analyze the data though, and I've never seen anyone dig into the validity of the data and ensure that Google Analytics is in fact accurate and reliable for them.
One of the most disappointing client experiences I had was after building a custom shop for a company that was heavily focused on graphic art. We optimized the hell out of their site, getting performance scores of 97+ when every page was image heavy and included a product grid designed for a masonry grid look similar to Pinterest.
A few days before launch they asked us to add their Google Pixel script. The next day they had included 7 or 8 different third party scripts and blown performance scores into the mid 50s. Its their site and they can do what they want with it, but I sure could have saved a lot of dev time if performance didn't matter at all.
Page visits tell you have many people you get. If you then use how many sign ups you get then you have a conversion rate. That’s an important figure. Page visits can also tell you if your marketing efforts have worked. Imagine doing all the marketing work and not knowing if it did anything.
OP clearly stated that signups and MRR are the really important figures for SaaS. Not incidentally, those two metrics also tell you if your marketing efforts are working.
> Not incidentally, those two metrics also tell you if your marketing efforts are working.
No, they don't. They don't tell you if visits are up, if more people heard of you or anything. They just tell you that x number of people signed up. We can guess that marketing is going better but maybe it's the time of year where more people need the service. If signups go down, maybe you just had downtime or something on your page was broken.
If you look at any number in isolation you're never going to get the full picture.
And your MRR can go up without any marketing. You can just do sales.
> No, they don't. They don't tell you if visits are up, if more people heard of you or anything. They just tell you that x number of people signed up.
In my experience, it's extremely cheap and easy to get a load of fake page impressions from bots, or to buy your US-only company loads of pageviews from low-cost-of-living countries, or to expand the top of your sales funnel with weak prospects who'll never convert to sales.
Seems to me only a fool would pat themselves on the back for doing so.
What? Who is honestly doing that? Are you just making random stuff up?
Imma make my analytics look really good when they're crap because??? People buy fake followers because others can see it. No one else is looking at your analytics. And you sure as hell don't want to increase your page views since your conversion rate would tank and that's the most important metric.
>They don't tell you if visits are up, if more people heard of you or anything.
Again, if you don't care about visits, you don't care if they're up. OP said it best: signups and MRR.
People hearing about you: do you seriously believe that website analytics are suitable tools that provide reliable metrics for brand/product awareness, recognition, product-market fit, etc.?
>maybe it's the time of year where more people need the service. If signups go down, maybe you just had downtime or something on your page was broken.
Exactly, seasonality and website uptime / page functionality are important. They should be measured. At the same time, website analytics have nothing valuable to add to these measurements.
>And your MRR can go up without any marketing. You can just do sales.
I think you are circling around it: all those analytics metrics are just a means to justify the existence of useless 'marketers' who have no idea how to actually measure brand visibility, recognition, or any qualitative metric. These 'specialists' can't even fathom (heh) that business seasonality is something that shows up in a north-star metric and have no imagination or technical ability to set up a website monitoring service or a crawler, use a CRM for attribution, etc.
Oh, they did. My bad. OP a god, they can't be wrong. Oh wait, I'm saying OP is a narrowminded and missing out.
> People hearing about you: do you seriously believe that website analytics are suitable tools that provide reliable metrics for brand/product awareness, recognition, product-market fit, etc.?
Why are you bringing up PMF when it comes to analytics. BUT! Yes, can. If your users are using your shit all the time and you got analytics all over your app, you've probably got a better
But remember when I said earlier looking a single stat in isolation is bad? Ssh.
> I think you are circling around it: all those analytics metrics are just a means to justify the existence of useless 'marketers' who have no idea how to actually measure brand visibility, recognition, or any qualitative metric. These 'specialists' can't even fathom (heh) that business seasonality is something that shows up in a north-star metric and have no imagination or technical ability to set up a website monitoring service or a crawler, use a CRM for attribution, etc.
"Useless marketers"...
Anyways, you're complaining about others people useless while you're saying all data except for your north star metrics are useless.
Imo, this is arrogance and ignorance mixed together.
The value of web analytics for our organization lies in the same realm as the value of Plausible over any third party analytics: The Funnel.
We're a membership driven organization, and by "membership" I mean we rely on donations to fund our content creation (Though whether you're a member or not you have the same level of access to our content). We care about raw traffic numbers, because it relates directly to our mission of informing people. It tells us how many people we inform day to day.
So yeah we care about those raw numbers, and those numbers are difficult to get w/out javaScript r/n because caching and the terrible log retention of our hosting providers.
Raw traffic numbers only tell part of the story though. We want to know the path people take from first landing on the site to becoming a donating member so that (in theory) we can do more of the things that promote that behavior in more people. That's The Funnel, and that's where orgs like Plausible are best. They're first party tracking, so the data stays with us. Also since they're first party tracking we can track a person's overall relationship with our site, from the first news story they read to the moment they first hit our donation page 3 years into the relationship or whatever.
We should be able to do that with our GA set up, but one of the reasons I want us to shift to Plausible is for its simplicity.
You got quite a few seo garbage-level nonsense replies. In my experience, you are right, and most tracking metrics have long since become the (vanity) goal to justify the existence of these 'digital marketers'.
It's funny that they spout nonsense about better UX or how you wouldn't be able to do CRO when you'd just laid out two metrics that are actually important and don't require any website analytics to track.
Cool. Perhaps companies and governments (gov.uk I'm looking at you) could consider using this stuff instead of forwarding all their public interactions to an unaccountable US corp.
Let me also plug my free, open-source and self-hosted event-based analytics solution: Fugu (https://github.com/shafy/fugu). Fugu does not track unique visitors (not even daily like Plausible does) and is made for event-based tracking. Comes with included Docker config to make it easy breezy to self-host.
I'm planning on running a small niche WordPress blog that I would like to monetize with adsense & possibly an affiliate program. I see there's a lot of choices for analytics available listed by users in this post. Does Adsense require Google Analytics or could I use one of these more privacy friendly ones?
AdSense and Google Analytics are two separate products, you can use either, both or neither. If you have AdSense though, you've already allowing Google to track your users, so I don't think ditching Analytics would make the blog any more private.
It's also cookieless, the hosted version is free to use within reason, and it's extremely lightweight if you choose to self-host it. It doesn't even need a separate database, it can run self-contained with SQLite (or Postgres if you prefer). A good fit for small sites where the big industrial-grade solutions are overkill.
This service claims to not track personal data, yet their docs admit to storing hash(siteID + User-Agent + IP) + seen_paths on their backend for session tracking.[1]
Sites can track sessions without tracking personal data.
right below that the docs also say that this hash is not persisted, only cached in memory and mapped to a UUIDv4. The UUIDv4 is what persists between sessions.
> The IP address and User-Agent are never stored to the database or disk, and there is no conceivable way to trace the random UUID back to this.
>
> It’s only stored in memory, which is needed anyway for basic networking to work.
I can't say whether that is GPDR compliant but it's definitely not storing the hash
Fetch an empty resource that is privately cacheable, set to max-age=0, and has an ETag containing the current timestamp and a random session id. The browser will consider its cached copy always stale.
When you next fetch that resource, because it is stale, the browser will revalidate it by passing an If-None-Match header containing the ETag. Update the ETag to include the original timestamp and the current timestamp.
So on every page load (or whichever other event you want to measure), you will be told when that session started, the session id and when that visitor was last seen.
To set the maximum session duration, reset the ETag if the last seen timestamp passed to you in If-None-Match is too long ago.
This can even work without JavaScript by using an img element.
The only data tracked with this is the session start time, last seen time, and a random session id. Since the session id isn’t related to any of your business logic, it cannot be used to identify an individual.
To further isolate this data, locate the tracking resource on a different hostname. The browser’s SOP will prevent any cookies from being sent with the request, so your analytics backend can’t record identifying information even if it wanted to. This will also prevent you from tracking which page is being visited, though you can override that with the no-referrer-when-downgrade referrer policy.
You just reinvented analytics cookies. You’d be surprised, but they don’t store PII either. It’s usually just a randomized session ID and timestamps, like you’re suggesting.
„ In comparison, in the context of the European GDPR, the Article 29 Working Party[6] considered hashing to be a technique for pseudonymization that “reduces the linkability of a dataset with the original identity of a data subject” and thus “is a useful security measure,” but is “not a method of anonymisation.”[7] In other words, from the perspective of the Article 29 Working Party, while hashing might be a useful security technique, it is not sufficient to convert personal data into deidentified data.“
Indeed you are correct. Plausible it is not. They should put their cookie consent back up, and need to inform their users how they are indeed processing the data collected from personal users.
problem is that this is what they say they do, there are too many examples of companies being noncompliant to their own policies and regulations. they should explain the abovementioned algorithm in their data privacy declaration published online. also even a hash can be considered as a private and personal data unless it has been protected sufficiently. thus need to inform your users anyway.
Good approach. IP Addresses are personal data. So the data and the hash is subject to GDPR.
You still need consent to collect it - well or some other kind of legal shenanigans. The intent is to track a person, it is not technically necessary. You might have a legitimate interest - but in the end you still have to consider the GDPR to use this tool.
Turns out that many officials believe this is fine. Companies using Plausible, Matomo and similar services have been under scrutiny.
IP adress is required for site to function - your server cant not collect it. Plausible also only processes it for uniqueness and doesnt save it as is. Interestingly most webservers/firewalls will have to keep track of ip adresses so they will be saved in acess logs and caches. Making them more problematic than Plausible. Yet its most likely fine because the intent is not to track individual users but to improve service/keep it runing. Plausible intent is also not track individual users but collect visitor counts which is something used for improving service too.
I have experience from state funded projects from central european countries. Afaik what they battle/hate most is what goes against the spirit of the law. So mainly popups that are hyperdesigned to be confusing so people are forced or tricked or annoyed thus accepting everything.
Another thing they battle is how long data is saved and where the data is shared.
If you self host service like plausible or matomo that do everything thats possible to be compliant then it's fine.
I think there is marketing tactic ad/analytics companies and marketers use against services like Plausible. They say these services also require cookie popup and wont give you as much detailed info so why would you use them. Most websites would be fine with limited data Plausible provides but it breaks ad/analytics industry business plan.
> Plausible also only processes it for uniqueness and doesnt save it as is
That's exactly the point. Processing of personal data to identify a unique person.
Regarding firewalls and logs: It's argued that this is legitimate interest as it is stated in Recital 49 of the GDPR. So they got a free pass, for the better or worth.
> I think you might be permanently spreading fear
Don't get me wrong, I like the approach. But it's not a get out of GDPR free card.
> That's exactly the point. Processing of personal data to identify a unique person.
Not sure thats what i said. They cannot identify unique person. They identify unique legitimate visits per one day.
If logs and firewalls mean legitimate interest because you have to give server your ip address for everything to work then using same thing can be said about plausible especially since the ip address is immediately thrown away unlike with firewalls where the main point is to keep record of bad actors.
It is very different to google analytics where whole point is to pinpoint repeating visitors, their behaviour etc. You simply can't do that with service like plausible. What you can do is know how many legitimate visits you had and what was visited. For most websites that is enough at same time i would be surprised if not knowing how many people visited your site would not be legitimate requirement for service to function.
Legitimate interest still requires the data subject to be informed under Art 13. Not sure how that would be accomplished without at least an info banner. (This goes for server logs too.)
If you have a website you have to write this in your Privacy Policy and most do.
Firewalls are a curious case. It is argued that the data is not collected but transmitted to the controller. Almost as if you get a letter with personal data and now have to deal with it.
Yes, it's a stretch. Not happy with it but I don't see any practical solution either...
AFAIK it's not enough to write it in your privacy policy. Art 21 of the GDPR makes this explicit:
> (4) At the latest at the time of the first communication with the data subject, the right referred to in paragraphs 1 and 2 shall be explicitly brought to the attention of the data subject and shall be presented clearly and separately from any other information.
I am not a lawyer, but as far as I can tell, there is no legal way to collect PII (including IP address) or place tracking identifiers on the user's device without at least informing the user explicitly under the GDPR and the ePrivacy Directive.
You are correct. In early days of the GDPR people thought about a page in front of the original page without any data collection presenting only the privacy information.
But soon there was an agreement that Art 13 lit. 4 could be interpreted that as long as you don't have any data collection beyond server logs this would be deemed as sufficient. Or in other words if you won't invoke the Art 21 lit. 1 of the GDPR.
But since everybody wants to track you on basis of their legitimate interest the web became full of cookie banners
That's a bit simplistic. IP addresses are not unequivocally personal data. Let's rewind back a bit, GDPR Art. 4:
> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, ___location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
IP addresses only allow to identify a natural person when combined with other data, such as ISP data or a profile built over dozens of websites. This is not the same kind of personal data as a name + address, Breyer notwithstanding (note the bit about the ISP in the judgment).
GDPR is not about identifying an abstract entity, it's about identifying a natural person. Doing the former for long enough/with enough data allows the latter, but especially with time-limited in-memory hashes that's a non-existent window of opportunity.
In practice this'd probably need to be resolved in court, and I'm sure not a single SME using Plausible or similar will even get a stern letter, much less fined.
> In practice this'd probably need to be resolved in court, and I'm sure not a single SME using Plausible or similar will even get a stern letter, much less fined.
Agreed.
Plausible just makes false claims like:
> All the site measurement is carried out absolutely anonymously. Cookies are not used and no personal data is collected. There are no persistent identifiers.
That's a heavy statement and it is simply not true, as you quoted:
> an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, ___location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person
hash(daily_salt + website_domain + ip_address + user_agent) will fall under this definition.
But again, you are right, better then anything any other service does
what are your thought on aggregated data? you can still identify unique visitors but its aggregated data so you can't link it back to the individual.
I have doubts that just identifying unique visitors would also identify individuals. Their current approach of creating random id which is unique for 24 hours should not violate GDPR? or it would?
You begin at a point where you have data to aggregate. This data is linked to individuals.
Anonymisation of data is data processing and some argue, that it is subject to a privacy impact assessment. Arguing that if done poorly it has great negative consequences for the individual if they can be deanonymized.
The duration itself does not change the outcome.
Thus said the approach Plausible takes is much better than any cookie used.
I think you can argue if this holds up: you cannot retrieve the ip from the hash (and residential IPs are usually dynamic). The short lifetime together with never storing the hash makes it so you cannot de-anonymise the user.
No one will get fined for not asking consent for this. Our DPO just said ‘don’t be silly’ when I asked him. But we will see if it gets tested (my bet: it won’t).
You don't need to retrieve the ip to make it PII, the hash itself is PII.
You might not think of it as containing actual "personal information", but its sole purpose is to attempt to uniquely identify a person. That makes it PII.
> (and residential IPs are usually dynamic)
This actually makes the short lifetime more suitable as a PII, because it reduces the likelihood of the same IP being used by a different person being tracked as the same person.
> The short lifetime together with never storing the hash makes it so you cannot de-anonymise the user.
That also doesn't matter, because the lifetime of the token is long enough to track the user through and entire typical session, maybe several.
The stupid thing in all these shenanigans is that collecting the data isn't itself the problem, it's not getting the user's consent. Just tell the user what you're doing, and it's not a problem - if it's a "technically required" cookie they can make an informed choice to use your site or not, if it's an "optionally required" cookie, they can choose whether to accept or not. Most users won't care and will click on the biggest, most obvious buttons. The ones that do care are likely atypical and would skew your metrics anyway.
You can as long as you have IPv4 visitors, because the search space is small enough to brute-force. There are only four billion IP addresses. The user-agent complicates things a little but there aren’t many of those, so you could retrieve the IP addresses of most visitors from the hash if you wanted to.
> residential IPs are usually dynamic
Usually isn’t good enough. I’ve had residential IPs that are on public record belonging to me personally. IP addresses can be personally identifying information, so they need to be treated that way.
I get what you're saying - in that if you know the IP address, then you can often easily discover who the individual is. I'd counter that actually, for most people this isn't the case - for many companies, only the ISP, Google, Apple, Facebook etc know who the real user of an IP is... (incidentally, the people most keen too force analytics on you, but that's another issue).
However, that is all kind of moot. The hash itself is PII, because it can be used to track an individual. PII isn't about the difficulty of determining the specific identity of a user, it's about the difficulty in identifying a specific user. The distinction is subtle, but important.
Take an example - people are using a wireless hotspot somewhere, maybe you own a coffee shop, and over the course of a few weeks, you're alerted to the fact that someone has been accessing some illegal content that could get your business in trouble. You've been careful to comply with the GDPR, and your logs only include time and hostname of the server accessed. On it's own, there is no PII there. But, combine that with say credit card transactions, or video footage and finding out who was in the coffee shop every time this happened. Then boom! Suddenly, your time has become PII. Maybe not uniquely correlated to a single person, but a group of people. With every instance of a correlation to that person and a group of random people, it doesn't take maybe to narrow it down to a specific individual.
This is why, to actually comply with GDPR, you need to only store logs for as short a time as is technically required (legally beyond a month is hard to justify, ideally a few days at most) and then you should aggregate into groups where individuals cannot be isolated. If your aggregations result in groups of people that are too small, you need to change the aggregation groups, or report an empty group. It's totally fine to store data like "on this day, n people went from this page to this page, average linger time blah seconds" if n is 10 or more. If n is 1 or close to it, that data is still identifying.
That part was responding to where you said "Usually isn’t good enough. I’ve had residential IPs that are on public record belonging to me personally. IP addresses can be personally identifying information, so they need to be treated that way."
My point is that whether you can determine the IP address from the hash or not doesn't matter. The hash itself is PII.
You would still have to produce the paperwork for this.
Most websites don't get fined using GA. Plausible is a huge step in the right direction, but their claims are very strong and not backed up by the GDPR if you take a closer look.
Regarding fines: most offices will give you a warning instead of a fine, you adjust your cookie banner and you are good to go
Currently using Umami, but I've considered switching to Plausible due to Umami's less-than-stellar development performance (e.g. breaking the site details page for a few days recently).
Also happily using hosted GoatCounter. Last year I noticed some occasional operational hiccups, like service brief downtime, but this year it's been completely stable as far as I can tell.
How are you self hosting it? I find its requirements extremely heavy for a simple analytics solution. It requires a PostgreSQL and Clickhouse database. I don't find self hosting Clickhouse particularly easy. Wish they had an option to just use SQLite as an alternative.
I completely agree that the self-hosting story for Plausible is overkill for most websites.
So much so that I made my own that focuses on self-hostability using SQLite and DuckDB (no external dependencies, can run on a 256MB VM): https://github.com/medama-io/medama
I've not tried their hosted version no. I doubt that there would be a seamless way to switch between them since all the data lives in Clickhouse, but I could be wrong.
Can I use plausible in a desktop application? I would like to have an idea of exactly which versions of an open source desktop app I maintain are being actively used so I know what to pay attention and invest efforts as I would like my users to be constantly migrating forward - we do have like 20 years of backwards compatibility so we push things forward very slowly.
I was able to do it pretty easily with a mobile app, should be just as easy on desktop. You could even register custom “pages” for various parts of the desktop app.
Plausible is very nice, but it lacks much of the information from Matomo (like “after viewing /foo, visitors tend to view…”). Matomo is very nice, but it lacks the free Google Search Console integration (“people are currently finding you from these Google searches: …”) from Plausible.[0]
I’m vain and curious enough to want to see the Google data, but not so much as to pay $160/yr for the Matomo plugin for my personal blog.
[0] This isn’t the same as Google Analytics. You can get this information without installing a tracker on your site.
I think Plausible’s self-hosting is not simple, requiring unnecessarily heavy databases like ClickHouse, which can be overkill for the average website owner. Comparatively, this project can effectively run on a 256MB VM for most small website with no external dependencies.
I was always confused by GDPR. What are the minimum requirements to avoid the banner? Anonymising the IPs and not keeping anything else, or you can keep anything as long as you don't share them with third-parties?
Essential cookies (e.g. a cookie that saves the cart's content in an e-commerce app) are fine. PII (personally identifable information) is never fine (this includes IP addresses, email addresses, more or less exact geolocations) - so anonymized IP is ok.
> Of course mapping each IP to random id and not storing the mapping should be completely ok.
If it was a different random id for every request, then sure, OK.
If it's the same random id used on multiple requests, then it becomes PII, as its purpose is to uniquely identify and individual. It should not be logged or stored.
Services like Plausible add time into the mix. So you know that someone visited these 5 pages in 20 min. But you wont know about returning visitors. I think thats pretty significant difference.
But if what you are saying is true then it's impossible to know how many people visited your website unless you have banner. What about logs then? Sounds like everybody is happily using those because they are "legitimate interest" because servers couldn't work without them but its way more identifying data than what Plausible saves.
> Services like Plausible add time into the mix. So you know that someone visited these 5 pages in 20 min. But you wont know about returning visitors. I think thats pretty significant difference.
That doesn't make it any less PII. Also, the 20 minutes thing is just a number you plucked out of thin air - it's actually valid for 24 hours.
> But if what you are saying is true then it's impossible to know how many people visited your website unless you have banner.
No, that's not what I'm saying at all. First of all, that claim is clearly false. If your web server logged only the URL and nothing else, no time, nothing, you would have accurate usage counts for every single part of your site.
For the record, I actually think Plausible attempts to do a good job - it's clear they are trying their best to be privacy focused, not log anything, only provide data in aggregate - that's all good stuff. However, I'm not sure their stance that their don't require consent is valid, because the hash itself is PII. The reason I think the hash is PII is because of how it is being used - to identify an individual user.
Oh, and servers can work perfectly fine without logs. People like logs, but they're by no means necessary.
Logs by themselves aren't necessarily a problem if you have a clear data policy in place, and there is a legitimate use for them. The point is disclosure of the data use, and timely deletion of any data that isn't strictly necessary for the business use. So, you can keep PII around relating to billing for as long as they have a subscription, or as long as you are legally required to keep customer records for. After that, they need to be deleted. Anything like access logs that you can justify a business need for can be kept, perhaps a few days or ideally hours until you extract aggregate data, but again you need to state that in your privacy policy, and they should be promptly deleted as soon as reasonably policy.
And as I said before, all you need to do to comply with the law is to make sure you have the user's consent before tracking them. It isn't really that onerous. The question is, if you don't want the user to know how you're tracking them, why not? What are you hiding?
> And as I said before, all you need to do to comply with the law is to make sure you have the user's consent before tracking them. It isn't really that onerous. The question is, if you don't want the user to know how you're tracking them, why not? What are you hiding?
This is super wierd spin from what i said. I work on content heavy media sites that are not ad driven. Its either from grants like research or journalism or its presentation of commercial work. Architects, design studios, publishers, writers… All of these clients want to have ballpark numbers of how many people visited the site. Nobody processes or sells this data. Its 10s to 100s visitors a day. We try to use the most private way we know of.
Its crazy that because of the sick practices of this industry i am suddenly the one suspicious. Some kind of nothing to hide fallacy huh? No we are not hiding anything. We just dont want annoying consent because of visitor counter. The ones hiding something are the ones with tricky psycho designed multi step consent banners. We just dont want to be in same bunch just because few basic stats.
> All of these clients want to have ballpark numbers of how many people visited the site.
You don't need cookies for that.
Again, as I've said before, you can for instance log data for technical reasons, e.g. wanting to post-mortem a failure or attack, as long as the data is deleted promptly as a matter of course. You shouldn't use the PII in that log for analysis without the user's consent (so for a log file, that means you probably should never use the IP address except for endpoints that are only accessible to logged in users), but the URL they accessed isn't PII (unless you start putting identifying tokens in it).
If you just want ballpark numbers, just extract the URL field only, and count how many times each appears. Obviously, this will give you metrics on how popular each page / asset is, not how many unique users you have. To do that, you have to identify unique users, and to do that you need to have their consent.
> We just dont want annoying consent because of visitor counter.
But the law requires you to get their consent.
> The ones hiding something are the ones with tricky psycho designed multi step consent banners.
To be fair, I agree with you. They are deliberately designed to be awful in the hopes that the user will just take the least path of resistance and accept their terms. However, it is still a choice. In the cases when I see such a consent form, I either just close the window or I re-open it in incognito mode so I won't get a persistent cookie if it's something I really want to read.
The point is that the regulatory line needs to be drawn somewhere. The law at the moment says the line is: If PII is required for your site to function, then must ensure the user knows you're doing it. If PII isn't strictly required for your site to function, but it provides a benefit to your company (usually re-framed as how to ultimately helps the customer), then you must request consent. Both of these cases are covered by the usual kind of popup, but that's why you'll see some that you can disable (like sharing data with partners) and some you can't (like cookies for logging in). But you still need consent.
> We just dont want to be in same bunch just because few basic stats.
Then just collect basic stats like how many hits each page got. That's fine, you don't need cookies or PII for that. Number of active users isn't a basic stat though, as it clearly requires you to distinguish between different users and any process you use to do that creates PII.
Perhaps you should consider just explaining why you want the cookie in your popup. If you word it in such a way that explains that you're only using daily active users as a metric to justify continued funding, you'll probably find most people are totally happy to click accept. A message plus simple ACCEPT / DECLINE is fine, as long as the message makes clear what you're doing. Note that you can set an "essential cookie" in response to them clicking DECLINE as long as you've explained that the website uses essential and non-essential cookies, but obviously it shouldn't contain anything other than a simple accept/decline result.
Nobody is setting any cookie. You know these services are cookieless instead use their ip+salt+time hash they send from client.
Problem with server side metrics (why google analytics became so popular) is first it generates lots of noise visits from bots. But more importantly its often not possible to implement them because the hosting is handled by unable/unwilling third party.
I will not jump the gun just yet. We will keep being in this gray zone until i see the authorities have problems with approach of matomo/plausible. I have seen the opposite. If they did we would remove the analytics entirely because there is nothing worse than cookie banner which instantly annoys users and puts you on level with any other mainstream site that does fingerprinted tracking.
It's not a clear and cut case with IPs. As you say, if your servers logs IPs that seems to be classified as "legitimate interest" (for security reasons). But if you use that data to track unique users for product dev, marketing etc. reasons, that's not "legitimiate" interest anymore. At least, this is my understanding.
For example, it would make stopping a DDoS attack much harder if you would need to anonymize IPs.
Yeah, great point. It's how you process and store the data that's important.
One of the key rights individuals have is to request that ALL PII about them is deleted from all of your records, and you have to comply with this request within a certain timeframe, and a maximum of 30 days. This includes backups, logs, everything.
Obviously, it's impractical to try to edit old backups to remove PII, so you have to be careful how you deal with logs in the first place - you might want them to be backed up on another machine with a maximum lifetime of a few days, you might want to not back them up at all and only backup your aggregated data, etc.
But keeping logs for a few days can be justified for as you saying DDOS mitigation, post-failure root-cause-analysis, etc, but the defaults for that data should be to delete that data as soon as it's no longer useful for that purpose, which for most companies will be a couple of days, maybe another couple for the weekend. You can keep it still further, for instance for active analysis, but the default should be to delete it as soon as possible.
It probably depends on how for you go with the fingerprinting. If it's only user agent, I would guess it's ok. If you start adding more and more info to the fingerprint, it will become PII at one point.
Not sure about how much of IPv4 must be anonymized. If you want to be sure, just anonymize the whole thing. Important to make it random, and not use a hashing function that always gives the same output for the same input IP (in that case, it counts as pseudoanonymized and can be PII).
Also, IANAL, just a dude who is passionate about online privacy.
well AFAIK a simple session cookie doesn't need a banner. Also i think if you do everything local to your system you don't need one either. The point where you need one if you use any system that utilizes third parties to track the user.
So if you store and analyze everything "locally" to your server you don't need cookies and therefore no banner no matter how much you "track" since its all request made to your own server you merly use the telemetry of.
You can't share that data without consent but thats a seperate data protection thing from the cookie banners.
Oh and the GPDR is mostly confusing because it was interpreted with malicous complyance by the whole industry - at least in effect if not intention.
it is simply easy for upper management to take a "better safe than sorry" approach and by now the banners have reached a degree of dark pattern development that is horrifying in their relentlessness.
So much this. The whole ad industry is afraid that most websites would switch to simpler more private compliant alternatives which would break their business (of reselling snooping data).
So they are on marketing campaign to paint these alternatives as non compliant and requiring banners too. Basically every fart now needs a consent banner and when you already have a banner why not have this most invasive visitor screen recorder analytics that we send to our 743 partners in real time.
What’s that? We need users consent for ad cookies? Ok let’s also make them consent to the session cookie too as a way to confuse them or get them to lazily just click the accept all cookies button rather than find the exact cookies the site needs to run without ads.
legitimate interest - anything to make your application function.
you have an online mail service, you have to save email accounts of emails you receive so you can respond to those.
you allow people to forward their emails received to other email addresses, you need to save those other email addresses.
This would be in dbs for that stuff if you have third party marketing analytics, just because you have legitimate interest to save email to make application work doesn't mean you can pass that email into third party marketing analytics. That is not legitimate interest.
if you have a newsletter service and someone signs up to receive newsletter then you need to save their email to send that newsletter. you don't need to ask, they have implicitly given you permission by asking you to send them the newsletter.
If you have a process for removing users from service for violation of terms then you probably need to be able to keep information about them otherwise they can just say get rid of info and then sign on again - this would come into the parts of the Digital services acts about obligations to users and appeals process for removal etc. but different thing, if you have removed someone you need to be able to identify when they try to come on again.
> legitimate interest - anything to make your application function.
Plus the data that you're required to retain by other laws. E.g. banks/financial institutions might be required to retain a lot of data for several years for audit and compliance purposes.
I figured the parent poster already covered that with
> If it's strictly necessary, e.g. logging in or legal obligation, you're fine and don't need to ask
I run a SaaS business and I dropped Google Analytics a long, long time ago. Primarily because of the tracking, but also because I really couldn't see the value of the data.
In the old days, you could at least use the "Referer" (sic) header to know where people came from and what they searched for. But that is long gone, and the only source of that data is Google/Bing search console.
Page visits are a vanity metric: they tell me nothing about my business. The only thing that actually matters for a SaaS are signups and MRR. Measuring your business by page views is like measuring the business performance of a Walmart by counting cars on the freeway nearby. Yes, the numbers are somewhat related, but you can't draw any conclusions.
I made it a point not to include any third-party JavaScript on my site, but even if I were to make an exception for these analytics, I can't really see the point, unless you are running an ad-driven site where pageviews are king.