I once did something similar with a recipe from a cookbook where the recipe started at the bottom of one page and continued onto the next page. It correctly identified the first few ingredients present in the photo of the first page but then proceeded to hallucinate another half-dozen or so ingredients in order to generate a complete recipe.
> If I try to do an UPDATE ... SET x = x + 1, that will always increment correctly in SQL. But if read x from an ORM object and write back x + 1, that looks like I'm just writing a constant, right?
This is not specific to ORMs... you can run into the same problem without one.
> Extra magic: if you've read a class from the db, pass it around, and then modify a field in that class, will that perform a db update: now? later? never?
In every ORM I've used you have specific control over when this happens.
I've never understood the ORM hate because a good ORM will get out of the way and let you write raw SQL when necessary while still offering all of the benefits you get out of an ORM when working with query results:
1. Mapping result rows back to objects, especially from joins where you will get back multiple rows per "object" that need to be collated.
2. Automatic handling of many-to-many relationships so you don't have to track which ids to add/remove from the join table yourself.
3. Identity mapping so if you query for the same object in different parts of your UI you always get the same underlying instance back.
4. Unit of work tracking so if you modify two properties of one object and one property of another the correct SQL is issued to only update those three particular columns.
5. Object change events so if you fetch a list of objects to display in the UI and some other part of your UI (or a background thread) add/updates/deletes an object, your list is automatically updated.
6. And finally in cases where your SQL is dynamic having a query builder is way cleaner than concatenating strings together.
For those who are against ORMs I am curious how you deal with these problems instead.
You are describing data mapper ORMs, a.k.a the good ORM. I think all the other ORM-loathing guys here had bad experiences with active record ORMs, a.k.a the bad ORM.
Also, infrastructure guys and DBA types tend not to like ORMs. But they are not the ones trying to manage the complexity in the business process. They just see our queries are not optimal, and it is everything to them.
Right! They should really be considered two different things. I've worked a lot with Django (the bad type) which people tend to love, but I've seen the horrors that it can produce. What they seem to love about it is being able to write ridiculously complicated SQL using ridiculously complicated Python. I don't get it. These types of ORMs don't even fully map to objects. The "objects" it gives you are nothing more than database rows, so it's all at the same abstraction level as SQL, but it just looks like Python. It's crazy.
SQLAlchemy is the real deal, but it's more difficult and people prefer easy.
Oh was I enthusiastic when I first got my hands on an active record ORM: "I can use all my usual objects and it'll manage the SQL for me? Wow!". That enthusiasm reached rock bottom rather quickly as soon as I wanted to fine tune things. Turns out I'm not a fan of mutating hierarchical objects and then calling a magical .commit()-method on it, or worse: letting the ORM do it implicitly. That abstraction is just not for me and I'd rather get my hands "dirty" writing SQL, I guess.
Yes I suspect a lot of the ORM hate comes 1) from people using poorly designed ones or 2) from people working on projects that don't really require the features I mentioned. Like if you are generating reports that just issue a bunch of queries and then dump the results to the screen you probably don't care that much about the lifetime of what you've retrieved. But just because an ORM might not be the right tool for your project doesn't make it a bad tool overall, that would be like saying hammers are bad tools because they can't be used to screw in screws.
Before becoming too overconfident in SQLite note that Rebello et al. (https://ramalagappan.github.io/pdfs/papers/cuttlefs.pdf) tested SQLite (along with Redis, LMDB, LevelDB, and PostgreSQL) using a proxy file system to simulate fsync errors and found that none of them handled all failure conditions safely.
In practice I believe I've seen SQLite databases corrupted due to what I suspect are two main causes:
1. The device powering off during the middle of a write, and
2. The device running out of space during the middle of a write.
I'm pretty sure that's not where I originally saw his comments. I remember his criticisms being a little more pointed. Although I guess "This is a bunch of academic speculation, with a total absence of real world modeling to validate the failure scenarios they presented" is pretty pointed.
I believe it is impossible to prevent dataloss if the device powers off during a write. The point about corruption still stands and appears to be used correctly from what I skimmed in the paper. Nice reference.
> I believe it is impossible to prevent dataloss if the device powers off during a write.
Most devices write sectors atomically, and so you can build a system on top of that that does not lose committed data. (Of course if the device powers off during a write then you can lose the uncommitted data you were trying to write, but the point is you don't ever have corruption, you get either the data that was there before the write attempt or the data that is there after).
Only way I know of is if you have e.g. a RAID controller with a battery-backed write cache. Even that may not be 100% reliable but it's the closest I know of. Of course that's not a software solution at all.
That's uh, not running out of power in the middle of the write. That's having extra special backup power to finish the write. If your battery dies mid cache-write-out, you're still screwed.
> In the case of Google OAuth, it's possible to forego this in order to allow any Google user from any Google workspace to login to your application.
There are plenty of use cases where this is appropriate. If you wanted to allow users to login to Hacker News with their Google accounts you would use this option because you do not care what workspace they belong to.
> Some applications (e.g. Tailscale) take advantage of the public Google OAuth API to provide private internal corporate accounts.
This is a misuse of the public Google OAuth API. Your first link clearly states: "A public application allows access to users outside of your organization (@your-organization.com). Access can be from consumer accounts, like @gmail.com, or other organizations, like @partner-organization.com." In other words it is intended for scenarios where you want to allow access to users outside your workspace.
> Instead, Google instructs you to look at the "hd" parameter, specific to Google, to determine the Google Workspace a given user belongs to for security purposes.
According to your second link the "hd" parameter only tells you what ___domain the user belongs to, it does not tell you what workspace the user belongs to.
> You can avoid this issue by using a custom Google OIDC IdP configured for internal access only in your applications, rather than using a pre-configured public Google OIDC IdP
So Google offers an OAuth integration option that actually restricts access to your specific workspace. Choosing to ignore this option and instead integrating with the option designed for public access from all Google accounts, and then calling it a vulnerability when someone can login with an account from another workspace, is frankly, absurd.
> This is a misuse of the public Google OAuth API. Your first link clearly states: "A public application allows access to users outside of your organization (@your-organization.com). Access can be from consumer accounts, like @gmail.com, or other organizations, like @partner-organization.com." In other words it is intended for scenarios where you want to allow access to users outside your workspace.
> According to your second link the "hd" parameter only tells you what ___domain the user belongs to, it does not tell you what workspace the user belongs to.
From the docs:
> If you need to validate that the ID token represents a Google Workspace or Cloud organization account, you can check the `hd` claim, which indicates the hosted ___domain of the user. This must be used when restricting access to a resource to only members of certain domains. The absence of this claim indicates that the account does not belong to a Google hosted ___domain.
Note also that even Google conflates "domains" with "Google hosted domains" with "Google Workspace or Cloud organization accounts."
> Choosing to ignore this option and instead integrating with the option designed for public access from all Google accounts, and then calling it a vulnerability when someone can login with an account from another workspace, is frankly, absurd.
At this point, if you still believe calling this a vulnerability is absurd, I don't think there's anything more I can say to convince you. Google paid out the bounty because they didn't believe it was abusrd.
I personally think that the best counterargument to calling it a vulnerability is: "well, sure, Google is reusing the Google Workspace identifier for different workspaces, which could be used to impersonate a user; but if you own the ___domain, you can also receive email as that user and reset the account that way."
I suppose this comes down to the interpretation of the documentation. Note that it only says "a workspace", not "a specific workspace" or "which workspace".
1) The "hd" claim tells you that the user is a member of a workspace. If the user is a member of a workspace it tells you the ___domain name of that workspace.
2) The "hd" claim tells you which specific workspace the user is a member of.
You are taking interpretation (2) whereas I am taking interpretation (1). I believe interpretation (1) is correct given the next sentence says you can use the "hd" claim to restrict access to only members of certain domains. If interpretation (2) was intended, they could have instead said you can use the "hd" claim to restrict access to only members of a certain workspace.
If Google is at fault for anything here it is for writing confusing documentation, however given the totality of the documentation where:
a) Google describes public applications as intended for logins from all Google accounts regardless of workspace, and
b) Google offers the internal application option for situations where you want to restrict logins to users of a specific workplace,
I'm going to stand by my conclusion that the real fault lies with service providers choosing the wrong integration option in the first place and then making invalid assumptions about what information the "hd" claim supplies in the public option.
I agree, I don't think this is a problem with Google's Oauth implementation, it's a problem with the service providers who authenticate users via the mere existence of an email address ending in @company.com without checking if the email address actually belongs to an active employee.
If, when you logged into Slack via Google Oauth with the email address [email protected], Slack checked with company.com whether [email protected] was a valid user that should be allowed to login, then this problem would be avoided entirely because the defunct company would no longer report any valid users.
This would also avoid further problems with attackers being able to login with unattended email addresses like [email protected], as was discussed here: https://news.ycombinator.com/item?id=41818459 (There's a lot of discussion below about whether the "sub" claim is stable or not but it's a red herring because of this IMO, also the proposed fix in the article wouldn't address it either.)
By looking the account up with Google's People API - https://developers.google.com/people
They would have to verify the account is active
If I log in using Google oauth, you already know the Google account is active.
AND the id hasn't changed
Yes, but that's an additional check, separate from the one you suggested would eliminate the issue:
If, when you logged into Slack via Google Oauth with the email address [email protected], Slack checked with company.com whether [email protected] was a valid user that should be allowed to login, then this problem would be avoided entirely because the defunct company would no longer report any valid users.
> If I log in using Google oauth, you already know the Google account is active.
You know there is an active Google account but (for the public OAuth integration option) it can be any Google account from any workspace, or no workspace.
"A public application allows access to users outside of your organization (@your-organization.com). Access can be from consumer accounts, like @gmail.com, or other organizations, like @partner-organization.com." [1]
> Yes, but that's an additional check, separate from the one you suggested would eliminate the issue:
If you set up an internal OAuth integration option no separate check is necessary, it will actually restrict access to users of your workspace.
"An internal application will only allow access to users from your organization (@your-organization.com)." [1]
You can use the SAML integration option as well. [2]
Right, this additional check should not be necessary in a typical OAuth or OIDC flow. This workaround is only necessary in this case because the API Google offers to services has a hole in it.
It is certainly an option to pre-provision accounts in your application (e.g. Slack) and then have any users authenticating from the SSO product compared against the list of authorized users. And for some products which sell licenses by seat, this is exactly what they do.
But for many products which are meant to be available to an entire organization, this is a big part of what SSO was supposed to solve in the first place: IT no longer has to provision (and de-provision) user accounts in every single application. Maintaining an allowlist in each application makes this pointless.
Maybe you can clarify what part of the linked comment didn't make sense? It's a bit hard to make out where the confusion is if the above comment didn't help clarify.
If I'm understanding your comment correctly there are three different ways to connect Google SSO to your application:
1. SAML. This avoids the issue because certificates need to be exchanged between Google and your application, but an attacker that recreates a duplicate workspace using your ___domain won't have access to those certificates. Only users from your workspace will be allowed to login.
2. A custom Google OIDC IdP configured for internal access only. This also avoids the issue because a secret key is required to set this up and the attacker won't have access to that key. Again, only users from your workspace will be allowed to login.
3. The public Google OAuth API which will allow any Google user from any workspace (or non-workspace users) to login to your application.
Assuming the above is correct, if you decide to connect Google SSO to your application using option (3), then it is not a vulnerability that a Google account from a different workspace can connect to your application, because the entire point of option (3) is that users with any Google account (regardless of workspace) can connect to your application.
If you intended to restrict your application to users of your own workspace then you should have used option (1) or (2).
First of all, the whole point of SSO is that the only and final source on who are the valid active users is the IdP, in this case Google's public OAuth instance. If the IdP says that the current request is coming from the real [email protected], you give them access. And Google's public OAuth instance will confirm this even if the current Google Workspace associated with example.com is a different one than a few months ago (though the "sub" field of the attestation will be different than before, which the service provider is supposed to check).
Second of all, even if it was recognized that this is a different [email protected], they'd still have access to all sorts of company internal resources, that may still contain sensitive data, especially for small companies which inherently trust all employees.
Google's public OAuth is intended for situations where you want to allow anyone with a Google account to login to your application, regardless of workspace. Given that, it is not a Google OAuth vulnerability that a user can login to your application using a Google account from a different workspace.
If you want to integrate Google SSO into your application and restrict logins to a specific workspace, there are other options you can use that will actually check if the user belongs to that workspace (SAML or internal OAuth).
To the extent that service providers are using Google's public OAuth and then trying to read the ___domain name out of the returned ID token in order to restrict logins a specific Google workspace, they are using Google's public OAuth instance for a situation it was not intended for because ___domain names do not map 1:1 to workspaces. However that is a vulnerability on the service provider's side, not Google's.
To the extent that a service provider offers Google SSO integration via Google's public OAuth but also via workspace restricted options, if a company selects the former instead of the latter then the responsibility for the vulnerability is on the company's side. (Slack, for example, offers both because there are some Slack groups where allowing access from any Google account makes sense, and other groups where you would want to restrict access to Google accounts from a specific workspace.)
Where does this “directory of active users” exist? If it is controlled by slack, then you are relying on a failed startup to properly notify ALL the 3rd parties when they shut down. Failed startups don’t always shut down cleanly like that.
> Failed startups don’t always shut down cleanly like that.
Agreed, and with the number of services and the "ease" of oauth it's likely impossible to even track. You could make a list of the major ones, but there could be hundreds per user, ultimately thousands of unique services used depending on the breadth of the startup's activities.
Yes. If you've set up your Slack so each login checks against the identity provider to ensure an active user is logging in, that would resolve the issue, no?
Even if you take over company.com's ___domain you can't reconfigure company.com's Slack to point to a new identity provider?
I think you may be a bit confused about the players here. When you use Google OAuth to login, it _is_ your identity provider, and it is reporting to Slack that the user exists. Google is reporting the user exists because it exists in the Google Workspace directory. You use this as your source of truth for provisioning users, and they automatically get access to all of your company's apps.
The problem is that even though the user has the same email ([email protected]), and the same Google Workspace ___domain ("hd": example.com), this is actually a _new_ Google Workspace account. But nothing Google provides to Slack allows them to detect this.
Slack, et al can fix this by _not_ using the public Google OAuth integration, and forcing every use to configure an individual internal Google OAuth integration. But they use the public one because Google has said it is a safe and secure way to operate their service.
What I'm suggesting is if you were able to pre-configure Slack to only allow logins for valid users from Google Workspace X, then even if someone creates a new workspace Y with the same ___domain, Slack would still be checking against workspace X. (And similar for non-Google based identity providers.)
And people are telling you that this is not possible with the Google public OAuth API. When Slack asks Google's public OAuth instance if user [email protected] is valid, Google checks with the Google Workspace associated to example.com, and returns to Slack a response saying "yes, that user is valid, here is more information from example.com". This can be the same Workspace or another one, Google isn't really telling Slack apparently.
Now, there is another field called "sub", that should be a unique ID for the Google Workspace or the specific account, but it seems that this is not always stable, per the article, so people integrating with Google OAuth don't trust it.
> And people are telling you that this is not possible with the Google public OAuth API.
Yes I understand, however it is possible to integrate Slack and Google SSO in such a way that it checks that the user belongs to the correct workspace, correct? Either via the SAML integration (https://support.google.com/a/answer/6357481) or an internal Google OAuth integration? The purpose of the public Google OAuth API as opposed to the previous two options is to allow logins from non-workspace or cross-workspace Google accounts, correct?
Only if you use the Google public OAuth integration. If you instead use the SAML integration with Slack as described in the link above you don’t have this problem.
Bingo! Now looking back to your original comment, this is what I was trying to clarify:
> I agree, I don't think this is a problem with Google's Oauth implementation, it's a problem with the service providers who authenticate users via the mere existence of an email address ending in @company.com without checking if the email address actually belongs to an active employee.
It's a problem with Google's public OAuth implementation when used for private workspace accounts, despite Google's docs stating that this is a valid use. :)
> despite Google's docs stating that this is a valid use. :)
I don't think Google's docs actually say this. I assume you are referring to the "hd" claim, but that only says:
"The ___domain associated with the Google Workspace or Cloud organization of the user. Provided only if the user belongs to a Google Cloud organization. You must check this claim when restricting access to a resource to only members of certain domains. The absence of this claim indicates that the account does not belong to a Google hosted ___domain."
It does not say you can use this claim to restrict access to members of a certain workspace, only for a certain ___domain.
I think certain service providers might have made the assumption that if a user belongs to a certain ___domain that also means they belong to a certain workspace, but that is clearly not a valid assumption.
I think that Google's public OAuth integration is only intended for use in situations where you want to allow logins from any Google account, regardless of workspace membership, and if you want to restrict logins to Google accounts belonging to a specific workspace, you are supposed to use one of the other integrations.
Given all that, I still do not think this is a problem with Google's OAuth implementation. Instead it is a problem with service providers who have incorrectly used the wrong type of Google SSO integration. Or in the case of service providers that offer multiple Google SSO integration options (like Slack), it is a problem with the company for selecting the wrong one.
>> I think certain service providers might have made the assumption that if a user belongs to a certain ___domain that also means they belong to a certain workspace, but that is clearly not a valid assumption.
> If you need to validate that the ID token represents a Google Workspace or Cloud organization account, you can check the `hd` claim, which indicates the hosted ___domain of the user. This must be used when restricting access to a resource to only members of certain domains. The absence of this claim indicates that the account does not belong to a Google hosted ___domain.
Over-moderation is mostly to blame for this, IMO. Too many questions get closed as a duplicate of an existing question without the mod realizing that the answer to the existing question is obsolete and no longer useful.
Yeah, any website that uses JavaScript to load page elements needs to solve this. My hope is that a library like HTMX can help reduce the friction in implementing the best possible patterns for doing so.
See the proposed moveBefore API [1] which is meant to solve those issues at the platform level. In the meantime, htmx has their morphdom extension, which I’d say is a must when replacing any page content with interactive elements within. Aside from that one major gotcha, the main issues with htmx are the poor accessibility of the examples in the docs [2]. In particular seeing so much ARIA without the corresponding keyboard support.