Hacker News new | past | comments | ask | show | jobs | submit | WinDoctor's comments login

Pull Requests originate from a low trust environment (opensource) and bring very costly context switching when forced into high trust environments (PRs reviewed by team-members)


> why the SQL API even allows injection vulnerability

How would one implement this?

"SQL APIs" use prepared statements. Meaning you have a string for SQL and some dynamic variables that inject into that string via $1, $2 etc.

BUT now if developer makes that string dynamic via a variable, then you have SQL injection again.


> How would one implement this?

The low-level API could simply not allow SQL statements as strings, and instead provide separate functions to build the queries and statements.

It would provide entry points which could be used to ensure proper escaping and such, and would still allow for easily generating queries dynamically in the cases where that is needed.

Of course, it doesn't completely guard against Bobby Tables[1], one could imagine someone including a run-time code generator and feed it unprotected SQL as input.

But it should make it a lot more difficult, as it would be much more "unnatural", requiring going against the grain, to inject unprotected user data. Also, the "query_execute" function could raise an error if there's more than one statement, requiring one to use a different function for batch execution.

Pseudo-codish example off the top of my head, for the sake of illustration:

   is_active = str_to_bool(args['active']); // from user
   qry = new_query(ctx);
   users_alias = new_table_alias(qry, 't');
   query_select_column(users_alias, 'id');
   query_select_column(users_alias, 'username');
   query_from_table(users_alias, 'users');
   filter_active = query_column_eq_clause(users_alias, 'active', is_active);
   where = query_where(qry);
   query_where_append(where, filter_active);   
   cursor = query_execute(qry);
[1]: https://xkcd.com/327/


"Gee, this new programming language / API makes it hard to copy my SQL queries across. Better use something else."


If that's all what the datanase drivers supported...


Easy. Don’t write queries in a language (SQL) which interpolates content without escaping it for the enclosing structure.

Go one level up.

For example statements that are prepared should not allow strings in the SQL, but rather variables, and then bind them to values like PDO does


It would be a bit annoying to have to prepare outside and pass in every SQL literal you need to use in your query.

I'd rather have SQL API taking not strings but a special type that string can't be directly converted into without escaping (by default).

In C++ tagged literals could be used to create this special type easily. Similar constructs exist in some other languages


Literally a library can generate SQL statements and compile them

JS and PHP has tagged literals

But they have to be “escaped” properly before being interpolated!


That's the whole point of having a separate type for queries. Whenever you try to glue a string to a query the string gets escaped.


Literally what PostgreSQL does.

E.g. you can do manual selects from internal tables to see the same content as `\dt` command for example.


Oh, cool. Where's the schema?


Everything in PostgreSQL is a role.

It's just named such that when a ROLE allows `login` it's considered a user


Someone was feeling very clever when they came up with this idea.


It makes a lot of sense, especially since in the SQL standard, you can grant privileges to "users" or "roles."

Might as well simplify the mental model and make them the same.


This is part of what many people find so confusing. In most systems “role” is a group (or something closely resembling a group), not a user. The weird terminology confuses beginners


It's a bit confusing and legacy.

All roles function like you would expect groups to function

A role that is not allowed to login is a `group`.

While the CREATE USER and CREATE GROUP commands still exist, they are simply aliases for CREATE ROLE.


To me, the comment you are replying to is saying that you should^ DIFFERENTIATE roles by service, not ‘end user’.



Oh it's really worrying that you post a link to a SO page as the recommended 'cookbook' for something that's referred to as a powerful feature. And both of the answers only have a single up vote each. Seems like there's a serious gap here.


You can help out by posting answer / upvoting good answer.


I don't think Stack Overflow is a good place for good documentation. I think the Postgres documentation should be better.


I agree with the sentiment that IAM in PostgreSQL is complex.

What makes it complex is that there are 3 layers of objects (Database, Schema, Tables) and also implicit grants given to DB object owners

To be able to select from a table you need:

* CONNECT on the Database

* USAGE on the Schema (Given implicitly to schema owner)

* SELECT on the Table (Given implicitly to table owner)

To see these privileges we need to understand acl entries of this format

`grantee=privilege-abbreviation[]/grantor:`

* Use \l+ to see privileges of Database

* Use \dn+ to see privileges of Schemas

* Use \dp+ to see privileges of Tables

Privileges are seen [here](https://www.postgresql.org/docs/current/ddl-priv.html)

e.g. in the following example user has been given all permissions by postgres role

`user=arwdDxt/postgres`

If the “grantee” column is empty for a given object, it means the object has default owner privileges (all privileges) or it can mean privileges to PUBLIC role (every role that exists)

`=r/postgres`

Also it's confusing when Public schema is used. You have CREATE permission on schema so when the tables are created with the same user you select data with and you have owner permissions out of the box.


> To be able to select from a table you need

> * CONNECT

> * USAGE

> * SELECT

Isn't LOGIN (https://www.postgresql.org/docs/16/role-attributes.html) also needed?

  Only roles that have the LOGIN attribute can be used as the initial role name for a database connection


You can log in and then change your role (if you permissions) with SET ROLE NEW_USER;

We managed to kludge our way to defaulting to read only, then using set role to do writes if you need to.


Good point. Connecting to the Cluster is indeed actually a 4th layer.

There you do need user with LOGIN, valid password & SSL.


> What makes it complex is that there are 3 layers of objects (Database, Schema, Tables) and also implicit grants given to DB object owners

The main privileges systems includes Columns, as well as Databases/Schemas/Tables. You can SELECT from a table if you have been granted SELECT on the table, or if you have been granted it on the specific columns used in your query. ("A user may perform SELECT, INSERT, etc. on a column if they hold that privilege for either the specific column or its whole table. Granting the privilege at the table level and then revoking it for one column will not do what one might wish: the table-level grant is unaffected by a column-level operation." [1])

There's also a system of Row Security Policies [2].

[1]: https://www.postgresql.org/docs/current/sql-grant.html

[2]: https://www.postgresql.org/docs/current/ddl-rowsecurity.html


I've seen funny things happen when developers think their GRANT is doing what ALTER DEFAULT PRIVILEGES does, and a day later break production because their user doesn't have access to the new tables.


What do you mean exactly?

If roles have INHERIT, then doing the following works, no?

* Role A creates table * GRANT A TO B; * ROLE B can read from table just like A can.

Also if Role A creates new table, Role B can read that too no?


Remove any "if" from your comment and it seems like you understood what would fail! There's more ways to make misunderstanding grant vs default privileges will create issues as well.


Then I would argue for trying to centrally ensure that all roles have INHERIT in your DBs. :thinking:

Doing ALTER DEFAULT PRIVILEGES could be another future footgun of it's own.


I think you probably don’t want the application user to inherit from the superuser you may use for migrations since some migrations require a superuser.


IMO, the permission management of the MS SQL Server is the best one around. It gets all the basics right, without any big issue. Other DBMS should copy it, or use it as a starting point if they want to go further (and there is plenty of stuff to improve on it).

TLDR, the container objects and the contained ones all share the same kind of permissions. Permissions of the container are applied to the contained unless explicitly changed.

So, if you grant select on the schema dbo to a, a will get select on all tables there. If you want to remove some table, you revoke the select on that specific table. And there is both metadata to discover where a specific privilege comes from and specific commands that edit the privileges on a specific level.


> data is effectively modelled in a nosql style.

What do you mean by this? Any references I could read?


You can probably search around for references on distributed data management. But basically, when your data is split across different instances, you can no longer do joins or transactions or rely on the database for acid guarantees. All these things must instead be handled in code which is exactly how you would work with data using a nosql database as well.


This might help: https://microservices.io/patterns/data/database-per-service....

Varies by actual DB but: with this architecture, if the DBs are NoSQL then you didn't lose as much by splitting up the databases, but if they were SQL then among other things you lose transactions across those DBs (or now need costly distributed transactions), effectively making the collective SQL system behave more like a NoSQL DB.


I think he means the data is not relational anymore and it is not normalized.


Probably means that the data isn't modelled at all, just a dump of the application architecture.


I would assume Putin hopes that a war unites the Russian people and increases his own power more than it loses trust.

It's a gamble for sure.



Incredible! :)

What do you mean by hand-tuned Postgres & hand-tuned ZFS. What did you change and why?


If I remember correctly, for ZFS it was some stuff about cache size and RAM usage, since we were in a VM and couldn't be too greedy. Something about a NVME drive needing slightly different settings. I didn't spend much time on this. For Postgres it was so so so many things, I played with it on and off for months, benchmarking certain key queries and such. And I read a LOT of blog posts about how to make Postgres and ZFS work together ideally, one thing I remember in particular was waffling back and forth about logbias=throughput vs logbias=latency. Just google "postgres zfs logbias" and you'll get MANY conflicting opinions. (edit: when I google it now in incognito, my reddit post is at the top lol)

But for what I actually changed, off the top of my head, I set the recordsizes in ZFS large enough that Postgres could safely (because of ZFS CoW) have full_page_writes off, and combined with synchronous_commit off, that really sped up the overall system and made the WAL logs much smaller. After looking just now at postgresql.conf, various other things were tweaked, such as seq_page_cost, random_page_cost, effective_cache_size, effective_io_concurrency, max_worker_processes, default_statistics_target, dynamic_shared_memory_type, work_mem, maintenance_work_mem, shared_buffers, but those were not quite as important. (plus some uninteresting tweaks to WAL behavior, since we had replicas that got WAL logs shipped every few minutes with rsync)


Thanks for the in-depth response!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: