Hacker News new | past | comments | ask | show | jobs | submit | sehrope's comments login

The intended usage is that the client tells the server, "I want to load data from a file /path/to/data.txt on my local filesystem" in a SQL command. As part of the protocol for executing the query the server sends a message to the client to request the contents of /path/to/data.txt. Unfortunately client's don't validate the file request and will send any file (ex: /path/to/secrets.txt) even if there was no legit data request in their command.

This has been an issue with MySQL client drivers for years. I found and fixed the same issue in MariaDB Connector/J (JDBC driver (wire compatible with MySQL databases) in 2015. It rejects LOCAL DATA requests from the server unless the client app preregistered an InputStream (Java interface for generic stream of bytes) as data for the command being executing.

This is one of the many many reasons I love open source database drivers. I was able to find and fix this issue only because I could see the source code. Similar "features" in proprietary databases could go unnoticed for years and even when discovered may not have feature flags to disable them.


Similarly, MySQL Connector/J also used to attempt to deserialize binary data that looked like a serialized Java object (CVE-2017-3523). Doing this with untrusted data can often be used to obtain arbitrary code execution. Connecting to an untrusted server does not appear to be a use-case that received enough attention.


This seems like a weird design choice. Why would you need to load a file from the file system as part of a select?

Unless I'm missing some kind of use case this seems like a bad protocol design.


Loading a CSV is a common use case. PostgreSQL has a similar \COPY command used for a similar purpose (but that's a client side command not server side as far as I know),


psql’s \copy command utilizes the server’s COPY functionality, which absolutely can read and write files on the server or run commands there [1].

  COPY with a file name instructs the PostgreSQL server to directly read from or
  write to a file.  The file must be accessible by the PostgreSQL user (the user
  ID the server runs as) and the name must be specified from the viewpoint of the
  server.

Clients using COPY, including psql’s \copy, often pass STDIN or STDOUT as the “file”, which allows data to be transmitted over the wire rather from the server’s filesystem.

https://www.postgresql.org/docs/11/sql-copy.html


https://www.postgresql.org/docs/9.0/sql-copy.html

PostgreSQL COPY can read or write to server files, though it requires superuser permissions.

There is also COPY FROM/TO STDIN/STDOUT, which allows the client to send the files on the same connection.


Performance reasons.


I cannot agree that this is the example of open source advantage.

For me it is an example of bad protocol design in the first place.


Bad protocol design occurs in both OSS and proprietary.

However, with proprietary software the protocol is unknown unless it has been published. With OSS, you at least have the source code of the implementation.

As you should know, proprietary software relies on the owners to fix the problem. With OSS, "anyone" can provide a fix - and even if the owner does not wish to include the fix in the official build (which would look very bad on them, in this instance), "anyone" can apply it to their own copy.

Meaning, it's vastly easier for a 3rd party to discover and fix OSS, than proprietary software.


"However, with proprietary software the protocol is unknown unless it has been published" this is not true. Proprietary software does not necessarily mean opaque protocols. It's chicken and egg question.

1) IIS is proprietary server, but speaks open HTTP protocol. Proprietary software may implement well known protocol. This is probably most of the cases. 2) SQL Server is proprietary software, but speaks documented protocol - TDS. Specification is published. 3) Oracle Database is proprietary software and speaks undocumented TNS protocol.

Protocol is just a specification. If design meant to be secure it is way better.

There are proprietary HTTP clients, but no HTTP server can request file from client. So HTTP protocol is better than MySql protocol. If someone will write custom open source MySql client it will probably be affected. So this is bad design. If someone will write custom open source HTTP client it will not be affected. So this is good design.

Open source does not overweight bad design. I see no sense in "open source v. bad design". Bad design is bad design, no matter what the license is. There is nothing good is keeping bad software alive just because it's open source. The fact that you can play with code and fix security bug is very nice at most. The fact that protocol was misdesigned is paramount.


Right? I said "the protocol is unknown unless it has been published". HTTP has been published.

No one has said nor implied that OSS had any effect on the protocol.

The assertion was, the patcher was thankful it was open source. If it were not, (s)he would not have been able to fix it.


It's strange to blame the client, while it's actually the fault of the server. A DB server should not be able to pipe an arbitrary file to the client.


You've got it backwards - the DB server can pipe an arbitrary file from the client. So it is considered the fault of the client - it should not allow that. Since the mysql client is the one receiving the request and it should apply standard security practices by not blindly trusting an incoming request and instead validating that the path is equal to an earlier client load request sent to the server. (Although a better approach IMO would be to modify the wire protocol so the server "request" does not use the file name, but instead uses an ID from the earlier client request)


You misunderstood what's happening here. A rogue server can request the client to read any file on the client's file system, and the client will comply without validation that the client actually requested this.


That’s not what this is about. The intended use is: Client tells server to load a file, server sends request for file, client sends file. Except that the client will send the server whatever file it requests. In fact the client doesn’t even need to tell the server to request a file. The server can just request whateve file it wants whenever it likes and the client will send it.


Uh. Ok. "I was able to find and fix this issue only because I could see the source code." This is how all security issues happen. If I was as terrible person, i would create scripts that pray upon people that didnt' patch.


People find vulnerabilities in closed source software too, they just don't have the source to patch it.


Actually, I think there are cases where people have patched closed source software. Binary patches are possible, just harder to write. (Not trying to say you're wrong: it's definitely easier if one has the source. Only that some humans are both determined and skilled, and some incredible stuff comes from that combination.)

IIRC, there was a flaw in Flash patched in this manner; it was using memcpy(), which requires the source and destination regions to not overlap, but, they did. In this case, it's fairly simple: one just needs to call memmove(), which conveniently takes the same args in the same order.

(IIRC, there was a lot of consternation getting Adobe to fix that properly, given how obvious the bug was.)


> PostgreSQL 11 introduces SQL stored procedures that allow users to use embedded transactions (i.e. BEGIN, COMMIT/ROLLBACK) within a procedure. Procedures can be created using the CREATE PROCEDURE command and executed using the CALL command.

I'm pretty sure this is going to be my favorite feature in v11 as it will facilitate conditional DDL for the few remaining things at aren't already allowed in transactions, in particular add values to enums.

Prior to v11 you could create new enums in a transaction but could not add values to an existing one. IIUC how this v11 feature works, you could now have a stored procedure check the current value list of the enum and decide to issue an ALTER TYPE ... ADD VALUE ... at runtime, optionally discarding any concurrency error. All of that could run from a SQL command, no external program needed.

> Prior to PostgreSQL 11, one such feature was using the ALTER TABLE .. ADD COLUMN command where the newly created column had a DEFAULT value that was not NULL. Prior to PostgreSQL 11, when executing aforementioned statement, PostgreSQL would rewrite the whole table, which on larger tables in active systems could cause a cascade of problems. PostgreSQL 11 removes the need to rewrite the table in most cases, and as such running ALTER TABLE .. ADD COLUMN .. DEFAULT .. will execute extremely quickly.

... and this is a close second favorite. Having NOT NULL constraints on columns makes schemas much more pleasant to work with, but for schema migrations that means requiring DEFAULT clause to populate existing data. While there are ways to slowly migrate to a NOT NULL / DEFAULT config for a new column (e.g. add column without constraint, migrate data piecemeal, likely in batches of N records, then enable constraint), having it for free in core without rewriting the table at all is simply awesome.


Feature author here. Running not-allowed-in-transaction-block DDL, such as VACUUM, still won't work in stored procedures. Room for future improvement.


I use DO for this sort of thing.


You also cannot run VACUUM inside a DO block. It's the same thing underneath.


No, but I use it for the sort of DDL that's missing IF [NOT] EXISTS syntax.


> ... and this is a close second favorite. Having NOT NULL constraints on columns makes schemas much more pleasant to work with, but for schema migrations that means requiring DEFAULT clause to populate existing data. While there are ways to slowly migrate to a NOT NULL / DEFAULT config for a new column (e.g. add column without constraint, migrate data piecemeal, likely in batches of N records, then enable constraint), having it for free in core without rewriting the table at all is simply awesome.

I'm not 100% sure but I don't think that's what it says. If you create a nullable column with a non-null default it won't rewrite the whole table anymore. You could get this behaviour in PostgreSQL 10 by creating the column and setting its default in separate DDL statements (all in a single transaction).


> You could get this behaviour in PostgreSQL 10 by creating the column and setting its default in separate DDL statements (all in a single transaction).

But that'd mean that the existing columns wouldn't have the DEFAULT value. And thus manually would have to update the whole table. Whereas the facilities in 11 set it on all columns without rewriting the whole table.


Hashing a value from Math.random() with a cryptographic hash (i.e. SHA256) doesn't make it cryptographically random[1].

If you want a random string get one directly via crypto.randomBytes(...)[2]:

    const id = crypto.randomBytes(32).toString('hex');
[1]: https://github.com/bluzi/jsonstore/blob/87af0d3ef6bf11222b98...

[2]: https://nodejs.org/api/crypto.html#crypto_crypto_randombytes...


Why not use uuids?


UUIDs are fine too. What matters is how they're generated.

If you're generating v4 UUIDs server side using the "uuid" NPM module then you're fine as internally it's using crypto.randomBytes(...)[1] with an almost 16-byte random string (UUIDs are 16-bytes but a proper v4 UUID has to override some of the bits to conform to the spec[2]).

If you're rolling your own UUID function or generating them client side then they may not be as random as you think. For example the same uuid NPM module silently uses Math.random()[3] on the client side if it can't find a better alternative. It's fine for something purely local to the one browser but I wouldn't rely on it being unique globally.

[1]: https://github.com/kelektiv/node-uuid/blob/17d443f7d8cfb65a7...

[2]: https://github.com/kelektiv/node-uuid/blob/17d443f7d8cfb65a7...

[3]: https://github.com/kelektiv/node-uuid/blob/17d443f7d8cfb65a7...


These need not be cryptographically random either.


Thanks, feel free to create a pull request.


What's the min version of the Postgres server this can be used with?


WAL-G uses non-exclusive backups so at least 9.6


Are there plans to support exclusive backups / older PG versions ?


Not right now. Maybe use WAL-E until you upgrade?


I've used WAL-E (the predecessor of this) for backing up Postgres's DB for years and it's been a very pleasant experience. From what I've read so far this looks like it's superior in every way. Lower resource usage, faster operation, and the switch to Go for WAL-G (v.s. Python for WAL-E) means no more mucking with Python versions either.

Great job to everybody that's working on this. I'm looking forward to trying it out.


python versioning can be a bitch, but virtualenv/virtualenvwrapper make 99.999999% of the problems go away.

i have a huge ETL pipeline at twitch that relies heavily on wal-e, and installs it on worker nodes and things like that.

that being said, if WAL-G is faster, i don't care waht it is written in, and am happy to use it


Every time I saw that mentioned, my mind goes to the movie. https://g.co/kgs/d8G9u5


I don't think it's a coincidence WAL-E is named as it is :)


Finally!

Having dealt with many database drivers over the years, I can say first hand that closed source drivers are the bane of my existence. The only thing worse than running into a deep-in-the-stack bug in a database driver is one that you can't correct, let alone debug properly.

Regardless of your views on closed source software on the server side of things, it's a disservice to your customers to not have open source client side drivers. The secret sauce is always on the server side so it doesn't buy you anything. At best there's (very poor) argument for security through obscurity or you're doing something silly like hiding server side features behind client side feature flags.

Plus the more tech savvy of your users will directly contribute back to improve the driver. This is both for fixing existing bugs and adding features to support existing database server features. In this particular case, I've got a laundry list of things I'd like corrected / added / improved in the SQL Server JDBC driver. Had it been open sourced a few years back, I would have already done them myself.

Unless there's a legal restriction for not doing so, like say not owning the original source, there's no good reason not to open source the client side of things.


> the bane of my existence

This reminds me far too much of the .NET DB2 drivers. What a steaming pile, not to mention having to navigate the labyrinth of the IBM download site to even find it in the first place.


You know what's worse? DB2 for iSeries. Pisses me off to no end that pulling JTOpen from Maven is crazy easy, but if you want to access your data from any other language prepare to shell out thousands of dollars for "DB2 Connect" (per server!) because it's totally incomprehensible that even though you've already payed through the nose for IBM i that you shouldn't be charged for accessing the data anywhere else.


Usually plays out something like this:

"Hi $BIG_COMPANY, we'd like to expand our usage of $SOFTWARE that we've already got a $BIG_MONEY license for. Where can we get client drivers for a new app that will use it?"

"Hi $CUSTOMER, that's great news. Send us a check for $MORE_MONEY and we'd be happy to help."

"Hi $BIG_COMPANY, to clarify, we're already paying $BIG_MONEY for a license for the server and we'd like to expand our usage of it. That means in the longer term we'd be more entrenched into using it and would likely renew your contract."

"Hi $CUSTOMER, that's great news. Send us a check for $MORE_MONEY and we'd be happy to help."


Thank you. I will now need several hours of therapy to get quality sleep again.

Porting a .NET application build for MS SQL to DB2 and dealing with all the ... pecualiarities of the DB2 driver is something I will hopefully never have to do again.


> The secret sauce is always on the server side

Not so much anymore? The tendency is towards smarter clients and possibly dumber servers.

cf Kafka

Also fast switchover in Oracle's JDBC drivers requires some code outside what is required by JDBC api.


Does this allow users to execute custom SQL, in addition to ORM methods, and have it be part of the same transaction?


It is not supported today. But it would't be any problem to implement it. Please create an issue if you want it.


I played around with it quite a bit in early 2014 and even wrote a plugin for it. It gave me a better understanding of Docker and abstracting things into containers. The internals are all bash scripts so it's pretty easy to follow how it works and modify it.

[1]: https://github.com/sehrope/dokku-logging-supervisord


From the link:

    ITERATIONS = 600
    ...
    crypto.pbkdf2 pwd, salt, ITERATIONS, LEN, (err, hash) ->

That's way too small for the number of iterations. Something like 100K would be a better choice.

Alternatively here's a version that uses bcrypt:

    bcrypt = require 'bcrypt'
    rounds = Number(process.env.BCRYPT_ROUNDS || 12)

    module.exports =
      hash: (password, cb) ->
        bcrypt.hash password, rounds, cb

      compare: (password, hashedPassword, cb) ->
        bcrypt.compare password, hashedPassword, cb


From my testing 600 took about 15ms. 100k would take about 2.5 seconds.


For me 600 iterations takes about 3ms (I guess my laptop is a bit faster). A decent range to shoot for is .5-1 sec.

Test program:

    crypto = require 'crypto'

    password = 'testing'
    len = 128
    salt = crypto.randomBytes(len)

    iters = Number(process.argv[2] || 600)

    console.log 'Testing iters=%s', iters
    for i in [1..10]
      start = Date.now()
      crypto.pbkdf2Sync password, salt, iters, len
      elapsed = Date.now() - start
      console.log '   Test #%s - %s ms', i, elapsed
Output:

      $ coffee pbkdf2-test.coffee 100000
      Testing iters=100000
         Test #1 - 497 ms
         Test #2 - 510 ms
         Test #3 - 496 ms
         Test #4 - 525 ms
         Test #5 - 510 ms
         Test #6 - 493 ms
         Test #7 - 521 ms
         Test #8 - 518 ms
         Test #9 - 510 ms
         Test #10 - 498 ms

      $ coffee pbkdf2-test.coffee 10000
      Testing iters=10000
         Test #1 - 54 ms
         Test #2 - 50 ms
         Test #3 - 50 ms
         Test #4 - 55 ms
         Test #5 - 51 ms
         Test #6 - 52 ms
         Test #7 - 50 ms
         Test #8 - 49 ms
         Test #9 - 51 ms
         Test #10 - 50 ms

      $ coffee pbkdf2-test.coffee 600
      Testing iters=600
         Test #1 - 3 ms
         Test #2 - 3 ms
         Test #3 - 3 ms
         Test #4 - 3 ms
         Test #5 - 3 ms
         Test #6 - 4 ms
         Test #7 - 3 ms
         Test #8 - 4 ms
         Test #9 - 3 ms
         Test #10 - 3 ms


Nice, I was actually testing on my server hardware which is obviously lower end. This is hopefully useful for people though.

For me 1 second seems pretty aggressive, that's CPU time/latency per login.


> ... and if streams become a better and better option as response sizes get larger and larger. The streams syntax is nicer looking in any case.

fs.readFile(...) reads the entire contents of the file into memory. For small files that may be acceptable but if you're dealing with anything that could be large it's not going to be very pleasant or even work properly.

I just tried it out on my laptop for a 100MB and a 1GB file. For a 100MB file using streams is about 25% slower. However the sync method failed for the 1GB file:

    Style            100MB       1GB
    =====            =====       ===
    fs.readFileSync  .176s    <failed>
    streams/pipe     .234s      1.25s


By adapting the highWaterMark option one could probably tweak it even further.

Default buffer size seems to be 64k: https://github.com/joyent/node/blob/912b5e05811fd24f09f9d652...

Stream buffering: http://www.nodejs.org/api/stream.html#stream_buffering


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: