I have to admit, I haven't done any benchmarking against the existing FDWs for Clickhouse.
But I actually wrote the Go FDW 2 years ago(around Dec 2018). ;)
There weren't any Clickhouse FDW available at that time and I probably would've tried them as well.
Now I just got around to write the blog post and convincing the team to release the code.
Though I have a suspicion that the percona FDW might win in the benchmarks as they won't have to pay the penalties when crossing the Go land to C land.[1]
Thanks for the reply. I really appreciate your post and the additional FDW. If you ever decide to give it a go, a benchmark would be interesting, if for no other reason than to see how much penalty cgo actually incurs vs possible efficiency gains in your implementation strategy.
Yeah, the original Percona FDW hasn't been active for a while, only had a few commits after the first release and was missing quite a few features. Also doesn't support anything more recent than Postgres 11.
So the "adjust" CH FDW is the only usable choice today, thankfully quite actively maintained, primarily by the Adjust devs. I'd presume it's used for their production load.
I sometimes fondly remember my time as an office hero, using MSAccess to attach to just about any data source imaginable, copy to a temptable and run whatever cleaning was required before loading into the server database.
Good times; almost 25 years ago now. Sometimes I wonder if we're stuck.
It is embarrassingly easy, once you get over the (minimal) hurdle of setting up the ODBC connections, docs over here for MySQL 5.6: https://dev.mysql.com/doc/connector-odbc/en/
In MSAccess, just go to "External Data" -> "New Data Source" -> "From Other Sources" -> "ODBC Databases" (chose whether to link or import the data) and then create a new data source under "Machine Data Sources" using the MySQL driver.
We use them in production at Splitgraph [0] to power our DDN (like a CDN, but for data). We make a PostgreSQL-compatible endpoint available to the public to query any of the tens of thousands of open datasets by referencing them as virtual tables: they're not hosted by us but we proxy to them using Postgres FDWs. When a query comes in, we intercept it and redirect it to a FDW instance that handles query translation and planning from the PG dialect to that of the backend data source.
We wrote an FDW for Socrata-powered [1] government open data portals to query the public datasets that we index in the Splitgraph catalog as a proof-of-concept. However, there are plenty of other FDWs that we're working on integrating to let people add their own backend data sources (RDS, Snowflake etc).
FDW plugin quality varies (some of them can't push down all predicates or JOINs) but it's definitely an interesting way to think about accessing data. We also added a lot of scaffolding around foreign data wrappers in our open-source tool [2] that makes it easy to add a FDW-managed data source to a PostgreSQL instance.
Then you might be even more surprised to find out that Foreign Data Wrappers are Postgres' implementation of the ANSI SQL standard extension known as "SQL/MED" [1], where MED stands for "management of external data".
I spent a number of years working on a data federation/virtualization engine - and SQL/MED is very much related to that.
It's actually a relatively unknown topic by many software/data engineers I have worked with, but things like GraphQL federation (example, Apollo GraphQL) or some of the more popular tools such as Presto, Dremio, Denodo, etc.) are where more advanced versions of this are today.
SQL/MED and what Postgres can do is quite cool, but just know that any time you have system boundaries you cross (e.g. between heterogenous systems), things like joins, data types, and many other things becoming a bit more difficult - or you just have to think about them more. But very cool tech.
I've used SQL/MED in Postgres, FDW, linked servers in SQL Server, database links in Oracle, and more advanced virtualization/federation engines also.
If you haven't been exposed to this area before, highly recommended as another tool to know about for the toolkit.
There are generally two classes for FDWs: Postgres<->Postgres and Postgres->Everything else.
The first one is generally suitable for production and is very useful for sharded Postgres when you want to communicate across shards without having to go back out through the application.
The second one's mileage really varies. Some implementations might or might not be prod ready or mig target only specific version combinations. Can be very useful for data engineering or analytics use cases for quick ETL into a staging database. Or for data migrations between database vendors.
Ville Tuulos - How to Build a SQL-based Data Warehouse for 100+ Billion Rows in Python
PyData SV 2014 -
In this talk, we show how and why AdRoll built a custom, high-performance data warehouse in Python which can handle hundreds of billions of data points with sub-minute latency on a small cluster of servers. This feat is made possible by a non-trivial combination of compressed data structures, meta-programming, and just-in-time compilation using Numba, a compiler for numerical Python. To enable smooth interoperability with existing tools, the system provides a standard SQL-interface using Multicorn and Foreign Data Wrappers in PostgreSQL.
Currently using this one https://github.com/pramsey/pgsql-ogr-fdw for an image migration project. Using it to pull image metadata thru ODBC. While it's primary design is for GIS data, it works well for any ODBC database.
I have done something similar. For a software, which take shared libraries - usually written in C - as plugins, I wrote such a plugin in Go. This was a very good experience. It took only little work to set up the C compatible Go functions realizing the API, but the rest I could implement in Go which made the life so nice. I also ended up calling back into some of the applications APIs from Go, that worked seamlessly.
Go has good facilities interfacing with C, the only attention you need to pay is properly handling C pointers (manual memory management) vs. Go pointers (automatically managed via the GC). But with very little care this is not a big issue. The Go part of the code is however much nicer than if you had to implement the functionality in C. (Yes, I do think that Go is a great C replacement)
I only just discovered foreign data wrappers today but it seems like you can write these in python though right? Might be easier than using Go to interface with C.
So, naive question, and I know the concept of a ‘database’ is a bit different between MySQL and Postgres, does Postgres require a FDW to communicate between multiple databases on the same server?