Thanks so much for sharing this article, it reminded me of a fallacy before when we built TiDB cloud (a distributed database service) at first, the fallacy is Transport cost is zero.
We deploy TiDB in three available zone(AZ) in one region in AWS. Each AZ contains a SQL computing server, and a columnar storage replica server, sometimes the computing server will send request to the replica server in another region and get the data result, if the data acquisition volume is high, this will cost us much money. Unfortunately, we only found out about this when we received the bill from AWS. After that, We start optimizing the size of the transferred data :sob:
I have to say, this problem has not been well solved. But we've still been doing something:
- We used gRPC with no compression before, and we tune an acceptable compression level to achieve a balance of performance and data volume.
- Improve our SQL optimizer to consider the network cost more accurately and data placement rule to balance data more logically to reduce data transfer crossing AZ.
- As a cloud service, we will also expose the cost transparently to our customers like MongoDB Atlas does. :-)
There are "clouds" with lower networking costs, like Digitalocean/linode. But they are not "capable" enough to support a sizable company. Actually, networking is like a "tax" on top of the whole platform.
One of the real problems is that cloud providers can just waive cross-AZ traffic for their own managed services but charge a lot for cloud customers.
> One of the real problems is that cloud providers can just waive cross-AZ traffic for their own managed services but charge a lot for cloud customers.
That is why no matter how smart your software is they'll always have the upper hand and build things better themselves (why everybody using the BSL license).
There is one administrator. I always found that fallacy the hardest to realize.
It’s like owning the entire supply chain. I remember this bit from someone who used to work at Heroku. It’s impossible for us to switch we spent so much time making the system work with AWS, switching or adding another provider would be a kin to starting over.
What can everyone do before that "new cloud" happens? The approaches like the above post mentioned how their cloud database tries different things from technology side seems the right and practical way to pursue though.
emm... moving and rebuilding whole systems onto another cloud is another wallet-ache thing, but yes, things may change if everyone votes with their wallet. The reality is that the big players like Snowflake or Databricks can have more bargain power than smaller players on the market, and as long as those big players stay with these providers, it may not change the game.
Some Conway’s Law hooey on my current job means I don’t even have that information. I’m doing work to reduce cluster size and maybe we’d be better off if I worked on cross-AZ chatter instead. No idea. Suspicions, yes. Dark, cynical suspicions, but no idea.
We deploy TiDB in three available zone(AZ) in one region in AWS. Each AZ contains a SQL computing server, and a columnar storage replica server, sometimes the computing server will send request to the replica server in another region and get the data result, if the data acquisition volume is high, this will cost us much money. Unfortunately, we only found out about this when we received the bill from AWS. After that, We start optimizing the size of the transferred data :sob: