A database for 2022

tptacek · on April 2, 2022

This thread has (at least at the moment) serious Bob Martin Sudoku Solver energy to it. Tailscale has solved an infamously complicated problem using, for the most part, simple tools. They're not just successful; they're remarkably successful, spookily successful, upsettingly successful.

Consider whether the secret sauce here might not be au courant database choices, but rather something much harder for random teams to pick up and run with: just straight up good design and programming.

This is a thing that happens in other fields, too; some of the hardest dishes to cook seem extremely simple and have few ingredients. They're hard because there's no place to hide: your technique has to be flawless or they don't work. A famous entrance exam for new cooks at fine dining restaurants is "make an omelet". Tailscale is, thankfully (and, to me, irritatingly), more like an omelet than a truffled hash.

ignoramous · on April 2, 2022

> This thread has (at least at the moment) serious Bob Martin Sudoku Solver energy to it.

To the forum's credit, the top comment (by u/judofyr) the last time tailscale blogged about their unusal db setup, got it exactly right: https://news.ycombinator.com/item?id=25768042

> Interesting choice of technology, but [tailscale] didn't completely convince me to why this is better than just using SQLite or PostgreSQL with a lagging replica.

> In particular [tailscale has] designed a very complicated system: Operationally you need an etcd cluster and a tailetc cluster. Code-wise you now have to maintain your own transaction-aware caching layer on top of etcd. That's quite a brave task considering how many databases fail at Jepsen.

> Considering that high-availability was not a requirement and that the main problem with the previous solution was performance ("writes went from nearly a second (sometimes worse!) to milliseconds") it looks like a simple server with SQLite + some indexes could have gotten [tailscale] quite far.

tptacek · on April 2, 2022

I did commit the cardinal sin of HN metacommentary and referred to the state of the thread at a point in time as if it was a durable property of the thread, which was a mistake I own and take full responsibility for.

reilly3000 · on April 2, 2022

If anything, this ^^^ is how you win the long game. History will surface itself when you make assertions. Gracefully revocability scales.

0des · on April 2, 2022

At least he's not the Dropbox guy

techdragon · on April 2, 2022

This and the “no Wi-Fi less space than a Nomad” comments on slash dot about the iPod are my favourite examples of “you don’t get consumers” comments.

ignoramous · on April 2, 2022

to quote u/InGoodFaith (news.ycombinator.com/item?id=29555814):

> Wouldn't also be HN without the misinterpretation of BrandonM's response. Here is dang's comment about that topic/meme: https://news.ycombinator.com/item?id=29178442

0des · on April 2, 2022

As someone who was there to witness it when it happened, that is not the impression I got at all, and I think Dang may have taken the more humble interpretation in that remark.

stavros · on April 2, 2022

You don't need to have been there, though, as anyone can read the perfectly-preserved exchange even now.

0des · on April 5, 2022

Lmao Stavros we should share a laptop or something or just speak over the phone the way we run in to each other so often.

Its different reading it at the time because it factors in the context of the community at the time as well as the lived experience of what it was like at the time to be a member or participant in given discussions. With that contrast, you can then get an idea of the intent of those comments beyond just the words, as they are often (and in this case) reinterpreted through the lens of current day contexts.

Example: ask what people think of the hit TV show The Apprentice today in the present, then hop in a delorean and ask someone before the trump presidency or candidacy what they think of the show.

stavros · on April 5, 2022

I guess that's true, the cultural context is different now.

You can call me anytime <3

recuter · on April 2, 2022

And some things that should not have been forgotten were lost. History became legend. Legend became myth...

stavros · on April 2, 2022

The Dropbox guy admitted he was wrong too, but we won't let him live it down. At least we don't remember his name.

Issaclabs · on April 2, 2022

What does that mean? Can you explain?

Operyl · on April 2, 2022

This comment trivializing Dropbox: https://news.ycombinator.com/item?id=9224

pooper · on April 2, 2022

I for one do not think the comments trivialized anything. I see these as inevitable questions that a leader like Drew Houston should be prepared to answer at some point. Think of it this way, would you disqualify me if you were hiring and I asked a similar question at the interview?

I think the underlying actual question for Dropbox to ponder (but perhaps not answer publicly) was "what is your moat?" without the business development language and at least for me that remains an open question.

I remember I was in college when Dropbox started and I thought Dropbox was silly to offer more storage for inviting more users. It was easy to approach people and ask them to try something in college. I think I almost maxed out my invites. I remember feeling I was taking Dropbox to the cleaners...

One activity question for HN: why do you think Dropbox continues to exist today while copycats like Barracuda Copy have failed?

sgbeal · on April 2, 2022

> One activity question for HN: why do you think Dropbox continues to exist today while copycats like Barracuda Copy have failed?

FWIW, i've been using Dropbox since practically the day they went online, and have been a subscriber for all but the first few months of that time. They've never once failed me, never been offline when i needed them, never lost anything, they created a Linux sync client out of the box (whereas Google first claimed the would make GDrive for Linux but never did), and it's easy to manage via multiple interfaces (mainly their own web UI and https://rclone.org, which can proxy your dropbox behind an add-hoc FTP server running in your LAN).

They're extremely pricey compared to the same storage in GDrive, but Dropbox has never let me down so i'm in no hurry to drop them.

belter · on April 2, 2022

That comment is unfairly judged, taking into account how much they used AWS instead of their own solution.

"...Half-a-billion people stored files on Dropbox. Well, sort of. Really, the files were in Amazon’s cloud. .." [1]

[1] "The Epic Story of Dropbox's Exodus From the Amazon Cloud Empire"

https://www.wired.com/2016/03/epic-story-dropboxs-exodus-ama...

emteycz · on April 2, 2022

Why can't a company use cloud? It's not like there's Amazonbox client I could install now and get the same service. Dropbox never was about the storage itself, IMHO - it was about the ease of usage. There were enough options to get storage - cloud or non-cloud - when it came out, but none of them had the easy folder-based sync and sharing (public folder) of Dropbox.

belter · on April 2, 2022

It's not if the company uses cloud or not. It's if they are solving the hard problems or instead, outsourcing the technical challenges to their cloud provider while charging the user a non justifiable premium for the intuitive user interface :-) So it's about the eventual value add and the business case.

The challenging part is to provide the back-end. Get massive storage, with proper checksum integrity, at a cost effective value and with flawless security.

A single command (and minor variations around it):

aws s3 sync <name_source_dir> s3://<name_target_bucket>

get's you all that. It can be easily mapped to an intuitive user interface by a software development boutique in weeks.

And because that is the difficult part, it seems for the first 8 years of their history that was solved for them by AWS, according to their own reported history.

Sharing a folder or file from Dropbox is no more intuitive than using the S3 console. Looking at the very colorful history of security issues around Dropbox [1] it's certainly not safer.

So why would a user pay more for storage costs than with a cloud provider, with notoriously less security? Does a tiny layer on top one or more cloud providers justify the add-on value for the user or the current valuation of 9 Billion dollars?

[1] "Criticism of Dropbox": https://en.wikipedia.org/wiki/Criticism_of_Dropbox

reilly3000 · on April 2, 2022

And for what its worth, systems like minio have emerged due to that commercial innovation. AWS has made a lot of investment in that system and will earn perpetual annuities from it, but by creating a defacto protocol we now get many choices that use its API. Its interesting because AWS never was benevolent with their S3 code, they just created a novel and rigorous offering that became worthy of emulation. Dropbox really hasn't done any equivalent contribution. If anything, the open source projects that have emerged to offer self-hosted alternatives to it are more feature-complete and extensible- such as nextcloud/owncloud, pydio, syncthing, etc.

side note: I'm filled with regret for every document that I authored on Dropbox Paper. It didn't last for long, but there was a time that it was the official wiki for a company I worked at. Its not impossible to export, but never with full-fidelity.

zimpenfish · on April 2, 2022

> Sharing a folder or file from Dropbox is no more intuitive than using the S3 console.

I've not really used the S3 console but, at least on macOS, sharing a file from Dropbox is as simple as "right-click on the file in Finder, Share..." then enter an email address or create a share link from the popup. Or I can right-click and "Copy Dropbox Link" if the direct link is good enough.

Is the S3 console simpler than a right-click?

belter · on April 2, 2022

Even easier and safer, create a signed url that expires after 5 min...Send it to the user...

aws s3 presign s3://awsexamplebucket/test2.txt --expires-in 300

zimpenfish · on April 3, 2022

But now you have to know which bucket the file is in, have the AWS CLI tools installed, use Terminal, copy the URL to messages / email, etc.

How is this easier than "right-click, pick a menu option"?

(If I want an expiring Dropbox link, it's there under 'link settings' in the pop-up. Admittedly only has date granularity but still there as an option.)

belter · on April 3, 2022

Example from the console: "Setting up an Amazon S3 bucket to share files"

https://gist.github.com/pjkelly/be2cd3881e766620a411

zimpenfish · on April 4, 2022

Again, how is this easier than a single right-click in Finder? Also don't those files need to be public to create a link? You can't create a link to a non-public file which grants access either directly via the link or with a password?

samhw · on April 3, 2022

...are we deliberately reenacting the original Dropbox thread here?

Operyl · on April 3, 2022

It usually ends up happening when the Dropbox thread comes up, it’s really interesting seeing both sides rehash the same points.

emteycz · on April 2, 2022

Their fundament is UX, not technology. Their technology could be whatever and users wouldn't care - the users care about syncing their files easily. Few geeks might flame about it on a tech forum, but nothing more.

Not that I like Dropbox too much. I'm going to switch to my self hosted solution. But it's certainly not as easy as using Dropbox, even as someone who uses AWS since 2012 frequently.

danenania · on April 2, 2022

The median Dropbox user doesn’t know what s3 or is what ‘console’ means.

jrockway · on April 2, 2022

People seem really stressed that they've changed databases twice before. Their service never went down and they didn't stop delivering features while all this was happening, so I don't see the big deal. You pick what works at the time, and incrementally evolve as problems and solutions become apparent.

This article could have easily been "why we were down for a day while some auto-vacuum setting was broken." Then everyone would be like "why didn't you just scp a json file between machines" ;)

tptacek · on April 2, 2022

Another comment on this thread tried to dunk on them, saying that they'd done this so many times before (JSON to etcd, etcd to sqlite) that they must be getting really good at it. That comment struck me as so close to understanding what they're doing right here, it was painful.

jrockway · on April 2, 2022

Yeah. Maybe people are overestimating how expensive this kind of change is. We can all imagine being part of a larger org, where one day an All Hands appears on your calendar, and some exec you've never heard of proclaims the beginning of a 2 year multi-team database migration project. So now instead of making nifty software, you are just in 40 hours of meetings a week fighting over small details on this migration nobody wants to do, and everyone is miserable about it. Features stopped getting released, people leave the company, the weekly status meeting never makes the project seem anywhere near completion.

Say "switch databases on a successful project" and I bet people's minds go there. That is indeed expensive and painful, but if you have a small team, it's just being agile.

There is also the "punching up" / sour grapes aspect that can't be ignored. "I would have picked the right database from the start, but nobody's funding my crazy idea." Life just isn't fair, and HN is where you can vent, I guess.

nrr · on April 2, 2022

I have the odd feeling that this kind of overestimation comes from the experience that folks tend to have in writing more traditional N-tier applications, especially post-Rails. A certain kind of database-first, ORM-first design philosophy really took hold after that, and it became somewhat common to see incredulity in response to telling someone that you weren't using something like MySQL or Postgres in production.

The kinds of projects that I more generally tend to see make these migrations successfully are the ones that design with in-memory data structures first and then sort out how best to persist that kind of thing to disk after their ___domain model has solidified a little bit.

Tailscale smells a lot like that to me.

devmunchies · on April 2, 2022

When I used dynamic languages (Node.js and Ruby) I leaned on the database as the structural source of truth for types—it was the rigor a successful project needs.

But as I’ve been using only statically typed languages the last few years (on the backend), I’ve moved to designing the domains in the code, like you said.

powera · on April 2, 2022

Yup.

When Tailscale migrates databases, it is a few engineers working for a few weeks solving problems that already exist.

When a company with 50k employees does it, it takes two years and only makes things worse.

hparadiz · on April 2, 2022

+1 on good design and programming.

My go to language of choice is PHP/MySQL. I've never not been able to scale a project. At one point while working at Comcast I prototyped a system for every error code from every cable set top box (150 million) in the country on a MySQL server instance running on a MacBook Pro pulling data from Splunk in real time. All so I could generate some png charts and embed them in Slack.

I know there are limits to MySQL but so far in my career I haven't had any projects hit those limits because hardware has been getting better faster.

oogali · on April 2, 2022

This comment makes me unreasonably giddy.

The replies are focused on MySQL (and to a lesser extent, PHP). None of that matters.

The hero here is the logging system serving as seekable stream of collected events. (I’m trying hard here to not say K-fka).

By consuming events from that source and being able to perform time-based seeking, you’ve offloaded (or dare I say, eliminated) your scaling pressures.

Even if you commit the “sins” of no indexes on heavy queries, using MyISAM, or building a giant, memory-intensive associative array… none of it matters.

You can choose to wipe your local dataset, start again in a different language, switch to another data store model, and none of your downstream consumers will know nor care (unless you blog about it).

*This* type of design, that prioritizes consumer-facing resiliency and continuity, is what I look for in systems that scale.

BonoboIO · on April 2, 2022

But my Kubernetes Cluster on AWS can handles this for only 30.000$ a month

That Comcast prototype sounds amazing. PHP and MySQL can really bring you very far, i use it for nearly everything and it is so incredible easy to develop and “deploy” new versions. I use propel as ORM and it is very close to the DB and I can change schemes fast and just let the autoupdate handle the rest.

cbg0 · on April 2, 2022

> But my Kubernetes Cluster on AWS can handles this for only 30.000$ a month

A k8s cluster for something like that wouldn't cost anywhere that amount, plus you would get the advantage of being able to deploy your app in multiple availability zones. If you don't need k8s features, there are definitely simpler options that will give you redundancy and won't break the bank.

hparadiz · on April 2, 2022

I only used a MacBook cause it was faster to ask IT for one than to have my ticket processed and approved by the cloud and security team which both required a write up on why I need a vm and what justification I have for needing it. Meanwhile IT is like "here you go".

ransom1538 · on April 2, 2022

"know there are limits to MySQL"

I have gone past mysql limits a few times. Will you go beyond its capability is a simple measure. Can your hardware write fast enough to keep up? If your hardware maxes out writing to disk 200 megs per second, welp mysql maxes out at 200 megs per second. If you need 400 megs per second of writing, well you need two instances that can write independently. That is when you are fucked. At that point you get into sharding* and doing whacky things mysql isn't meant todo. Reading things from disk, mysql's replication scales just fine.

hparadiz · on April 2, 2022

That's what I mean though. I've never hit those limits. These days it's easy to put together a raid 5 with NVMe disks and you'll quickly reach gigabyte per second write speeds.

ransom1538 · on April 3, 2022

What cloud service can you do that on? Seriously asking.

ransom1538 · on April 2, 2022

One of the largest forums online in 2003 was was ezboard. A billion page views per month. Back then - insane - well that was beyond insane. I started work there as a young idiot. "What db you guys using, mysql?...." long pause... "We use a file system." I dug in. Yep. A fucking file system. Files allowed: simple backups, impossible speed, obvious cache systems, partitioning.

Later on i met the roomate of the author of memcache. Few hundred lines of C. Completely changed the game how caching systems worked. Years later a super senior FB engineer explained to me over a beer - "we basically use mysql as a file store, everything else is in memcache." Good ideas are stupid simple.

tptacek · on April 2, 2022

The guy who wrote memcache is on this thread, I think?

hashhar · on April 2, 2022

And he even works at Tailscale now.

ransom1538 · on April 2, 2022

bradfitz lol he is.

dastbe · on April 2, 2022

nobody goes to these restaurants for their omelets, though at the same time nobody should care how tailscale actually implements things so long as it works well.

imo they’re not solving a notoriously hard problem, they’re realizing that they have the tiniest version of this problem (and that will most likely continue to be true). when that’s the case and you want to prioritize stuff that moves the needle like dev velocity, why do anything less simple? getting hung up on building scalable solutions far beyond what you need is somewhere between a fool and a big co employees game. far too often people introduce complexity way before it’s needed, and that almost always slows down your devs ability to actually fix bugs and make the product more reliable.

tptacek · on April 2, 2022

They don't, but there are other "finesse" dishes that people do in fact go to restaurants for that are notable for their simplicity, and the fact that you can only pull them off if you have flawless technique. Japanese, in particular, is notorious for this.

jimbokun · on April 2, 2022

There’s a manga about a food critic (Oishinbou [1]), where he takes another food critic to a hole in the wall Tokyo restaurant as an example of elite cooking. The chef-owner brings him his meal.

Which is just a piece of cooked fish, rice, and miso soup. The critic is first almost insulted by such a simple meal. But then realizes the fish, rice and soup are prepared to perfection, and is amazed at the chef’s skill.

[1] https://en.wikipedia.org/wiki/Oishinbo

mbreese · on April 2, 2022

Which is basically also the ending of the Pixar film Ratatouille. The protagonist wows a critic with a simple dish prepared to perfection. It’s a common trope, but also true. “Simple” food prepared well can be much better than a complex dish with many flavors.

Similarly, I’ve also heard (probably from a show like Top Chef, so YMMV) things like a salad or soup are also quick ways to differentiate chefs. In this case, it’s less about technique and more about identifying and mixing flavors.

It works because there is no place to hide. You either have good flavor or not.

vaughandroid · on April 2, 2022

Please could someone explain the "Bob Martin Sudoku Solver energy" reference?

diarrhea · on April 2, 2022

All I found: https://news.ycombinator.com/item?id=15441323

dastbe · on April 3, 2022

my interpretation is that there is a certain class of people who think "if i just apply the process then i will get a satisfactory result", as bob martin does with using TDD to implement a sudoku solver instead of just thinking through the problem and designing a solution up front. i don't think its a particularly great analogy because the bob martin example is about trying to use a "mindless" process to achieve a desired design, not picking an overkill option.

jldugger · on April 2, 2022

> serious Bob Martin Sudoku Solver energy to it

You mean Ron Jeffries?

morelisp · on April 2, 2022

The Bob Martin Sudoku Solver is the same thing but twice as expensive and replaced what little self-awareness it had with unnecessary sexual innuendo.

tptacek · on April 2, 2022

I might!

carapace · on April 2, 2022

(Not to namedrop, but, uh) I know apenwarr glancingly IRL and he's amazing. A force of nature. When people talk about "10x" programmers I think of him.

rablackburn · on April 2, 2022

I loved this article, despite the negativity.

A discussion that keeps recurring on HN is how you can get a lot done with simple, boring solutions. This is a perfect example of that in practice. What’s the simplest thing that’ll work? Do that, monitor the solution, and when you start hitting a limitation reevaluate what the next appropriate solution is.

There are plenty of comments here deriding this as insanity when we have established best practices, but “best practices for me are not necessarily best practices thee”.

People aren’t perfect, not every engineer comes fully-formed grokking the industry’s best solutions and their trade offs.

This is what learning looks like. This is what an evolving system looks like. And this is how valuable software gets built every day.

Who gives a fuck about scalability on a poc? “Let’s just put in a json file”. I Love it. Personally I start at the “use SQLite” step and go from there because SQLite is such amazing software. Thanks to this post I’m excited to try out Lifestream.

You might rightly point out that Tailscale isn’t a side-project/poc anymore. But that’s exactly why they’re changing to SQLite and writing this post; this is their team learning to deal with larger problems and changing needs, and they’re sharing that journey with us.

As long as they’re not _ignoring_ serious risks let them experiment. But try not to get too upset if they’re making different risk/effort trade-offs than you would. Even if it’s “objectively” the “wrong” thing.

My rubric for technical sanity is:

- are you taking (and testing) backups?

- is your infra/data secure?

- are you monitoring what’s happening?

- can you recover from catastrophic failure in a acceptable time frame?

- are you meeting your legal obligations?

Cover those needs and you can be forgiven a lot suboptimal/experimental implementation details.

ignoramous · on April 2, 2022

> People aren’t perfect, not every engineer comes fully-formed grokking the industry’s best solutions and their trade offs.

The thing is, David Crawshaw is being extremely honest and humble here in admitting their shortcomings and how they're learning from it... but an entire troupe of Silicon Valley engs seem to have lost sleep over it, for reasons beyond me.

tptacek · on April 2, 2022

Are these even shortcomings? They made JSON scale for 18 months, which is an achievement. They seamlessly switched to etcd, which is one of the "right things" for distributed state problems. Like us, they discovered that when you adopt a "distributed state" solution, you inherit all the API limitations those things come with because nothing really solves (or can solve) the "multiple writers multiple readers real-time consistency" problem. Nothing broke, but etcd was a pain to work with: anything you store in etcd, the developers that work with it have to understand the rituals of etcd. Wouldn't it be nice to have that just be a SQL database? Oh, look: that's easy to do: use sqlite, and ship WAL segments.

There is negativity on this thread, but it strikes me as incredibly, even embarrassingly, ill-informed.

ramraj07 · on April 2, 2022

The question is, is this the simple solution? Perhaps I and others have been spoiled by cloud offerings but having set up RDS once and never having to worry much about the db architecture after that for years as we scale has maybe spoiled us.

rablackburn · on April 2, 2022

I’ll be honest and admit my own bias to cheaper, self-managed tooling. I’ll admit that RDS definitely meets the criteria of simple.

From what I understand RDS works great and you’re happy to pay what feels like the tiny cost…right up until it doesn’t anymore.

You’re paying Amazon a premium for them to manage the tedious stuff. But sometimes dealing with the tedious is what teaches you how the stack works and makes you jmuch more prepared to solve those more complex problems down the road.

The uncomfortable fact is businesses finding themselves in situations where they experience vendor-lock-in just because they’ve let their capability wither away.

Clearly David’s point about avoiding MySQL and PostgreSQL due to fears of “vendor lock-in” resonates with me, even if I personally think that’s not quite where the threat is.

…for Postgres anyway. I’ve heard enough stories that I never want to touch an oracle product in my career.

mettamage · on April 2, 2022

I favorited your post. It has taken out some insecurities that I had about best practices. I couldn't explicitly formulate them either. Thanks for the (professional/career) advice/therapy :)

infogulch · on April 1, 2022

They're using litestream [1] to replicate a SQLite database, which streams additions to the SQLite WAL file to object storage and can replay it back. This is fairly hands-off from SQLite's perspective. There's also the SQLite Session Extension [2] that is a built-in way to support generating and applying "patches".

I'm curious how these tools will mature, it seems like a good match for microservices.

[1]: https://github.com/benbjohnson/litestream

[2]: https://www.sqlite.org/sessionintro.html

Xeoncross · on April 1, 2022

The litestream project was created by https://github.com/benbjohnson who wrote https://github.com/boltdb/bolt (a key value store) which has been instrumental (from my point of view) in the Go community as one of the original choices for an embedded database as it was not only fast, but had transactions with stable snapshots.

It was used by https://github.com/blevesearch/bleve, https://github.com/etcd-io/etcd, and number of other projects.

These days, https://github.com/dgraph-io/badger is often favored because of it's improved write throughput.

benbjohnson · on April 1, 2022

Bolt author here. Badger is a good database but it's mostly a tradeoff of write versus read throughput, specifically, range queries. Badger is an LSM so it doesn't perform as well (last time I checked) with iterating over ordered key/value pairs. LSMs have bloom filters to speed up point queries so those aren't as much of an issue.

infogulch · on April 1, 2022

Another one of Ben's projects is on the front page right now:

Postgres wire compatible SQLite proxy - https://news.ycombinator.com/item?id=30875837

otterley · on April 1, 2022

It should be noted that BoltDB has been retired as it suffered from a number of congenital defects. It was implicated in the major Roblox outage that happened a couple months ago, and Consul doesn't use it anymore.

https://news.ycombinator.com/item?id=30015913

(This comment is not to suggest that the Litestream project is of poor quality or anything like that.)

benbjohnson · on April 1, 2022

BoltDB author here. Just to clarify, the project was retired because of maintenance burden. CoreOS wanted to make some changes but I didn't have the bandwidth to test and maintain it all. Since this was before Go had good version management, we decided that they could fork the project as "bbolt" and users could move over as needed. They did a good job maintaining it so I eventually archived the original project.

tapirl · on April 2, 2022

It is strange that, although https://github.com/syndtr/goleveldb is used in the go-ethereum project for years, it is mentioned rarely (comparing to other go dbs).

Xeoncross · on April 6, 2022

Yes, it has it's own trade-offs, but is certainly a top-performer for certain workloads: https://github.com/smallnest/kvbench

silisili · on April 1, 2022

I watched an interview of Mr. Hipp, creator of SQLite, that I can't find now but was pretty interesting. Aside from being way different than I expected for some reason, in a good way - very down to earth and friendly, he was asked specifically about that, and more or less answered that his job was to write a solid DB, and replication can be done elsewhere. That's a pretty honest answer, and looks like someone took up the challenge.

qbasic_forever · on April 1, 2022

He was on an episode of the Changelog podcast in the fall and went into similar topics: https://changelog.com/news/RRAw/visit It's a good listen.

billywhizz · on April 2, 2022

probably this one? https://corecursive.com/066-sqlite-with-richard-hipp/

alophawen · on April 2, 2022

Not commenting on tailscale, but for the state of databases.

Sometimes boring is the right choice. PostgreSQL has worked for decades now, and seems to have regained much of the performance that MySQL once boasted.

If you do this for the money, investing in tried and true (but boring) software should be the default solution.

I watched with dread how the MongoDB fiasco played out a decade ago. Meanwhile, I kept using PostgreSQL and had a good ride.

unfocussed_mike · on April 2, 2022

Yep.

IMHO MySQL becoming boring (fast, pretty reliable, non-compliant quirks being ironed out, decent backup tool availability, credible admin tools, JSON and CTE support) is one of the most important long term trends on the internet.

Along with the parallel evolution of PHP.

klysm · on April 2, 2022

I strongly dislike the combination of php and MySQL, from the (recent) experiences I’ve had with it. I don’t see a good reason to pick that as a greenfield stack, but I could be convinced otherwise.

pphysch · on April 2, 2022

For me it's not necessarily the tech themselves, but the lack of any framework to guide development for the next guy.

All the PHP+MySQL I've worked with has been horribly bespoke and brittle.

Want to significantly change a URL? Need to refactor every path in the .PHP file, because it's importing something from ../.. Or get in the spaghetti business with path aliasing on the webserver.

Want to refactor the database schema? Difficulty: Impossible, because raw SQL strings are scattered everywhere, including on other systems.

Want to add/modify a new internal CRUD form? Gonna be several hours of work to RE what the last guy did to keep things consistent, while tiptoeing into the SQL to not break anything. With a proper framework, this is a 5 minute task and a few LOC into the "admin interface".

It works, and you can do a lot with just a few php files and a basic Linux system, but there are downsides.

tored · on April 3, 2022

All of these problems are created by the developers.

I have created many on-the-fly mini-PHP frameworks to use within existing legacy code bases to avoid all of this.

Easiest way to avoid manually requiring files is to use classes with the auto loader. Done.

The second easiest solution is to have different constants that has the full path to a directory and then just you use the correct constant when doing a require. No relative paths required.

You can even have one require constant per file if that is your game.

Raw sql is fine, just don’t scatter it in every file, instead put sql in specific repositories, e.g ArticleRepository, and do all article queries in there. Now it is easy.

An ORM actually creates more problems when refactoring because entities are spread out everywhere.

If you use something like Phpstorm IDE it can help refactor raw sql too.

Problem is that many developers doesn’t think this thru and even worse they accept the situation and don’t do anything about even if they can. Instead they go to forums and complain. I’m sorry but I have zero tolerance for that.

pphysch · on April 4, 2022

A decent ORM framework provides all those things out of the box, and I disagree that the ORM creates more problems than its worth. It is a fine tool to accelerate development.

unfocussed_mike · on April 2, 2022

There are several PHP frameworks worth using.

Laravel is, IMO, among the best web frameworks available full stop. Certainly one of the most maturely documented.

It addresses all the points you make.

alophawen · on April 2, 2022

I had a 10 year career doing PHP, in the last year or so I also enjoyed Laravel and built 10 or so sites with it. It did many things right.

Eventually some blade template error resulted in an impossible stack trace due to the insane amount of magic behind the scenes in Laravel, and while debugging that I decided to stop relying on such black magic hack of a framework. Around the same time Go gained in popularity and I have been working with Go since and are very happy with it.

This was Laravel 5 days and I'm sure much has changed.

I completely adored the laracasts made by Jeffrey Way. I wish every framework had a guy like that do produce learning materials.

tored · on April 3, 2022

Frameworks helps developers to move in the right direction, and that is a good thing, however when you have used a framework long enough you start seeing the cracks.

Frameworks solves a general problem, you on the other hand has a specific problem. To be able to accommodate your specific problem and everyone else specific problems frameworks has tendency to heavily rely on magic.

* magic classes - using PHP magic methods for everything, the actual class is usually empty and things happening elsewhere.

* magic files - put a file with the correct name in folder somewhere and things happen

* magic configuration - put a hard coded string in some config file and you change the entire behaviour of the app.

* magic layers - if you follow the execution path with the debugger you spend the majority of time circling around in different layers and very little time in your own code.

All of these things are considered bad practice when doing it in your own code, but for some bizarre reason it is considered good practice in frameworks.

klysm · on April 3, 2022

> All of these things are considered bad practice when doing it in your own code, but for some bizarre reason it is considered good practice in frameworks.

I think that’s because that kind of code has a much higher maintenance burden. It’s much more difficult to get up to speed on what’s going on in complex magical code. But you can probably trust a framework that’s achieved critical mass to make reasonable decisions and stay alive for a while.

pphysch · on April 2, 2022

Yeah I would love if these places had a solid Laravel deployment, but they sadly don't.

And if I'm gonna deploy a new framework greenfield, I'll just go with Django since Python has more mindshare in my industry.

unfocussed_mike · on April 2, 2022

MySQL + PHP (Laravel + Composer + Lighthouse + Spatie MediaLibrary) is very productive.

klysm · on April 3, 2022

I agree it's _initially_ productive, but the it becomes very difficult to maintain.

unfocussed_mike · on April 3, 2022

Why/where, particularly in Laravel?

Laravel has an excellent ORM, a great query builder (I write almost no SQL), an effective database migrations system, a solid class hierarchy (no include statements anywhere), a really useful job queuing system, and it is built around Composer (one of the best package managers anywhere).

Lighthouse is really a first-class, schema-first GraphQL binding with a very logical code interface.

I don't see any code maintenance problems in Laravel that can't be mitigated by the same discipline you need anywhere else.

klysm · on April 3, 2022

I don’t have anything objective to back this up, but in my experience it requires a lot more discipline and initial knowledge from the devs. Much of the existing documentation for PHP is just plain wrong, or horrible advice. It’s difficult for new devs to learn these practices. PHP let’s you be ‘lazy’ and hack in shortcuts to solve problems - lots of discipline required there.

If you have a solid team and some people with a good background in the framework and PHP that can mentor effectively, then it probably won’t be bad. It still would not be my first choice though.

tored · on April 3, 2022

Compared to what?

louwrentius · on April 2, 2022

You say 'sometimes', I think it should be the default position.

When you want to deviate from the boring software and technology, there must be a very sound 'business case' for it.

If you are building a planetary scale business, OK, maybe that warrants something more fancy.

But how many companies really need that? I'm not limiting myself to the HN unicorns, but looking at the whole market.

alophawen · on April 2, 2022

I agree with you and I also wrote:

> If you do this for the money, investing in tried and true (but boring) software should be the default solution.

Meaning if you are just playing around, then by all means evaluate new and shiny stuff.

kfk · on April 2, 2022

It depends. I was in a demo call from Blaze, a CDP (Customer Data Platform), Segment.io competitor. Their underlying db seems to be MongoDB and for a CDP makes a lot of sense. In a CDP you collect all kind of data on n hard ids that identify 1 person interacting with your web assets, schemaless and json first is a lot easier to reason in this context than SQL. This is because you keep enriching your profile with additional attributes over time. Can PostgreSQL do this? Absolutely, but it’s not its main feature.

Things are complex, especially in data, I think it’s important to evaluate use cases rather than going with generic assumptions like “use boring tech”.

rsp1984 · on April 2, 2022

> I watched with dread how the MongoDB fiasco played out a decade ago.

Could you elaborate on this? MongoDB as a company is worth $30B, so it looks like they did at least some things right.

vinay_ys · on April 2, 2022

Their paid managed service works well for enterprise SaaS companies who need < 3-4 TB of storage max ever per customer cluster and have enterprise scale users (<100k users).

It also works well for early stage Consumer Internet startups that haven't yet achieved huge growth but care a lot about developer productivity while churning features at high velocity.

But it gets blamed for reliability issues suffered by highly successful Consumer Internet companies that have achieved scale and have DAU, MAU in > 10M and have lot more than 4TB of data in a single cluster.

But that's basically a good problem to have at that stage. Usually such a company would have many more problems – monolithic application with monolithic database with huge unmanageable schema and indexes gone wild etc. Usual solution at that stage would be some sort of rearchitecture towards Microservices with multiple specialized databases for different use-cases – usually cloud hosted managed databases with horizontal scalability dedicated for online user-path workloads and separate OLAP tech stack for offline ETL/analytics workloads.

database_guy · on April 2, 2022

At work I have a multi-petabyte cluster in MongoDB Atlas with no issues. The managed service is fantastic; I'm going to need some citations on how it degrades after a few terabytes. I don't really like MongoDB's document-model-everywhere approach but I have to respect how well their managed service works.

redwood · on April 3, 2022

Grossly misinformed post. 4TB (compressed) per shard may be what you're thinking of?

alophawen · on April 2, 2022

There was many new node projects using MongoDB and I remember several failed, but details elude me.

There was a Slack competitor using MongoDB that was snappy and js hipster trendy that ultimately failed because of unreliable database IIRC.

UncleEntity · on April 2, 2022

How many crypto currency projects went down with $million+ losses due to using mongoDB as the backend?

billywhizz · on April 2, 2022

they did their marketing right.

mathgladiator · on April 2, 2022

I believe using a large JSON file is not half bad, but you do run into the problem of how do you index and query it in meaningful. I actually ran into this problem when building a game because a document doesn't provide a great model..

The relational model is... JUST... SO... GOOD. And, it is a shame that most of the relational systems are so complicated.

A document within Adama (https://www.adama-platform.com/) is basically a giant JSON file held within memory with clients connected via a WebSocket. I'm basically building my own indexing since I want the indexing to be reactive, and I've got exceptionally fun optimization problems.

Litestream, in my opinion, is a great way to get started. Actually, it's beyond fantastic if you maintain the 1 server to 1 database because migrations become so easy. So, I applaud the team for taking this simple approach.

tptacek · on April 2, 2022

If you have a "single source of truth", sqlite/Litestream seems like a pretty-near-optimal way of taking advantage of the relational model while keeping design simplicity. I don't know what the higher-level architecture of Tailscale is, but we have the same problems; we have a complicated "single source of truth" that takes the form of a Consul cluster, but sqlite makes an absolute ton of sense for us, because we can condense the Consul cluster down to a SQL schema and then ship it around our fleet.

(we don't use Litestream right now; we're pushing the complexity up a layer in our design instead)

A lot of people on this thread are looking down their noses at sqlite, but I think they're kind of beclowning themselves; the unreasonable effectiveness of sqlite has been a meme in infra dev for a couple years now. It's not a new idea. Lots of people are doing stuff like this.

The funniest bit on this thread is the person saying they should use RDS, as if their infrastructure was just a big Rails app.

statictype · on April 2, 2022

It would be interesting to known why a standard boring RDS setup wouldn’t solve their problem completely. In fact I would be more interested to understand that than the actual details of sqlite tailing.

(I think the reason they gave was vendor lock-in, but apart from that, I didn’t understand why it wouldn’t be adequate)

tptacek · on April 2, 2022

One obvious reason not to use RDS is wanting disk-local caches of information replicated from a single leader, rather than having every single machine in your fleet calling out to an external service on every read. That's certainly why we're not considering Postgres in our infrastructure, even though managed Postgres is a product we in fact offer.

We use Postgres! It's the backing state for our API, and an important source of truth in our architecture. Postgres is a great way of serving a GraphQL API. It is not necessarily a good way to back an infrastructure service.

throwdbaaway · on April 2, 2022

Sounds like postgres listen/notify could be viable for your high-read-low-write use case? Or it is not scalable enough for the fleet size?

tptacek · on April 2, 2022

That's still all your infrastructure components calling out to an external service on every read --- and for what advantage?

This isn't an app server; it's an infrastructure component, running (I don't know about Tailscale here, but we use SQLite in similar uses cases) on potentially hundreds or thousands of machines.

throwdbaaway · on April 3, 2022

No, you don't need to make any network call on every read with listen/notify. It will essentially be a local cache, working the same way as etcd watcher, using one postgres connection per machine.

statictype · on April 2, 2022

Makes sense once you think of Tailscale as an infrastructure-level service and not just an app. Thanks

broken8ball · on April 2, 2022

Really curious…how do you meaningfully overlay any indexes on top of a big-ass JSON file? Technical details are appreciated, and no problem if it’s your secret sauce—just very curious how this is accomplished!

mathgladiator · on April 2, 2022

So, a funny thought experiment is what happens when you parse a JSON file at the same time? You also index by primary key (the field name).

So, I mirror this thinking and having be just an object with the keys being the primary key. Then, I simply index all the children by their fields based on insights from the developer via the index keyword.

So, if you have

record R { public int id; client int owner; int age; index age; } table<R> rows;

then queries for age can be accelerated by the table.

like "iterate rows where age==42" will basically hone in on the bucket of age==42. I currently only index clients by hash and integers.

The critical aspect which makes this work is that I monitor all mutations. When a child object has a field mutated, then it is removed from all indices and placed into an unknown index. Any queries will also consider it as the purpose of queries to simply narrow the field. Once data changes are persisted, the index is updated and items are moved out of the unknown bucket. This works fairly well because the indices are primarily used during the privacy check phase.

acidbaseextract · on April 2, 2022

The normal way? You can implement whatever kind of index you like — b-tree index, bitmap index, hash index are all useful and conceptually simple if you're familiar with the backing data structures.

For example, if you want to index a "foreign key" id stored in each "record" in a JSON array of objects, you build a hash table from the FK id values to the JSON array indices of the objects that have that id. It can be as stupid simple as an `fk_index = defaultdict(set)` somewhere in your program, to use a Pythonism.

Now when someone wants JSON objects in that array matching a given FK id, they can just O(1) look in the index to know the position of records that match. Much better than an O(N) scan of every item in the array.

Of course you have to to maintain the index as writes to the JSON happen, but that's not bad once you understand how things work. No real secret sauce.

mathgladiator · on April 2, 2022

The secret sauce may be the need to take control of the write path.

chubot · on April 1, 2022

Honest question: what do the sqlite authors think of litestream?

I seem to recall Richard Hipp on the "changelog" podcast mentioning it, but I don't remember what he said.

2 episodes here: https://changelog.com/person/drh/podcasts#feed

I think it was very neutral, something to the effect of "there are multiple solutions".

But I know essentially nothing about sqlite internals so I can't judge, or maybe that's why I didn't understand what he said.

I guess I'm wondering if it's a recommended/supported mode of operation in sqlite. What are the failure cases? How much data can you lose?

benbjohnson · on April 2, 2022

> Honest question: what do the sqlite authors think of litestream?

Litestream author here. Dr Hipp and his team reached out when I first released Litestream and we had a video call together. They were fantastic. Really friendly and down to earth. I explained how I put together Litestream and we went back and forth on different approaches. They were really helpful with understanding some of the shared memory stuff that's what makes the new read replication work. They ended the call by asking me if there's anything they could help with or anything I needed.

> I guess I'm wondering if it's a recommended/supported mode of operation in sqlite.

I don't think the SQLite team endorses any tooling outside of what they build AFAIK. So in that sense, no, it's not officially supported. However, the API for maintaining checkpointing is publicly available and there are docs for controlling it from outside processes (which is what Litestream does).

> What are the failure cases? How much data can you lose?

Litestream is designed so that it keeps retrying in the event that you can't connect to a replication destination. S3 is pretty reliable so it may be a network outage that could cut you off. You can also enable Prometheus metrics to be reported out of Litestream if you want to add monitoring and alerting.

As far as data loss, by default it's setup to bundle database changes together every 1 second, compress them, and upload them to somewhere like S3. So your window for data loss is 1 second unless S3 goes down.

You can also run regular backups with the SQLite CLI and upload those as a fallback. That's what I typically do since it's really cheap and easy to setup an hourly cronjob and I'm overly paranoid. :)

infogulch · on April 2, 2022

Wow that's great that you were able to discuss with the SQLite team directly. Did you discuss how litestream compares to the session extension?

benbjohnson · on April 2, 2022

We didn’t discuss the session extension. That extension solves a different problem of having multiple primaries that need to merge their data. I haven’t used it personally but it looks great for that purpose.

chubot · on April 2, 2022

Cool, thanks for the response!

endisneigh · on April 1, 2022

I'm so desperate for a SQL database where I can just put it on a bunch of commodity hardware via Docker, connect them up and never worry about this again. Ideally it'd monitor my queries and create indexes for me. pleeeeeeaaaaaaaaaaaassssseeeeeeeeee

/dream

FoundationDB is similar but you have to do so much yourself. It's more or less what I'm describing for a key value store though. Not sure why it's not more popular.

Nican · on April 1, 2022

I am having quite a good time with CockroachDB. It has been mostly set-and-forget, and it auto-balances, and does rolling upgrades pretty smoothly.

marginalia_nu · on April 2, 2022

Solving performance by mindlessly adding indexes works great as long as you don't have all too much data and have endless RAM.

The index backing the unique constraint on the URL table for my search engine is around 23 Gb. The entire server has 128 Gb of RAM, for comparison.

jitl · on April 2, 2022

I am hoping EdgeDB heads this direction to some extent.

btgeekboy · on April 1, 2022

FoundationDB is an Apple thing, isn’t it? Might explain why. It feels kinda like they’ve put it out there and it just hasn’t had anyone with a marketing budget to push it along.

database_guy · on April 1, 2022

It started as a proprietary database. Apple acquired it and then open sourced it eventually.

tapirl · on April 2, 2022

Didn't it start as an open source database? then apple acquired it and made it close sourced, then open sourced it again after several years.

It is weird that this DB has a good reputation which doesn't match its popularity.

jitl · on April 2, 2022

FoundationDB was the first publicly available distributed transactional KV on the market that I can recall. It wasn’t “open” though - always proprietary. They generated a lot of hype and good press for being this NoSQL DB that was all about ACID correctness in the age of “mongo is webscale (by writing to /dev/null)”. They had a demo showing impressive throughout on a low power cluster with no data loss during a power cycle. I think that despite generally impressive tech they had some trouble selling but did end up selling the company to Apple.

jen20 · on April 2, 2022

No, it was not open source prior to Apple releasing it as Apache 2.0 in April of 2018.

astrange · on April 2, 2022

The source was licensed to some people but it wasn’t open source IIRC.

database_guy · on April 1, 2022

I lamented this exact situation in a post I made last week. I'm with you on this completely.

_clhx · on April 2, 2022

You probably don't really want that. But not in the way that people didn't want WoW classic.

bobnamob · on April 2, 2022

Can you elaborate on why you say this?

_clhx · on April 11, 2022

Because it would probably lead to the database doing stuff that you don't want it to do, with no way for you to tell it to do something else.

coderdd · on April 2, 2022

I was pondering about using simple decentralized databases that are kept synchronized asynchronously.

I arrived to a design where SQLite databases would be synchronized by Kafka. Kafka is really robust, and has a friendly semantics when configured to be in-order delivery.

The catch is, you don't issue writes to SQLite anymore. You write Kafka messages, and have to prepare to resolve concurrency problems at read+write-to-db time. For example, user registration is not an atomic action anymore - you write a registration attempt message, and then it might race with a concurrent registration attempt from an other node (say due to retries and random load balancing).

You resolve the race in message-reception time (doable due to stable order on multiple reader nodes). I expect this would work nice, but needs you to rethink all actions that used to be synchronous db writes.

lifty · on April 2, 2022

There are more robust solutions out there for this architectural pattern. Look into dolt(dolthub) and noms.

stavros · on April 2, 2022

I love this. If it works, keep it until shortly before it stops working. The older I get the more love I have for simple solutions that are easy to work with (and change when you've grown out of them).

These days I wince when I hear of overcomplicated solutions for an MVP that "will scale" at a nebulous point in the future, while we're going to be paying the complexity penalty every day for a benefit that may never come.

norom · on April 1, 2022

I for one am enjoying this. Sure they could probably just use a regular ol' database, but where is the excitement in that!

mrkurt · on April 1, 2022

sqlite is, like, the regular-est of the databases.

isbvhodnvemrwvn · on April 2, 2022

Not quite, I got bitten by weak type system more than once.

dangoor · on April 2, 2022

Enough people have had this complaint that sqlite has added STRICT tables: https://www.sqlite.org/stricttables.html

isbvhodnvemrwvn · on April 2, 2022

Oh, finally! I haven't heard about it. Thank you!

mekster · on April 2, 2022

Funny in here, people praise SQLite but dismiss MySQL when it's tiny bit off the SQL standard and people complain how PHP converts values with best effort guess on different types but not when SQLite does that too.

Don't know why anyone would use SQLite in production except for as some KV store embedded in an app.

tptacek · on April 2, 2022

Because an SQL engine that defers control over connections, concurrency, storage, and process management to users is extremely valuable for multiple reasons, including simple cases like caches where you'd ordinarily just store things in flat files or BerkleyDB-style database, and complex cases like distributed statekeeping where existing SQL databases aren't going to do anything close to the right thing for your application, but you also don't want to reinvent SQL and query planning and write-ahead-logs to build your complex database.

The vibe I get from this thread is that a lot of people have only experienced SQLite from the vantage point of something like Rails running in a "test" environment, and are used to explaining to people on some PHPbb somewhere that you can't really use SQLite to back a multiuser SAAS app. That's as may be, but when you're building infra components, you control who the "users" are; not everything is a web server taking requests from all comers.

yawaramin · on April 2, 2022

Adding to what tptacek said, it seems you're also out of date with SQLite, they added strict typing recently.

nooorofe · on April 2, 2022

there is a strict mode https://sqlite.org/src/wiki?name=StrictMode

jitl · on April 2, 2022

SQLite is the most deployed database in the world

isbvhodnvemrwvn · on April 2, 2022

Not for this kind of applications. Same as Java being on 13 billion of devices, it doesn't mean anything if it's some config store underneath some mobile or desktop app.

yawaramin · on April 2, 2022

> some config store underneath some mobile or desktop app

Do you understand what kind of application Tailscale is? For all intents and purposes it's exactly what you described above.

mekster · on April 2, 2022

Excitement is only for people who are learning.

billywhizz · on April 2, 2022

you should always be learning. and excited when you are. do you know everything?

mekster · on April 2, 2022

There are projects you can try new things with and projects that should just work when it's expected for many people to use. Don't push your excitement to the users.

jjeaff · on April 1, 2022

I really hope they aren't taking the same tack when dealing with the encryption portion of their service.

acrispino · on April 1, 2022

They're using WireGuard for the tunneling. I don't think a lot of people will object

Aeolun · on April 1, 2022

What would be the analogue?

Using openssl with a wrapper service? Doesn’t seem particularly dubious to me.

ignoramous · on April 2, 2022

> Using openssl with a wrapper service?

Coincidentally, one of tailscale's founders already built this: https://github.com/sshuttle/sshuttle

jjeaff · on April 6, 2022

They have been trying to create or reinvent the database rather than using tried and true solutions. The analogue would be rolling your own cryptography.

dang · on April 2, 2022

kall · on April 2, 2022

At first I was also a little confused by this architecture. I think that was because I was missing a clear picture of what they want to do. I think it is:

- one leader that can write to the db

- a ton of other hosts that can read from the "streamed" db and want to do so often/fast

- hosts send their writes (if they have any?) to the leader via some api call (?)

Is that right? If so, the solution makes a ton of sense. "Just use pg/MySQL" in that case would mean hundreds (thousands?) of read replicas, which doesn't sound fun, or every read over the network.

tptacek · on April 2, 2022

It's replacing etcd, which has essentially that model.

brycelarkin · on April 1, 2022

It looks like you guys use AWS. Why not just use RDS if you don’t want to deal with database management?

eyelidlessness · on April 2, 2022

Disclaimer: I hate doing ops, but I’ve been in a position to actively hate doing it fairly regularly for several years of my career. My perspective is as a person who doesn’t want to deal with any of this kind of stuff. So I’ve probably failed to acquire knowledge which would make it less frictionful for me, purely from lack of interest.

RDS has some significant downsides which I would personally consider no go if I were in a position to evaluate it. The one which stands out as particularly painful from my past experience is… it’s excruciatingly slow to provision or make configuration changes. Like lose whole days of work to a few iterations of trial and error slow. Combined with AWS’ sprawling and inscrutable set of authorization and configuration options, the weird idiosyncrasies between most of their offerings, and the absolutely opaque naming applied to most of those offerings… trying to use RDS effectively as a managed database service felt more to me like becoming a full time ops professional.

carlhjerpe · on April 2, 2022

Idk, I just use the Terraform module and set the Helm values on whatever should consume the db with whatever I get back from the module then call it a day. Then again, I'm not scaling big at all.

epolanski · on April 2, 2022

Some engineering teams seem to take overcomplication always one step too far. It's very hard to estimate future work and overconfident engineers consistently downplay the costs.

ignoramous · on April 2, 2022

I'm not sure why you're being downvoted. You're right: running databases isn't for every team! Reach out for a managed database if you can.

But tailscale isn't some random group of engs. They've probably got the chops to pull off literally anything they want to. I mean TFA casually mentions online cross-database transfers, multiple zero-downtime schema migrations, inspecting litestream's replication code for feasibility, deftly modifying sqlite WAL checkpoints... all in one breath.

jeffffff · on April 2, 2022

if they're this talented, is their time really best spent on dba work rather than improving the product?

ignoramous · on April 2, 2022

> is their time really best spent on dba work...

It seems to me that tailscale engs want to avoid DBA work but also not use managed offerings, and so, they're comfortable paying the costs they have to (such as multiple migrations).

> ...rather than improving the product?

Well, you'd guess they want to be able to continually improve their already credible product too. When TFA points out that zero vendor lock-in and hassle-free, local end-to-end tests are non-negotiable, I think it is for this reason.

----

> if they're this talented, is their time really best spent on...

From: https://tailscale.com/blog/go-linker/

"People are often surprised and sometimes horrified when they learn that Tailscale maintains its own fork of the Go toolchain. Tailscale is a small startup. Isn't that a horrible distraction, a flagrant burning of innovation tokens?"

"Maybe. But the thing is, you write code with the engineers you have."

"We had a problem: We kept crashing on iOS, and in addition to being awful, it was preventing us from adding features."

"Another team might have decided to cut even more features on iOS to try to achieve stability, or limited in some way the size of the tailnet that iOS could interact with."

"Another team might have radically redesigned the data structures to squeeze every last drop out of them."

"Another team might have rewritten the entire thing in Rust or C."

"Another team might have decided to accept the crashes and attempted to mitigate the pain by making re-establishment of connections faster."

"Another team might have decided to just live with it and put their focus elsewhere."

"The Tailscale team has Go expertise, spanning the standard library to the toolchain to the runtime to the ecosystem. It’s an asset, and it would be foolish not to use it when the occasion arises. And the fun thing about working on low level, performance-sensitive code is that that occasion arises with surprising frequency."

"Blog posts about how people solve their problems are fun and interesting, but they must always be taken with a healthy dose of context. There may be no other startups in existence for which working on the Go linker would be a sensible choice, but it was for us."

jeffffff · on April 2, 2022

> When TFA points out that zero vendor lock-in and hassle-free, local end-to-end tests are non-negotiable, I think it is for this reason.

if zero vendor lock-in and hassle-free, local end-to-end tests are non-negotiable, why are they using s3? migrating to another s3 compatible backend would be similar in effort to migrating from aurora mysql or postgres to another managed mysql or postgres service or to self-hosted mysql or postgres

tptacek · on April 2, 2022

First, S3 and a SQL database aren't comparable. But I think you're bringing up S3 because they're using Litestream to ship WAL frames to S3. Go read the Litestream documentation; Litestream syncs to basically anything. They don't need to "migrate to another S3 compatible backend"; they can migrate to almost anything that can save a file.

It's a super confusing argument regardless, because the industry is lousy with "S3-compatible backends".

ignoramous · on April 2, 2022

> migrating to another s3 compatible backend would be similar in effort to migrating from aurora mysql or postgres to another managed mysql or postgres service or to self-hosted mysql or postgres

You may be right. I have no experience migrating litestream but from the docs (https://litestream.io/guides/) it is literally cp'ing files from S3 to wherever and exec'ing one of these one-liners (of course, the devil is in the details):

   litestream restore -o my.db s3://BUCKETNAME/PATHNAME
   litestream restore -o my.db abs://STORAGEACCOUNT@CONTAINERNAME/PATH
   litestream restore -o my.db gcs://BUCKET/PATH
   litestream restore -o my.db s3://SPACENAME.nyc3.digitaloceanspaces.com/db
   litestream restore -o my.db s3://BUCKETNAME.us-east-1.linodeobjects.com/db
   litestream restore -o my.db sftp://USER:PASSWORD@HOST:PORT/PATH

benbjohnson · on April 2, 2022

Litestream author here. Yeah, you're basically right but it's simpler than a DB migration. No need to copy the old data over. You can remove the `-litestream` metadata directory and point it at a new replication destination and it'll automatically re-snapshot the database begin replication.

epolanski · on April 2, 2022

Some examples: they could get rid of that pointless bootstrap on their website, they are shoving almost 1.5 MBs for a single font alone on their main page and their HTML semantics are nowhere to be found. This will all impact their bounce rate, accessibility and SEO.

I just don't believe the tale of "such skilled engineering teams" which don't show that in their products but blogposts.

blizz017 · on April 2, 2022

What if I told you that the product engineering team is almost never the same team maintaining the website; hell it’s likely the website is contracted out and maintained by the marketing team.

nooorofe · on April 2, 2022

do you have doubts those people are skilled? https://en.wikipedia.org/wiki/Brad_Fitzpatrick https://news.ycombinator.com/item?id=21727925

didip · on April 2, 2022

So… litestream is bidirectional? Nothing can go wrong when daemon A and daemon B write almost at the same time?

Edit: I just found this tips and gotcha page: https://litestream.io/tips/

benbjohnson · on April 2, 2022

Litestream author here. I'm glad you were able to track down what you were looking for. For posterity, I'll add a response here. Litestream is a one-way, physical replication tool so you have a single primary node that continuously copies changes out to an external destination (S3, SFTP, etc).

If you're looking to have multiple primaries then you can try using the SQLite session extension[1] and copying patch files. That's quite a bit more complicated though.

[1]: https://www.sqlite.org/sessionintro.html

bob1029 · on April 2, 2022

This is great. We've been using sqlite in production as our exclusive database storage solution for over half a decade. For non-trivial, multi-user systems too.

Replication of data has always been a concern for us, with the current mitigation being periodic VM snapshots. In order to engage larger customers, we would need to tighten this up.

Our next gen data storage technology leverages some dark magic from fintech. Append-only log, single writer, synchronous replication, etc. Most recent benchmarks with replication to 1 sync witness on same lan are in excess of 200k business objects created or updated per second.

KwisaksHaderach · on April 1, 2022

The obvious candidates were MySQL (or one of its renamed variants given who bought it) or PostgreSQL, but several of us on the team have operational experience running these databases and didn’t enjoy the prospect of wrestling with the ops overhead of making live replication work and behave well. Other databases like CockroachDB looked very tempting, but we had zero experience with it. And we didn’t want to lock ourselves into a cloud provider with a managed product like Spanner.

What about managed mysql/posgresql? no lock-in and installing them locally is trivial.

mrkurt · on April 2, 2022

Managed DBs have tremendous lock in. Just try migrating off RDS with zero downtime. You can't, because they've "managed" your ability to configure external replicas. Then built a whole brittle data migration service that probably won't work for your DB.

Winsaucerer · on April 2, 2022

I'm having a really difficult time migrating my managed postgresql instance in Google Cloud just to get it to a newer version. I'm down to my last option: shut down everything, export, re-import somewhere else again.

I did briefly have logical replication working, and so had my near zero downtime solution in hand. But the replication broke after a few days because I delayed cutting over, and then I couldn't get it going again. I couldn't find out how to debug this either.

The lesson I've learned here is that I'll be avoiding managed DB products whenever I can. It takes away control that turns out I sometimes really need.

jeffffff · on April 2, 2022

they've already had to migrate twice due to questionable technology decisions. having to migrate again due to needing to change cloud providers seems a lot less likely than having to migrate again because their outside the box technology choice didn't pan out.

tptacek · on April 2, 2022

"Questionable technology decisions". You're trying to dunk, but I don't think you understand where the hoop is. Their technology decisions have panned out Tailscale well. We should all be so fortunate. This isn't Twitter with the "fail whale"; the only reason you know about any of this stuff is because they wrote about it. They ran their entire service with a JSON file backend for 18 months, and switched from it to etcd without you even knowing about it.

jeffffff · on April 2, 2022

i really don't know a ton about this product or team but it sounds like if they had used aurora mysql or aurora postgres in the first place then there would be nothing to write a blog post about because it would've just worked and kept working. they say they want to avoid vendor lock-in but if the vendor became a real issue they'd be doing their first migration instead of being on v3 already. additionally, their bespoke solution relies on s3, which is also a vendor-specific technology, so it seems like they haven't avoided vendor lock in? i've seen many cases of developers doing more work to avoid vendor lock-in than it would take to replatform if it ended up being a necessity, and this really feels like that looking at it from the outside. i'd understand this better if mysql or postgres couldn't solve their problem, but that is not the case here, and i can't wrap my head around a company who is ok with their devs reinventing a very good wheel 3 times when the obvious choice would've worked fine the first time. it seems like they are successful in spite of these decisions, not because of these decisions.

https://mcfunley.com/choose-boring-technology

tptacek · on April 2, 2022

You should start by learning more about the product, and then tell them they should use Aurora for all their backing store.

jeffffff · on April 2, 2022

[flagged]

tptacek · on April 2, 2022

You don't know what this database is --- upthread, you said you don't even know what the product is. All you appear to know is that they should be using something like RDS. Isn't that a weird position to take?

jeffffff · on April 2, 2022

it's also kinda weird to write a blog post about a database being used for what is apparently a very specific use case without mentioning what that use case is. the burden of proof is on tailscale to explain why they need to deviate from industry norm here and clearly from the entire comment thread of people wondering the same thing as me they haven't done that. this blog post might actually be valuable if they included more context so that people could learn when something like this might be a good idea, especially since it's not a good idea >99% of the time. as is, no one should be surprised by this response.

tptacek · on April 2, 2022

This is one of several comments you've written where you've acknowledged you don't know what the use case is. But you've stridently insisted that they should have used Aurora Postgres or Aurora MySQL. You get how strange this take is, right?

One thing that would have helped your writing on this thread: a lot more question marks. It's OK not to understand something! Asking questions helps everybody.

jeffffff · on April 3, 2022

at no point in the blog post or this thread has anyone actually explained what their super special use case is that makes doing what almost everyone else does a bad choice and this replicated sqlite thing a good choice. it's pretty clear you're an investor in this company or have some other vested interest in shilling for them, good luck with that.

hokus_p2 · on April 2, 2022

>> it seems like they are successful in spite of these decisions, not because of these decisions

That is my conclusion, too.

mrkurt · on April 2, 2022

The thing about startup decisions is: most of them are "wrong". Or they start right and become wrong later. Successful startups aren't successful in spite of their wrong decisions, they're successful because they can change them very quickly (then write a blog post about it and get more customers).

There's also a strong correlation between people who look at things from weird angles and also build good products. Why are you surprised the people who invented an entirely new way of doing VPNs also don't cargo cult database storages?

jeffffff · on April 2, 2022

it's not cargo culting if it works, and they even say in the article that mysql or postgres would've worked

maybe next they will stop cargo culting operating systems and switch to SerenityOS?

tptacek · on April 2, 2022

They're talking about running many instances of MySQL locally, not hooking all their systems up to RDS.

jeffffff · on April 3, 2022

no they aren't

Tailscale’s coordination server, our “control plane”, has become known as CONTROL. It’s currently a single Go process on a single VM.

source: https://tailscale.com/blog/an-unlikely-database-migration/

bobnamob · on April 2, 2022

The s3 api is rapidly becoming a distributed storage standard. Building your product around it is hardly lock-in these days.

AWS does offer one (very reliable) implementation but they’re very definitely not a monopoly.

yawaramin · on April 2, 2022

Sounds like you need to do a lot more research before commenting on this? It's pretty easy to find in the Litestream docs that it replicates to S3, Google Cloud Storage, Azure, and other options even including SFTP. In fact by off-loading the storage integration details to Litestream, the Tailscale people now get seamless storage vendor independence almost for free.

This is what a really smart and future-proof solution looks like.

jeffffff · on April 2, 2022

or they could have used mysql or postgres hosted by aws, gcp, azure, etc? i would put money on this not being the last time they change databases

tptacek · on April 2, 2022

Again: this is an infrastructure component, running on many machines internally. It's not a Rails app. We use SQLite in very similar circumstances across hundreds of machines. Having all those machines schlepping all their reads back to RDS would not only be untenably slow, but it would also make the whole system less stable. I don't think you've really thought through the design at all, and you should before making comments like these.

I'd put money on this not being the last database change too! But not for the same reasons you would.

mwcampbell · on April 3, 2022

Unless I'm misunderstanding something, it seems to me that Tailscale's controller, discussed in the previous blog post [1], is like a typical CRUD API server, currently only running on one machine (they're not using Litestream's work-in-progress live read replicas), not something widely distributed like Fly's service discovery infrastructure.

I'm not joining the "just use RDS" crowd, but it looks to me like using a managed DB service like RDS would have been a reasonable decision for this application.

[1]: https://tailscale.com/blog/an-unlikely-database-migration/

yawaramin · on April 2, 2022

I'll take that money quite easily ;-)

IshKebab · on April 2, 2022

They're not questionable because they didn't work, they're questionable because they were so difficult to run that they had to migrate twice.

You could absolutely run a web site using Brainfuck without any failures or any customers realising it. Doesn't mean it's not a questionable decision!