I never stop being impressed at how often people will jump to odd, unsupportable, conclusions like, "using MySQL will make this thing faster".
I've seen it so many times over the years regarding users and email configurations. I can't count the number of times I've dropped into someone's badly behaving mail configuration and found they had MySQL hosting the users, and explained it was for "performance" reasons. Somehow they didn't grasp that /etc/passwd fits entirely in memory, and the map files Postfix uses for various lookups and stuff are already a database (just one specifically designed for the task at hand) and also fit entirely in memory. Putting that MySQL layer in there is disastrous if performance matters; it is orders of magnitude slower for any case I've seen...still plenty fast for most cases, but it's ridiculous that this idea gets cargo-culted around that if you store your mail users in MySQL your mail server will be faster.
A little knowledge is a dangerous thing, is what I'm trying to say, and people who know MySQL is "fast" may not know enough to know that it's not the right tool for the job in a lot of cases...and is probably slower for many use cases. I'm pretty confident this is one of those cases. SQLite is wicked fast on small data sets, and being smaller means more of it will fit in memory; I can't think of any way MySQL could be a more performant choice for this workload.
Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!
Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!
It's really not that hard to imagine. There is an embedded library version of MySQL called libmysqld which is made for this very purpose. Of course as you point out it's quite overkill for something that would work perfectly well with Berkeley DBs.
I've never seen it used like that, but Microsoft's equivalent SQL Server Express is a nightmare in deployment and support, because you need to administer a full-blown SQL Server on every customer PC (backups, migration debugging, …). I've no idea why people don't use SQLite instead.
_SQL Server Express_ is not the MS equivalent of SQLite.
_SQL Server Express_ is just a free (as in beer) limited version of _SQL Server_. It's still a full-blown traditional server like MySQL, Oracle and other that require services and administration tools to be installed.
Mirosoft has 2 equivalents to SQLite: The Jet database engine is pre-installed in all versions of Windows and allows creation of .mdb databases (those used by older versions of MSAccess). You don't need to include any dll file to use it.
The other is _SQL Server Compact_ which, like SQLite, is an embedded engine that you can bundle with your application by including a library.
All of these embedded databases are able to do multi-user writes to some extent. Jet is actually quite good if careful with locking (emphasis on careful). SQLIte is a de-facto standard because it's simple, performant, cross-platform and flexible.
While it can replace full-blown databases in some cases, it's far from being always true. There are still many cases where using something like SQL Server Express may make more sense, for instance if you want to offer a path to your customer for drop-in replacement of the database based on the growth of their needs over time. Not saying it's not a costly lock-in, but it's an easy one to sell.
It should be noted, of course, that both Jet and SQL Server Compact are essentially considered "deprecated" technology and are not recommended for new development efforts. The compact database format Microsoft recommends these days (and which comes bundled with the Universal Windows Platform SDK these days) is now actually SQLite.
SQLite is popular enough for Windows applications that SQLite had to rename its temporary files to "etilqs_..." (SQLite backwards) to avoid getting unnecessary bug reports from naive users (https://answers.microsoft.com/en-us/windows/forum/windows_7-...)
Certianly from my experience the main reason for using it is that once it becomes to large for a desktop machine it's trivial to migrate to a 'proper' MsSql server.
I also don't find the admin side much of a faff. Provide your users a backup/restore option within your app and ensure you install your own instance.
SQLite is a practical option if concurrent writes are not required. It is not meant for that, and that is also one of the reasons why it is both fast and simple, since without concurrency, a whole set of really complex problems goes away. SQLite is great when the communication and arbitration between two services takes place at the application level, as is the case with DNS AXFR requests, for example, and each of those services may have its own private data store. In such a situation, an immediate benefit is realized by having a ___domain specific, uniform data manipulation tool (SQL) without having to write custom data manipulation code. This is especially well suited to scenarios where configuration management is performed via OS packaging.
Yup, I did some benchmarking (for a webapp which runs a separate process for each customer) and SQLite had the highest TPS and lowest memory usage by a large margin.
Probably because it runs in the same address space as each server process and there is no message passing (TCP/unix socket) overhead.
> I never stop being impressed at how often people will jump to odd, unsupportable, conclusions like, "using MySQL will make this thing faster".
That is a good opportunity to ask questions and learn about the person, their knowledge and their use case.
Yeah sometimes the answer is they don't know how stuff works underneath and are just following a blog post or a trend or heard someone talk about it at meetup.
Programmers love programming tools / frameworks / languages. There are so many new shiny things out there. New ones pop up every month. New language, new build system, new db, new UI library, new message queue etc. Interstingly over the years, lots of those have had great performance improvements. Maybe they took advantage of new kernel API or some new algorithm and so on. That feeds the hype -- put new shiny things in and performance doubles.
So people have been trained to expect all the new things to just magically be better. But like said, you have to probe and find out the reason they propose the change, only then you can only see if it is justified or not (one of the resonable responses could be "I've actually measured with a production load and it does look better"),
Its mostly about concurrency, sqllite serialises all writes, for an embedded database in a single user application that is ok, but for a web application that may have a lot of concurrent writes going on its a performance disaster area. Note that for good perfomamce on a write heavy web application even mysql can be a problem unless you use innodb. As myisam has the same write table locking behaviour.
I'm a little ignorant of the situation on the sqlite side, so this may be a dumb question...
Say you have an MVC web app, could you have asynchronous calls to a thread performing the actual DB writes?
From your app's perspective, concurrent writes would be placed into a queue that performs them serially.
Are there any functional issues to this approach? Thanks in advance for your thoughts or info.
So many times I see people handing out a root password to edit /etc/password. Sure, it may be crazy for performance but, even a trivial number of users and aliases are more easily managed with a remote mysql client. Everyone that has used postfix in a production environment knows the pain otherwise, the rest feel free to vote down.
There are practically countless ways to delegate management of mailboxes and users without granting root. That's not even a hard problem to solve, and certainly not one that justifies introducing a hugely complex additional variable to the equation.
Anyway, I'm not saying "never use MySQL for mail users" (though, I think the percentage of deployments where it makes sense is closer to none than it is to one), I'm just trying to make the point that MySQL is, in some folks minds, a magical solution to performance problems. Often, it not only introduces needless complexity, it won't even improve performance. It's a classic example of "when all you have is a hammer, everything starts to look like a nail". MySQL is a very fine hammer. It just isn't the right tool for every job.
The article we're talking about is another case where a little knowledge is a dangerous thing. A desktop app serving one user with a tiny data set (as I understand it, we're talking about the metadata for a person's music collection) is exactly the right workload for SQLite. I'd be shocked if a naive port to MySQL were faster (though they acknowledge that there's room for query optimization), and not at all surprised if it were slower. And, I know it'll require more memory and disk space for the same working set.
> Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!
Amarok music player on KDE linux desktop environment (at least on version 3.5) kept the music collection metadata in database. You could choose sqlite, mysql or postgress during installation if I remember correctly. It worked ok, because of the package system.
I guess if you used separately installed databases there would be some conflicts, though.
Not faster perhaps but with sqlite3 I had locking issues and after I switched to mysql I don't. It's really that simple. The app was used by a small office of 40 people at first, then started being used by close to a hundred people and that's when the locking errors began.
That's a pretty big difference in scale. The app in the original article seems to be a desktop application, not a client/server system with many users. There's plenty of use cases where MySQL (or PostgreSQL) is a great choice. Yours is probably one of them.
I don't even know what it is so I wish I had looked it up first but now it's too late. Also mysql will aide in later replicating the application between different locations to ensure availability for each office.
Yeah the WAL basically makes "concurrent processes reading/writing to the same database" work magically, where as without you'll get all sorts of issues and timeouts. I don't know why it isn't enabled by default, or at least more prominently advertised as an option.
Initially, WAL mode was off by default because older versions of SQLite do not support it, and it is a property of the database file. Thus, a database created in WAL mode would be unreadable by older SQLites. But WAL has been available for 6 years now, so it might be reasonable to make it the default. We will take your suggestion under consideration. Thanks.
Wow, the article is such a fresh breath of air, primarily because the author demonstrates common sense.
He (they?) picked SQLite for all the correct reasons:
- best tool for the job for their situation;
- write-light and read-heavy;
- zero configuration;
- easy to embed;
- understanding that optimizing queries by far gives the best performance in the shortest amount of time.
As an aside, I'm currently using SQLite for Bacula and Postfix, and it's a joy to use; the only drawback I found so far is lack of REGEXP REPLACE in the SQL dialect which the database supports (must be loaded with .load /path/to/lib/libpcre.so, but it is not part of the language). I used the Oracle RDBMS for my PowerDNS deployments, but in retrospect, the way PowerDNS works, SQLite would have been an even better match. All in all, it is great to read that someone picked it for all the correct reasons, rather than some fashion trend, as is often the case in computer industry.
Premature optimization is evil, but preemptive optimization is necessary unless you want to paint yourself into a corner. I realized this after implementing a bitcoin full node.
In my bitcoin implementation, as an experiment, I tried storing the blockchain in sqlite, postgres, and leveldb. I gathered up a bunch of data from the first ~200k blocks of the blockchain and benchmarked all three databases. I queried for something like 30,000 utxos out of a set of a couple million. What took 300-400ms in leveldb took 1.6 seconds in postgres (on the repl. in my actual node it would have taken longer due to deserialization of the utxos). What took 1.6 seconds in postgres took over 30 seconds in SQlite.
Now, you can tell me I did the benchmarks wrong, and "oh, if you just did this it would be faster!", but 30+ seconds is slower to an absolutely insane level. Needless to say, I went the key-value store route, but I was still astounded at how slow sqlite was once it got a few million records in the database.
I actually like sqlite, but when you know you're going to be dealing with 70gb of data and over 10 million records, preemptive optimization is the key. If I were the author, I would consider switching to postgres if there are over 500k-1m records to be expected. That being said, if they're partial to sqlite, SQLightning (https://github.com/LMDB/sqlightning) looks pretty interesting (SQLite with an LMDB backend).
edit: To clarify, these weren't particularly scientific benchmarks. This was me timing a very specific query to get an idea of the level of data management I was up against. Don't take my word for it.
What operations were you doing in your benchmark? What was SQLite actually doing (CPU?, disk I/O?) while taking 30 seconds to finish? This would be a lot more useful and less FUDdy with more detail.
I'll clarify here: these weren't very scientific benchmarks, as I alluded to. So, don't take my word for it. I didn't measure ops per second, I didn't take into account system load or cpu usage, etc. This is all anecdotal evidence. I'm not saying this an announcement that "you shouldn't use sqlite". I measured the time of one query.
It was a benchmark I ran just to personally give me a general idea of what I was up against in terms of data management. This is primarily why I didn't post them in the first place. This was several months ago. I'll try to find the code that ran them and put them up on github if I can, but I doubt it will be that useful without the actual data, which isn't easy to upload.
So yes, to clarify, don't take my word for it. This is just my experience.
I would be interested in a repo or post about your setup for the actual project; downloading, storing and using blockchain data. Sounds super interesting so if it(article, repo, site) isn't private & exists, would be keen. Cheers.
The project itself an alternative bitcoin full node, one of the few (along with btcd, bitcoinj, etc), and the only one that runs in the browser: http://bcoin.io/browser.html. I've posted it here before but no one seemed to be interested.
It's probably not good if you want to index anything more than addresses, but you could easily modify it to index other things if you wanted to. The actual blockchain database resides here: https://github.com/bcoin-org/bcoin/blob/coinviews5/lib/bcoin...
^ That's the current branch I'm working on which, as it happens, optimizes a lot of the storage of utxos. Before, it was storing them the messy bitcoinj way. Now it uses coin viewpoints similar to bitcoind and btcd.
It would still allow you to use a simpler solution and not paint yourself in a corner if you optimized on sqlite; no preemptive optimization necessary wrt choice of dbms if you can perform the work on the simpler solution.
> What took 300-400ms in leveldb took 1.6 seconds in postgres (on the repl. in my actual node it would have taken longer due to deserialization of the utxos). What took 1.6 seconds in postgres took over 30 seconds in SQlite.
I'm sorry to say but you're almost certainly doing something wrong then.
It's possible, but it was a very simple database layout. Pretty hard to screw up. Then again, I screw up simple things all the time. But whatever I screwed up would likely have been duplicated on the postgres side since the sqlite and postgres setups were nearly identical. Postgres performed fine compared to sqlite. It was a bit behind leveldb, but I figure the overhead of flushing the data to a socket has an added cost. Leveldb has the advantage of being in the same process.
If I remember right: Querying for leveldb was iteration from [20-byte-hash][lo-utxo-id] to [20-byte-hash][hi-utxo-id] and subsequent lookups on the keys containing the actual data. The sql table setup was `id` as a char primary key and `address` as a char with an index. The query was a simple `select data from utxos where address = ?`. The actual `data` blob was on average maybe 60 bytes in size.
Maybe there was something I could have done better there. I'm not sure. This was just my experience. This is all beside the point anyway. My point was whatever database was better, it was worth testing each one, and I don't consider it to be premature optimization.
By table you mean index, right ?
What matters most is the size of the index, and most index data structures (see btrees) work so that very few disk seeks are needed, even if the index is large.
Perhaps if you're using a really slow spinning disk and have no indexes, or the results are distributed evenly across the entire dataset (i.e lots of random, non sequential access).
It's just 300ms is really slow, even with a largeish dataset that doesn't fit into memory. Perhaps you hit a corner case in some way that destroyed performance in sqlite, but I'd be surprised if those results were representative.
With relational databases, you absolutely can see subsecond to 30+ second ranges for the same query on the same data, if you don't have proper indexing or stats on your tables.
Depends from your disk system and data set size ... It can also mean 2-5 hours or days versus subsecond query. Try having 6 TB database with integer column. Then select just one row from it. Either it's indexed or not. You don't need to explain that, if query ends up taking forever it wasn't indexed.
Between that, and packages readily available on most Linux and BSD distros out there (and, in most cases, installed by default), it's well on its way to become a de facto standard system API for relational storage.
It's amusing how Microsoft's different departments keep trying to kill off each other. They spent so much time trying to shove SQL Server Express down everyone's throat, and now everyone gets SQLite included instead.
SQLite is not a competitor to SQL Express, because the latter is still an out-of-proc server. It is a competitor to SQL CE (which shipped its last release to date in 2011, so...).
The nice thing about SQLite is how little it assumes about the world outside. That made it easy to run in WinRT application sandbox with minimal changes; and for quite a while, it was the only local DB readily available to WinRT code written in C# or C++ (JS got IndexedDB).
Like his POST++ has direct mapping to C++ classes so if you use C++ then you don't need any ORM.
In my Sciter[1] Engine I am using his DyBase library [3] as a bult-in persistence for Sciter's script [2] (JavaScript++).
With the DyBase in script you have features similar to MongoDB (noSQL free-form DB) but without any need for ORM and DAL - you can declare some root object as be persistable and access those data trees as if they are JavaScript objects. The engine pumps objects from DB into memory when they are needed:
var storage = Storage.open(...);
var dataRoot = storage.root; // all things inside are persitable
dataRoot.topics = []; // flat persistable list
dataRoot.topics.push({ foo:1, bar:2 }); // storing object
/* create indexed collection with string keys, keys can be unique or not */
dataRoot.titles = storage.createIndex(#string);
DyBase has Python bindings too.
[1] http://sciter.com - multiplatform HTML/CSS UI Engine for Desktop and Mobile Application
I was unfamiliar with this project and assumed it was a hosted service at first. Not so, this is a local application, so an embedded database makes sense.
It took until the very last paragraph for the blog post to make that point.
FWIW, I've run SQLite in a few production sites (low 6 figure pageviews per month) and it has worked fantastically. If you understand and work with the limitations, it really is amazing how much you can get out of it.
I'm actually surprised WordPress hasn't ever moved over to it for ease of installation/deployment - WordPress and PHP seem more likely to trip over in most deployments I've seen before SQLite would.
> I'm actually surprised WordPress hasn't ever moved over to it for ease of installation/deployment
For the same reasons Wordpress hasn't moved to Postgres or doesn't include feature X,Y or Z. Wordpress has to remain relatively stable. A change in the database means breaking Wordpress ecosystem, as many plugins add tables to the DB. There is very little abstraction when it comes to Wordpress API and its interaction with the DB. Wordpress doesn't ship with an ORM.
What are the limitations? I love SQLite, and have vaguely heard that it has issues with concurrency, but what, exactly? I use it on production for www.tithess.gr and it seems to be working beautifully, no concurrency problems whatsoever there.
What problems should I be expecting in a multi-access scenario? I've never had that question answered adequately.
Only one process can have a SQLite database file open for writes at a time. Multiple processes/threads can read, however.
That's about it. I'm hesitant to describe it as an "issue" with concurrency because that has connotations that it's a problem with SQLite that the authors should address. Rather, it's part of the simplicity that is a key design feature of SQLite.
As SQLite's own documentation[0] says, it doesn't compete with client/server databases like Oracle/Postgres/MySQL - it competes with fopen() (edit: not a system call, see below), hand-maintained serialization/pickling formats, etc.
Nitpick: fopen() is not a syscall, it's part of stdio which is in libc (in user mode). The nearest POSIX syscall equivalent is open(2), but stdio does things like buffering and formatting in user mode on top. It's also more portable to non-Unix because stdio is in the C language spec.
I had heard this phrase before as simply "it competes with fopen()".
Andrew has already spoken to the single writer matter. Another issue is it's not very easy to scale horizontally at all. You could shard, but you're not going to be doing replication very easily (although a project does exist to provide replicated SQLite - https://github.com/rqlite/rqlite - but then you might as well finally use Postgres or whatever).
My attitude to this is if my project can be working well and turning a profit within SQLite's limitations, we can worry about migrating to a different stack later on. The same reason I use Ruby for almost everything initially too.
Consider multi-user wordpress site for example. You can have one SQLite DB per user - there is simply no information to share between users. If so you can split all your users between multiple backend machines and do per user request routing - like one machine serving users from A to F. SQLlite has cheap DB loading sequence so you can open/close it for each page request. That would be perfect horizontal scaling as all your users will not fight for single DB access.
Or just remove one [DB] layer completely. Such databases can be stored directly on machines executing http requests.
And if you need to rebuild DB structure you don't need to stop the world - do them one by one.
Of course, all this depends on amount of data you need to store for each user and informational structure of the app.
You cannot have multiple writers to a SQLite database. Only a single file descriptor may be open with 'write' access. As long as you can get the performance you need out of a single writer, then you're good!
Clarification: You can have multiple connections (aka file descriptors) open on the same database file for writing at the same time. But only one can be actively writing at a time. SQLite uses file locking to serialize writes. Multiple writers can exist concurrently, they merely have to take turns writing.
I've run into this issue before with Django's automatically generated migrations breaking in SQLite. As a workaround, I rm my local test db or make the change myself.
The worst thing that I experienced was actually the lack of basic support for routine schema alterations that you find in other DMBS's. I don't have time to recall exactly but relatively mundane things like altering the names or data types of columns required basically dropping and rebuilding the columns / tables from scratch. It really made the ongoing maintenance quite painful compared to other DBMS's.
From other limitations there can be the aspect of security. If you have a more complex application / set of applications you may want to use different database users for different purposes. Maybe some audit / append-only tables as well.
With sqlite whoever controls the app can do whatever they want with the database file.
Wouldn't a "full" RDBMS like Mysql/Postgres offer a ton of benefits over SQLite (like the features to handle edge cases as they arise) to the point where, even if SQLite would work, SQLite still wouldn't be the ideal choice?
Does wordpress completely abstract the database or do third-party plugins use their own interfaces to the database, in which case a migration to SQLite would break a lot of them?
Wordpress does attempt some abstraction of the database via its various wrapper classes/functions like wpdb, WP_Query, etc. It's definitely not ORM-level abstraction by any stretch of the imagination, though.
Agreed. Not that the 3 items are not relevant, but the fact that it's a local application sounds to me like the most decisive argument in favor of an embedded DB.
SQLite also does remarkably well with recovering from all manner of power loss / crashes / worst case scenarios. We created a "power loss" rig just to test this facility for one particular system. Really SQLite's biggest weakness is concurrency, and if your app needs that in any serious amount you probably ought to look elsewhere. If you're just dealing with occasional concurrency though SQLite shouldn't be dismissed out-of-hand.
The breadth and depth to which SQLite is tested[0] is both admirable and inspiring.
It's not just "internal" tests, like unit tests or things that can be run from client C programs, but tests of all kinds of hard-to-simulate external situations like out-of-memory and power failure situations.
That is simply amazing write-up. I'd still not call it a high-assurance system given some things missing. Yet, the amount of rigor in the testing phase could easily be the baseline for that niche in that it exceeds about anything I've seen. There's basically so little I could suggest on code or testing side that I'm not going to even bother here given how marginal it would be due to effort already put in. Just too well-done for mere speculations.
I also noted that the assert section essentially does Design-by-Contract. This is a subset of formal specification that's prime value is in preventing interface errors (vast majority in big apps) and supporting formal verification. Done in design/spec phase in high-assurance since both Edsger Dijkstra and Margaret Hamilton independently discovered technique's value. Done at language-level since Eiffel due to Bertrand Meyer. Good to see that, even if not all techniques, they're doing the 80/20 rule to get most benefit out of what formal specs they're using. Also allow you to easily enable run-time checks if you can accept performance hit. Nice.
I gained a newfound respect for SQLite recently after reading Dan Luu's post on file consistency (http://danluu.com/file-consistency/). I had always thought of SQLite as a bit of a toy database, but having read that post I was surprised by how rigorously it appears to have been developed.
Reading that post makes me want to investigate LMDB as an alternative "competition for fopen()". I'd be very interested if anyone had opinions/experience on that idea.
(Disclaimer: LMDB author here) When your data is simple, too simple for SQL, LMDB is the rational choice. (And as others have already pointed out, you can use SQLightning, SQLite built on LMDB, if you really want the relational data model.)
Nothing else even approaches LMDB for small footprint, efficiency, and reliability.
It works very well for concurrent readers and can handle one concurrent writer when set to WAL mode. Saying that it's no good at concurrency is selling it short. It's blazing fast as long as the use case doesn't require multiple concurrent writers. In fact, it will be faster than client-server databases when you aren't hitting a weak point like that, since it has no IPC overhead.
For any database that isn't huge, a library embedded into your application is going to be faster than anything that has to communicate with a server over a socket connection. Though both execute SQL queries, SQLite is completely different than relational database servers and appropriate many places where running a full RDBMS is not. For example, you can't run MySQL or Postgres on the iPhone, but you can use SQLite.
Could you expand on some of your use cases? An R data frame, by definition, has to fit into memory, so it would seem that any sort of map/filter/group/fold operation would be fastest if performed in-memory, as well. And I assumed that e.g. joining data frames (where you would run out of memory really quickly if your datasets are large to begin with) would be uncommon... am I wrong?
I for example often need to score/model data which doesn't fit in RAM (on my PC) so I use libraries like bigGLM which can use out-of-memory data to build the models. One of the options is SQLite, but you can use an ODBC connection.
Additionally, I can explore slices of the data, which resides only on disk. I don't even need to import it. I can use dplyr (very famous package for aggregations and slicing) which will map the R syntax to SQL which is executed by SQLite.
The set up is quite interesting. The sql data frames library will embed a dataframe in sqllite and let you use sql. It has been quite useful in transitioning our SQL data analysts to R.
> we have almost no indices, no principled denormalization
Sounds like an easy win. People are probably suggesting a database switch because they're finding issues with the current speed, but they're not using their current database to its full potential yet.
Is the bottleneck even the database in the first place? From the article, that doesn't seem to be the case:
> The main case when beets writes to its database is on import, and import performance is dominated by I/O to the network and to music files.
Small file I/O is universally slow, even on SSDs, and if you're hitting rate-limited MusicBrainz servers for each file, database performance is almost irrelevant.
This appears to be a desktop application, so SQLite seems like a good fit. Although if some people are requesting it perhaps they have their reasons. For example if I had RDBMS already setup with proper backups I would generally prefer to use that, since there's no effort to make yet another allocation use it. It most likely would also enable me to have access to the same databases from multiple computers. With SQLite you would have to solve the same problems again (e.g. store the database in Dropbox account, and deal with shortcomings of that approach)
IMO instead writing how SQLite is the best choice for the project I think it would be better to add support for multiple backends. Something that is good for you and majority of users does not mean it's good for everyone.
Sqlite is fine for small scale systems. It is not a "web scale" database, but not every web site is "web scale."
SQLite does have performance limits, and will break at certain load, but until that, it's okay.
For single user databases, like desktop applications, SQLite is awesome!
What the others bring to the table is concurrent sever performanc, user management, and such.
There's nothing surprising about this, right?
I'm a long time user and lover of SQLite, since way back when. Use it in a lot of our projects (web and Win32) that require local databases for logging etc.
Sure for larger or concurrent user access to a db, we use other databases, but nothing beats the 'zero footprint' install of SQLite. I even enjoyed the fact that earlier versions of SQLite forced me to think optimise my queries due to the lack of nested SELECTs.
SQLite still kind of reminds me of my early days in MySQL. I was recently trying to downgrade MySQL 5.7 to 5.6 after a slew of issues, which forced me to reminisce about how simple things used to be when MySQL was a lot like SQLite still is now...
And much overlooked. It doesn't get half the love it deserves. While it could do with some work around making the sysadmin experience a bit better and have a way of aliasing short 'public' database names to the file-system level database, it's quite a good RDBMS.
About a decade ago, I wrote a system of intermittently connected systems that would run independently and sync up to a central db, all using firebird (uuid custom type) and a few utilities that ran via C# (mono under suse). It worked surprisingly well for the purpose it was designed.
Firebird was definitely nice in terms of being able to utilize the same db from the local systems (embedded) to the centralized server (service mode). As you say, it could use some love, but some of that could be done via symlink and/or consistent structure (/var/firebird/db/*). I haven't even looked at it in a while, wonder how hard it would be to use with electron/node, may have to look.
Beet is an awesome program, you should really check it out if you still are among the minority of people who actually have a music collection and don't rent access from spotify/itunes/etc.
I'm glad to see this post; one of the reasons that I like beet so much is that everything is self-contained.
You're doing it right for your application! MySQL or PostgreSQL would most probably be slower and introduce a lot more overhead as they are client/server oriented systems. Don't listen to those armchair architects!
I don't get the point of this article. SQLite is fine, especially in an embedded database, but once you have concurrent access, it starts to suffer because it has very coarse grained locks, so a "real" database is better for a distributed single-DB design. It's more about using the right tool for the job, and the author seems to be talking himself out of some kind of guilt for SQLite being the right tool for him.
I think you're misreading the post. They're just trying to explain why sqlite is the right tool for their particular job, so they don't have to keep explaining it to other people who keep trying to shoehorn in the wrong tool. Lack of concurrency in their app is specifically mentioned as a reason sqlite is appropriate. There's no guilt there.
I can't blame them. I've been a huge fan of SQLite for years. Anytime I need storage it's my default choice unless there is a specific reason to use something else.
Another nice advantage of it is if you are distributing something that requires a small dataset[0][1]. If I give you both the code and the data already imported into a sqlite database, then you can use the code right away, or you can dump the data to a different database very easily.
I have used SQLite for similar use cases, but occasionally it's led to a corrupted db. I had a cron task writing to it once a day, but an issue with the scheduler led to 2 tasks one day with the latter one finishing before the former.
Of course I can add locking or something in my code, but I'd prefer to handle at a lower level — for example, have SQLite take the latest write without corrupting. I'm hoping someone has solved this problem with SQLite elegantly.
SQLite is supposed to handle multiple concurrent writes by returning SQLITE_BUSY. I'm imagining hypothetical other causes for your corruption problem, like power loss at a bad moment.
The article you linked has an interesting line: "Unfortunately, most consumer-grade mass storage devices lie about syncing." That's the kind of problem I was talking about.
This is helpful; it's possible I'm assuming too much. The corruption did happen only once and was also during a time period that the server (a PaaS) was running maintenance including some downtime.
I may be imagining things but I seem to recall that at one point you had to build your own SQLite to get safe concurrent writes, but that was made the default years ago.
Haven't used SQLite in a while, but this is how I did it.
You should enable WAL (pragma journal_mode=wal) I believe you need to do this on every connection.
Now, whenever you are accessing the data do it within transactions (https://www.sqlite.org/lang_transaction.html) generally SQLite will acquire proper locks when needed (it will minimize amount of locked time by deferring locks until they are needed, but you can use immediate or exclusive modifiers. If you for example put exclusive modifier only one program will be able to read/write at the time, you generally will want to use defer (default) behavior since it still provides the safety but multiple applications still can access the database at the same time)
Also note that locking might not work if you keep the database on network file system.
Some people are unfamiliar with the phrase "right tool for the job".
As the developers behind the project, I'd have to think the authors are in the best position to make the determination about which tool is appropriate.
"right tool for the job" ends up being almost meaningless unless the players involved have enough experience with multiple tools to make a useful judgement. Too often "right tool" is "whatever I've already used before". :/
That is true. I failed to adjust my perspective for the fact that not everyone has nearly thirty years of programming experience given I still consider myself an amateur.
Still we should recall the utility gained in using the right tool is finite and variable. In some cases can be like night and day while in other cases a moderate or negligble improvement might be all you see. Under some circumstances familiarity with a given tool may outweigh the advantages of using a better suited tool - especially if a team lacks well-rounded expertise.
Reinforcing the other side of the argument is the fact that programmers are often subject to constraints placed on them from on high that restrict the choices they might make in such.
Given the specific situation, the author seems to build a pretty good case for SQLite at least by my own limited understanding of the facts.
The "constraints" part is def something to remember. I've done a lot of short term consulting or training over the years, and I'll come in to a new team and see what they're doing. The "right tool" line is often used to justify something, and it's often either because it's the only thing person X knew, but often the 'right tool' might take an extra several weeks to get the team up to speed on. It's definitely the 'right' tool, but they're denied that decision.
My reaction to "right tool" was less about this particular article (and I agree with you) as it was to the phrase in general. I've seen it used to justify just about any tech decision ever made.
The other reality is... if you're a C# shop, with a team of 15 C#/.NET devs, the 'right tools', if there's any deadline at all, will probably involve Windows, C# and .NET tools, even if they are demonstrably inferior to, say, a Linux-based stack. Deadlines, existing code and budgets do play a role in decision making, further confounding the usefulness of the "right tool" rule.
For everyone loving SQLite, you should consider donating to them. I remember a post this last year about the maintainers working on it full time, but making much less than most of us probably do.
Implementing WebSQL was so much work that all the browser vendors just basically bundled SQLite and called it a day. The problem was that the working group required at least one other implementation, before it became a standard. But that was such a monstrous project that none of the browser vendors wanted to take it on.
What the browser vendors agreed on is that they would rather have IndexedDB. I've never used it, but they say it is a lower-level API than SQL, and using it you could build your own SQL abstraction layer above it.
So we have as standards the key-value-based localStorage and then, halfway between that and SQL, is IndexedDB.
I participated in HTML5 specification work group at W3C as Invited Expert. We had such a proposal to add SQL storage to HTML5. But it was pretty much unanimous opinion that adding specification of any SQL flavor to the HTML spec would be too much.
Yeah, basically it would've had to specify all of SQL, or to say "do what SQLite does", and neither option was very palatable to all the parties. Some browsers [1] did implement WebSQL before it was formally abandoned, though.
Some browsers can use Open Source libraries, some simply cannot , for whatever reasons. Yet formally we need at least two independent implementations of the same thing in order the spec to get Recommendation status.
So each UA (a.k.a. browser) has to implement any SQL spec from scratch adding its own SQL flavor.
In fact I am pretty sure that it is possible to add some basic SQL features by JS on top of IndexedDB. It should be such things already I think.
SQLite is Ok, but write access must be synchronized.
I used it for my Flask (Python) application and was forced to switch to PostgreSQL because of synchronization problems. I would prefer sticking with SQLite which was simpler to manage.
The author doesn't say a word about synchronization when writing to SQLite.
The entire thing reeks of self-congratulation on how smart he is for not engaging in "premature-optimization" for using a embedded database engine...as the database engine embedded in an single-user application.
I have written several "Use Sqlite!" posts that have made the rounds on hackernews... reading this watered down post, which is devoid of any new, surprising or usable info, it strikes me that repping SQLite has achieved meme status.
If you want tangible info you can actually use, read sqlites documentation. There's a wealth of information there.
I've seen it so many times over the years regarding users and email configurations. I can't count the number of times I've dropped into someone's badly behaving mail configuration and found they had MySQL hosting the users, and explained it was for "performance" reasons. Somehow they didn't grasp that /etc/passwd fits entirely in memory, and the map files Postfix uses for various lookups and stuff are already a database (just one specifically designed for the task at hand) and also fit entirely in memory. Putting that MySQL layer in there is disastrous if performance matters; it is orders of magnitude slower for any case I've seen...still plenty fast for most cases, but it's ridiculous that this idea gets cargo-culted around that if you store your mail users in MySQL your mail server will be faster.
A little knowledge is a dangerous thing, is what I'm trying to say, and people who know MySQL is "fast" may not know enough to know that it's not the right tool for the job in a lot of cases...and is probably slower for many use cases. I'm pretty confident this is one of those cases. SQLite is wicked fast on small data sets, and being smaller means more of it will fit in memory; I can't think of any way MySQL could be a more performant choice for this workload.
Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!