For tests, first use a real database. If you can't because the database just doesn't want to (ahem mongo ahem) then use a fake. If you can't use a fake, stop lying and use a fake.
Mocks for databases are extremely brittle and complicated.
Ahem MongoDB? I must admit, I've never understood what MongoDB's real use case is supposed to be. Whenever I've looked at it, there has always been a better alternative. When I've had to use it, I've had to debug performance problems that wouldn't have existed with alternative solutions.
It sounds cool. But running software isn't about sounding cool.
A decade ago, it was really clear. As https://aphyr.com/posts/284-call-me-maybe-mongodb explains, MongoDB didn't really work. But they've fixed that. So now it runs acceptably accurately. I just don't know why I'd ever want to.
I worked full time on a pretty seriously-trafficed product based on MongoDB for 3 years and I still don't know of anywhere I'd want to use MongoDB. I'd basically always want either a DB with a schema or a super fast opaque-store style cache.
Also, their hosted offerings (MongoDB Atlas) were not well operated and they took down our company for 2 days. Our MongoDB instance stopped accepting new connections and restarting our instance via their control panel didn't bring it back up and then their support literally said "we don't know why this happened or how to fix it" for like a day and a half, while we were on their highest tier ultra platinum support contract. Ultimately, we had to do 24 hours of research and then tell their own support how to fix the problem using their special shell access. If I recall correctly, it was some issue with WiredTiger (an internal component of the Mongo version we were using).
After that experience, I'd never use anything produced by MongoDB for anything; we moved everything that we possibly could out of Mongo and into more traditional RDBMS (PostgreSQL) and never had to deal with issues like that again.
This is a pretty common experience with mongo tools and support, sadly.
Recently I ran into a tool that spat out "you probably want to use this option!". We paid for enterprise support so I asked why this option was not documented, and they said because it is dangerous. Can you imagine if the "-f" for rm wasn't in the man pages? Ridiculous
MongoDB ships with horizontal sharding out-of-the-box, has idiomatic and well-maintained drivers for pretty much every language you could want (no C library re-use), is reasonably vendor-neutral and can be run locally, and the data modeling it encourages is both preferential for some people as well as pushes users to avoid patterns that don't scale very well with other models. Whether these things are important to you is a different question, but there is a lot to like that alternatives may not have answers for. If you currently or plan on spending > 10K per month on your database, I think MongoDB is one of the strongest choices out there.
Also want to add that you can definitely use MongoDB (or any other database) in a way that doesn't scale well. I have personally run MongoDB at petabyte scale and had a relatively great experience.
For most use cases, PostgreSQL is cheaper and faster to run.
A lot fewer are "web scale" than think they are. For ones who are, there are other competitors like Snowflake that work well.
As for scaling, it depends what you want to do. If you want to do things that look like joins, maybe it wasn't the right choice. Though I've definitely succeeded.
I used MongoDB at a company where engineering policy was strictly that MongoDB was our only allowable database, including in cases where it was clearly not the best choice.
Were there good aspects? Sure... kind of. It was super super easy to just throw data into the database. Need to add a field? Who cares, just add it. Need to remove a field? Who cares, just remove it -- as long as the calling code is null-safe. And it was likewise super easy to store denormalized data for fast lookups when it made sense to do so, as well as deeply-nested things and arbitrary JSON blobs synced from our CMS. And queries could be stored and manipulated as actual Python dicts instead of opaque strings, whereas you normally need an ORM or query builder to do that in SQL. And you could get the best-of-both-worlds-ish with the ODMantic framework, where you could spit out a warning (but not an error) and recover gracefully if there happened to be bad data in the database.
Basically it allows you to forego any foresight or serious design, and instead just throw shit together. Which is great for getting a prototype going really fast and giving the appearance of a highly productive team. That is, until you run out of new microservices to prototype and now you have to start adding features to existing code and fixing bugs. IMO it was completely "penny-wise and pound-foolish" with respect to developer time, but I can at least see the appeal if you're a particular type of engineer operating under a particular set of incentives.
As for using MongoDB for things it was actually meant for (writing a ton of schemaless JSON blobs really fast and figuring out reads later), I have no idea because we never used it for that and had nothing really resembling that use case.
In my company they use it to store a large number of varying sized jsons. You can create indexes to make search queries lighting fast. It's also been extremely stable.
I wasn't involved in setting it up tho, so can't say anything about how difficult it is to work with on the technical side.
I don’t know. Like any other tool, it depends on how you use it. I worked for a unicorn with presence in multiple countries and they were using Mongo as main db in multiple microservices. Around 1500 engineers. It worked fine. I’m not saying it was justified, but never had perf. issues.
We had some pretty bad performance with $lookup queries but it all magically went away after adding some indexes. I have a lot of grievances with teams that use MongoDB, but relatively few grievances with MongoDB itself along these lines.
Sometimes you cannot choose. Sometimes you are handed down some decisions already made, and reverting them may have a cost the business doesn't want to pay.
PG and Mongo will vertically scale about the same depending on your queries. They were probably using sharding with tiny instances which is dumb. Also, large documents doesn't really hurt performance with mongo, except maybe with writes, or with large array fields due to replication implication.
Yeah that's a bad idea. You should instead just create a doc for each number. Each time you add a number the entire list needs copied to disk and to each secondary, so the cost grows quickly with each write.
It's trivial to spin up a PG instance on an AF_LOCAL (Unix ___domain) socket, create the DB, populate it with schema and test data, then run your app's tests. If all the tests have different non-overlapping data then you can even share the one instance for all the tests.
Mocks for databases are extremely brittle and complicated.