MongoDB shares jump more than 30% in $192M IPO

shubhamjain · on Oct 19, 2017

MongoDB isn't usually seen favorably but everyone must admit that it's an unlikely success story that deserves admiration. Think about it. Bringing a database to the market with a completely different paradigm, growing it to the enterprise-production-ready level, and creating a billion-dollar business around is no small deal. Yes, they did ride the NoSQL zeitgeist but they survived when others had no major success.

Undeniably, they did have their fuck-ups in the start, but I think they have done a good job fixing them.

munk-a · on Oct 19, 2017

If they had done so by presenting a product that was picked up due to it's merit and continued to gain traction based on that then I'd be rather impressed. Instead the story of MongoDB seems to be how extremely well targeted marketing and sales can build a so-so product into a huge IPO, there have been performance comparisons showing it isn't even the best at what it does so this IPO is riding on the network effect of those early stages of marketing alone.

In our economy this is sadly an entirely valid route to making money but their mindshare is going to continue to collapse as the warts on their product show more and more.

So the admiration you have should probably be directed at the wool the marketers behind MongoDB managed to pull over your eyes.

rburhum · on Oct 19, 2017

Most people forget that with startups, what you are trying to create is a business... the technology behind is only one element to it.

It is extremely difficult to do good marketing and sales and they did it.

Solve a customer problem while building a healthy business... a success story in my book.

gxs · on Oct 19, 2017

Agree - I think there is a perception that if you build an awesome product it sells itself.

There is so much work to running a successful business, especially at that scale. Marketing, Sales, Implementation, Support, on top of G&A, and that's not even touching on HR and retaining employees in a competitive market.

There is a reason some people are known as the "business" guys, I think expertise in that area is underrated, especially among people who've only worked at smaller companies or startups.

Edit: Genuinely curious to know why I'm being downvoted. Care to elaborate on why you're downvoting?

sanderjd · on Oct 20, 2017

I didn't downvote you, but if I had, it would be because while you're right that building sales and marketing is difficult and doing so successfully is certainly admirable, another fundamentally important part of creating a successful software business is building a successful product. I suspect your downvoters don't believe that Mongo has actually done that part.

onion2k · on Oct 20, 2017

There are a lot of successful software businesses built on terrible products. In fact, in my experience, a lot of startups put far too much emphasis on software quality. As painful as it is for many developer/designer entrepreneurs to hear, if your product solves the customer's problem at a price they're happy to pay then more than good enough to sell to them, even if it's absolute garbage from a tech perspective.

There are plenty of reasons to write good quality code (it's more maintainable, moves faster, more secure, requires less resources, more fun, etc) but "because you won't be successful if you don't" really isn't one of them.

KGIII · on Oct 20, 2017

Speculation: You were being downvoted because software types often express ideas that can be summed up as business is easy and the people doing the business aspects aren't important. They seem to think that everyone and anyone can do it and, when they hear success stories as examples, it reinforces those ideas.

A good example is some of the comments that software folks make about the sales department.

They are different skillsets and people have varied abilities in each. The business aspect is not actually easy. It is also not always done well. The software end isn't always done well, either.

My guess would be that is why you were being downvoted. I notice a number of votes are emotional in nature. They aren't based on logic. I see some that are most easily explained as, "I don't want to know." The comment is factual, not confrontational, and topical. The vote is still negative.

Meh... I don't mind it when it happens to me. I have karma to spare and don't actually make comments for the points. I say what I feel needs to be said and try to stick mostly within the guidelines. (I do meander off topic sometimes.)

But, no... I don't comment for the points. I comment for the replies and thoughts. They are usually quite brilliant and insightful.

sanderjd · on Oct 20, 2017

I did not downvote the parent comment, but I did downvote you, because I think your comment stereotypes "software types" in an unfair way. It is possible to both have a lot of respect for the difficulty of building the business side of an organization and for the folks who can do it well, and to think that it is unwise to build a business based mostly on sales and marketing without focusing enough on the quality of the product itself.

KGIII · on Oct 20, 2017

You missed the part where I said 'often.' That is okay by me. Karma is only good if you spend it.

sanderjd · on Oct 20, 2017

Sure, but your speculation suggests that the "often" is enough to explain the parent's downvotes. I think there's a better explanation that doesn't impugn certain "types" of people. I'm just speculating too of course.

KGIII · on Oct 20, 2017

Just out of curiosity, what do you think is a better (and more probable) answer?

sanderjd · on Oct 22, 2017

Sorry I just saw this - it's what I said in my original comment that you replied to, I think a better and more probable explanation is that people "have a lot of respect for the difficulty of building the business side of an organization and for the folks who can do it well, [but that they] think that it is unwise to build a business based mostly on sales and marketing without focusing enough on the quality of the product itself".

mrbrowning · on Oct 20, 2017

Mongo as a business has largely catapulted itself to where it is by making money off of information assymetry more than actual technical superiority. They may have since corrected a lot of the fundamental technical isssues with their product, but that doesn’t change the fact that (contrary to SV dogma) their success doesn’t rest on the creation of wealth, only its extraction from magpie-type customers. Downvoters likely chafe at the notion that such a strategy is laudable or valid.

rtpg · on Oct 20, 2017

Even assuming that the product is so-so, saying that they did not create wealth is a bit disingenuous. They basically created "NoSQL" discussions at a time where most settled on some SQL variant despite their difficulties.

It's not like they were just a vendor of some other product. They actually spent money on the product, on evangelism (kind of marketing, but also kind of development), on so much.

Stripe's technical achievements at first were basically "a nice button" and a slightly friendlier risk model. Yet they are lauded (as they should be!). We can cut Mongo some slack given how we act about other companies.

philwelch · on Oct 20, 2017

I would say that NoSQL was more popularized by Amazon’s whitepaper on Dynamo, which they published in 2007.

alexasmyths · on Oct 21, 2017

"their success doesn’t rest on the creation of wealth, only its extraction from magpie-type customers"

To make such an assertion, you're going to have to provide evidence that the people using Mongo and it's services, are, in fact too stupid to know what's good for them, and that they are absolutely better off using something else.

nopzor · on Oct 19, 2017

Have no idea why you're being downvoted. I think your point is salient and under appreciated. Certainly from my own experiences it's easy to get caught up in the product and the tech, and not focus enough on the biz side of things. Ie. Sales and Marketing.

Basically I'm in complete agreement with you, and feel this is a hard learned lesson.

_8huj · on Oct 20, 2017

I'm not sure HN celebrates products for simply being successfully marketed.

That said, I do think they made a way of handling a data model pleasurable. Different paradigm++

developer2 · on Oct 20, 2017

>> not sure HN celebrates products for simply being successfully marketed

Except that Mongo didn't find success through "simply being successfully marketed". So many people in this thread cannot seem to understand how to look at the business as a whole. Software devs are blasting the software for not being a mind-blowing piece of technology, while not understanding that it has its niche and - apparently - a good business strategy.

You cannot (typically) succeed by only having an amazing product. You also (typically) cannot succeed by dumping millions into sales and marketing for a truly garbage product. The product itself and the business behind it are both necessary elements for success.

There are a lot of developers in this thread trying very hard to appear smart by jumping on the "Mongo is a joke" bandwagon.

quickben · on Oct 20, 2017

Have you looked at their financials? They are in the same bandwagon as their technology.

gaius · on Oct 21, 2017

Mongo's business model is "exploiting the technically naive". They are the CueCat of databases.

alexasmyths · on Oct 21, 2017

How is it exploitative to give something away for free?

Imagine if everyone using MongoDB had to pay for it, you know, like most things in the world?

I suggest they are giving away their cake and picking up a few crumbs after the fact.

Builders don't typically give away homes to sell a few cabinets.

lafar6502 · on Oct 21, 2017

Exploiting with products available for free? Oh really?

gaius · on Oct 22, 2017

The first taste is always free

awinder · on Oct 20, 2017

I agree but given that the technology is the product here, they’re in a bit of a different situation. Evaluating the technology is evaluating the product here.

rburhum · on Oct 20, 2017

During my first startup as the CTO I made the mistake of thinking this way. I was so wrong I burned through 2M like it was nothing.

The "product" of a startup is a sustainable business model that can scale fast. The core technology may or may not be a key element part of that model.

So they solved a customer problem with interesting technology - regardless of arguments if it ACID compliant, or as good as <replace with whatever you think is "better">.

Where they sustainable at the beginning? No. Are they sustainable now and did they scale appropriately? Yes.

As a startup that went through it stages, they did it. This is not easy because it requires all the elements to work properly (market need, sales, marketing, engineering, operations, etc).

So what if RethinkDB/or_other has a feature that is better? Irrelevant.

Even better they did this ethically, which is more than I can say about other "startups" with insane funding and dubious and questionable strategies.

jameskilton · on Oct 19, 2017

Ah, right, like RethinkDB?

As much as I'd love this to be true, the #1 measure of success is if people even know you exist and #2 is if you solve enough of a customer's problems that it's worth using your product. MongoDB hit these two points hard right out of the gate and are now very successful because of it.

takeda · on Oct 19, 2017

I think it all comes down to what definition is of solving a problem.

What Mongo did is implementing these features just well enough so they can put checkpoint in their marketing brochures.

When you actually start using it you learn that most of it is either performing slow, does not work correctly or once in a while corrupts your data.

Yes they hit these points hard, but only from marketing point of view. They are now universally hated by ops and developers because these people now have to deal with fallout.

RethinkDB actually is an example that database is something that's not good as a base for a startup[1], and the Open Source approach (like PostgrSQL) is more suitable.

MongoDB is example that you can base a startup on a database technology and succeed, if you can sleep well at night that your product can cause people to lose data.

[1] The reason RethinkDB failed was because they refused to provide half solutions like MongoDB, everyone expects startup to start making money, but building a reliable databases is not something that's easy and can't be done quickly.

Jare · on Oct 19, 2017

> Yes they hit these points hard, but only from marketing point of view

MongoDB serviced one of my projects for 5 years without a single hitch. It did everything I wanted it to do, it did it well, it did it exactly as advertised (no more, no less), and it did it right when I needed it. It never stopped improving during those years, massively in some aspects. The company behind it only grew stronger during those years, and along with it, so did my confidence that my project wouldn't need to undergo re-engineering further down the line.

Seriously, as a professional software developer, what more do you want.

Drdrdrq · on Oct 19, 2017

My experience too, 6 years. Never been a fanboy, but don't understand the hate either. MongoDB never surprised me, the tradeoffs were known in advance (at least the ones that mattered to me).

takeda · on Oct 19, 2017

Based on my experience, your use case was simple enough that any database would do just fine.

Jare · on Oct 19, 2017

Our use case was not simple, but MongoDB made it simple to implement. We used pretty much every feature of MongoDB. Any other database could have been coerced to do what we needed, of course, but with greatly varying degrees of extra complexity or overhead.

orf · on Oct 20, 2017

Please elaborate on how your use case was not simple, and how your data was not relational in the slightest.

tluyben2 · on Oct 19, 2017

Because of the term 'NoSQL' people (programmers) assumed they could throw all the information all those neckbeards are whining about overboard. Because NoSQL, people don't seem to treat it like a database and just toss data in it and run queries in the most insane manner possible. Just ignore the fact that your code makes it so that every single query is doing a full tablescan! That doesn't matter on NoSQL right? For a long time, case insensitive queries on a field were done (and recommended) using a regexp...

noncoml · on Oct 19, 2017

MongoDBs clients are much more simple to use than that of RethinkDB. Imho that was the number one reason it didn’t gain traction.

mfjordvald · on Oct 20, 2017

I'm curious what you mean by this. We use a combination of RethinkDB + Deepstream at work and it's amazing to work with with really simple querying language. I might very well be missing something, though, as I'm not the main developer so not too deep into it.

noncoml · on Oct 20, 2017

For one is the extra database handle and run I have to use every time, instead of the commands being methods to the db handle object. The repetition of males code more verbose.

Secondly, it’s the ReQL. It’s not native but not JSON based either. A weird hybrid mix.

But it’s only my personal opinion based on my taste.

lafar6502 · on Oct 21, 2017

And there are no bad reviews of Rethinkdb because nobody used it for real

VintageCool · on Oct 21, 2017

There was at least one review from someone who switched from RethinkDB to Postgres, which includes a lot of pain points and the line:

> A RethinkDB employee told me he thought I was their biggest user in terms of how hard I was pushing RethinkDB.

http://blog.sagemath.com/2017/02/09/rethinkdb-vs-postgres.ht...

wnevets · on Oct 19, 2017

>how extremely well targeted marketing and sales can build a so-so product into a huge IPO

Thats how most products succeed if you ask me.

munk-a · on Oct 19, 2017

I don't disagree, but we haven't seen them close the distance in terms of performance.

Let me clarify a bit, I'm not saying that this tactic is invalid or inefficient within our current system, in fact this is a great way to land a good payday if you can pull it off. My point was more that MongoDB is going to implode when their performance shortfalls become visible to the wider populace, if that happens, of course. It's still possible they take the earnings from this IPO and dump heavy into R&D investment to make up the lost ground, but it's also very possible they take this payday and ride out the tech until it breaks down.

takeda · on Oct 19, 2017

Even minor things like picking a product name had impact. They also implemented most desirable features (the gotcha was that they were half assed, but if your evaluation was also half assed as is usually done you won't notice it)

I actually chose it back in 2010 project and on paper MongoDB looked better than other databases. MongoDB name as they mentioned originates from Humongous Database giving false impression that it can scale well.

GenericsMotors · on Oct 20, 2017

> Even minor things like picking a product name had impact.

They get zero points for this.

It pretty much reads as "Retard"DB in Portuguese, Spanish, and IIRC German.

sigzero · on Oct 20, 2017

Not that it is 100% reliable but I put "mongo" in Google translate for all three of those languages and it comes out "mongo". Unless mongo is "slang" in those languages.

GenericsMotors · on Oct 20, 2017

It is slang, short for "mongoloide".

https://en.wiktionary.org/wiki/mongo

Interesting this isn't the case with English.

zimpenfish · on Oct 20, 2017

English would be "mong". Probably because we got lazy with our abusive terms in the 70s and 80s.

https://en.wiktionary.org/wiki/mong#English (Etymology 3)

Arnt · on Oct 20, 2017

It was picked up due to merit, IMNSHO, just not the for the factors you think highly of.

It didn't do ACID, not even nearly, but it did other praiseworthy things, with "laser focus on four critical things: onboarding, usability, libraries and support" (https://www.nemil.com/mongo/2.html). Please don't be disparaging about any of those four.

hodgesrm · on Oct 20, 2017

Amen to that. The first time I used MongoDB it was stupid easy to get started. We implemented data replication from MySQL into MongoDB in a 3 hour hackathon, something that would have taken at least a week for a relational database. For a full description of the experience see:

http://scale-out-blog.blogspot.com/2011/05/introducing-mysql...

Don't put down 'easy'. It's incredibly hard to do well.

P.s., I'm DBMS guy and have lived and breathed Jim Grey-style transactional consistency for decades. I'm still impressed by what they did.

capkutay · on Oct 19, 2017

How did MongoDB go to market differently than the other enterprise NoSQL DBs (MarkLogic, DataStax)?

pja · on Oct 19, 2017

I’m told they absolutely flooded tech meetups in the SV area and with advertorial talks - talks on NoSQL (that just happened to use MongoDB of course), lots of FUD about SQL vs. NoSQL etc etc.

I’ve no idea to what extent this was true, or if it was whether it was their main marketing approach, but it might have been quite an effective strategy: talks at meetups are often 'trusted' to a greater extent than other ways of reaching devs & if you can find a new generation of devs who are just starting out and hook them on your product before they get the chance to explore other approaches then it could work. High cost of course (someone has to pay for all that time) but if it gets you a leg up on your competitors in a $billion business then that’s what the VC money is for...

VintageCool · on Oct 19, 2017

MongoDB was huge at hackathons, too. People could implement an app in a weekend using MongoDB because it had no integrity checks and none of its drawbacks manifest themselves in the first two days of development.

munk-a · on Oct 19, 2017

I totally get this, but a hackathon is a one time software build and you don't need to lean heavily on a schema if you've only got one state your data has ever existed in, MongoDB is great for templating like this.

However, when it comes to writing software that's meant to survive changes and versions the schema ends up providing a safety net that ensures you can ignore the data that exists and assume that the structure described by your schema is consistent. Data integrity is key to maintainability and those constraints are what keep your data clean.

pja · on Oct 19, 2017

I wouldn't be at all surprised to discover that they seeded the hackathons with ringers too: Devs on payroll that turned up & implemented a website/app in a weekend that naturally 'just happened' to use MongoDB.

Makes you wonder what other meetup/hackathon visible tech is really covert marketing...

derefr · on Oct 19, 2017

> their mindshare is going to continue to collapse as the warts on their product show more and more

I'm not clear on why you think this—software products with huge mindshare and an open contribution model, get their warts fixed, usually by programmers sponsored by corporations repackaging the software (Redhat and Canonical with Linux) or running the software in a SaaS model (Cloudant with CouchDB.)

tsilva · on Oct 19, 2017

- "We are better then you are. We have better stuff..."

- "You don't get it, Steve. That doesn't matter!"

https://www.youtube.com/watch?v=CBri-xgYvHQ

SmirkingRevenge · on Oct 20, 2017

Javascript, x86, MySQL, DOS.... the market victors in this industry are often just "good enough" technologically speaking and sometimes even kinda suck.

Add MongoDB to the list. Its "good enough", and marketed well.

adrianN · on Oct 20, 2017

That seems to be the business model of all "enterprise" products.

amazingman · on Oct 20, 2017

Good for them. I hope their longer-term employees who likely traded lower salary for stock options get a nice windfall from this.

nagarjun · on Oct 20, 2017

What would you consider to be a "top-notch" NoSQL database?

darkr · on Oct 20, 2017

DynamoDB, Cassandra

rljy · on Oct 20, 2017

Has MongoDB improved post funding? I got involved in Docker early and invested about a year of work into a project based on Docker. Docker was slow and had warts and my project ultimately is almost unusable due to Docker's slowness and warts. I knew about the warts near the beginning, but I assumed that when they got $40million, they would fix the warts. They didn't. Indeed, it seems that quality has not changed since they got funded, perhaps it has even decreased! If nothing else, it has certainly gotten harder to install on linux. I think that it is often the case that a user base buys into an idea and funding comes and there is a vision. There is nothing wrong with that, so long as the company actually invests the money into achieving that vision. I mean, lots of people have bought into Tesla's vision, that vision hasn't been achieved yet, it's still mostly just marketing, especially the self driving part, but no one is going to hate them for marketing the vision if they DO succeed, or at least try valiantly.

dominotw · on Oct 20, 2017

> it has certainly gotten harder to install on linux.

Downvoted for this. I haven't used docker much but I just did

> yum install docker-ce

and started the service. took me less than 2 mins.

rljy · on Oct 20, 2017

I recently was at a seminar where the lady sitting next to me was a network analyst. Her laptop was running debian and she needed to install docker to be able to work along in the seminar, so she did the logical thing which was to go to docker.com and opened the "Get docker" menu. Under that menu, there are two main options, "Mac" and "Windows". She managed to get down to the "for servers" section and click "debian" at which point she was directed to "download docker from the docker store" at which point she was lost for a while. Then she managed to get to this document https://docs.docker.com/engine/installation/linux/docker-ce/... at which point I had to help her, because she didn't know which exact version of debian she was running and didn't know how to find out. I would say it was pretty much a catastrophe, as she spent about half an hour trying to get docker installed.

You ran "yum install docker-ce" you were lucky that you knew the name of the package. I didn't know the name on Fedora, and looking here https://docs.docker.com/engine/installation/linux/docker-ce/... didn't help one bit. Indeed, that page doesn't include a line with "yum" in it at all!

rogerb · on Oct 20, 2017

Lol. Yes, you customers of mongodb ! You’re all wrong and duped by pretty pictures! If just you could achieve the enlightenment of munk-a ! Lol. I’ve seen this idiotic response from the very beginning days of mongo (I worked there). Marketing at mongo did little more than maniacally focusing on making sure our customers and users where tied into the community and supremely supported on any issue they might have. But go on - keep predicting imminent failure, your sentiment is shared by a lot of people who have never build a company and frankly never will.

rdslw · on Oct 20, 2017

I don't buy "creating a billion-dollar business".

Ive read a Bloomberg story on the IPO and saw there two hilarious sequences "MongoDB has 4,300 paying customers. MongoDB employs 820 people in 29 offices" - that does not compute (for me) so I started analyzing their financial report (btw funny is nobody in whole HN discussion quoted numbers from the report so far, but a lot of 'im buying their stock' talks here).

In the report, scroll to the bottom where they put some cream: they show for last year 91m rev from subscriptions plus 10m from services. While at the same time they show operating LOSS of 85 millions (due to cost, people, marketing, sales, services etc). Khem khem, I know they are growing (holy world) but no, that does not compute.

Of course it is just mine opinion, and looking on the IPO results, rather unpopular one :) but I will stand by it. Especially in next 3 years. Caveat emptor.

I will short them soon.

austenallred · on Oct 20, 2017

Make sure to look at the revenue growth rate before you short (or encourage HN to do so). What happens if that growth rate continues for another year or two, and they have a standard growth multiple? You’ll lose your shirt.

It seems like you’re trying to value the company in a manner that doesn’t make sense for this stage of a company, and you could lose a hell of a lot of money.

For the uninitiated in the stock market: you shouldn’t really be shorting anything, especially not growth stocks, and abso-freaking-lutely not immediately following an IPO.

syllogism · on Oct 20, 2017

So long as you're making a loss, there's a lot of ways to "creatively" grow revenue. In the limit case (which would be fraud), you and I can just trade out expensive invoices, declaring each as a loss. This puts a lot of revenue on our books, with no value exchanged.

Mongo don't have to be committing fraud, of course. They could be doing any number of actual business activities that make legitimate trades --- but trades optimised at revenue growth, not profitability.

The major metric for companies used to be profit growth. When companies were optimising for that, it was smart to look at revenue growth as a leading indicator. But Goodhart's law ruins everything: now companies know to optimise for revenue growth, and so its value as a metric is much diminished.

austenallred · on Oct 20, 2017

Companies optimize for future cashflow, because that’s how value is created.

syllogism · on Oct 20, 2017

Mongo's current cashflow is negative, and the trend is increasingly negative.

I had a look at their prospectus, which describes the bulk of their revenue as subscriptions. Subscriptions sound like good unit economics. It's instructive to compare their pitch to investors to the pitch they make their customers: http://s3.amazonaws.com/info-mongodb-com/TCO_MongoDB_vs._Ora...

When Mongo talk to their customers, they describe the license cost as $0 --- they fold that into support. That sounds more like a service.

In other words: customer fires $100k of staff, pays Mongo $80k in "subscription", Mongo hires $120k of staff, which they tally up as "customer success". There's no expense category for support in their prospectus, so clearly the support personnel are filed under "sales and marketing".

It's the same old story. They're just selling $1 bills for $0.80 a piece.

throw98987 · on Oct 20, 2017

How can one short it efficiently?

austenallred · on Oct 20, 2017

I mean this in the kindest way possible: If you don’t know the answer to that question, you shouldn’t be shorting anything. Especially not at IPO.

nailer · on Oct 20, 2017

Outside the US, you can do OTC derivatives via Spread Betting. Eg, give me/take from me a pound for every point this stock moves for/against my bet. Most spread betting providers in the UK carry NASDAQ stocks.

Note the downside is unlimited when you short via spread betting. Eg, there's no maximum value of a stock, so you might get proper f'd.

013a · on Oct 20, 2017

The downside is not unlimited when purchasing put options.

notyourday · on Oct 20, 2017

It will be optionable soon. Buy out of money puts.

trive_news · on Oct 20, 2017

You could buy the stock of a competitor if you can't get any stock to borrow for short sale.

droidist2 · on Oct 22, 2017

What's their main competitor? Oracle?

icahnvalyou · on Oct 20, 2017

Most online brokers don't have lending inventory available yet. TD won't let me short.

notyourday · on Oct 20, 2017

Of course they don't. Settlement time is T+3. Unless you are MM, until the first IPO + 3 days no shares can be located. At T+3 it will go into hard to borrow list.

icahnvalyou · on Oct 27, 2017

ojr · on Oct 20, 2017

you have to buy with borrowed money, sell the stock and then buy the same amount at a lower price to make money shorting, I do not believe you will short them or have the capital to make significant money doing so but it is interesting to see the value you place on your analysis

enjo · on Oct 19, 2017

It really is an interesting story. They aren't best in class in any of the metrics we think of as mattering. It's not the highest scale system out there. It's not the most durable. Or the most available. Hell it's not even particularly reliable (at least throughout its history).

It is however..simple. Very simple. It's easy to reason about. It's easy to setup. For 90% of use cases it's very easy to administer.

It turns out the market for that type of data store. Something you can apt-get install and just start dropping data into is pretty massive. I've used MongoDB on a few occasions. Usually thinking I'm just using it to bootstrap a project, but three years later it's still running because it's just good enough to keep me from moving on to something else.

stanfordkid · on Oct 19, 2017

It really was the first NoSQL database that many programmers used -- and it came out right around the same time that JSON was really taking off. The timing was perfect, the product was simple. Fact is most people don't really need insane scale and there are thousands of use cases where you just need to read and write JSON and fetch things by a few different fields. I think they nailed it. Every other NoSQL DB out there at the time was insanely hard to "just play with" (e.g HBase, Cassandra etc.)

I think you hit the nail on the head -- for many programmers who were "coming of age" at that time Mongo was the easiest to experiment with and to get working.

devdad · on Oct 19, 2017

This is me - I had taken SQL at uni and found MongoDB to be much easier to reason about since I knew my frontend parts already. One of the apps I've built has a terrible database structure (my first real life server setup!), and is probably the worst codebase I've ever written. I will never show it publicly. It currently serves 100k+ users and the users have no idea how messed up the server code is. It won't scale to a million users, but that's okay and the technology served me well as a junior.

I since turned to RethinkDB and sometimes just a classic SQL will do, but Mongo is really easy to get started with.

pessimizer · on Oct 19, 2017

> It really was the first NoSQL database that many programmers used

That was because it was heavily marketed, not because of timing.

> Mongo was the easiest to experiment with and to get working.

It really wasn't, it was the one they had heard about.

derefr · on Oct 19, 2017

> It is however..simple. Very simple. It's easy to reason about. It's easy to setup. For 90% of use cases it's very easy to administer.

I think most of HN would feel this way about, say, Redis. And Redis isn't "highest scale" or "most durable" or "most available" either. (Though it is pretty reliable.)

It's interesting, then, to compare the general impression people here have to MongoDB to the one they have of Redis. To me, Mongo is a "why use it when Postgres is just as easy to install", while Redis is exactly what I'd think of "to bootstrap a project, but three years later it's still running."

Is it just the slightly-different pitched use-cases of "working store" (Redis) vs "persistent store" (Mongo)? Is it that Redis still has its uses even when you've got Postgres there beside it, whereas Mongo doesn't so much (at least since Postgres got JSON columns)?

nemothekid · on Oct 19, 2017

Redis makes very different promises than mongodb. Redis has always documented how exactly persistence works and how you could lose data. Redis' benchmarks arent dishonest. Redis isn't marketed as a SQL replacement nor as a primary data store.

adamnemecek · on Oct 19, 2017

> It's easy to reason about.

...until it's not. Then it's really not.

sametmax · on Oct 19, 2017

But how many people reach this stage ?

yeukhon · on Oct 19, 2017

It starts really early on. When I first used MongoDB I thought to myself "what a fresh breath, I can finally ignore normalization!" The documents I inserted would contain all the information I needed. One retrieval and I got what I needed. Wow!! But then I started doing sorting, searching and I had to do most of the work on the client side (my backend). At that point, I found myself in trouble because in my other tables I also dup my information. Data has to be updated in multiple places. So I thought "let's take out the dup data." Then I found myself not knowing how to structure my document data anymore... back to some sort of normalization. At the time the searching and ranking in MongoDB were also poor, so I was forced to doing the entire thing on the client side regardless.

I went back to PostgreSQL since. I probably needed a good one-on-one expert training with MongoDB, but I just found myself happier with RDBMS. You don't have to be strict normalization in RDBMS, just enough to make sense for your use cases.

One thing I really did like about MongoDB back then was storing blob (files). It was the best solution available without setting up S3. At the time there was a limitation with 4GB (??) but MongoDB worked for my use case anyway. That being said, please don't store files in any databases today. Use DB to store references to a real object storage like S3. When the DB crashed, you better hope no corruption.

goldfishcaura · on Oct 20, 2017

It amazes me how quickly our industry has forgotten the need for DBAs. With these MongoDBs, MPP cloud dbs and Hadoops, everyone seems to have assumed that engineers can now do all db work. This is reflected in the titles too: Data "Engineer".

But from my perspective, this is delusional. There is a lot that goes into DBA's experience that is not solved by the performance improvements in databases over the past decade. But there are more choices. 20-30 years ago, you would have been forced to write code on Oracle and you would have asked for help before deciding how to structure the data. Today, with more choices, you just read some online opinions, and jump on it without any internal resource to guide you.

Not saying the world of Oracle was great, but the young on this thread (me included) would benefit from respecting the experience of the old.

yeukhon · on Oct 21, 2017

I agree. The issue I see is it's pretty tough to hire very good DBA that knows the new cool technologies to be very honest. I've worked with some DBAs but they have no experience with Cassandra or whatever !MySQL !Postgres !Oracle !SQL, and they also have very difficult time integrating themselves with the developers. It turns out the developers have better understanding of Cassandra than the DBAs and DevOps/Ops. As Ops we just learn from them and from incidents.... Good DBAs also tend to do a lot of testing and development besides reading manual and utilizing past experiences.

droidist2 · on Oct 22, 2017

> everyone seems to have assumed that engineers can now do all db work

Yes, everyone wants "full stack" and so you have a bunch of people haphazardly adorning themselves with the "full stack" label.

adamnemecek · on Oct 19, 2017

It's a gradual process and you start feeling the pain fairly early on.

zitterbewegung · on Oct 19, 2017

Developer ergonomics are way more important than many features to get wide adoption. You can fix the other things with time. Worse is better.

brianwawok · on Oct 19, 2017

Except when you poison mindshare.

For example, I used Mongo back in the early days. Was terrible. I will now never use it again. I don't care if it shoots lasers. It is dead to me.

Obviously the happy developer count is way more than the hate it for life count, so I am the odd man out here. Perhaps many users never actually had many GB of data or had to deal with the data loss side of things?

erik_seaberg · on Oct 19, 2017

Maybe they haven't noticed? I've seen databases fail in ways that take a long time to detect on accident, if you don't have a source of truth to sanity check against.

bsaul · on Oct 19, 2017

maybe mongo is the db of failed products. Perfect until you really need a database.

The high valuation could be an index of the number of failing products happily using the tech :)

jsjohnst · on Oct 19, 2017

[apparently my opinion is unwanted]

jjnoakes · on Oct 19, 2017

Let's tone down the drama a little; I am willing to bet there aren't many people at all who would literally choose to lose their home versus work with a technology they don't like (even if for valid reasons, and even if those reasons include data loss or other catastrophes).

If my boss tells me to use MongoDB tomorrow or live on the street despite my advice to the contrary, I will use MongoDB happily. I may look for a new job in my free time if using MongoDB makes me miserable, but I certainly won't be living on the street...

jsjohnst · on Oct 19, 2017

[or maybe the fact I watched a hundred million $ startup go down the drain in part due to bad engineering choices, of which Mongo was one, doesn’t entitle me to a negative opinion of Mongo]

jjnoakes · on Oct 19, 2017

You can say you'd choose to be homeless all you want, but saying "many" would do the same I think is being a little too dramatic.

And quitting your job (as you now point out, but didn't originally) is much different from "choosing to be homeless".

jsjohnst · on Oct 19, 2017

I used the word quit in my very first post on this thread in a way that should’ve been obvious I meant quit my job.

jjnoakes · on Oct 19, 2017

It wasn't the word "quit" that was being objected to though; it was your equating quitting to being homeless (in the very same first post in this thread) and then further saying you think many others would also rather quit and be homeless than work with MongoDB.

That's a bit too dramatic.

Next time, just say "I'd rather quit than work with MongoDB again", and you won't have this problem.

jsjohnst · on Oct 19, 2017

> Next time, just say "I'd rather quit than work with MongoDB again", and you won't have this problem.

Fair point... “I would quit rather than work with MongoDB again” is more accurate, but still encapsulates your point.

The point I was trying to make earlier and did in a way overly dramatic for you is that I’d never take a tech job again if it meant I had to use Mongo.

pessimizer · on Oct 19, 2017

hyperbole

Thaxll · on Oct 19, 2017

lol, the things you read on HN, very entertaining.

rareattention · on Oct 20, 2017

>They aren't best in class in any of the metrics we think of as mattering.

MongoDB is the best in class at how programmer-friendly it is. Its API is easy to work with. This is especially the case if you are using node and javascript.

RHSman2 · on Oct 19, 2017

Have you ever tried to write a mongo query? Simple, it is not.

inferiorhuman · on Oct 19, 2017

> Bringing a database to the market with a completely different paradigm, growing it to the enterprise-production-ready level

Neither of those are true, however.

threeseed · on Oct 19, 2017

It was the first, well known, company supported JSON document store.

It is being used by Facebook, Metlife, Expedia, Sony, eBay, Adobe etc.

In what way isn't it ready for production use cases ?

flukus · on Oct 19, 2017

You need more than just a list of companies using it, you can create a similar list for just about any technology, good or bad.

Sony is a worldwide company with 127,000 employees, they alone probably use just about every database system around. So saying sony uses mongo isn't impressive, for all we know it was just a side project from an intern that has 100 documents, there is no context.

threeseed · on Oct 19, 2017

If we use your tortured logic that reference sites are meaningless then no technology in the history of the world is ever production ready.

Well I have worked for a number of billion dollar companies who have run MongoDB so there is first hand evidence.

inferiorhuman · on Oct 19, 2017

Hell, I work at a Fortune 1000 company that uses Mongo. Thankfully I'm not on that team (we're mostly postgres with some mysql derivatives mixed in). Trust me, nobody here actually likes dealing with Mongo.

flukus · on Oct 19, 2017

By your logic Visual Source Safe was production ready.

grafporno · on Oct 20, 2017

Well _maybe_ reference sites _are_ meaningless as measure of production readiness?

inferiorhuman · on Oct 19, 2017

> It was the first, well known, company supported JSON document store.

And how is that a paradigm shift? Non-relational, non-SQL databases have been around for a long, long time.

praneshp · on Oct 19, 2017

> In what way isn't it ready for production use cases ?

Your parent commenter doesn't like Mongo. So it's not production ready.

inferiorhuman · on Oct 19, 2017

> Your parent commenter doesn't like Mongo. So it's not production ready.

Eh I don't like Mongo because I've used it. IDK, it's been a few years since I've had to deal with mongo in a production environment. I don't miss it. I dislike that it's slow to get data to/from the javascript interpreter -- and that the solution was to work around it with the aggregation framework.

I don't like the unsafe defaults (data integrity, access control).

I don't like the magical, unreliable sharding that you have little to no control over.

flavio81 · on Oct 19, 2017

>it's an unlikely success story that deserves admiration

Well, it's a success story due to marketing, smart marketing, clever marketing, and more marketing, not due to lots of technical merits.

And it was not the first document store to appear, just the most successful. For starters, CouchDB was previous to MongoDB.

megaman22 · on Oct 19, 2017

Notes was a document store before JSON, or even XML was even a thing. The wheel turns.

hinkley · on Oct 19, 2017

Reminds me of JBoss, right down to the astroturfing.

munk-a · on Oct 19, 2017

Honestly, this story being on the front page of hackernews might be another example of that astroturfing. It's not impossible that they're trying to drive that price up as much as possible before dumping their shares.

thawkins · on Oct 19, 2017

Couch was a nightmare to use, mingo made all the friction to get it into products go away, especialy when teamed with dynamic languages.

karmelapple · on Oct 19, 2017

Using CouchDB on our team has been a pretty good experience; nothing like a nightmare.

We used replication on the native apps for offline mode, kept things simpler with online-only for our web app, and have had a very good time overall.

We are also using JSON schema for our CouchDB documents so we ensure we don't have all kinds of wild / wrong data showing up; if you had really hairy, unstructured data, I can see why a dynamic language might help with that bad situation, but for our use case, we were able to take advantage of the powerful replication built into CouchDB and get solid offline syncing for all of our native apps that also plays nicely with live updates in a web browser.

jared0x90 · on Oct 19, 2017

Did you really find CouchDB that bad to work with? I never in the end settled on either couch or mongo for a project long term but when I setup couch to test it out a few years ago I had it up and running with functional replication quite quickly (i.e., one afternoon) from being a complete novice with the platform. It's built in web panel at the time basically handled everything from basic setup to replication quickly.

bborud · on Oct 19, 2017

You forgot to mention: bringing a database to market that didn't work. Because for any real definition of "works" MongoDB didn't deliver on that.

(Note: I used to be optimistic about NoSQL, but I was dismayed by the extremely low quality that NoSQL products started out at, and stayed at, for a very long time)

GenericsMotors · on Oct 20, 2017

> Bringing a database to the market with a completely different paradigm

Ah yes, the "toss your data over the fence and hope for the best" paradigm.

Truly groundbreaking.

user5994461 · on Oct 19, 2017

Depends what you mean by production ready. Sure thing, $192M is not a billion dollar business.

yaseer · on Oct 19, 2017

$192M is the value of the shares sold in the IPO.

The MDB's market cap, i.e. it's valuation, is around $1.17 billion as of the time this article was written:

https://www.cnbc.com/2017/10/18/mongodb-prices-its-ipo-worth...

EDIT: Added 'billion'

kureikain · on Oct 19, 2017

Yes,

Recently I read about Postgres 10 release. Almost all of the features of that are available in MongoDB too.

4 or 5 years ago, MongoDB was bad. But nowsaday the landscape was changed, and I would say MongoDB is quite decent to work with now. It's the easiest database to manage, compare with MySQl, PostgreSQL, ElasticSearch.

gremlinsinc · on Oct 19, 2017

Is it me..or does Mongo not seem as relevent and 'hip' as it once was... I mean I feel postgres is much more solid, and you can combine some of the aspects of document store via the json data types they added... of course I'm not really a DBA and don't have a lot of Mongo experience ... but personally I feel rdbms make more sense for growth/scaling..

da_chicken · on Oct 19, 2017

RDBMS makes sense for most applications. Most applications store data that can be fit to the relational model. Most applications aren't big data or data mining OLAP.

Most RDBMSs can do key-value stores very well now. Most applications also care more about consistency over availability, which is what RDBMSs do (CAP theorem). Many NoSQL data stores choose availability and partitioning and sacrifice consistency (i.e., "eventual consistency"). There's a lot of applications that you can't sacrifice consistency for. Electronic health records, financial records, student records, employee records, etc. You care that the data are accurate and up to date, and you want the system to error if it can't provide that. Wrong answers and "close enough" answers aren't good enough.

Now, if you're running Reddit or Wikipedia or Facebook or HN... do you really care if a user doesn't get the absolute latest version of a document or comment? No, not really. If the content is hours old it's a problem, but it's not a big deal if it's a few minutes out of date. You care more that your users get a version of the document more than you care that they get the latest version of the document.

gaius · on Oct 19, 2017

Most RDBMSs can do key-value stores very well now.

Yep, all of MongoDB is just one bullet point on Postgres's list of features. Anyone spending on money on it ought to be hauled before the shareholders and given a talking to on fiduciary responsibility...

elvinyung · on Oct 19, 2017

>just one bullet point

Tell me again how Postgres can seamlessly do horizontal scaling and synchronous replication?

orf · on Oct 20, 2017

https://jepsen.io/analyses/mongodb-3-4-0-rc3

> MongoDB’s version 0 replication protocol is inherently unsafe.

Tell me again how MongoDB took 8 years to get to the point where its replication is kind of OK.

skrebbel · on Oct 20, 2017

This subthread is about the future, not the past. 8 years ago, PG didn't have json support so your point is moot.

I'm very curious about what people think about the future of Mongo, independently and particularly in comparison to Postgres. However every time that comes up, people keep bringing up that Mongo was a buggy piece of crap in some irrelevant past. So what?

orf · on Oct 20, 2017

> This subthread is about the future, not the past. 8 years ago, PG didn't have json support so your point is moot.

8 years ago PG did have replication though, so not sure why it not having feature X 8 years ago makes my point moot.

People keep bringing up that it was a buggy piece of crap because its the icing on the cake and pretty much something you never want your database to be, past or present. Not that software configured by default to eat your data and not persist it can be called a database mind you.

elvinyung · on Oct 20, 2017

I don't see this as a problem. It takes years for any software project to mature, a DBMS even more so. I'm sure that I can go back to the 1980s and find gamebreaking bugs in the original POSTGRES. It has been years for MongoDB to approach maturity.

Of course I would prefer Postgres when I can use it, and I can generally use it basically all the time, but NoSQL still has its use cases.

shawn-butler · on Oct 19, 2017

Synchronous replication was added in 9.1 and much improved in 9.6? pglogical[0] works pretty well for me under 10 but I have no production experience with bdr[1].

[0]: https://www.2ndquadrant.com/en/resources/pglogical/ [1]: https://www.2ndquadrant.com/en/resources/bdr/

elvinyung · on Oct 20, 2017

IMO, sure but it's far from seamless. (I also looked at pg's quorum commits, but the same applies.)

In general Postgres was not designed at its core for a distributed world. Even now, replication feels like an afterthought in the grand scheme of things, and sharding nonexistent without extensions.

tormeh · on Oct 20, 2017

You mean asynchronous?

elvinyung · on Oct 20, 2017

tribaal · on Oct 20, 2017

> Now, if you're running Reddit or Wikipedia or Facebook or HN...

Wikipedia uses MariaDB (so, MySQL). https://meta.wikimedia.org/wiki/Wikimedia_servers#Software

da_chicken · on Oct 20, 2017

Wikipedia predates NoSQL. It runs on PHP + MySQL because that's what was most popular back in 2001, and they have no intrest in completely rewriting their entire stack just to use Cassandra or MongoDB. That doesn't mean a NoSQL data store wouldn't work extremely well for the type of application that Wikipedia is.

sigzero · on Oct 20, 2017

MongoDB is the MariaDB of the NoSQL world.

paulmd · on Oct 19, 2017

> Now, if you're running Reddit or Wikipedia or Facebook or HN... do you really care if a user doesn't get the absolute latest version of a document or comment?

I mean... do you? I often come back a few minutes after posting to add something I forgot or rephrase something for clarity. I hate when I am tweaking a Reddit comment a couple times during a period of high server load and I get served an old version of the comment and end up losing something I added in a previous edit.

With something like Wikipedia it would be quite frustrating to lose revisions.

Obviously it is what it is, I can't change their codebase, and I'm sure it's necessary as currently engineered, but is there really no other way to cluster their data except "one big table"? Maybe like shard subreddits to specific servers ala Hyperdex?

But yeah, most places that Mongo is applied aren't exactly Facebook or Reddit either, in terms of total data throughput.

da_chicken · on Oct 23, 2017

Oh, it will certainly come up, but it's not going to break Reddit if you get an old version of a comment as long as it's eventually consistent. Nobody is going to die, and nobody is going to lose any money.

Data stores like Cassandra and MongoDB don't lose revisions. That's not the kind of consistency we're talking about. CAP consistency is just getting the most recent version. You won't lose data -- data loss is a bug, not expected behavior, just like any other data store -- you just won't always get the most recent version of it. And, keep in mind, when we talk about eventual consistency here we generally mean "consistent on all nodes within a few minutes, but we're not blocking reads to write this data." It's not going to take hours.

That said, if you find you get an old version of your own comment, I'd be more willing to believe it's the fact that your request failed with a 503 error or otherwise timed out as much as it was a data store problem. Next time it happens, wait 5 minutes and try again.

> is there really no other way to cluster their data except "one big table"? Maybe like shard subreddits to specific servers ala Hyperdex?

The whole point of MongoDB or Cassandra is that you can get shards without all the headache that RDBMSs usually put you through. You configure your sharding function and let the system do the rest. You don't have to connect to the right shard or anything of the sort, which some RDBMSs do (or did, it's been awhile since I've looked) require with sharding.

Reddit has their code and architecture posted, though it's out-of-date now, it makes it clear that it's basically just two big tables:

https://github.com/reddit/reddit/wiki/Architecture-Overview

It's PostgreSQL, ThingDB, Cassandra, memcached, and RabbitMQ.

Daycrawler · on Oct 20, 2017

It's not about the applications. It's about the components of the application. Reddit, Wikipedia, Facebook, HN, all use a mixture of RDBMS and NoSQL.

rareattention · on Oct 20, 2017

>RDBMS makes sense for most applications.

Why? An RDBMS has never been the best option for any application I have created and I have created standard business applications as well as consumer applications.

eternalban · on Oct 19, 2017

> RDBMS makes sense for most applications.

But "applications" are built by development teams.

So: Does an "RDBMS makes sense for most applications"?

kornish · on Oct 19, 2017

Could you point to a definition of "application" that has the word "team" in it?

eternalban · on Oct 19, 2017

Why should I waste time on a non-sequitur?

zzzcpan · on Oct 19, 2017

You don't seem to know this, but no traditional RDBMSs actually provide CAP consistency, for that they would have to use at least two-phase commit or something, but they don't. So, they all are noCAP databases. Electronic health or financial records are way safer in a proper eventually consistent database, like orders of magnitude safer, but everyone just takes the risk with some insurance at best to cover the losses.

EDIT: If you downvote, please explain why. You can't disagree with the truth.

artimaeis · on Oct 19, 2017

> "no traditional RDBMS actually provide CAP consistency, for that they would have to use at least two-phase commit

https://docs.microsoft.com/en-us/sql/t-sql/language-elements...

> If the transaction committed was a Transact-SQL distributed transaction, COMMIT TRANSACTION triggers MS DTC to use a two-phase commit protocol to commit all of the servers involved in the transaction. If a local transaction spans two or more databases on the same instance of the Database Engine, the instance uses an internal two-phase commit to commit all of the databases involved in the transaction.

I'm only versed in SQL-Server but I'm pretty sure other RDBMS vendors provide similar functionality.

zzzcpan · on Oct 19, 2017

Thanks for pointing out. Oracle has distributed transactions too.

toomuchtodo · on Oct 19, 2017

In certain environments, it is better to fail then to have data that isn't immediately consistent. Finance and healthcare are two such systems. Availability is not always paramount.

https://en.wikipedia.org/wiki/Database_transaction

https://en.wikipedia.org/wiki/Atomicity_(database_systems)

zzzcpan · on Oct 19, 2017

They only guarantee consistency as long as you don't use them over a network, i.e. communications with the database are always reliable. But once you do use them over a network - CAP theorem comes in and forces you to either use something like two-phase commit or no promises of consistency. Which is the opposite of what his post implied, like there is CAP consistency with those databases. But there never was!

Although I kind of got used to RDBMS crowd not understanding consistency, it's just another technology cult.

richardwhiuk · on Oct 19, 2017

Proper RDMS databases have two phase commit with transactions.....

zzzcpan · on Oct 19, 2017

Which ones? Traditional mainstream RDBMSs, like Mysql and Postgres don't use two-phase commit protocol. Obviously new distributed ones do it properly, but we are not talking about them.

orf · on Oct 20, 2017

https://www.postgresql.org/docs/9.3/static/sql-prepare-trans...

cortesoft · on Oct 19, 2017

Wait, CAP theorem just says that you have to sacrifice availability of you want Consistency and Partition tolerance.

da_chicken · on Oct 19, 2017

> You don't seem to know this, but no traditional RDBMSs actually provide CAP consistency, for that they would have to use at least two-phase commit or something, but they don't.

At the single server level (which is how I think others here are interpreting your comment)? No, they all do, with the exception of some configurations of MySQL (especially older editions, which is why it's often maligned by DBAs). That's what transaction logs do. They're literally a write ahead log (WAL). You commit a transaction, and the DB first obtains an exclusive lock on the affected rows (or page, or table). Any other transaction attempting to read or update those rows will be blocked (with exceptions). It then writes the change to the transaction log and flushes the change to disk. Then it writes the changes to the database file and flushes the change to disk. Then it returns the results of the query to the user. Many RDBMSs let you control how tightly the locks are and the degree that the data are isolated during a transaction.

At the distributed network server level? Then I guess I kind of agree with you, sure. RDBMSs let you "get around" the problems of distributed scaling by not letting you do it easily. SQL servers often only have master/slave or publisher/subscriber setups or otherwise partition the data between instances with sharding. There's no need for raft or paxos type algorithms because they don't attempt to implement a true multi-master environment. There's either a fixed overall master, or each server is the deterministic master of it's own little world, so you avoid consistency problems with distributed data. However, in doing so you sacrifice availability, since if a shard goes down so does all that data, or if the master is busy then you can't always submit queries to the slaves. Replication is used for redundancy, not scaling or load balancing. The solution RDBMSs had was sharding + master/slave replication for redundancy, which can get messy fast and has issues like hot spots or limited queries or variant performance. It's just a lot harder to do than it feels like it should be, and with storage as cheap as it is it feels like a waste of effort.

That said, some RDBMSs do allow you to use multimaster, bidirectional, or peer-to-peer replication, but most of those configurations basically warn you that you're sacrificing consistency by doing it and all of them that I've seen are a huge pain in the ass that makes shard + replicate look like child's play. They also have schema requirements that make life difficult, and they're somewhat notorious for being difficult both to administer and develop for. You have to design the whole thing from the ground up to work with this type of replication, it still feels like a house of cards, and it's this exact level of pain in the ass that encouraged the partitioning and availability focused NoSQL data stores.

However... most applications don't need that kind of scaling. They don't need a database in every time zone for single millisecond response times globally. They don't have the users to demand it, or don't have the quantity of data to require it, or have other requirements that make a traditional RDBMS desirable where you can't accept a system that allows for out-of-date data (which is when PACELC theorem kicks in because NoSQL typically doesn't have locking like an RDBMS does to mitigate this particular problem).

kuschku · on Oct 19, 2017

Actually, the trend with NewSQL goes towards providing CAP consistency, even with multi-master replication.

Google’s Cloud SQL is a good example of that, by using TrueTime as transaction id, and an MVCC implementation, they are able to provide consistency, while also being good enough on the other metrics.

Some NewSQL implementations copy that concept, but unless you run GPS clocks yourself, you’ll get slightly worse results.

bpicolo · on Oct 19, 2017

To be fair, Mongo is better than it's ever been. They even got through Jepsen just fine after wiredtiger.

That said, it's pretty hard to make sense of when you would want non-relational dbms these days, especially in an era where you can get 100core systems in AWS/GCP. Write-scaling is still a pretty obvious reason, though things like Citus might help here.

dsp1234 · on Oct 19, 2017

They even got through Jepsen just fine after wiredtiger.

With non-default settings[0].

The Jepsen tests passed with the "linearizable" read concern. The default read concern is "local", which "Provides no guarantee that the data has been written to a majority of the replica set members (i.e. may be rolled back)."[1] This is like having "READ UNCOMMITTED" be the default read level in a traditional database system.

The Jepsen tests passed with the "majority" write concern. The default write concern is "1", which means only the "primary" in a replica set needs to acknowledge the write[2]. This does not guarantee safety in the face of network partitions.

It's still not safe out of the box.

[0] - https://jepsen.io/analyses/mongodb-3-4-0-rc3

"With the v1 protocol, majority writes, and linearizable reads, MongoDB 3.4.1 (and the current development release, 3.5.1) pass all MongoDB Jepsen tests:"

[1] - https://docs.mongodb.com/manual/reference/read-concern/

[2] - https://docs.mongodb.com/manual/reference/write-concern/

stingraycharles · on Oct 19, 2017

These kind of default settings make sense, though. If you really care about lineair writes, you know how to configure it.

Same goes for example for PostgreSQL [1], that uses Read Committed rather than Serializable transaction isolation by default because for the majority of the people, this is fine, and the performance tradeoffs are worth it.

[1] https://www.postgresql.org/docs/9.5/static/transaction-iso.h...

bpicolo · on Oct 19, 2017

Yeah, and you can turn off fsync in postgres if you want a blazing fast db that loses your data. :D

threeseed · on Oct 19, 2017

This meme isn't even funny anymore.

It was a bug that was fixed many, many years ago and was only true if you didn't use any client libraries.

threeseed · on Oct 19, 2017

Not sure if you know much about databases but those are standard defaults.

Cassandra as well doesn't require a full quorum to acknowledge writes. It just relies on the closest node. Likewise for Oracle.

http://docs.datastax.com/en/archived/cassandra/2.0/cassandra...

growse · on Oct 19, 2017

> That said, it's pretty hard to make sense of when you would want non-relational dbms these days. Write-scaling is still a pretty obvious reason, though things like Citus might help here.

Personally, I find a non-relational database useful when my data model is non-relational.

hungerstrike · on Oct 19, 2017

What is an example of non-relational data?

A tree is relational. Each child has a relation to its parent.

I can't imagine data that has no relation (no connection) to anything else. Maybe what you meant was heterogeneous (e.g. data elements that do not all have the same attributes) - but even then I can't readily come up with an example.

balfirevic · on Oct 19, 2017

Word "relational" in relational databases does not stand for relationships between the tables [0]. That being said, relational model is a great fit for plenty (most?) of the data.

[0] - https://en.wikipedia.org/wiki/Relation_(database)

hungerstrike · on Oct 19, 2017

Thanks for the technical correction. That also being said, I think that definition is terrible and that anybody trying to understand what a relational database is would be better off not reading that article.

Yes, I agree that most types of data can be stored in tabular form (or an "n-ary relation" per Wikipedia). I'm just wondering what concrete types of data one would rather store in a document.

I don't think there is a good example. The decision to store some data outside of an RDBMS must have more to do with the processing model or something else.

growse · on Oct 20, 2017

> I don't think there is a good example. The decision to store some data outside of an RDBMS must have more to do with the processing model or something else.

What else other than the processing model and business requirements would determine how you model and store your data?

hungerstrike · on Oct 20, 2017

That was my point. The person I responded to said that they only store "non-relational data" in NoSQL so I was asking what that was.

balfirevic · on Oct 21, 2017

To clear up the possible confusion, the person you are currently responding to said that, not me.

dandermotj · on Oct 19, 2017

Sparse heterogeneous data is often the type of data stored in NoSQL dbs. Modelled in a relational way, this produces many tables with many NULL fields, while keeping it in a key value format is neat and tidy.

I'd recommend the paper What Goes Around Comes Around[1], the first paper in Readings in Database Systems[2]

[1] https://scholar.google.com/scholar?cluster=73661829057771494... [2]redbook.io

wvenable · on Oct 19, 2017

> Sparse heterogeneous data is often the type of data stored in NoSQL dbs.

I still can't imagine what sparse heterogeneous data exists in the world that makes sense to store. Any type of querying or processing requires some kind of structure (even if implicit in the code) which you can just put in different table structures.

You have to make sense of data to process it and that kind of implies a structure, doesn't it? Am I missing some obvious example of heterogeneous data?

threeseed · on Oct 19, 2017

Customer Analytical Record / Feature Engineering Store

One customer column, tens of thousands of attribute columns.

If you need everything about a customer it is a single, O(1) fetch operation which makes it perfect for driving chat bots, call centres, websites, operational decisioning engines, dashboards etc. Almost every large company will have one of these.

You can't really do it in relational systems properly because (a) you hit the column limit, (b) often it is sparse i.e. lots of NULLs everywhere, (c) you need this system to be distributed since it often gets a lot of load.

rwallace · on Oct 19, 2017

What would the attribute columns consist of? My experience has been with named columns defined individually by humans, of which I've never seen more than a few hundred; how do you get tens of thousands? Are they a different kind of thing?

threeseed · on Oct 19, 2017

Most companies who do it purely by humans can easily get into the thousands of attributes. Have seen it many times before where you hit the column limit of a SQL database.

But where you get into tens/hundreds of thousands is when you have machine learning models automatically selecting and storing important features from the data.

growse · on Oct 19, 2017

Tick (market) data is another good example of this. A given 'Tick' is just an event that can have any of up to thousands of different attributes set (often just a handful).

bpicolo · on Oct 19, 2017

You've got options there though

https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80...

threeseed · on Oct 19, 2017

No.

EAVT is great as an intermediate format but it is absolutely useless to query for since most of the time you are trying to find a set of attributes for a given entity i.e. full table scan.

What you want is a "wide table". One entity column and all the attribute columns to the right. Often with most of the values set to null.

This is the dream use case for MongoDB since it you can ignore sparse values yet when you query it via their drivers it will appear as a wide table. You can't do this at all in PostgreSQL since you will hit a column limit.

flukus · on Oct 19, 2017

> EAVT is great as an intermediate format but it is absolutely useless to query for since most of the time you are trying to find a set of attributes for a given entity i.e. full table scan.

This is what indexes are for. An index on the entity id should avoid any full table scans.

threeseed · on Oct 20, 2017

EAVT table with 100 million entities and 10000 attributes = one trillion row table.

And you want to build indexes on half the table ?

Good luck with that.

flukus · on Oct 20, 2017

> Often with most of the values set to null.

Your math is at odds with your own requirements, null values don't need a row.

threeseed · on Oct 20, 2017

You clearly don't understand what you're talking about.

Sparcity is an issue for the wide table not the EAVT form.

kuschku · on Oct 19, 2017

> You can't do this at all in PostgreSQL since you will hit a column limit.

JSONB is designed for exactly this, isn’t it?

threeseed · on Oct 19, 2017

Sure. But MongoDB is far better at scaling, has infinitely better drivers (including Spark) and is about an order of magnitude faster than PostgreSQL for partial updates.

The lack of a Spark driver alone renders PostgreSQL useless for most companies.

growse · on Oct 19, 2017

Any data can be modelled in a relational way, I guess. It doesn't mean it's the best representation.

Try modelling a cyclic graph in a relational way and you'll quickly tie yourself in knots trying to update and query it.

The point is, relational databases a great for storing data that you've decided to model relationally. If you decide not to, then you probably want some other sort of database.

gaius · on Oct 20, 2017

The point is, relational databases a great for storing data that you've decided to model relationally. If you decide not to, then you probably want some other sort of database.

Sure. But that database is not MongoDB.

growse · on Oct 20, 2017

Oh, I totally agree!

The "you only need relational databases" mantra bugs me though, because it's so obviously not true.

mygo · on Oct 20, 2017

such a naive outlook. MongoDB has two things that are being incredibly overlooked right now: 1- company stability. So you don't have to redo your database in 3 years. 2- developer community / pervasiveness. So you have an easier time integrating it into your projects, even with its technical shortcomings.

I bet two years ago there was someone out there saying "if you're going to do NoSQL you better use RethinkDB over MongoDB"

How great the technology is, is absolutely not the only factor. Good thing people can take in many different factors when making their decisions.

pcsanwald · on Oct 19, 2017

instead of non-relational, think of it as denormalized. I also can't think of any cases where you wouldn't want some relationships. I can absolutely rattle off tons of cases where applications benefit greatly from some kind of denormalization.

In years past, people called these "data warehouses" and essentially took snapshots of their production DBs and denormalized the hell out of them so that aggregations wouldn't crash the server.

bpicolo · on Oct 19, 2017

Sure. The most direct argument against that is that Postgresql jsonb is just non-relational data support in a first class relational DB, which is pretty great, so to an extend you get the best of both worlds, though I'm sure you can find a case where it's not quite optimal vs some nosql db.

This talk is a pretty nifty perf overview:

https://www.percona.com/live/e17/sessions/high-performance-j...

That said, if you know beforehand that horizontal scaling will be a crucial factor, probably postgres isn't the first choice. But with how fast CPUs are these days it's usually not important for a long time.

jsjohnst · on Oct 19, 2017

> But with how fast CPUs are these days it's usually not important for a long time.

That’s a very naive statement to make.

takeda · on Oct 20, 2017

If you grow and you hit hardware limits then congratulations you are the next facebook/google/etc. Also there's nothing stopping you from switching your current database to something else.

Absolutely majority of the companies will do just fine, because the hardware improves faster than their demands. Starting out with a distributed system "because one day we might need it" is just silly, because chances are you'll never hit it, and you'll have to pay for the overhead of having a distributed system (which is non trivial).

Actually my company started using PG and had presentation and someone asked if we considered a distributed database so we can scale. The presenter nicely said it was evaluated and this solution worked best, but that was too nice.

1. It's only about 100GB of data

2. The hardware is barely utilized, we didn't tune it (except some standard memory settings), because there's no need yet.

3. Our data is relational (in fact most data from most companies is relational)

bpicolo · on Oct 19, 2017

Is it? Because there are a ton of companies out there running postgres, mysql and scaling workloads just fine.

jsjohnst · on Oct 19, 2017

I upvoted you because you’re technically right for many cases, but if you’d ever dealt with a database you couldn’t scale further (we were using multiple of the largest instances EC2 had at the time) and new data was flowing in faster than you could delete it and downtime wasn’t an option, you too would hate any predecessors who said “you can always scale higher” before things got so bad it was almost impossible to recover.