Hacker News new | past | comments | ask | show | jobs | submit login
MongoDB shares jump more than 30% in $192M IPO (cnbc.com)
384 points by bnewton on Oct 19, 2017 | hide | past | favorite | 414 comments



MongoDB isn't usually seen favorably but everyone must admit that it's an unlikely success story that deserves admiration. Think about it. Bringing a database to the market with a completely different paradigm, growing it to the enterprise-production-ready level, and creating a billion-dollar business around is no small deal. Yes, they did ride the NoSQL zeitgeist but they survived when others had no major success.

Undeniably, they did have their fuck-ups in the start, but I think they have done a good job fixing them.


If they had done so by presenting a product that was picked up due to it's merit and continued to gain traction based on that then I'd be rather impressed. Instead the story of MongoDB seems to be how extremely well targeted marketing and sales can build a so-so product into a huge IPO, there have been performance comparisons showing it isn't even the best at what it does so this IPO is riding on the network effect of those early stages of marketing alone.

In our economy this is sadly an entirely valid route to making money but their mindshare is going to continue to collapse as the warts on their product show more and more.

So the admiration you have should probably be directed at the wool the marketers behind MongoDB managed to pull over your eyes.


Most people forget that with startups, what you are trying to create is a business... the technology behind is only one element to it.

It is extremely difficult to do good marketing and sales and they did it.

Solve a customer problem while building a healthy business... a success story in my book.


Agree - I think there is a perception that if you build an awesome product it sells itself.

There is so much work to running a successful business, especially at that scale. Marketing, Sales, Implementation, Support, on top of G&A, and that's not even touching on HR and retaining employees in a competitive market.

There is a reason some people are known as the "business" guys, I think expertise in that area is underrated, especially among people who've only worked at smaller companies or startups.

Edit: Genuinely curious to know why I'm being downvoted. Care to elaborate on why you're downvoting?


I didn't downvote you, but if I had, it would be because while you're right that building sales and marketing is difficult and doing so successfully is certainly admirable, another fundamentally important part of creating a successful software business is building a successful product. I suspect your downvoters don't believe that Mongo has actually done that part.


There are a lot of successful software businesses built on terrible products. In fact, in my experience, a lot of startups put far too much emphasis on software quality. As painful as it is for many developer/designer entrepreneurs to hear, if your product solves the customer's problem at a price they're happy to pay then more than good enough to sell to them, even if it's absolute garbage from a tech perspective.

There are plenty of reasons to write good quality code (it's more maintainable, moves faster, more secure, requires less resources, more fun, etc) but "because you won't be successful if you don't" really isn't one of them.


Speculation: You were being downvoted because software types often express ideas that can be summed up as business is easy and the people doing the business aspects aren't important. They seem to think that everyone and anyone can do it and, when they hear success stories as examples, it reinforces those ideas.

A good example is some of the comments that software folks make about the sales department.

They are different skillsets and people have varied abilities in each. The business aspect is not actually easy. It is also not always done well. The software end isn't always done well, either.

My guess would be that is why you were being downvoted. I notice a number of votes are emotional in nature. They aren't based on logic. I see some that are most easily explained as, "I don't want to know." The comment is factual, not confrontational, and topical. The vote is still negative.

Meh... I don't mind it when it happens to me. I have karma to spare and don't actually make comments for the points. I say what I feel needs to be said and try to stick mostly within the guidelines. (I do meander off topic sometimes.)

But, no... I don't comment for the points. I comment for the replies and thoughts. They are usually quite brilliant and insightful.


I did not downvote the parent comment, but I did downvote you, because I think your comment stereotypes "software types" in an unfair way. It is possible to both have a lot of respect for the difficulty of building the business side of an organization and for the folks who can do it well, and to think that it is unwise to build a business based mostly on sales and marketing without focusing enough on the quality of the product itself.


You missed the part where I said 'often.' That is okay by me. Karma is only good if you spend it.


Sure, but your speculation suggests that the "often" is enough to explain the parent's downvotes. I think there's a better explanation that doesn't impugn certain "types" of people. I'm just speculating too of course.


Just out of curiosity, what do you think is a better (and more probable) answer?


Sorry I just saw this - it's what I said in my original comment that you replied to, I think a better and more probable explanation is that people "have a lot of respect for the difficulty of building the business side of an organization and for the folks who can do it well, [but that they] think that it is unwise to build a business based mostly on sales and marketing without focusing enough on the quality of the product itself".


Mongo as a business has largely catapulted itself to where it is by making money off of information assymetry more than actual technical superiority. They may have since corrected a lot of the fundamental technical isssues with their product, but that doesn’t change the fact that (contrary to SV dogma) their success doesn’t rest on the creation of wealth, only its extraction from magpie-type customers. Downvoters likely chafe at the notion that such a strategy is laudable or valid.


Even assuming that the product is so-so, saying that they did not create wealth is a bit disingenuous. They basically created "NoSQL" discussions at a time where most settled on some SQL variant despite their difficulties.

It's not like they were just a vendor of some other product. They actually spent money on the product, on evangelism (kind of marketing, but also kind of development), on so much.

Stripe's technical achievements at first were basically "a nice button" and a slightly friendlier risk model. Yet they are lauded (as they should be!). We can cut Mongo some slack given how we act about other companies.


I would say that NoSQL was more popularized by Amazon’s whitepaper on Dynamo, which they published in 2007.


"their success doesn’t rest on the creation of wealth, only its extraction from magpie-type customers"

To make such an assertion, you're going to have to provide evidence that the people using Mongo and it's services, are, in fact too stupid to know what's good for them, and that they are absolutely better off using something else.


Have no idea why you're being downvoted. I think your point is salient and under appreciated. Certainly from my own experiences it's easy to get caught up in the product and the tech, and not focus enough on the biz side of things. Ie. Sales and Marketing.

Basically I'm in complete agreement with you, and feel this is a hard learned lesson.


I'm not sure HN celebrates products for simply being successfully marketed.

That said, I do think they made a way of handling a data model pleasurable. Different paradigm++


>> not sure HN celebrates products for simply being successfully marketed

Except that Mongo didn't find success through "simply being successfully marketed". So many people in this thread cannot seem to understand how to look at the business as a whole. Software devs are blasting the software for not being a mind-blowing piece of technology, while not understanding that it has its niche and - apparently - a good business strategy.

You cannot (typically) succeed by only having an amazing product. You also (typically) cannot succeed by dumping millions into sales and marketing for a truly garbage product. The product itself and the business behind it are both necessary elements for success.

There are a lot of developers in this thread trying very hard to appear smart by jumping on the "Mongo is a joke" bandwagon.


Have you looked at their financials? They are in the same bandwagon as their technology.


Mongo's business model is "exploiting the technically naive". They are the CueCat of databases.


How is it exploitative to give something away for free?

Imagine if everyone using MongoDB had to pay for it, you know, like most things in the world?

I suggest they are giving away their cake and picking up a few crumbs after the fact.

Builders don't typically give away homes to sell a few cabinets.


Exploiting with products available for free? Oh really?


The first taste is always free


I agree but given that the technology is the product here, they’re in a bit of a different situation. Evaluating the technology is evaluating the product here.


During my first startup as the CTO I made the mistake of thinking this way. I was so wrong I burned through 2M like it was nothing.

The "product" of a startup is a sustainable business model that can scale fast. The core technology may or may not be a key element part of that model.

So they solved a customer problem with interesting technology - regardless of arguments if it ACID compliant, or as good as <replace with whatever you think is "better">.

Where they sustainable at the beginning? No. Are they sustainable now and did they scale appropriately? Yes.

As a startup that went through it stages, they did it. This is not easy because it requires all the elements to work properly (market need, sales, marketing, engineering, operations, etc).

So what if RethinkDB/or_other has a feature that is better? Irrelevant.

Even better they did this ethically, which is more than I can say about other "startups" with insane funding and dubious and questionable strategies.


Ah, right, like RethinkDB?

As much as I'd love this to be true, the #1 measure of success is if people even know you exist and #2 is if you solve enough of a customer's problems that it's worth using your product. MongoDB hit these two points hard right out of the gate and are now very successful because of it.


I think it all comes down to what definition is of solving a problem.

What Mongo did is implementing these features just well enough so they can put checkpoint in their marketing brochures.

When you actually start using it you learn that most of it is either performing slow, does not work correctly or once in a while corrupts your data.

Yes they hit these points hard, but only from marketing point of view. They are now universally hated by ops and developers because these people now have to deal with fallout.

RethinkDB actually is an example that database is something that's not good as a base for a startup[1], and the Open Source approach (like PostgrSQL) is more suitable.

MongoDB is example that you can base a startup on a database technology and succeed, if you can sleep well at night that your product can cause people to lose data.

[1] The reason RethinkDB failed was because they refused to provide half solutions like MongoDB, everyone expects startup to start making money, but building a reliable databases is not something that's easy and can't be done quickly.


> Yes they hit these points hard, but only from marketing point of view

MongoDB serviced one of my projects for 5 years without a single hitch. It did everything I wanted it to do, it did it well, it did it exactly as advertised (no more, no less), and it did it right when I needed it. It never stopped improving during those years, massively in some aspects. The company behind it only grew stronger during those years, and along with it, so did my confidence that my project wouldn't need to undergo re-engineering further down the line.

Seriously, as a professional software developer, what more do you want.


My experience too, 6 years. Never been a fanboy, but don't understand the hate either. MongoDB never surprised me, the tradeoffs were known in advance (at least the ones that mattered to me).


Based on my experience, your use case was simple enough that any database would do just fine.


Our use case was not simple, but MongoDB made it simple to implement. We used pretty much every feature of MongoDB. Any other database could have been coerced to do what we needed, of course, but with greatly varying degrees of extra complexity or overhead.


Please elaborate on how your use case was not simple, and how your data was not relational in the slightest.


Because of the term 'NoSQL' people (programmers) assumed they could throw all the information all those neckbeards are whining about overboard. Because NoSQL, people don't seem to treat it like a database and just toss data in it and run queries in the most insane manner possible. Just ignore the fact that your code makes it so that every single query is doing a full tablescan! That doesn't matter on NoSQL right? For a long time, case insensitive queries on a field were done (and recommended) using a regexp...


MongoDBs clients are much more simple to use than that of RethinkDB. Imho that was the number one reason it didn’t gain traction.


I'm curious what you mean by this. We use a combination of RethinkDB + Deepstream at work and it's amazing to work with with really simple querying language. I might very well be missing something, though, as I'm not the main developer so not too deep into it.


For one is the extra database handle and run I have to use every time, instead of the commands being methods to the db handle object. The repetition of males code more verbose.

Secondly, it’s the ReQL. It’s not native but not JSON based either. A weird hybrid mix.

But it’s only my personal opinion based on my taste.


And there are no bad reviews of Rethinkdb because nobody used it for real


There was at least one review from someone who switched from RethinkDB to Postgres, which includes a lot of pain points and the line:

> A RethinkDB employee told me he thought I was their biggest user in terms of how hard I was pushing RethinkDB.

http://blog.sagemath.com/2017/02/09/rethinkdb-vs-postgres.ht...


>how extremely well targeted marketing and sales can build a so-so product into a huge IPO

Thats how most products succeed if you ask me.


I don't disagree, but we haven't seen them close the distance in terms of performance.

Let me clarify a bit, I'm not saying that this tactic is invalid or inefficient within our current system, in fact this is a great way to land a good payday if you can pull it off. My point was more that MongoDB is going to implode when their performance shortfalls become visible to the wider populace, if that happens, of course. It's still possible they take the earnings from this IPO and dump heavy into R&D investment to make up the lost ground, but it's also very possible they take this payday and ride out the tech until it breaks down.


Even minor things like picking a product name had impact. They also implemented most desirable features (the gotcha was that they were half assed, but if your evaluation was also half assed as is usually done you won't notice it)

I actually chose it back in 2010 project and on paper MongoDB looked better than other databases. MongoDB name as they mentioned originates from Humongous Database giving false impression that it can scale well.


> Even minor things like picking a product name had impact.

They get zero points for this.

It pretty much reads as "Retard"DB in Portuguese, Spanish, and IIRC German.


Not that it is 100% reliable but I put "mongo" in Google translate for all three of those languages and it comes out "mongo". Unless mongo is "slang" in those languages.


It is slang, short for "mongoloide".

https://en.wiktionary.org/wiki/mongo

Interesting this isn't the case with English.


English would be "mong". Probably because we got lazy with our abusive terms in the 70s and 80s.

https://en.wiktionary.org/wiki/mong#English (Etymology 3)


It was picked up due to merit, IMNSHO, just not the for the factors you think highly of.

It didn't do ACID, not even nearly, but it did other praiseworthy things, with "laser focus on four critical things: onboarding, usability, libraries and support" (https://www.nemil.com/mongo/2.html). Please don't be disparaging about any of those four.


Amen to that. The first time I used MongoDB it was stupid easy to get started. We implemented data replication from MySQL into MongoDB in a 3 hour hackathon, something that would have taken at least a week for a relational database. For a full description of the experience see:

http://scale-out-blog.blogspot.com/2011/05/introducing-mysql...

Don't put down 'easy'. It's incredibly hard to do well.

P.s., I'm DBMS guy and have lived and breathed Jim Grey-style transactional consistency for decades. I'm still impressed by what they did.


How did MongoDB go to market differently than the other enterprise NoSQL DBs (MarkLogic, DataStax)?


I’m told they absolutely flooded tech meetups in the SV area and with advertorial talks - talks on NoSQL (that just happened to use MongoDB of course), lots of FUD about SQL vs. NoSQL etc etc.

I’ve no idea to what extent this was true, or if it was whether it was their main marketing approach, but it might have been quite an effective strategy: talks at meetups are often 'trusted' to a greater extent than other ways of reaching devs & if you can find a new generation of devs who are just starting out and hook them on your product before they get the chance to explore other approaches then it could work. High cost of course (someone has to pay for all that time) but if it gets you a leg up on your competitors in a $billion business then that’s what the VC money is for...


MongoDB was huge at hackathons, too. People could implement an app in a weekend using MongoDB because it had no integrity checks and none of its drawbacks manifest themselves in the first two days of development.


I totally get this, but a hackathon is a one time software build and you don't need to lean heavily on a schema if you've only got one state your data has ever existed in, MongoDB is great for templating like this.

However, when it comes to writing software that's meant to survive changes and versions the schema ends up providing a safety net that ensures you can ignore the data that exists and assume that the structure described by your schema is consistent. Data integrity is key to maintainability and those constraints are what keep your data clean.


I wouldn't be at all surprised to discover that they seeded the hackathons with ringers too: Devs on payroll that turned up & implemented a website/app in a weekend that naturally 'just happened' to use MongoDB.

Makes you wonder what other meetup/hackathon visible tech is really covert marketing...


> their mindshare is going to continue to collapse as the warts on their product show more and more

I'm not clear on why you think this—software products with huge mindshare and an open contribution model, get their warts fixed, usually by programmers sponsored by corporations repackaging the software (Redhat and Canonical with Linux) or running the software in a SaaS model (Cloudant with CouchDB.)


- "We are better then you are. We have better stuff..."

- "You don't get it, Steve. That doesn't matter!"

https://www.youtube.com/watch?v=CBri-xgYvHQ


Javascript, x86, MySQL, DOS.... the market victors in this industry are often just "good enough" technologically speaking and sometimes even kinda suck.

Add MongoDB to the list. Its "good enough", and marketed well.


That seems to be the business model of all "enterprise" products.


Good for them. I hope their longer-term employees who likely traded lower salary for stock options get a nice windfall from this.


What would you consider to be a "top-notch" NoSQL database?


DynamoDB, Cassandra


Has MongoDB improved post funding? I got involved in Docker early and invested about a year of work into a project based on Docker. Docker was slow and had warts and my project ultimately is almost unusable due to Docker's slowness and warts. I knew about the warts near the beginning, but I assumed that when they got $40million, they would fix the warts. They didn't. Indeed, it seems that quality has not changed since they got funded, perhaps it has even decreased! If nothing else, it has certainly gotten harder to install on linux. I think that it is often the case that a user base buys into an idea and funding comes and there is a vision. There is nothing wrong with that, so long as the company actually invests the money into achieving that vision. I mean, lots of people have bought into Tesla's vision, that vision hasn't been achieved yet, it's still mostly just marketing, especially the self driving part, but no one is going to hate them for marketing the vision if they DO succeed, or at least try valiantly.


> it has certainly gotten harder to install on linux.

Downvoted for this. I haven't used docker much but I just did

> yum install docker-ce

and started the service. took me less than 2 mins.


I recently was at a seminar where the lady sitting next to me was a network analyst. Her laptop was running debian and she needed to install docker to be able to work along in the seminar, so she did the logical thing which was to go to docker.com and opened the "Get docker" menu. Under that menu, there are two main options, "Mac" and "Windows". She managed to get down to the "for servers" section and click "debian" at which point she was directed to "download docker from the docker store" at which point she was lost for a while. Then she managed to get to this document https://docs.docker.com/engine/installation/linux/docker-ce/... at which point I had to help her, because she didn't know which exact version of debian she was running and didn't know how to find out. I would say it was pretty much a catastrophe, as she spent about half an hour trying to get docker installed.

You ran "yum install docker-ce" you were lucky that you knew the name of the package. I didn't know the name on Fedora, and looking here https://docs.docker.com/engine/installation/linux/docker-ce/... didn't help one bit. Indeed, that page doesn't include a line with "yum" in it at all!


Lol. Yes, you customers of mongodb ! You’re all wrong and duped by pretty pictures! If just you could achieve the enlightenment of munk-a ! Lol. I’ve seen this idiotic response from the very beginning days of mongo (I worked there). Marketing at mongo did little more than maniacally focusing on making sure our customers and users where tied into the community and supremely supported on any issue they might have. But go on - keep predicting imminent failure, your sentiment is shared by a lot of people who have never build a company and frankly never will.


I don't buy "creating a billion-dollar business".

Ive read a Bloomberg story on the IPO and saw there two hilarious sequences "MongoDB has 4,300 paying customers. MongoDB employs 820 people in 29 offices" - that does not compute (for me) so I started analyzing their financial report (btw funny is nobody in whole HN discussion quoted numbers from the report so far, but a lot of 'im buying their stock' talks here).

In the report, scroll to the bottom where they put some cream: they show for last year 91m rev from subscriptions plus 10m from services. While at the same time they show operating LOSS of 85 millions (due to cost, people, marketing, sales, services etc). Khem khem, I know they are growing (holy world) but no, that does not compute.

Of course it is just mine opinion, and looking on the IPO results, rather unpopular one :) but I will stand by it. Especially in next 3 years. Caveat emptor.

I will short them soon.


Make sure to look at the revenue growth rate before you short (or encourage HN to do so). What happens if that growth rate continues for another year or two, and they have a standard growth multiple? You’ll lose your shirt.

It seems like you’re trying to value the company in a manner that doesn’t make sense for this stage of a company, and you could lose a hell of a lot of money.

For the uninitiated in the stock market: you shouldn’t really be shorting anything, especially not growth stocks, and abso-freaking-lutely not immediately following an IPO.


So long as you're making a loss, there's a lot of ways to "creatively" grow revenue. In the limit case (which would be fraud), you and I can just trade out expensive invoices, declaring each as a loss. This puts a lot of revenue on our books, with no value exchanged.

Mongo don't have to be committing fraud, of course. They could be doing any number of actual business activities that make legitimate trades --- but trades optimised at revenue growth, not profitability.

The major metric for companies used to be profit growth. When companies were optimising for that, it was smart to look at revenue growth as a leading indicator. But Goodhart's law ruins everything: now companies know to optimise for revenue growth, and so its value as a metric is much diminished.


Companies optimize for future cashflow, because that’s how value is created.


Mongo's current cashflow is negative, and the trend is increasingly negative.

I had a look at their prospectus, which describes the bulk of their revenue as subscriptions. Subscriptions sound like good unit economics. It's instructive to compare their pitch to investors to the pitch they make their customers: http://s3.amazonaws.com/info-mongodb-com/TCO_MongoDB_vs._Ora...

When Mongo talk to their customers, they describe the license cost as $0 --- they fold that into support. That sounds more like a service.

In other words: customer fires $100k of staff, pays Mongo $80k in "subscription", Mongo hires $120k of staff, which they tally up as "customer success". There's no expense category for support in their prospectus, so clearly the support personnel are filed under "sales and marketing".

It's the same old story. They're just selling $1 bills for $0.80 a piece.


How can one short it efficiently?


I mean this in the kindest way possible: If you don’t know the answer to that question, you shouldn’t be shorting anything. Especially not at IPO.


Outside the US, you can do OTC derivatives via Spread Betting. Eg, give me/take from me a pound for every point this stock moves for/against my bet. Most spread betting providers in the UK carry NASDAQ stocks.

Note the downside is unlimited when you short via spread betting. Eg, there's no maximum value of a stock, so you might get proper f'd.


The downside is not unlimited when purchasing put options.


It will be optionable soon. Buy out of money puts.


You could buy the stock of a competitor if you can't get any stock to borrow for short sale.


What's their main competitor? Oracle?


Most online brokers don't have lending inventory available yet. TD won't let me short.


Of course they don't. Settlement time is T+3. Unless you are MM, until the first IPO + 3 days no shares can be located. At T+3 it will go into hard to borrow list.


TIL


you have to buy with borrowed money, sell the stock and then buy the same amount at a lower price to make money shorting, I do not believe you will short them or have the capital to make significant money doing so but it is interesting to see the value you place on your analysis


It really is an interesting story. They aren't best in class in any of the metrics we think of as mattering. It's not the highest scale system out there. It's not the most durable. Or the most available. Hell it's not even particularly reliable (at least throughout its history).

It is however..simple. Very simple. It's easy to reason about. It's easy to setup. For 90% of use cases it's very easy to administer.

It turns out the market for that type of data store. Something you can apt-get install and just start dropping data into is pretty massive. I've used MongoDB on a few occasions. Usually thinking I'm just using it to bootstrap a project, but three years later it's still running because it's just good enough to keep me from moving on to something else.


It really was the first NoSQL database that many programmers used -- and it came out right around the same time that JSON was really taking off. The timing was perfect, the product was simple. Fact is most people don't really need insane scale and there are thousands of use cases where you just need to read and write JSON and fetch things by a few different fields. I think they nailed it. Every other NoSQL DB out there at the time was insanely hard to "just play with" (e.g HBase, Cassandra etc.)

I think you hit the nail on the head -- for many programmers who were "coming of age" at that time Mongo was the easiest to experiment with and to get working.


This is me - I had taken SQL at uni and found MongoDB to be much easier to reason about since I knew my frontend parts already. One of the apps I've built has a terrible database structure (my first real life server setup!), and is probably the worst codebase I've ever written. I will never show it publicly. It currently serves 100k+ users and the users have no idea how messed up the server code is. It won't scale to a million users, but that's okay and the technology served me well as a junior.

I since turned to RethinkDB and sometimes just a classic SQL will do, but Mongo is really easy to get started with.


> It really was the first NoSQL database that many programmers used

That was because it was heavily marketed, not because of timing.

> Mongo was the easiest to experiment with and to get working.

It really wasn't, it was the one they had heard about.


> It is however..simple. Very simple. It's easy to reason about. It's easy to setup. For 90% of use cases it's very easy to administer.

I think most of HN would feel this way about, say, Redis. And Redis isn't "highest scale" or "most durable" or "most available" either. (Though it is pretty reliable.)

It's interesting, then, to compare the general impression people here have to MongoDB to the one they have of Redis. To me, Mongo is a "why use it when Postgres is just as easy to install", while Redis is exactly what I'd think of "to bootstrap a project, but three years later it's still running."

Is it just the slightly-different pitched use-cases of "working store" (Redis) vs "persistent store" (Mongo)? Is it that Redis still has its uses even when you've got Postgres there beside it, whereas Mongo doesn't so much (at least since Postgres got JSON columns)?


Redis makes very different promises than mongodb. Redis has always documented how exactly persistence works and how you could lose data. Redis' benchmarks arent dishonest. Redis isn't marketed as a SQL replacement nor as a primary data store.


> It's easy to reason about.

...until it's not. Then it's really not.


But how many people reach this stage ?


It starts really early on. When I first used MongoDB I thought to myself "what a fresh breath, I can finally ignore normalization!" The documents I inserted would contain all the information I needed. One retrieval and I got what I needed. Wow!! But then I started doing sorting, searching and I had to do most of the work on the client side (my backend). At that point, I found myself in trouble because in my other tables I also dup my information. Data has to be updated in multiple places. So I thought "let's take out the dup data." Then I found myself not knowing how to structure my document data anymore... back to some sort of normalization. At the time the searching and ranking in MongoDB were also poor, so I was forced to doing the entire thing on the client side regardless.

I went back to PostgreSQL since. I probably needed a good one-on-one expert training with MongoDB, but I just found myself happier with RDBMS. You don't have to be strict normalization in RDBMS, just enough to make sense for your use cases.

One thing I really did like about MongoDB back then was storing blob (files). It was the best solution available without setting up S3. At the time there was a limitation with 4GB (??) but MongoDB worked for my use case anyway. That being said, please don't store files in any databases today. Use DB to store references to a real object storage like S3. When the DB crashed, you better hope no corruption.


It amazes me how quickly our industry has forgotten the need for DBAs. With these MongoDBs, MPP cloud dbs and Hadoops, everyone seems to have assumed that engineers can now do all db work. This is reflected in the titles too: Data "Engineer".

But from my perspective, this is delusional. There is a lot that goes into DBA's experience that is not solved by the performance improvements in databases over the past decade. But there are more choices. 20-30 years ago, you would have been forced to write code on Oracle and you would have asked for help before deciding how to structure the data. Today, with more choices, you just read some online opinions, and jump on it without any internal resource to guide you.

Not saying the world of Oracle was great, but the young on this thread (me included) would benefit from respecting the experience of the old.


I agree. The issue I see is it's pretty tough to hire very good DBA that knows the new cool technologies to be very honest. I've worked with some DBAs but they have no experience with Cassandra or whatever !MySQL !Postgres !Oracle !SQL, and they also have very difficult time integrating themselves with the developers. It turns out the developers have better understanding of Cassandra than the DBAs and DevOps/Ops. As Ops we just learn from them and from incidents.... Good DBAs also tend to do a lot of testing and development besides reading manual and utilizing past experiences.


> everyone seems to have assumed that engineers can now do all db work

Yes, everyone wants "full stack" and so you have a bunch of people haphazardly adorning themselves with the "full stack" label.


It's a gradual process and you start feeling the pain fairly early on.


Developer ergonomics are way more important than many features to get wide adoption. You can fix the other things with time. Worse is better.


Except when you poison mindshare.

For example, I used Mongo back in the early days. Was terrible. I will now never use it again. I don't care if it shoots lasers. It is dead to me.

Obviously the happy developer count is way more than the hate it for life count, so I am the odd man out here. Perhaps many users never actually had many GB of data or had to deal with the data loss side of things?


Maybe they haven't noticed? I've seen databases fail in ways that take a long time to detect on accident, if you don't have a source of truth to sanity check against.


maybe mongo is the db of failed products. Perfect until you really need a database.

The high valuation could be an index of the number of failing products happily using the tech :)


[apparently my opinion is unwanted]


Let's tone down the drama a little; I am willing to bet there aren't many people at all who would literally choose to lose their home versus work with a technology they don't like (even if for valid reasons, and even if those reasons include data loss or other catastrophes).

If my boss tells me to use MongoDB tomorrow or live on the street despite my advice to the contrary, I will use MongoDB happily. I may look for a new job in my free time if using MongoDB makes me miserable, but I certainly won't be living on the street...


[or maybe the fact I watched a hundred million $ startup go down the drain in part due to bad engineering choices, of which Mongo was one, doesn’t entitle me to a negative opinion of Mongo]


You can say you'd choose to be homeless all you want, but saying "many" would do the same I think is being a little too dramatic.

And quitting your job (as you now point out, but didn't originally) is much different from "choosing to be homeless".


I used the word quit in my very first post on this thread in a way that should’ve been obvious I meant quit my job.


It wasn't the word "quit" that was being objected to though; it was your equating quitting to being homeless (in the very same first post in this thread) and then further saying you think many others would also rather quit and be homeless than work with MongoDB.

That's a bit too dramatic.

Next time, just say "I'd rather quit than work with MongoDB again", and you won't have this problem.


> Next time, just say "I'd rather quit than work with MongoDB again", and you won't have this problem.

Fair point... “I would quit rather than work with MongoDB again” is more accurate, but still encapsulates your point.

The point I was trying to make earlier and did in a way overly dramatic for you is that I’d never take a tech job again if it meant I had to use Mongo.


hyperbole


lol, the things you read on HN, very entertaining.


>They aren't best in class in any of the metrics we think of as mattering.

MongoDB is the best in class at how programmer-friendly it is. Its API is easy to work with. This is especially the case if you are using node and javascript.


Have you ever tried to write a mongo query? Simple, it is not.


> Bringing a database to the market with a completely different paradigm, growing it to the enterprise-production-ready level

Neither of those are true, however.


It was the first, well known, company supported JSON document store.

It is being used by Facebook, Metlife, Expedia, Sony, eBay, Adobe etc.

In what way isn't it ready for production use cases ?


You need more than just a list of companies using it, you can create a similar list for just about any technology, good or bad.

Sony is a worldwide company with 127,000 employees, they alone probably use just about every database system around. So saying sony uses mongo isn't impressive, for all we know it was just a side project from an intern that has 100 documents, there is no context.


If we use your tortured logic that reference sites are meaningless then no technology in the history of the world is ever production ready.

Well I have worked for a number of billion dollar companies who have run MongoDB so there is first hand evidence.


Hell, I work at a Fortune 1000 company that uses Mongo. Thankfully I'm not on that team (we're mostly postgres with some mysql derivatives mixed in). Trust me, nobody here actually likes dealing with Mongo.


By your logic Visual Source Safe was production ready.


Well _maybe_ reference sites _are_ meaningless as measure of production readiness?


> It was the first, well known, company supported JSON document store.

And how is that a paradigm shift? Non-relational, non-SQL databases have been around for a long, long time.


> In what way isn't it ready for production use cases ?

Your parent commenter doesn't like Mongo. So it's not production ready.


> Your parent commenter doesn't like Mongo. So it's not production ready.

Eh I don't like Mongo because I've used it. IDK, it's been a few years since I've had to deal with mongo in a production environment. I don't miss it. I dislike that it's slow to get data to/from the javascript interpreter -- and that the solution was to work around it with the aggregation framework.

I don't like the unsafe defaults (data integrity, access control).

I don't like the magical, unreliable sharding that you have little to no control over.


>it's an unlikely success story that deserves admiration

Well, it's a success story due to marketing, smart marketing, clever marketing, and more marketing, not due to lots of technical merits.

And it was not the first document store to appear, just the most successful. For starters, CouchDB was previous to MongoDB.


Notes was a document store before JSON, or even XML was even a thing. The wheel turns.


Reminds me of JBoss, right down to the astroturfing.


Honestly, this story being on the front page of hackernews might be another example of that astroturfing. It's not impossible that they're trying to drive that price up as much as possible before dumping their shares.


Couch was a nightmare to use, mingo made all the friction to get it into products go away, especialy when teamed with dynamic languages.


Using CouchDB on our team has been a pretty good experience; nothing like a nightmare.

We used replication on the native apps for offline mode, kept things simpler with online-only for our web app, and have had a very good time overall.

We are also using JSON schema for our CouchDB documents so we ensure we don't have all kinds of wild / wrong data showing up; if you had really hairy, unstructured data, I can see why a dynamic language might help with that bad situation, but for our use case, we were able to take advantage of the powerful replication built into CouchDB and get solid offline syncing for all of our native apps that also plays nicely with live updates in a web browser.


Did you really find CouchDB that bad to work with? I never in the end settled on either couch or mongo for a project long term but when I setup couch to test it out a few years ago I had it up and running with functional replication quite quickly (i.e., one afternoon) from being a complete novice with the platform. It's built in web panel at the time basically handled everything from basic setup to replication quickly.


You forgot to mention: bringing a database to market that didn't work. Because for any real definition of "works" MongoDB didn't deliver on that.

(Note: I used to be optimistic about NoSQL, but I was dismayed by the extremely low quality that NoSQL products started out at, and stayed at, for a very long time)


> Bringing a database to the market with a completely different paradigm

Ah yes, the "toss your data over the fence and hope for the best" paradigm.

Truly groundbreaking.


Depends what you mean by production ready. Sure thing, $192M is not a billion dollar business.


$192M is the value of the shares sold in the IPO.

The MDB's market cap, i.e. it's valuation, is around $1.17 billion as of the time this article was written:

https://www.cnbc.com/2017/10/18/mongodb-prices-its-ipo-worth...

EDIT: Added 'billion'


Yes,

Recently I read about Postgres 10 release. Almost all of the features of that are available in MongoDB too.

4 or 5 years ago, MongoDB was bad. But nowsaday the landscape was changed, and I would say MongoDB is quite decent to work with now. It's the easiest database to manage, compare with MySQl, PostgreSQL, ElasticSearch.


Is it me..or does Mongo not seem as relevent and 'hip' as it once was... I mean I feel postgres is much more solid, and you can combine some of the aspects of document store via the json data types they added... of course I'm not really a DBA and don't have a lot of Mongo experience ... but personally I feel rdbms make more sense for growth/scaling..


RDBMS makes sense for most applications. Most applications store data that can be fit to the relational model. Most applications aren't big data or data mining OLAP.

Most RDBMSs can do key-value stores very well now. Most applications also care more about consistency over availability, which is what RDBMSs do (CAP theorem). Many NoSQL data stores choose availability and partitioning and sacrifice consistency (i.e., "eventual consistency"). There's a lot of applications that you can't sacrifice consistency for. Electronic health records, financial records, student records, employee records, etc. You care that the data are accurate and up to date, and you want the system to error if it can't provide that. Wrong answers and "close enough" answers aren't good enough.

Now, if you're running Reddit or Wikipedia or Facebook or HN... do you really care if a user doesn't get the absolute latest version of a document or comment? No, not really. If the content is hours old it's a problem, but it's not a big deal if it's a few minutes out of date. You care more that your users get a version of the document more than you care that they get the latest version of the document.


Most RDBMSs can do key-value stores very well now.

Yep, all of MongoDB is just one bullet point on Postgres's list of features. Anyone spending on money on it ought to be hauled before the shareholders and given a talking to on fiduciary responsibility...


>just one bullet point

Tell me again how Postgres can seamlessly do horizontal scaling and synchronous replication?


https://jepsen.io/analyses/mongodb-3-4-0-rc3

> MongoDB’s version 0 replication protocol is inherently unsafe.

Tell me again how MongoDB took 8 years to get to the point where its replication is kind of OK.


This subthread is about the future, not the past. 8 years ago, PG didn't have json support so your point is moot.

I'm very curious about what people think about the future of Mongo, independently and particularly in comparison to Postgres. However every time that comes up, people keep bringing up that Mongo was a buggy piece of crap in some irrelevant past. So what?


> This subthread is about the future, not the past. 8 years ago, PG didn't have json support so your point is moot.

8 years ago PG did have replication though, so not sure why it not having feature X 8 years ago makes my point moot.

People keep bringing up that it was a buggy piece of crap because its the icing on the cake and pretty much something you never want your database to be, past or present. Not that software configured by default to eat your data and not persist it can be called a database mind you.


I don't see this as a problem. It takes years for any software project to mature, a DBMS even more so. I'm sure that I can go back to the 1980s and find gamebreaking bugs in the original POSTGRES. It has been years for MongoDB to approach maturity.

Of course I would prefer Postgres when I can use it, and I can generally use it basically all the time, but NoSQL still has its use cases.


Synchronous replication was added in 9.1 and much improved in 9.6? pglogical[0] works pretty well for me under 10 but I have no production experience with bdr[1].

[0]: https://www.2ndquadrant.com/en/resources/pglogical/ [1]: https://www.2ndquadrant.com/en/resources/bdr/


IMO, sure but it's far from seamless. (I also looked at pg's quorum commits, but the same applies.)

In general Postgres was not designed at its core for a distributed world. Even now, replication feels like an afterthought in the grand scheme of things, and sharding nonexistent without extensions.


You mean asynchronous?


No.


> Now, if you're running Reddit or Wikipedia or Facebook or HN...

Wikipedia uses MariaDB (so, MySQL). https://meta.wikimedia.org/wiki/Wikimedia_servers#Software


Wikipedia predates NoSQL. It runs on PHP + MySQL because that's what was most popular back in 2001, and they have no intrest in completely rewriting their entire stack just to use Cassandra or MongoDB. That doesn't mean a NoSQL data store wouldn't work extremely well for the type of application that Wikipedia is.


MongoDB is the MariaDB of the NoSQL world.


> Now, if you're running Reddit or Wikipedia or Facebook or HN... do you really care if a user doesn't get the absolute latest version of a document or comment?

I mean... do you? I often come back a few minutes after posting to add something I forgot or rephrase something for clarity. I hate when I am tweaking a Reddit comment a couple times during a period of high server load and I get served an old version of the comment and end up losing something I added in a previous edit.

With something like Wikipedia it would be quite frustrating to lose revisions.

Obviously it is what it is, I can't change their codebase, and I'm sure it's necessary as currently engineered, but is there really no other way to cluster their data except "one big table"? Maybe like shard subreddits to specific servers ala Hyperdex?

But yeah, most places that Mongo is applied aren't exactly Facebook or Reddit either, in terms of total data throughput.


Oh, it will certainly come up, but it's not going to break Reddit if you get an old version of a comment as long as it's eventually consistent. Nobody is going to die, and nobody is going to lose any money.

Data stores like Cassandra and MongoDB don't lose revisions. That's not the kind of consistency we're talking about. CAP consistency is just getting the most recent version. You won't lose data -- data loss is a bug, not expected behavior, just like any other data store -- you just won't always get the most recent version of it. And, keep in mind, when we talk about eventual consistency here we generally mean "consistent on all nodes within a few minutes, but we're not blocking reads to write this data." It's not going to take hours.

That said, if you find you get an old version of your own comment, I'd be more willing to believe it's the fact that your request failed with a 503 error or otherwise timed out as much as it was a data store problem. Next time it happens, wait 5 minutes and try again.

> is there really no other way to cluster their data except "one big table"? Maybe like shard subreddits to specific servers ala Hyperdex?

The whole point of MongoDB or Cassandra is that you can get shards without all the headache that RDBMSs usually put you through. You configure your sharding function and let the system do the rest. You don't have to connect to the right shard or anything of the sort, which some RDBMSs do (or did, it's been awhile since I've looked) require with sharding.

Reddit has their code and architecture posted, though it's out-of-date now, it makes it clear that it's basically just two big tables:

https://github.com/reddit/reddit/wiki/Architecture-Overview

It's PostgreSQL, ThingDB, Cassandra, memcached, and RabbitMQ.


It's not about the applications. It's about the components of the application. Reddit, Wikipedia, Facebook, HN, all use a mixture of RDBMS and NoSQL.


>RDBMS makes sense for most applications.

Why? An RDBMS has never been the best option for any application I have created and I have created standard business applications as well as consumer applications.


> RDBMS makes sense for most applications.

But "applications" are built by development teams.

So: Does an "RDBMS makes sense for most applications"?


Could you point to a definition of "application" that has the word "team" in it?


Why should I waste time on a non-sequitur?


You don't seem to know this, but no traditional RDBMSs actually provide CAP consistency, for that they would have to use at least two-phase commit or something, but they don't. So, they all are noCAP databases. Electronic health or financial records are way safer in a proper eventually consistent database, like orders of magnitude safer, but everyone just takes the risk with some insurance at best to cover the losses.

EDIT: If you downvote, please explain why. You can't disagree with the truth.


> "no traditional RDBMS actually provide CAP consistency, for that they would have to use at least two-phase commit

https://docs.microsoft.com/en-us/sql/t-sql/language-elements...

> If the transaction committed was a Transact-SQL distributed transaction, COMMIT TRANSACTION triggers MS DTC to use a two-phase commit protocol to commit all of the servers involved in the transaction. If a local transaction spans two or more databases on the same instance of the Database Engine, the instance uses an internal two-phase commit to commit all of the databases involved in the transaction.

I'm only versed in SQL-Server but I'm pretty sure other RDBMS vendors provide similar functionality.


Thanks for pointing out. Oracle has distributed transactions too.


In certain environments, it is better to fail then to have data that isn't immediately consistent. Finance and healthcare are two such systems. Availability is not always paramount.

https://en.wikipedia.org/wiki/Database_transaction

https://en.wikipedia.org/wiki/Atomicity_(database_systems)


They only guarantee consistency as long as you don't use them over a network, i.e. communications with the database are always reliable. But once you do use them over a network - CAP theorem comes in and forces you to either use something like two-phase commit or no promises of consistency. Which is the opposite of what his post implied, like there is CAP consistency with those databases. But there never was!

Although I kind of got used to RDBMS crowd not understanding consistency, it's just another technology cult.


Proper RDMS databases have two phase commit with transactions.....


Which ones? Traditional mainstream RDBMSs, like Mysql and Postgres don't use two-phase commit protocol. Obviously new distributed ones do it properly, but we are not talking about them.



Wait, CAP theorem just says that you have to sacrifice availability of you want Consistency and Partition tolerance.


> You don't seem to know this, but no traditional RDBMSs actually provide CAP consistency, for that they would have to use at least two-phase commit or something, but they don't.

At the single server level (which is how I think others here are interpreting your comment)? No, they all do, with the exception of some configurations of MySQL (especially older editions, which is why it's often maligned by DBAs). That's what transaction logs do. They're literally a write ahead log (WAL). You commit a transaction, and the DB first obtains an exclusive lock on the affected rows (or page, or table). Any other transaction attempting to read or update those rows will be blocked (with exceptions). It then writes the change to the transaction log and flushes the change to disk. Then it writes the changes to the database file and flushes the change to disk. Then it returns the results of the query to the user. Many RDBMSs let you control how tightly the locks are and the degree that the data are isolated during a transaction.

At the distributed network server level? Then I guess I kind of agree with you, sure. RDBMSs let you "get around" the problems of distributed scaling by not letting you do it easily. SQL servers often only have master/slave or publisher/subscriber setups or otherwise partition the data between instances with sharding. There's no need for raft or paxos type algorithms because they don't attempt to implement a true multi-master environment. There's either a fixed overall master, or each server is the deterministic master of it's own little world, so you avoid consistency problems with distributed data. However, in doing so you sacrifice availability, since if a shard goes down so does all that data, or if the master is busy then you can't always submit queries to the slaves. Replication is used for redundancy, not scaling or load balancing. The solution RDBMSs had was sharding + master/slave replication for redundancy, which can get messy fast and has issues like hot spots or limited queries or variant performance. It's just a lot harder to do than it feels like it should be, and with storage as cheap as it is it feels like a waste of effort.

That said, some RDBMSs do allow you to use multimaster, bidirectional, or peer-to-peer replication, but most of those configurations basically warn you that you're sacrificing consistency by doing it and all of them that I've seen are a huge pain in the ass that makes shard + replicate look like child's play. They also have schema requirements that make life difficult, and they're somewhat notorious for being difficult both to administer and develop for. You have to design the whole thing from the ground up to work with this type of replication, it still feels like a house of cards, and it's this exact level of pain in the ass that encouraged the partitioning and availability focused NoSQL data stores.

However... most applications don't need that kind of scaling. They don't need a database in every time zone for single millisecond response times globally. They don't have the users to demand it, or don't have the quantity of data to require it, or have other requirements that make a traditional RDBMS desirable where you can't accept a system that allows for out-of-date data (which is when PACELC theorem kicks in because NoSQL typically doesn't have locking like an RDBMS does to mitigate this particular problem).


Actually, the trend with NewSQL goes towards providing CAP consistency, even with multi-master replication.

Google’s Cloud SQL is a good example of that, by using TrueTime as transaction id, and an MVCC implementation, they are able to provide consistency, while also being good enough on the other metrics.

Some NewSQL implementations copy that concept, but unless you run GPS clocks yourself, you’ll get slightly worse results.


To be fair, Mongo is better than it's ever been. They even got through Jepsen just fine after wiredtiger.

That said, it's pretty hard to make sense of when you would want non-relational dbms these days, especially in an era where you can get 100core systems in AWS/GCP. Write-scaling is still a pretty obvious reason, though things like Citus might help here.


They even got through Jepsen just fine after wiredtiger.

With non-default settings[0].

The Jepsen tests passed with the "linearizable" read concern. The default read concern is "local", which "Provides no guarantee that the data has been written to a majority of the replica set members (i.e. may be rolled back)."[1] This is like having "READ UNCOMMITTED" be the default read level in a traditional database system.

The Jepsen tests passed with the "majority" write concern. The default write concern is "1", which means only the "primary" in a replica set needs to acknowledge the write[2]. This does not guarantee safety in the face of network partitions.

It's still not safe out of the box.

[0] - https://jepsen.io/analyses/mongodb-3-4-0-rc3

"With the v1 protocol, majority writes, and linearizable reads, MongoDB 3.4.1 (and the current development release, 3.5.1) pass all MongoDB Jepsen tests:"

[1] - https://docs.mongodb.com/manual/reference/read-concern/

[2] - https://docs.mongodb.com/manual/reference/write-concern/


These kind of default settings make sense, though. If you really care about lineair writes, you know how to configure it.

Same goes for example for PostgreSQL [1], that uses Read Committed rather than Serializable transaction isolation by default because for the majority of the people, this is fine, and the performance tradeoffs are worth it.

[1] https://www.postgresql.org/docs/9.5/static/transaction-iso.h...


Yeah, and you can turn off fsync in postgres if you want a blazing fast db that loses your data. :D


This meme isn't even funny anymore.

It was a bug that was fixed many, many years ago and was only true if you didn't use any client libraries.


Not sure if you know much about databases but those are standard defaults.

Cassandra as well doesn't require a full quorum to acknowledge writes. It just relies on the closest node. Likewise for Oracle.

http://docs.datastax.com/en/archived/cassandra/2.0/cassandra...


> That said, it's pretty hard to make sense of when you would want non-relational dbms these days. Write-scaling is still a pretty obvious reason, though things like Citus might help here.

Personally, I find a non-relational database useful when my data model is non-relational.


What is an example of non-relational data?

A tree is relational. Each child has a relation to its parent.

I can't imagine data that has no relation (no connection) to anything else. Maybe what you meant was heterogeneous (e.g. data elements that do not all have the same attributes) - but even then I can't readily come up with an example.


Word "relational" in relational databases does not stand for relationships between the tables [0]. That being said, relational model is a great fit for plenty (most?) of the data.

[0] - https://en.wikipedia.org/wiki/Relation_(database)


Thanks for the technical correction. That also being said, I think that definition is terrible and that anybody trying to understand what a relational database is would be better off not reading that article.

Yes, I agree that most types of data can be stored in tabular form (or an "n-ary relation" per Wikipedia). I'm just wondering what concrete types of data one would rather store in a document.

I don't think there is a good example. The decision to store some data outside of an RDBMS must have more to do with the processing model or something else.


> I don't think there is a good example. The decision to store some data outside of an RDBMS must have more to do with the processing model or something else.

What else other than the processing model and business requirements would determine how you model and store your data?


That was my point. The person I responded to said that they only store "non-relational data" in NoSQL so I was asking what that was.


To clear up the possible confusion, the person you are currently responding to said that, not me.


Sparse heterogeneous data is often the type of data stored in NoSQL dbs. Modelled in a relational way, this produces many tables with many NULL fields, while keeping it in a key value format is neat and tidy.

I'd recommend the paper What Goes Around Comes Around[1], the first paper in Readings in Database Systems[2]

[1] https://scholar.google.com/scholar?cluster=73661829057771494... [2]redbook.io


> Sparse heterogeneous data is often the type of data stored in NoSQL dbs.

I still can't imagine what sparse heterogeneous data exists in the world that makes sense to store. Any type of querying or processing requires some kind of structure (even if implicit in the code) which you can just put in different table structures.

You have to make sense of data to process it and that kind of implies a structure, doesn't it? Am I missing some obvious example of heterogeneous data?


Customer Analytical Record / Feature Engineering Store

One customer column, tens of thousands of attribute columns.

If you need everything about a customer it is a single, O(1) fetch operation which makes it perfect for driving chat bots, call centres, websites, operational decisioning engines, dashboards etc. Almost every large company will have one of these.

You can't really do it in relational systems properly because (a) you hit the column limit, (b) often it is sparse i.e. lots of NULLs everywhere, (c) you need this system to be distributed since it often gets a lot of load.


What would the attribute columns consist of? My experience has been with named columns defined individually by humans, of which I've never seen more than a few hundred; how do you get tens of thousands? Are they a different kind of thing?


Most companies who do it purely by humans can easily get into the thousands of attributes. Have seen it many times before where you hit the column limit of a SQL database.

But where you get into tens/hundreds of thousands is when you have machine learning models automatically selecting and storing important features from the data.


Tick (market) data is another good example of this. A given 'Tick' is just an event that can have any of up to thousands of different attributes set (often just a handful).



No.

EAVT is great as an intermediate format but it is absolutely useless to query for since most of the time you are trying to find a set of attributes for a given entity i.e. full table scan.

What you want is a "wide table". One entity column and all the attribute columns to the right. Often with most of the values set to null.

This is the dream use case for MongoDB since it you can ignore sparse values yet when you query it via their drivers it will appear as a wide table. You can't do this at all in PostgreSQL since you will hit a column limit.


> EAVT is great as an intermediate format but it is absolutely useless to query for since most of the time you are trying to find a set of attributes for a given entity i.e. full table scan.

This is what indexes are for. An index on the entity id should avoid any full table scans.


EAVT table with 100 million entities and 10000 attributes = one trillion row table.

And you want to build indexes on half the table ?

Good luck with that.


> Often with most of the values set to null.

Your math is at odds with your own requirements, null values don't need a row.


You clearly don't understand what you're talking about.

Sparcity is an issue for the wide table not the EAVT form.


> You can't do this at all in PostgreSQL since you will hit a column limit.

JSONB is designed for exactly this, isn’t it?


Sure. But MongoDB is far better at scaling, has infinitely better drivers (including Spark) and is about an order of magnitude faster than PostgreSQL for partial updates.

The lack of a Spark driver alone renders PostgreSQL useless for most companies.


Any data can be modelled in a relational way, I guess. It doesn't mean it's the best representation.

Try modelling a cyclic graph in a relational way and you'll quickly tie yourself in knots trying to update and query it.

The point is, relational databases a great for storing data that you've decided to model relationally. If you decide not to, then you probably want some other sort of database.


The point is, relational databases a great for storing data that you've decided to model relationally. If you decide not to, then you probably want some other sort of database.

Sure. But that database is not MongoDB.


Oh, I totally agree!

The "you only need relational databases" mantra bugs me though, because it's so obviously not true.


such a naive outlook. MongoDB has two things that are being incredibly overlooked right now: 1- company stability. So you don't have to redo your database in 3 years. 2- developer community / pervasiveness. So you have an easier time integrating it into your projects, even with its technical shortcomings.

I bet two years ago there was someone out there saying "if you're going to do NoSQL you better use RethinkDB over MongoDB"

How great the technology is, is absolutely not the only factor. Good thing people can take in many different factors when making their decisions.


instead of non-relational, think of it as denormalized. I also can't think of any cases where you wouldn't want some relationships. I can absolutely rattle off tons of cases where applications benefit greatly from some kind of denormalization.

In years past, people called these "data warehouses" and essentially took snapshots of their production DBs and denormalized the hell out of them so that aggregations wouldn't crash the server.


Sure. The most direct argument against that is that Postgresql jsonb is just non-relational data support in a first class relational DB, which is pretty great, so to an extend you get the best of both worlds, though I'm sure you can find a case where it's not quite optimal vs some nosql db.

This talk is a pretty nifty perf overview:

https://www.percona.com/live/e17/sessions/high-performance-j...

That said, if you know beforehand that horizontal scaling will be a crucial factor, probably postgres isn't the first choice. But with how fast CPUs are these days it's usually not important for a long time.


> But with how fast CPUs are these days it's usually not important for a long time.

That’s a very naive statement to make.


If you grow and you hit hardware limits then congratulations you are the next facebook/google/etc. Also there's nothing stopping you from switching your current database to something else.

Absolutely majority of the companies will do just fine, because the hardware improves faster than their demands. Starting out with a distributed system "because one day we might need it" is just silly, because chances are you'll never hit it, and you'll have to pay for the overhead of having a distributed system (which is non trivial).

Actually my company started using PG and had presentation and someone asked if we considered a distributed database so we can scale. The presenter nicely said it was evaluated and this solution worked best, but that was too nice.

1. It's only about 100GB of data

2. The hardware is barely utilized, we didn't tune it (except some standard memory settings), because there's no need yet.

3. Our data is relational (in fact most data from most companies is relational)


Is it? Because there are a ton of companies out there running postgres, mysql and scaling workloads just fine.


I upvoted you because you’re technically right for many cases, but if you’d ever dealt with a database you couldn’t scale further (we were using multiple of the largest instances EC2 had at the time) and new data was flowing in faster than you could delete it and downtime wasn’t an option, you too would hate any predecessors who said “you can always scale higher” before things got so bad it was almost impossible to recover.


Yeah. There are definitely cases where the scaling options make sense along different axis for sure. My point is mostly that relational DBs tend to strike a good default ground if you aren't sure what axis you'll care about.


PostgreSQL JSONB doesn't have a dedicated driver though i.e. you can't do partial updates via the JDBC/ODBC drivers.

Which means you can't use it for any big data/analytics use cases. MongoDB has fantastic client libraries e.g. Spark, Java.


Why wouldn't you be able to do partial updates with JDBC drivers?


The JDBC driver for postgres fully supports partial updates with JSONB, though?


That makes complete sense, but it seems to be very rare that data is non relational in my experience.


I sorta feel the same way - but then I think about the 800(?) employees they have and their combined skillset in database administration, internals, research, philosophy and I'm left feeling like there has got to be a very strong technical and business use case that I just don't see. Like I would imagine that if I were locked in a room with their Sr. product people and devs, they could easily convince most people of mongodb. I would pay money to see a debate style db-off between Postgres' core devs, and representatives from Mongodb.


Mongo is no longer as "hip", but it is slowly breaking into the enterprise as people learn the right use cases for it.


I still have trouble to find proper use cases for MongoDB that a more solid database couldn't handle.


And as it matures to meet more cases (e.g. new consensus, storage solutions).


This fact has not dribbled down from programmers to higher management yet, it seems.


I had a phone screen with a company just a couple of months ago that ran on MongoDB and they were committed to it.


Sounds like you dodged a bullet!


It's not hip anymore, but it's still relevant. A company might need to be hip to hype up their technology at the initial stage to gain audiences but eventually they need to settle on a mature business model that might not seem exciting but would let the company survive.


whenever I hear people say such things as "Oh..this app won't work well with RDBMS and it definitely needs a Document style Db.", I recall that facebook from its beginning and still use MySQL (RDBMS) database. This proves that RDBMS is relentlessly scalable and can work with very complex apps.


and facebook has had to spend millions in making it work for their use-cases. it's not like facebook just enables the --scale flag for vanilla mysql and then walks away. not saying they'd be able to make vanilla mongo scale to their needs without serious investment either, but saying "facebook uses this so therefore it's good enough for me" discounts a lot of work that goes into making those solutions work for their needs/scale.


That's precisely why they IPO'd. Private investors needed an exit while they shift their money to Docker Inc for the next tech hype train.

https://www.hntrends.com/2017/september.html?compare1=Mongod...


Postgres is actually much less solid, it's a database from the different era and doesn't come close to any post-CAP database, even to the infamous Mongodb.


yeah... no. Can you explain why being "from a different era" (and so having a hell of a lot more development time and battle testing) is a bad thing.

While you're at it you could also elaborate on why Postgres is much less "solid" than a database that literally eats writes without any consensus as to if they are valid and/or actually written. After that you could explain why "post-cap" is a thing.

Until you do, your comment is pretty useless and it sounds like you could do with a nice shot of consistent, well designed database right to the heart.


I don't understand why everyone is so happy when IPOs go up and make it sound like a good event.

I see it as the founders needlessly missing out on 30% of money (in this case), which ends up going in the pockets of the Wall Street middle men that get first access to the stock offering.


I see it as the founders needlessly missing out on 30% of money (in this case), which ends up going in the pockets of the Wall Street middle men that get first access to the stock offering.

That's the system as it's currently implemented, yes. Underwriters have some folks on tap that they'll let in on the IPO. IOW, folks that'll dump money into the IPO. But those folks aren't suckers, they'd like to see a return on their investment, if not individual investments then at least in aggregate. So, in summary:

1. Companies want someone to buy their new shares.

2. Brokerages have such folks on tap.

3. However, those investors wants a return.

4. So the underwriters set the IPO price a bit low so as to increase the chances of investors getting that return, which means those investors will come back next time for, say, pets.com's IPO.

I doubt this is written down anywhere, but that's the impression I get from observing IPOs (tech and non-tech) for 20 years or so.


Those investors are given allocations based on how much the brokers like them, how much brokerage they pay, etc. The book build isn't done on the highest price, that's the messed up part.

If it was done on the highest price - then those investors who are buying after IPO would have put in bids in the IPO and there wouldn't be a pricing gap.


There are other reasons you might want a pop as opposed to squeezing every possible cent out of the IPO

- For better or worse, a pop is seen as a successful IPO. A lot of the market is about expectations and if you have an "unsuccessful" IPO you are going to get good press.

- Underwriters are selling to the same institutional investors over and over. They're going to promise those investors that there will be a reward for getting in on this IPO. If an underwriter sells a bunch of IPOs that don't go anywhere they are going to have trouble continuing to underwrite


Both of those are meaningless to a company when compared to having 30% more money to devote to their projects. Because of lockups, the IPO pop is irrelevant to employees and most insiders, whose proceeds are usually determined by the stock's value six months from IPO.

The first point amounts to one good press cycle. You can get that other ways and the lasting value of a single cycle of fluffed up good press based on underwriters and their cronies making money on their shares is nil.

The second is entirely the underwriter's problem. A company only IPOs once; there's no reason for them to take a hit to help the underwriter and friends make money for no reason. The underwriter takes a cut and a fee regardless.


> The second is entirely the underwriter's problem. A company only IPOs once; there's no reason for them to take a hit to help the underwriter and friends make money for no reason. The underwriter takes a cut and a fee regardless.

Think of it as an additional cost of underwriting just broken out in a weird way and it might be more palatable.

If you want a lot of interest in your IPO then you probably need underwriters with big institutional relationships. The underwriters with the best relationships are going to be the ones who help their clients get good returns. You're just paying for better underwriters by losing part of the pop instead of paying them fees directly.

Companies are free to go to smaller underwriters in exchange for an IPO price that's closer to what the underwriter thinks the market will bear. And yet most of them choose not to.

Given that almost every IPO has a pop, what do you think is most likely

* CEOs & VCs taking their companies public don't realize that they are leaving money on the table by underpricing the IPO. You have two groups of sophisticated finance people and the underwriting side always manages to fleece the company going public.

* There are structural reasons and incentives for creating an IPO pop

You can't even accuse people of trying to fleece the low-level employees and retail investors. The employees are locked up either way and whether you have a pop or not, by the time the shares reach the retail investors they're the same price either way.


The company "missed out" on the money, not the founders right? The IPO wasn't with the all of the founders shares. They may also be restricted on when and how much they can sell. Part of the value is creating buzz.

If you IPO at 24 and it jumps to 32, hopefully it is still 30 when you can sell. If you IPO at 32 and it drops to 26 cause there is no buzz, you make a lot less when you sell.


The company would be more valuable (having more cash) or less diluted (having sold fewer shares) if it had IPO'd at a higher price.


Pricing is hard, but yes, I think they did price this one a little too low.

That said, the goal is for shares to have a nice upward pop when they hit the open market.

This helps encourage a broader based of shareholders, protecting against the case where a big shareholder decides to dump all of their shares.

This also helps in the case where the company comes out with bad news in the near future. Shareholders who make money are less likely to sue.


The ones who made money are the ones who sold and who are ex-shareholders.


And the underwriters, and the early shareholders who grabbed a little chunk of that 30% on the way up.


> Pricing is hard, but yes, I think they did price this one a little too low.

Auctions can work pretty well for pricing things.


Sorry, why exactly do you think 7.5x next year sales is too low?


Because the market would have supported a higher price.


I actually gambled on this one. I'd noticed the hype and read around this morning.

I didn't go in that big, but we will see. I bought 700 shares at $29.12, sometime around noon. I was a bit busy so didn't pay much attention to this thread, but they closed at a bit over $32.00.

I haven't yet set a mark to sell them. I do have it set to notify me I'd they go below 29.00 and will sell them all if they go below $24.00. Ideally, I'll keep the shares for a full year, at minimum.

I figured 20k isn't too much to risk, though I technically risk less than that because I will sell if they go below $24.

So, we shall see... We shall see... It is one of the rare times when I am betting against the folks on the tech forums. The commentary seems to be largely negative, concerning the software itself - I've never personally used it. I'm betting that the hype train will continue, regardless of it not being perfect in every way.


This comment is a nice illustration of the Keynesian beauty contest: https://en.m.wikipedia.org/wiki/Keynesian_beauty_contest


That is a neat article, thanks! I'd never seen it before but it's fairly close to what I've been doing.

I do another one and someone told me there's a technical name for it. I forget the name.

When I go shopping, I'll discretely look in shopping carts. I'll make a mental note and check to see how well the shelves are stocked in comparison to other products. If I see a lot of the same company in carts, to the point where the store hasn't matched stocking well, then I'll look further into the parent company.

So far, it has done well. I'd never played in the market before and my 401k was always managed. So, this is pretty new for me. I've been at it since maybe 2011.


In what world is 20k on one stock not "too much risk"?


I am financially secure. At most, I risk about $3500 because I'll sell if they go below $24 each.

I did the same thing with Tesla, though they were $24 at the time.

I've written about this before, here in HN comments. I often make stock choices based on the commentary at sites like Slashdot and HN. No, I don't listen to people saying to sell or buy, I listen to the hype and commentary about the companies themselves. As another example I did well with Yahoo!

To be clear, this isn't serious money. I have someone who professionally manages my finances. This is more an experiment and is meant to to play around.

I know, that probably sounds a bit terrible. But, it's not money I am worried about losing and I'm actually doing pretty good at speculating. I make greater returns than the person who does it for me - which is to be expected. I take a bunch of risks, don't do traditional research, and don't even check the daily prices.


This assumes you catch it before it falls below $24, which you'll have to by hand since you said you don't have an automatic stop loss, just a notification.


Yeah, I'll get the notice and decide at that point. My current thoughts are $24 is the limit. I think I'll adjust the notification to $25 and maybe automate selling at $24.

That way, I can change my mind if, for some reason, I think it will rally.


stock traders can easily go into the $XX,000 for a single stock. just don't let those shares fall to zero before selling them and you're only really risking $X,000.


Well, the founders’ shares are usually locked up for months after the IPO, right? More like the VC investors and company missed out on some of their money (which is less than 30% for a 30% rise)


Didn't Zuckerberg immediately issue a lot more shares and instantly sell them at the IPO?


Depends on how the number of shares offered and the dilution was calculated.


Presumably the founders didn't sell all their shares.

An IPO that goes up instead of down is much more likely to attract more investment thus driving the founder's remaining shares up.


It'll be interesting to see what happens to Spotify if they go ahead with the direct listing.


Having gone through a few acquisitions where MongoDB was used, I would never recommend using it from a legal/compliance perspective. You either have to pay for a very expensive commercial license OR adhere to their AGPL license (which is very difficult).

https://opensource.google.com/docs/using/agpl-policy/ https://github.com/mongodb/mongo/blob/master/GNU-AGPL-3.0.tx...


Why do you need a commercial license or have to do anything special for AGPL compliance if you aren't modifying the codebase and using the binaries from Mongo's website/repo?

It looks like all of the drivers you would use to connect to it are Apache 2.0 licensed so that wouldn't be a reason.


>Twenty-one percent of respondents told industry site StackOverflow that MongoDB was the most popular database, second only to versions from the dominant technology, SQL, which traces its roots back to legacy technology companies such as Microsoft, IBM and Oracle.

Microsoft, IBM, and Oracle are legacy technology companies? What kind of hipster journalist wrote this article?


->"MongoDB was the database that most software developers said they wanted to work with, according to StackOverflow's survey of 64,000 developers."

Does this just seem like it can't be true? Does anyone know where or why this would be the case?


Curiosity? Redis and Postgres are still above it as the "most loved": https://insights.stackoverflow.com/survey/2017#technology-mo...


A lot of the bootcamp schools use mongo from what I remember. An interesting comparison would be what devs with 5+ years of experience prefer to use.


That's not "software developers", though. And how many people would that actually be, not most I'd think...


A friend of mine works in a commercial real estate consulting company. One of his former co-workers was non technical, but kept insisting they use Mongo for their projects. He believed it would automatically boost the performance of a system. I guess he thought of it like installing a turbocharger in a car.

I have no idea what his sources were for this, but it signals Mongo's marketing power to me if a non developer would become so avid about it.


After the fiasco with thousands of non protected Mongo DB instances you can figure out who their major audience is.


Maybe they just surveyed bootcamp attendees.


Hmm, I work for a competing nosql company and the good news about the IPO is: It should raise the visibility of all database products, not just Mongo. The second part is that IPO for a database company is viable in this climate. Some of the early workers have been waiting for a long time for the IPO. I bet this spawns a new group of seed investors.


yes MDB's success is fantastic news for innovation in the space as a whole. I hope your company does well too.


ach, the interminable hatred of mongodb is an anthropological artifact worthy of study.

It's the only truly working example of a horizontally scalable arbitrary document storage and retrieval system with indexing on any element. It is a much more general tool then an RDBS, and should never be used when an RDBS would do the job.

However, it's really good at collecting searchable, arbitrary schemaless data in real time. The newest versions do what they're supposed to do rather well, it's a tool like any other tool.

That said, the company is guilty of overhyping for sure, and I wouldn't invest in the stock on a rational basis.


> should never be used when an RDBS would do the job

The hate comes exactly from that. The "hipster" just use it for everything. Try reading some "modern" webdev tutorial without seeing it used wrongly.


Only? Many of the early MongoDB employees came from MarkLogic. Several have returned too.

(Full disclosure: PM at MarkLogic)


Well you guys must have a pretty lame marketing department because I've been using noSql for the better part of a decade and I've never heard of you.

A quick look at your web page tells me why. You have no open source or free version that I can download and kick the tires of.


Mongo is a marketing company for a mediocre product, but no doubt they're damn good at marketing it.


They're also f*ing annoying. One of their sales people was bugging me for 2-3 weeks sending e-mail nearly every day. He even subscribed me to their mailing lists without my permission.


Per crunchbase they raised their last funding round on a pre-money valuation of $1.6 billion. Since the market cap now is less than that valuation, why is the coverage saying this is a big success? It seems to me like this is a down round of some sort. Does anyone know if I am missing something here? Or does this type of down round not really affect the employees common stock?


That really depends on the terms of their last investment round. As far as I know those haven't been disclosed. I would suspect that the investor in the last round is compensated with more shares, probably something similar to if he had invested at a 1.1 billion instead of 1.6 billion valuation. Maybe not though. Depends on the terms.


After being introduced to the MongoDB stale read issue in 2015 [1] we abandoned MongoDB and never looked back. Anyone knows if this is resolved in the latest versions?

[1] https://aphyr.com/posts/322-jepsen-mongodb-stale-reads


It got significantly better in 3.4 -- this analysis claims that the issues were finally resolved in 3.4 (although they were in the RC) https://jepsen.io/analyses/mongodb-3-4-0-rc3


Thanks. Seems they did some good work indeed. Though it takes time before they get back the trust they lost over this.


Still, the settings that let them pass these tests are not defaults.

And the reason for them being non-defaults is that it will drastically reduce performance.

Edit: Also looks like Kyle saved for the next time testing of server crashes and restarts which is another difficult problem to handle when performance is important.


In real-live-production workloads I've had nothing but pain with MongoDB. They see more like a marketing and sales team piggybacking on a strange and immature database technology.


Lets just hope the db that is used to store the data on said shares isn't MongoDB ;)


Interesting valuation. 8x revenue is a success for SaaS companies. They were initially priced at $20/share, which would imply ~9.5x revenue. At $32/share they're at ~16x revenue. Very steep.

They also fail the VC "rule of 40" where SaaS companies should have growth plus margins equal to 40%. (50% growth and negative 10% margins, or 10% growth and 30% margins) They seem to be at 50% growth, but minus 40% margins.

Somehow they pulled it off. Great for them!


For non-tech people, MongoDB often represents areas such as "big data". I work with a lot of people that have very little tech knowledge and the name comes up frequently to demonstrate that you're hip. It's the same that being on AWS convinces investors because running your own servers is so yesterday.

Only Hadoop probably has an even bigger name. That's what everyone connects with machine learning and AI, so every big company needs a Hadoop cluster.


As usual, the confusion and religious comments are numerous. There is no such thing as "nosql". There are different types of databases, with traditional relational being useful for 95% of scenarios (especially on increasingly fast servers with decent replication features) while the rest of the time something more specific is needed.

SQL is just an interface, obviously common to relational databases but can be applied to any datastore. Spark/Drill/Presto/Dremio/etc can give you SQL over any data, even just files in a folder somewhere, so let's clear up this notion between actual database technology and the access path.

Document stores are definitely useful. MongoDB is one of the better ones today although it had a rocky start. RethinkDB was an interesting experiment but never matured, RavenDB is a solid contender, Couchbase has proven itself, Riak might stick around, and there are dozens of others.

There is a place for everything and MongoDB is being used by plenty of companies to great extent. It might not always be the right choice but when it is, it works incredibly well. Good luck to the team, I'm glad to see the success in both the product and the company.


> There is no such thing as nosql

This is not at all correct. SQL has well defined semantics that are standardized around the relational model and ACID guarantees. A nosql datastore is one that intentionally makes tradeoffs that force it to deviate from that model. The name makes sense if you understand that SQL was the first broadly adopted language that targeted the relational model.

I'm not very familiar with the systems you mention other than Presto. But if they do not provide relational guarantees then even if they have SQL-like syntax the semantics are sufficiently different for them to not be implementing a true SQL. Hence, nosql.


SQL = structured query language. That's all it is, a language to access and manipulate data and data structures. It comes in several variations, standards and dialects. Any system can implement (a form of) SQL as an interface, just as any system can also implement ACID guarantees around data access, but these are completely separate concepts. ACID isn't even mentioned on the SQL wiki page. [1]

Relational databases have a history of offering both in a single package, but there is no "true" SQL. There are also plenty of non-relational data stores that offer SQL and/or ACID guarantees so this limited generalization isn't accurate or useful.

"NoSQL" has no meaning other than originally describing systems that did not have any SQL access at all, usually due to different storage layers, distributed architectures, custom access protocols, and general immaturity around usage. A decade later, all of these systems have evolved and there's both convergence and specialization everywhere.

It would be far better to just talk about the actual type of database, and the interfaces and guarantees it provides, rather than marketing jargon like nosql.

1. https://en.wikipedia.org/wiki/SQL


Marten > RavenDB!


In some cases, but ravendb 4.0 is natively clustered and has more querying capabilities and performance when using all of the document store functionality.


Maybe machines are too fast today. Back in the 70s, even departmental computers were slow. This might have forced better engineering choices onto the product designers.

The difference between and O( n log n) algorithm and an O(n^2) one could have made the difference between a decent product and a totally unusable one.

Nowadays, toy examples can seem to work fine even when the products have really terrible implementations.


Obligatory: MongoDB is web scale.[0]

[0]: https://youtu.be/b2F-DItXtZs


I think the share price will be eventually consistent.


Is this a pun? For puns I go to Reddit, for substance to HN.


My sincerest apologies. I hope I didn’t ruin your entire week.


Whats MongoDB's business model !? Open source, and separate "enterprise" closed source version !? Are contributors OK with this or are most contributors in-house ? It seems Nginx et.al are also using this business model ... I'm thinking about starting my own open source business.


Bait and switch

They promise that it's so easy to use you don't need a DBA, you don't need any ops staff to run it, etc, just develop and go! Then once you're in too deep, they get you...


Open-source and an army of sales people that will convince big company decision-makers who don't understand what open-source is to go for the hosted solution.


Support contracts, setup, etc.


The irony of the situation is that MongoDB is now pretty good at horizontal scalability, probably because of all the big companies that were fooled into using it, which then had to fix it :-)

Seeing the comments in this thread, it's interesting how it gets compared with PostgreSQL. People are missing the point — NoSQL only happened because of horizontal scalability requirements, being the number one reason for why people want NoSQL.

Just because PostgreSQL can now store and interrogate JSON, that doesn't mean that PostgreSQL can scale horizontally. In fact PostgreSQL sucks at horizontal scaling, historically its replication story has been worse than MySQL actually.

And might not have big data, but you might want redundancy and scenarios with pretty tight SLAs are not uncommon at all.


> "Most applications today run on a database technology that was introduced in the 1970s," Ittycheria said. "In the '70s, I was using a rotary phone to have a phone conversation. So people are looking for a modern, scalable and flexible platform."

It's like a weird version of the Turing test where you have to decide whether someone's speaking seriously or in jest when they talk about NoSQL.

https://www.youtube.com/watch?v=b2F-DItXtZs


"Most spaceflight trajectories are based on physics that was introduced in the 1680s. In the 1680s, we were using slide rules to multiply numbers."


Actually they were using logarithmic tables then.



TIL. Thanks!


I wish I can upvote you 10 times


I'm continuously amazed when I hear about mongodb in use. Not over say postgres, I get that there are nosql advantages, but over literally any other nosql option.

https://en.wikipedia.org/wiki/Poe%27s_law is the term you're looking for...


In general NoSQL solutions are optimized for certain use-cases at the detriment of others. If you're looking for a general purpose database, then RDBMSs like PostgreSQL are your best bet.

So that said, when comparing NoSQL solutions, ending up with an apples versus oranges comparison is almost inevitable.


The funny thing is PG will outperform most NoSQL solutions as NoSQL store for vast majority of use cases.


Well, I for one think PostgreSQL is overrated.

The number one reason for why people want NoSQL is horizontal scaling and for that PostgreSQL is terrible, with all available solutions being hacks that don't work.


I don't disagree with you for teams who know which technology is right and which isn't, for their specific use cases.

But more common than not, inexperienced devs are using Mongo and similar to store relational data, simply because they were sold the 'MEAN' stack and didn't realize that, while it's easy to get a quick prototype running, a year or two later you eventually need things like transactions and joins, and NoSQL is absolutely the wrong technology most of the time.


Citus works just fine.


what % of people have workload beyond something that PG on i3.16xlarge can handle?


I always think this when people talk about scaling. You can buy off the shelf now boxes with 48 or 96 cores, 1-2T of RAM, internal bays for 10s of T of SSD or connect it to an AFA and get 100s of T. This is not even an exotic custom build system, just a commodity, and has been for years. Running a recent version of a conventional database on a box like this goes a very, very long way, with very little hassle because you don't even need to think about "sharding", and you can always add a hotstandby for offloading reads, or for redundancy in another DC, or whatever. Systems like this can quite happily bottleneck on the network before the database starts breaking a sweat.

Remember, sharding isn't scaling the database. Sharding is admitting your database can't scale so you're offloading the problem to another layer.


You can have 224 cores and 12TB RAM in a commodity Supermicro box.


Indeed. You would need one helluva Postgres workload to overload that. Not one in a million people genuinely have that requirement.


It's also about locality of your data. Having a global infrastructure means 150-200ms minimum latency per query if you have a system in India or Singapore with a database server in a US data center region. That adds up quick.


That is orthogonal concern to what is being discussed if you run geo distributed Mongo cluster you will either have slow queries or will compromise on data integrity.


It's not orthogonal because it is harder to have master-master cross data center setups in relational databases even in scenarios where eventual consistency is acceptable (such as a use case for a piece of our infrastructure we have).

Sure MongoDB may not be the best fit for it but my point is more that these are scenarios where horizontal scaling is an important consideration makes more sense for some nosql solutions than for sql solutions. It's not just about single box performance.


people who don't want to get woken up if Amazon decides to terminate said i3.16xlarge...


Not a fan of AWS or cloud in general but accepted that this the direction the world is going for now. The AWS instance type was purely for illustration purposes.


Missed the point completely...


I think you should reconsider that dismissal: everyone needs n > 1 instances for reliability while increasingly few tasks require more {CPU,RAM,IOPs} than a single server can provide. That means that a growing percentage of problems will require clustering for reliability more than performance, and that favors the easiest to manage since in most cases every option will be fast enough.


Horizontal scaling is often just as much about redundancy as it is scaling; no one server solution is a valid answer to a production application.


People tend to copycat what Google and the like are doing without giving much thought to the fact that in many cases the solutions they adapt are simply due to the fact that they don't have an alternative and not because it's "better".


I second this. Mongo clusters are easy to manage, and the aggregation pipeline or MapReduce engines do wonders on a sharded cluster with tons of data. That's the selling point of mongo for me.


It doesn't matter how easy to is to scale if it doesn't do it right. You can win all the internet points when it comes to speed and ease-of-use but it doesn't matter if it comes at the cost of data integrity.


> It doesn't matter how easy to is to scale if it doesn't do it right.

I would refer you to an earlier comment:

> In general NoSQL solutions are optimized for certain use-cases at the detriment of others

Your definition of 'right' is absolutely not the only definition of 'right'.

More to the point: there are a lot of problem spaces where the data integrity provided by MongoDB are more than sufficient.


Sorry, I meant "correct" and not "right," as the former is objective and the latter can be (though often isn't) subjective. MongoDB is not correct; in fact, it is provably incorrect. If your needs to don't require correctness, then by all means.


> with tons of data

Define tons of data. I'll bet a beer that what you say can fit on a single postgres instance and even be small enough to run fine on a cheapish developer laptop.


That's interesting to me as I've been looking at Mongo aggregations for a solution to a problem but couldn't find any research on the performance of them.


NoSQL is optimized for no solutions. You do all the optimizations yourself. Only then are you allowed to speak of "optimized for certain use cases at the detriment of others."


I can say that CouchDB makes Mongo look like an absolute dreamboat.


Which NoSQL would you prefer over mongo?


Literally any other. I'm unaware of any alternative so unreliable.

But I do think http://www.scylladb.com/ is great.


Literally any other. I'm unaware of any alternative so unreliable.

There was a post on LinkedIn last year, "MongoDB: The Frankenstein Monster of NoSQL Databases"[0], by the CTO of SlamData, which I think produces an analytics package for MongoDB or the like.

The piece is an interesting dive into why MongoDB is what it is (at least as of March 2016) -- and given his connection to MongoDB and its employees -- being a partner of theirs -- it's quite eye-opening (and I'm surprised with almost 200 comments in this thread that no one posted it previously):

Much like Mary Shelly’s Frankenstein monster, MongoDB’s data access layer is sewn together from ragged pieces that don’t fit together. Pieces that were never designed to fit together.

The result, depending on your point of view, is either an Enterprise-grade NoSQL database destined to supplant Oracle, or an unholy abomination of nature, deserving of an angry mob bearing torches and pitchforks.

Let me dissect this creature so you can decide for yourself.

[0] https://www.linkedin.com/pulse/mongodb-frankenstein-monster-...


Another post[0] by the same author complements the one I posted above -- this is more recent and is focused more on MongoDB's moves as a company, relating what he saw at MongoDB World 2017):

You might expect a database company to announce improved scalability, reliability, or performance. Or perhaps announce some of the countless features that users have been requesting for years, which are collecting dust in MongoDB's ever-growing issue tracker.

These are sane, logical expectations for a database company, so you'd be forgiven for being shocked at MongoDB's announcements that the company is investing massively into every product category except database technology.

Yet, to those who know MongoDB's troubled history, these announcements come as no surprise. In fact, I even predicted the launch of Stitch just over a year ago, at the last MongoDB World.

MongoDB didn't start as a database company, it's never acted like a database company, and in my opinion, it doesn't really have the DNA of a database company.

If his assessment in this next snippet is correct, one wonders about those who invested in this IPO:

MongoDB has no ecosystem. There are no analytics tools for MongoDB (except SlamData [N.B. this is the author's company] ), no backup software, no recovery software, no data integration software, no query optimization software, no data management software, nothing. Zilch.

There's just MongoDB.

In hindsight, this is an inevitable consequence of a company without database DNA trying to build and monetize a database. MongoDB couldn't figure out how to build an Oracle-sized empire on a database—partially, I'd argue, because they couldn't figure out how to build a database—yet they have to hit their sales quotas.

If you can't sell the database at scale, you end up trying to build and sell an ecosystem around the database. Slowly, bit by bit, MongoDB went after their early partners, trying to put them all out of business to drive a few million here and there.

The result is a "database company" that sells everything under the sun, including database management tools, data exploration tools, cloud hosting tools, cloud hosting services, and soon, BaaS (Bubble makes a return!) and BI software. Everything except, you know, a database.

[0] https://www.linkedin.com/pulse/mongodb-world-2017-lonely-sto...



Couchbase has a pretty nice mobile sync system, with clients for both Android and iOS.


Just want to note Couchbase is not just for mobile. Couchbase Server is an enterprise-class document-oriented (like Mongo) db with a dynamic query language that's a superset of SQL. (FD: I work for Couchbase).


I started a company and chose couchbase. We haven't launched yet, so I've been using the community version and cobbling stuff together.

In this context: please, for the love of God, improve your documentation and dev tools. I love the underlying technology but my experience getting a basic service up and running has been pretty mixed. One of the reasons mongodb has been so successful is that there's 50 bajillion articles showing you how to hack up some crappy code that roughly does what you need it to do. Some of those articles are less bad than others, but in a pinch you can find something to at least get you on the right track. Couchbase doesn't have that deep well of experience to draw on. If you run into a problem, or experience strange behavior, it's up to you to figure out what's going on. That would be ok, but even the official documentation and standard dev tools are not good enough. To get people to adopt couchbase you need to do more to get them started.

Javascript-specific whining: it's particularly frustrating to find an official ODM like ottoman and discover that half the features don't work and that there are tons of bugs that haven't been fixed for about a year. These aren't minor bugs either; some of them stop you using headline features. Forget full text search, N1QL is mostly unusable! Check out ottoman bug #153 for details there.


The problem I found with couch is handling user permissions at database level. Except for that detail, it's one of the best databases I've ever used.


With the sync gateway you get pretty fine grained control. Every create or update goes through a sync function you can define in JavaScript which can deny requests. You can control which users see what by assigning the document to channels. So a user can only pull down documents if that document is on a channel they have access, too.

The issue I've found with the sync gateway is that the queries you can perform are more limited than what you can do directly on a Couchbase store e.g. joining data is difficult.

BTW, Couchbase should not be confused with CouchDB. Similar name and might have some history in common, but Couchbase is more fully featured.


Agree, and also I would distinguish between Couchbase Enterprise Edition versus Community Edition.


Cassandra is a pretty good alternative if you're looking for excellent write speeds and scalability.


We're using Cassandra at scale (~1M concurrent users at peak). It is, in our experience, an absolutely terrible piece of software.

The development community doesn't seem to care about fixing bugs, and when they do fix things, they reliably introduce new, often devastating defects. "Move fast and break things" is unforgivable at the persistence layer. We're stuck using an ancient, unsupported version, because it is, per our empirical testing, the least bad. But we made that choice, and for now we're stuck with it.

Please, let our pain be a lesson to you. I am totally willing to be someone else's "That's how you get ants..." on this point. Our emoji for it in Slack is a burning poo. It's that bad.

EDIT: phrasing.


I am not a DBA, obviously. I did have to work as one, but that was a lot of years ago. So, pardon me if this is a dumb question.

You say you're stuck with it. Can't you just change a few things, shut down for a little while, export your data, and insert it into a new DB? I know it is hard to do it with writes happening at the same time, but it seems like you could freeze it as read only, extract the data, and put it into a DB that has been prepped ahead of time and tested.

I know we did this more than once BUT it wasn't with public facing data and we were able to migrate while still working on existing data. We just couldn't add more data while doing so.


No, it is not that simple. "We just couldn't add more data while doing so" is, by itself, a deal-breaker.


>We're stuck using an ancient, unsupported version, because it is, per our empirical testing, the least bad. But we made that choice, and for now we're stuck with it.

Which is this least bad version, if you don't mind my asking?


2.2 causes us the least pain.


I work on more mature projects so I've never used a pure NoSQL environment but I do enjoy working with a split cache/persistence layer approach combining a RBMS (preferring PostgreSQL) behind a cache layer (preferring libmemcached).


Marklogic has a much better design when it comes to durability of writes and ACID in general. Disclosure: I work for Marklogic.


Druid works far better for our uses.


I've been enjoying using ArangoDb


Aerospike for low-latency small row sizes, HBase for everything else.


Most vehicles are rolling around on technology that was invented 10000 years ago.

(Is that the same idea?)


That's interesting.


Well, the SGDBs of the 70's were nothing like the ones around today.


was it not round?


Firestone recalled those tires in 1999.


I guess it didn’t have a hollow core...


Meanwhile almost all of those applications are written in languages 20+ years old (Ruby, Java, PHP, Javascript.. Python is almost 30)


while the languages may be old...they're constantly updated... php 7 is lightyears ahead of 5 and 4 was just a joke.. Python 3 same.. just because somethingn is new/old doesn't make it better/worst I mean - many devs swear by Vim which is like 40 years old.. is sublime text better because it's newer?

(I use sublime..but I admire those who've jumped into vim for the productivity boost that brings).


That’s exactly the point though. It’s not like we’re plopping a floppy from Oracle into every machine running a modern SQL db.


> "Most applications today run on a database technology that was introduced in the 1970s. We built ours on ideas that were tried and discarded in favor of that technology."

Not entirely true, but nuance often precludes persuasion.

Many of the ideas behind NoSQL databases can be very valuable, given the right context, but there are a lot of good reasons relational databases have been the de-facto default over the last four decades.


Worse, the lack of awareness in how a phone system works by picking the phone type as opposed to the actual network is a perfect superficial understanding of technology that dovetails into the video way too nicely.


Yes dumb argument. That YouTube video was the first thing I thought of when I read the sentence.

Let's scrap everything that was invented in the 70's or before.


I saw that.



I know there's quite some aversion to NoSQL around here, and generally I don't care much as I'm seldomly dealing directly with databases.

But recently, I've been exposed to a fairly big and complex SQL one with several references between entities and lots, lots of X_has_Y tables. This makes me think that with growing complexity (which seems to be a general trend), NoSQL databases seem more practical at some point, or at least something less rigid than classic relational ones. I'm not saying SQL is obsolete, but it seems like its ___domain of usefulness is shrinking.


I'd argue the opposite is true. NoSQL requires you to be more intentional with your schema design since you can only query keys. As a database grows with complexity, you can find yourself in a world of pain with NoSQL if you all of a sudden need to query something in a different manner. At least with SQL you have more options with WHERE clauses & indexes even if performance might not be top notch.


Not sure of others, but not true with Couchbase. You have queries just as flexible as with SQL.


One of the reasons I love PostgreSQLs jsonb type is you get the power of relational database and power of (nosql?) Document database at the same time. i haven’t run into any query I couldn’t write against a jsonb doc. Granted dot notation on json would be nicer than ‘prop’ > ‘prop’ > ‘prop’, but I don’t feel limited.


The point is that the relational model is a fully general purpose mathematical model for managing data with integrity and with ad hock access.

Most other models are for high performance of specific access paths and punt integrity management to code, which is a throwback to the 70s. They’re glorified file systems and data structure caches. Mongo has a reputation for losing your data.

There are cases where scale and availability kills you and you need something like Cassandra or whatnot. But there are few as flexible and general as Oracle or Postgres.


I don't think the HN community is averse to NoSQL solutions, but is averse to MongoDB in particular.


Everyone here seems to love Postgres and hate on Mongo, I have no technical knowledge to compare the two, so IMO a lot of that love and hate is more towards the "project attitude", MongoDB is a suit, a sellout, with bullshit marketing and all Postgres is like some roots hippie, that cares a lot more about technical values and neglects marketing.


I think most of it was a reaction to the massive marketing push Mongo made. There were so many years of people hyping it as the answer to all of your data storage needs, followed by accounts of users hitting bugs or reimplementing most of a SQL database in code, and the promised performance benefits either didn't materialize or hadn't been necessary in the first place (“web scale” turned out to mean only dozens to hundreds of requests per second for many apps).

Meanwhile, Postgres was quietly plugging away adding new features and continuing to deliver solid performance for a wider range of workloads, including better performance on JSON document storage.


MongoDB accrued a lot of bad-will due to some extremely questionable defaults, which remain defaults to this day. There's no question that you can write a fast database when there's no guarantee that data ever hits the disk, but developers tend not to like it when a database accepts their write and then silently loses data. It's also great for toy problems and 15-minute-demos... but then you inevitably run into its limitations and end up re-implementing a database in your app.

Even at its best, there is essentially no reason to choose MongoDB over Postgres with JSONB-type columns. They are essentially the same data model but Postgres gives you better guarantees of data consistency, plus a forward migration path to relational data when the day inevitably arrives when you need to model relationships between entities.

At this point Postgres is where most open-source RDBMS development work is concentrated. It's not only a solid codebase, it's piling up features pretty quickly and there are relatively few niches it doesn't fill at least adequately. All of these niches are covered by some commercial products built on top of Postgres (eg EnterpriseDB or CitusDB). It's pretty much a one-stop shop for application development. You can use it for everything from GIS to machine learning [0] pretty efficiently, and it pretty much will just do the right thing without you watching.

NoSQL really fits best around the margins, like as an auxiliary system for analytics. There is really almost no use-case where "user inputs data and we lose it" is an acceptable application behavior, so consistency is a business requirement for your master database whether you realize it or not. And consistency across a distributed system is hard so it almost always makes sense to sidestep clustering until the last possible moment. Buying more machine is cheap, replication/failover is a lot easier than consistency between distributed masters, and if you are really up against the wall there are those commercial products that can do this with Postgres.

If you want to make an analogy... Oracle is the suit, Postgres is the hardworking small business that is slowly but surely eating up Oracle's lunch, and MongoDB is a trustafarian with a hot-dog detector app. And that's why there's a lot of resentment towards MongoDB.

[0]: The 9.x series and 10.0 release have been absolutely jam-packed with new features, it's absurd how fast development is moving at the moment. One of my favorites... indexed cube queries. A cube is a data-cube type, an N-dimensional cube of data. One feature of this is distance queries, which have obvious applications in pattern recognition tasks (eg k-nearest-neighbor). One of the features in 9.6 is index functionality for these, so you can now do indexed KNN searches on your data...

https://www.depesz.com/2016/01/10/waiting-for-9-6-cube-exten...


> NoSQL really fits best around the margins, like as an auxiliary system for analytics.

I'd say it also fits well in two niches: document datastores (so long as there's some JOIN support, via referencing nested documents vs direct nesting) and graph stores.

I remember 10+ years ago working on storing nested sets in the RDBMS and it wasn't pretty. And the RDBMS schema for Magento 1, with key-value tables all over the place which NoSQL would have removed the need for.


Can you define what "nested sets" means more specifically?

Postgres supports hierarchical/nested structures using the "ltree" column type. There is nothing stopping you from defining a primary key of (eg) "set1.set10.set100". There is also support for recursive views/etc to operate on these kinds of sets.

Again, if you have some kind of "sparse" column, it can make sense to put that into a JSONB column. This is effectively the same thing as attaching an unstructured document to a record for this use-case.


Sure: http://mikehillyer.com/articles/managing-hierarchical-data-i...

Joe Celko popularised them. I've been unable to find when they were first introduced; but a search of my source code archive points to having written one ~14 years ago.


> so long as there's some JOIN support, via referencing nested documents vs direct nesting)

Which has its own problems. PG does this just fine, with a full battle-tested relational system to back it (and you) up.

> with key-value tables all over the place which NoSQL would have removed the need for.

Product X having a stupid schema is not a good basis for an argument for or against a particular product.


> Product X having a stupid schema is not a good basis for an argument for or against a particular product.

What other way could Magento have implemented user-defined columns at the time, using a RDBMS? In 2009 when MongoDB was released, JSON columnstores were something to dream of and the alternative was storing serialised data in a BLOB field. That "stupid schema" did not have an alternative I can think of, except NoSQL.


I'm not sure if those were questionable defaults, or questionable design decisions which were the only option at the time, and now persist as questionable defaults.

I'm pretty sure that mmap was the only storage engine available for MongoDB for most of the hype period.


i got downvoted in another thread for saying nosql has a lot of advantages in reducing work like not having to worry about carefully creating indexes when joining billion row tables. someone replied you dont need indexes. i replied I dont know how you plan on running joins on billion row tables in postgress without indexes. got downvoted again.

I bet they are still waiting for that join....


Whether you need indexes really depends on what you’re trying to do. Full table scans are more efficient if you are doing analytics on most of those rows in a time series. If you need 1 row in a billion, yes, you need an index.


if your doing a join on two size N tables, your O(n) cost for each record is N, thus you are doing N^2 lookups. basically reading the entire DB into memory each record.


No. Worst case a nested join of two tables M and N would be O(MN). But most real databases would have a merge or hash join which could bring this down to O(M+N) if the tables are already stored in a sorted indexed order. It is also rare the cardinality of these databases will be similar in an analytical query - usually you have one time series “fact table” and a bunch of dimensions to enrich that data.

If I’m doing analytics on a time series this gets even better with partition pruning and hash joins or bitmap indexes. And if I have a columnar database, that blows up this whole complexity argument.

My point is that, layout and indexes should never be assumed to be one size fits all. Keep in mind if I’m doing analytics I want to bring my time series data into memory at least as a stream, as I need to calculate / filter / transform the records. Not everything is about rendering a page on a website.


From the sounds of it, you were downvoted for not recognizing this as a textbook scenario with decades of practice and engine improvements devoted to allowing you to balance query performance against your server budget.

Interestingly enough, many of those techniques are common in the NoSQL world as well — a billion records is enough to require thinking about data flow anywhere — but the difference is that you have to deploy them more frequently.


If you're dealing with billion row tables and using mongodb you are way out of your league. Good luck.


mongodb is not the only nosql db :) I use hive/hbase


> less rigid than classic relational ones

Less rigid than what? The schema is going to exist somewhere...


So you gonna do all those JOINs in code???


Part of the point is you don't need so many joins.


Does anyone use Mongo anymore?

Edit: Honest to god question. I was surprised when I saw the headline, because I've never seen anyone use mongo in production, and never ran into any articles talking about using mongo.


Oh, hackernews


I can't think of a compelling reason to use Mongo over Cassandra except "we already use it and don't want to change".


It's a fair bit easier to setup than Cassandra and a document store is simply easier for most people to reason about than column-families.

For something like your typical rails app, Mongo has some real nice properties. It makes it sticky. You start out with Mongo because you can just drop data in and off you go. You keep using it because Mongo, despite all of its drawbacks, is really plenty good enough for more than 80% of the stuff on the web.


What I'm mostly interested to see is the rate of new projects using mongo.

Do they have growth at a rate that justifies their valuation?

If that's the case, is HN living in its own bubble? Because if you read HN you'd think that nobody will use mongodb for a new project.


What should one use then?


At my (admittedly tiny) place of work, we use it in production for literally everything alongside redis. It gets the job done. Not my favorite. I'd rather be using PostgreSQL since our data is 100% relational. But though some wicked twist of fate, I'm the only one who can use SQL dbs, and the Sr. devs like mongodb.


yes?


are you saying it has become ubiquitous?


$192M IPO means $192M of funds raised in the sale, at a valuation of $1.2B.


Which is a little more than the amount of money they lost the last two years. Taking into account their 100M in the back this gives them roughly 3.5 years at the same loss level.


I hope Neo4j will achieve something similar. Relational does a lot very well but for certain use cases a graph is far superior.


Technology aside, how's their financial status now that they have IPO'ed? Does it seem like a good investment?


I would look at year over year growth. How you define growth will decide the riskiness for you.


heisenbit above notes that this raises enough money so that the company can last 3.5 years at their current burn rate.


MongoDB worth > $1B?! Ladies and Gentlemen, we're back in 2000 again. Brace yourselves for the bubble bursting!


This article is still relevant: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...

The main point is: most of your data is most likely relational and not document based. Use the DB that fits your data model, and not the data model that fits your DB.


Even for document data, Postgres is quite competitive

https://www.percona.com/live/e17/sites/default/files/slides/...


ride that gravy train till there is no more gravy


I thought they were supposed to go closer to 1 BN


The IPO was $192M, the worth of the company is slightly above $1B


Who would have though given that at some point this "Don't use MongoDB" seemed to have taken over HN: https://news.ycombinator.com/item?id=3202081


This means the company absolutely positively mispriced the offering. The company could have successfully raised nearly 30% more money.


That's not how it works. When you're shopping around your IPO before going public, you need to get commitments from institutional investors to buy your stock at IPO at a particular price. Just because people will start buying at +30% shortly after IPO does not mean you can get investors to commit to a 30% higher IPO price beforehand.


That's exactly how it works. What you are describing is the standard IB B.S. speech. It has been debunked multiple times, starting from the time of DLJ.


Huh? I'm not saying it's reasonable/good or doesn't involve cronyism and "discounts" for "favored" investors, but what I described is actually how it works.


Twitter...Snapchat....MongoDB...Beware!


In a world where Giphy is valued by investors at 400M, I'm not sure if half that for mongodb is good or bad.


They sold 8,000,000 shares at $24 but that is only 16% [1] of shares outstanding. The stock is ~$30 as of now, which yields a total valuation of $1.47B

[1] https://www.sec.gov/Archives/edgar/data/1441816/000104746917...


They raised 192M but their valuation is much higher.


So instructions on making a successful company in 2017:

1. Spot a non-trivial expense type applicable to most Fortune 500 companies.

2. Make a startup offering the same thing under the cost.

3. Use your network to get sales people that personally know exec-level people from Fortune 500 and will pitch the product to them.

4. Get them to sign up. Of course they'll do, you're offering it under cost at the investors' expense.

5. Show impressive revenue and customer base growth and forget the word "profitability".

6. Make an IPO and cash in before the public realizes that your might have been selling dollars for ninety cents.

Who's left holding the bag? Average Joe, who's pension fund ended up investing in a promising technology company showing exemplary revenue growth over an extended time period...


To be fair, Average Joe's pension fund was probably invested in one of the VCs that who cashed out at the IPO too. So Average Joe likely still benefits at the end of the day.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: