Hacker News new | past | comments | ask | show | jobs | submit login

We've been running PouchDB in production for ~15 months now. We chose it because it was a greenfield project and it gave us 2 things: Easy offline support and real-time syncing that makes it easy to create collaboration a-la Google Docs. Because the entire thing is a web app with app cache manifest deploying new versions is very little hassle.

In terms of architecture we have about 250 tenants with separate Couch databases per each. We're still running Couch 1.6. We have yet to evaluate Couch 2.0.

It's been mostly smooth ride for the most part but this being a very unusual architecture we had to tackle few interesting problems that came along.

1. Load times. Once you get over certain db size the initial load time from clean slate takes ages due to PouchDB being super chatty. I'm talking about 15-30 mins to do initial sync of 20-30mb database. We had to resort to pouch-dump to produce dump files periodically. That helped a lot. I think this issue has been rectified with Couch 2.0 and sync protocol update.

2. Browser limits. Once we hit the inherent capacity of some browsers (namely Safari on iOS, 50mb) we had to get creative. Now we're running 2 CouchDB databases for each tenant where 1 has full data and the other only contains last 7-8 days. Pouch syncs to the latter one. We run filtered replications between the full db and the reduced db and do periodic purging. On the client side if a customer tries to go back more than 7 days we just use the Pouch in online only mode where it acts as a client library to remote couch and doesn't sync locally.

3. Dealing with conflicts. This might matter or it might not depending on the ___domain but you have to be aware of data conflicts. Because CouchDB/PouchDb is eventually consistent multi-master setup and you will get data conflicts where people update the same entity based on the same source revision. PouchDB has nice hooks to let you deal with this but you have to architect for it.

4. Custom back-end logic. Because Pouch talks directly to Couch you can't exactly execute custom back-end logic when needed. We had to introduce a REST back-channel to make sure our back-end runs extra logic when needed.

5. We had some nasty one-off surprises. Last one was with an object that had 1700 or so revisions in couch and once it synced to PouchDB it would crash the Chrome tab in a matter of seconds. Due to the way PouchDB stores revision tree (lot's of nested arrays) Chrome would choke during JSON.parse() call and eat up memory until crash. We resolved this one by reducing the revision history limit that is kept.




That chattiness is what has driven me away from Pouch, sadly. It's a flaw in the Couch replication protocol design that won't be fixed until the spec is changed.


The chattiness is mostly addressed with _bulk_get in CouchDB 2.0 - Pouch will automatically use it if the server supports it. Another option is to stick a HTTP/2 proxy in front of your CouchDB instance - the chatter to the db is ultimately still there but it significantly reduces the latency cost to the PouchDB client. There are plans to add first class HTTP/2 support to Couch but for remote client architectures just adding a proxy should be a significant improvement. Projects like https://github.com/cloudant-labs/envoy take this a step further and provide an extensible proxy (e.g. you can do sub-database access control, etc).


> We had some nasty one-off surprises. Last one was with an object that had 1700 or so revisions in couch and once it synced to PouchDB it would crash the Chrome tab in a matter of seconds. Due to the way PouchDB stores revision tree (lot's of nested arrays) Chrome would choke during JSON.parse() call and eat up memory until crash. We resolved this one by reducing the revision history limit that is kept.

I think I remember this issue (I was formally a heavily contributor to PouchDB) I think Nolan ended up writing a non recursive JSON parser to deal with this and there was some debate about whether it made sense to be used as it was significantly slower (though could handle deeply nested structures)


Yup, exactly. We use JSON.parse inside of a try/catch and then fall back to vuvuzela (https://github.com/nolanlawson/vuvuzela) which is a non-recursive JSON parser in cases of stack overflows (here's the code: https://github.com/pouchdb/pouchdb/blob/62be5fed959bbdf91758...).

Unfortunately the only way to resolve this without vuvuzela would have been to change the structure of the stored documents which would have required a large migration, so I'm glad to hear that the vuvuzela solution was the right way to go.


This is very interesting to read. I'm currently working on an Electron app that uses PouchDB and that has a lot to do with revisions- one of the big reasons I choose PouchDB.

According to your 3rd point on conflicts, could you shed some more light on:

>PouchDB has nice hooks to let you deal with this but you have to architect for it.



Good report! On #4, have you considered a client-db-server approach? Where your server just listen to changes in do and act accordingly. Is there something in your specific case that prevents this approach?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: