Anyone using this? Looks interesting.

jgaa · on July 13, 2020

Yes. I use it in setup where some streams are Geo-replicated to other continents.

Most of the time, it just works.

Here is a performance test program I wrote to see what I could squeeze out of it:

   https://github.com/jgaa/pulsartest

ckdarby · on July 13, 2020

Yes, I'm using it in production and we're transitioning from our existing over entirely to it.

Running with official helm charts in AWS managed kubernetes and using EBS.

Currently passing about 20k/s in and 20k/s out.

Got any specific questions just let me know.

samdung · on July 13, 2020

Not yet, but reading the docs. Has 'once only delivery' which is good. Still trying to find if it does FIFO (i think its hard to achieve this in a distributed system).

jackvanlightly · on July 13, 2020

Yes it does FIFO.

Just like RabbitMQ, Apache Kafka and many other distributed systems, writes go through an elected leader, who is able to ensure ordering guarantees.

Specifically with Apache Pulsar, each topic has an owner broker (leader) who accepts writes and serves readers.

It should be noted that Apache Pulsar supports shared subscriptions which allow two or more consumers to compete for the same messages, like having two consumers on a RabbitMQ queue. Here FIFO order cannot be guaranteed for all kinds of reasons.

SuddsMcDuff · on July 13, 2020

See also: competing consumers pattern https://www.enterpriseintegrationpatterns.com/patterns/messa...

dikei · on July 13, 2020

https://pulsar.apache.org/docs/en/concepts-messaging/#orderi...

Basically, they have the same ordering guarantee as Kafka: FIFO is guaranteed for messages from the same producer, within the same partition. If you need producer-level FIFO, then you can only use 1 partition.

There's no ordering guarantee for messages coming from different producer.

eis · on July 13, 2020

To add:

In a distributed system, an ordering guarantee by producer (and optionally partition "key" which in the end is like an extension of the topic key) is pretty much all you need and all you can get.

If you have two producers then how would one decide which one sent the message first? Go by some timestamp? Clocks are unreliable for these purposes so it comes down to consesus. And letting the queue decide which one came first is equivalent. Once a message got acknowledged by the queue, ordering cannot change anymore.

oweiler · on July 13, 2020

What do you mean by FIFO? Messages delivered in insertion order? With Kafka, this is easy: messages with the same key land in the same partition. I guess Pulsar offers something similar.

EdwardDiego · on July 13, 2020

I've spent a lot of time poking it and running it through scenarios to get an understanding of it - haven't used it in production though. Might look at trialling it there in a year or so, once some of the early bugs are shaken out.

It has a lot of promise, but adopting it right now is going to require you spending a lot of time reading source code when you, like I did, find the docs are out of date, or have giant holes.

Things I liked:

Tiered storage - the ability to offload old data to S3 / Azure / HDFS, and then reload it upon transparently by a consumer, is awesome. (albeit with some delay, but their reasoning is that historical reads don't need millisecond latency)

Inter-cluster replication is a lot easier than running up a Kafka Connect cluster to run Mirror Maker 2, but there's a slight caveat that it doesn't work for chained clusters (A -> B -> C). A message from A won't be replicated to C by B. But Pulsar assumes A is going to replicate to both of them if they need it.

Schema registry baked in is nice. Pulsar's replication automatically replicating the schema to other clusters is verrrrry nice for consumers on a different cluster. It's doable with Confluent's Schema Registry, but it's another thing to manage.

Load balancing brokers - haven't seen it any action, but they attempt to redistribute partition ownership based on load.

Only downsides were the docs were sparse or out-of-date in places, especially the BookKeeper ones. There were some configuration fun and odd errors with the BK command line tools (and they could never tell me which bookie was the auditor) and there's two scripts that do very similar things, but not quite. That said, having an auditor process that checks for under-replicated segments automatically is nice, once it's working.

There's several tools for Kafka that do similar, but nice to have it out of the box.

Can't really comment on its capacities to act as a message queue, but it has that too.

But yeah, I reckon I'll look again in a year.

On an interesting note, Pulsar is very similar design to the PubSub system - near-stateless easily scalable brokers with (another layer) then BK as the storage layer. They moved to Kafka, but then, they have Twitter sized problems so YMMV. https://blog.twitter.com/engineering/en_us/topics/insights/2...