Same. Ran a perf test recently. With two 1 core brokers I got 2000 rps with persistence and peaked at 12000 rps with non-persistence.
We’ve also had similar issues as OP, except fixing it just came down to configuring the Java client to have 0 prefetch so that long jobs don’t block other msgs from being processed by other clients. Also using separate queues wide different workloads.
It seems that OP's company doesn't really know anything about the job workload beforehand, as the jobs are created by their customers. Having different queues for short/long workloads might be impossible.
We’ve also had similar issues as OP, except fixing it just came down to configuring the Java client to have 0 prefetch so that long jobs don’t block other msgs from being processed by other clients. Also using separate queues wide different workloads.