If you have a distributed architecture you are probebly going to build some nodes. When programming that software for these nodes you want to use all the 4-8 cores that the node can use. Sure you could only use one core on every maschine and spin up 4-8 maschines but that not effective.
no it's not. How do you coordinate between those processes? It's far easier to use a producer/consumer model on the jvm with threads. I do this all the time and it's really easy, much easier than managing multiple processes and some kind of external queue like redis.
If you're writing a distributed app you have to write the process coordination logic and the fact that two processes happen to be on the same machine is incidental. This is how all the big webapps scale.