The Concurrency Myth

chamakits · on June 5, 2011

I'm currently looking for an article from a professor or a haskell developer (can't remember which) discussing the difference between parallelism and concurrency. Can't find it, but I'll post these for now:

http://www.danielmoth.com/Blog/threadingconcurrency-vs-paral...

Its a worthy read and may change your mind at least a bit. Though overall, I agree that not all developers benefit from learning functional programming languages as the ones specified, I truly believe that most developers would benefit greatly. Anyway, if I find the article I'm telling you about i'll post it as well.

EDIT: I believe this was it: http://ghcmutterings.wordpress.com/2009/10/06/parallelism-co...

EDIT 2: I also believe that most of these languages actually promise more. For example, one of the biggest promises Haskell gives is no side effects. Though this does have a mayor implications for concurrency/parallelism it also (in my opinion) provides safer, cleaner and easier to maintain code. (It does take a memory hit, I'll admit to that, but I personally prefer that safety). Erlang's biggest promise I believe is its fault tolerance. And over all, I honestly believe that functional programming puts you in a different mindset and once you get used to it (I'm still learning, but even then I find it mind opening) you see things clearer. But when it comes down to it, its impossible to be objective about this. However, I will objectively say, everyone should try at least one functional language, just to give you a different view of things :D .

bitdiddle · on June 5, 2011

absolutely! A strong component of FP is getting programs right, and forcing the explicit management of state is critical to that. Of course that makes FP a lot harder also, Haskell was not so pretty before the emergence of monads.

sigil · on June 5, 2011

I thought you were referring to this one: http://existentialtype.wordpress.com/2011/03/17/parallelism-...

chamakits · on June 5, 2011

...though I wasn't....I will read...tomorrow.....Jejejej, thanks though :D

gtani · on June 5, 2011

http://www.tbray.org/ongoing/When/200x/2009/10/07/C-dot-next...

(thread on ghcmutterings post) http://news.ycombinator.com/item?id=2457307

dfabulich · on June 5, 2011

This really hit the nail on the head for me. I'd switch to Scala/Haskell/Clojure if it could give me a multithreaded GUI toolkit, where I'd never have to worry about whether I was on the event thread or not.

Is it impossible? A "Failed Dream"? http://weblogs.java.net/blog/kgh/archive/2004/10/multithread...

beza1e1 · on June 5, 2011

A nice overview, unfortunately he stopped before the real insight.

'there is a fundamental conflict here between a thread wanting to go "up" and other threads wanting to go "down"'

Can anybody explain that fundamental conflict?

mooism2 · on June 5, 2011

Each level has its own (set of) locks. When a thread goes up/down, it needs to grab a lock from the higher/lower level before it can proceed. When one thread goes up at the same time as another thread goes down and they need the same locks, you get deadlock.

Is that what you were asking about, or is there something subtler I'm missing?

beza1e1 · on June 6, 2011

That is more of a fundamental problem of synchronization than about GUIs. Locks are anti-modular. So one solution could be to abandon locks, which is hardly a fresh insight.

An event queue just makes the queue the one and only synchronization point.

pjscott · on June 5, 2011

The locking problems that article mentions are a perfect use-case for software transactional memory. Which Haskell and Clojure have. I would wager that someone could make a straightforward multithreaded GUI toolkit in either of those languages.

http://en.wikipedia.org/wiki/Software_transactional_memory

jerf · on June 5, 2011

I don't think a multithreaded GUI toolkit is a failed dream. But it might take the creation of a GUI toolkit designed from scratch to support it.

Though the possibility that some relatively small and clever bit of code could wrap STM around a GUI toolkit is not out of the question.

kscaldef · on June 5, 2011

> It's hard to imagine a web application where one HTTP request comes in and a dozen threads (or processes, whatever) are spawned.

Really? I think that's a failure of the author's imagination. Nearly every web application that I've worked on since 2003 has had a significant amount of concurrency involved in handling each request.

awj · on June 5, 2011

The project I'm working on right now (Rails) could certainly use the hell out of it. Hell, people were thrilled by how much faster things were after I pushed everything that didn't need to happen at request-time into the background queue.

Just having futures to keep database access would help, although I'm not sure how that would be possible with Rails' lazy querying.

All that said, the core Ruby guys are probably better off spending their time on performance enhancements than removing enough of the GIL to allow that kind of multithreaded shenanigans.

aristus · on June 5, 2011

At minimum, doing your data fetching in parallel / asynchronously is a huge boost. Likewise flushing to the client in concurrent / out-of-order chunks, possibly in parallel with other work.

PGenes · on June 6, 2011

The exact point of the article. The concurrency stuff is abstracted away from the developer.

cageface · on June 5, 2011

The kind of app that needs to scale up to 4-8 processors but never higher is pretty rare. For anything bigger than that you need a full-blown distributed architecture. Unless you're sure you can scale forever in one box relying on language-level parallelism is asking for trouble.

nupark2 · on June 5, 2011

This is a false dichotomy; taking advantage of language-level parallelism doesn't prevent you from supporting a full-blown distributed architecture.

In fact, if so inclined, you can model your distributed architecture on top of language-level (or OS-level) parallelism; message passing can be considerably cheaper when you can rely on simple shared memory mechanisms.

Parallelism in the language environment allows you to make better use of the hardware you have. In a server environment, there's a fixed power utilization cost for each machine, and a per-CPU core power cost. It's cheaper and more efficient to scale up cores than it is to scale horizontally across machines. In fact, the more efficient you are, the longer you can put off the complexity of large-scale horizontal scaling.

In a mobile environment, the equation is different -- you can't just add more mobile devices to scale horizontally. The device has X amount of cores, period. If you don't use them, your application's performance will not be up to par with other applications.

cageface · on June 5, 2011

But all the assumptions you make about reliability in a single box go right out the window when you start building a distributed app. The point is that you can't rely on simple shared memory mechanisms anymore. All those pretty parallel algorithms go straight in the bin. This is why Erlang looks nothing like Haskell.

And if you have to endure the pain of writing a distributed app, why would you layer another level of concurrent complexity underneath it?

nupark2 · on June 5, 2011

Local concurrency and shared memory can be used to improve performance of operations that may be distributed; assuming proper implementations, a shared memory mailbox will always be cheaper than forcing multiple data copies of network/pipe based messaging, even if the API for both transports is identical.

_delirium · on June 5, 2011

I tend to see using 4-8 cores as having roughly the same use case as compiler optimizations: making code running on one box do so faster, usually by a constant factor. That's a pretty big use case, though. If, for example, a significant part of Matlab's built-in functions could use all 4 of my cores, and got maybe a 2x speedup overall from doing so, that'd be pretty great. (Matlab in particular seems to have already added quite a bit of this.)

jshen · on June 5, 2011

"The kind of app that needs to scale up to 4-8 processors but never higher is pretty rare."

I disagree. At my last job I exclusively made internal web apps that could always live on a single server. They never needed to scale beyond the cores on 1 box, but most benefited from using all cores on that box.

bemmu · on June 5, 2011

Triple-A 3D games might fall into this category.

cageface · on June 5, 2011

Yeah, and the high-end multimedia authoring tools, and database engines. Ironically, all of these need every drop of performance you can squeeze out of a machine so nice, high-level languages like Haskell are off the table and you have to slog through with C++ and low-level threading.

mononcqc · on June 5, 2011

funnily enough, many DB engines are written in Erlang -- a usually slow, high-level language. CouchDB, Riak, Mnesia, Hibari, Amazon SimpleDB, etc. They're not completely off the table.

mooism2 · on June 5, 2011

I thought Moore's law was enabling new processors to include more cores at no extra cost. So there should be processors with more than 8 cores before too long.

nickik · on June 5, 2011

If you have a distributed architecture you are probebly going to build some nodes. When programming that software for these nodes you want to use all the 4-8 cores that the node can use. Sure you could only use one core on every maschine and spin up 4-8 maschines but that not effective.

cageface · on June 5, 2011

Unless each of your processes needs a lot of memory, the easy way to leverage those cores is to spin up a server process for each core.

jshen · on June 5, 2011

no it's not. How do you coordinate between those processes? It's far easier to use a producer/consumer model on the jvm with threads. I do this all the time and it's really easy, much easier than managing multiple processes and some kind of external queue like redis.

cageface · on June 6, 2011

If you're writing a distributed app you have to write the process coordination logic and the fact that two processes happen to be on the same machine is incidental. This is how all the big webapps scale.

jshen · on June 6, 2011

But you make a giant mistake assuming that everyone is writing big web apps.

cageface · on June 6, 2011

Assuming no such thing. It's just the easiest example to hand. At Pixar we used similar techniques for rendering.

jshen · on June 6, 2011

you said, "the easy way to leverage those cores is to spin up a server process for each core"

It's not THE easy way. It's actually the harder way, but in some cases, like giant public web apps, you have to do the hardway anyway.

Managing a work queue using java.util.concurrent in process is far easier than managing many processes and coordinating across something like redis.

7952 · on June 5, 2011

I think the problem is getting terms mixed up. A client-server arrangement on multiple machines can run in single threads and not share memory and is still concurrent. And concurrent code can still be run on a single processor.

The ability to stretch an application across more than one machine is obviously essential to the modern world. The internet is by definition a kind of distributed computing system.

Some languages have features that promote a particular kind of parallel programming. But they are not suited to every task and thats ok!

werg · on June 5, 2011

I think the point is you don't have to switch, even if you're doing some kind of realtime webapp you can still piece it together using php and redis (though I'm not sure whether there's a good comet/websocket framework for php). But it's sure going to be a hack. Also, using Akka or OTP you get a lot of stuff for free: supervisor hierarchies, lightweight processes (not everyone must be an OS thread), pattern matching. So I would agree that pitching these technologies to garden variety web and mobile apps on the value proposition that it's better at dealing with multi-cores is disingenuous. As you said, it has its benefits there. But maybe some of the proponents are just using multi-cores to illustrate a general trend, and maybe less of a singular value proposition.

daniel_solano · on June 5, 2011

I am not sure this article gets it quite right. Yes, there are a lot frameworks that help shield the developer from having to deal multi-threading concerns. However, so long as you have shared state, you still need to worry about the problem.

Let's take the example of a web application, be it in PHP, Java, etc.: What happens if your user has multiple browser tabs open to your application and submits concurrent requests that modify the user's session? With multiple cores, it is much more likely that the concurrency bugs in your application will be exposed. Languages like Clojure help you explicitly reason about shared mutable state.

justincormack · on June 5, 2011

A lot of the seriously parallel stuff is going GPU because 4 CPU cores is so much less than what you get on a GPU. And a lot of big data problems are IO bound not CPU bound. And a lot of multicore servers are being partitioned into virtual ones with fewer cores. So we have avoided a lot of parallel coding for a bit. Both AMD and Nvidia are going for a mix of CPU and GPU and maybe the split of a smallish number of CPU cores and then a wide parallel GPU for parallel code will be a success, with an option of lots of CPU cores available for threaded server workloads.

argv_empty · on June 5, 2011

Distributed computing does not involve the kind of shared mutable state that functional programming can protect you from. Distributed map/reduce systems like Hadoop manage shared state complexity despite being written in Java.

Isn't map/reduce kind of drawn from the functional paradigm?

dedward · on June 5, 2011

Not just kind of - more like "absolutely" mapping and reducing are directly from the functional paradigm.

jshen · on June 5, 2011

hadoop is about concurrent disk IO, not concurrent use of multiple cores.

systemizer · on June 5, 2011

Same difference. In a map-reduce framework, you don't need the workers to have multicores; they could all be single core machines. So in retrospect, it is addressing the same issue.

jshen · on June 5, 2011

It's not the same thing. CPUs are only getting faster in that they are getting more cores. How is a programmer to take advantage of that? That is the question. Hadoop is not about this problem, it's goal is to overcome the bottleneck of disk IO for large data sets.

systemizer · on June 6, 2011

Map/Reduce is about taking an operation that would be otherwise sequential and breaking it up into smaller, parallelized tasks. Then worker machines can concurrently process the operation. It's not about disk IO. I recommend you read the original paper: http://web.mit.edu/6.033/www/papers/mapreduce-osdi04.pdf

jshen · on June 6, 2011

we were talking about hadoop specifically. This is from Hadoop The Definitive Guide

"The problem is simple: while the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives— have not kept up. One typical drive from 1990 could store 1370 MB of data and had a transfer speed of 4.4 MB/s,§ so you could read all the data from a full drive in around five minutes. Almost 20 years later one terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.

This is a long time to read all data on a single drive—and writing is even slower. The obvious way to reduce the time is to read from multiple disks at once. Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes. Only using one hundredth of a disk may seem wasteful. But we can store one hundred datasets, each of which is one terabyte, and provide shared access to them. We can imagine that the users of such a system would be happy to share access in return for shorter analysis times, and, statistically, that their analysis jobs would be likely to be spread over time, so they wouldn’t interfere with each other too much."