Hacker News new | past | comments | ask | show | jobs | submit login
Go concurrency isn't parallelism: Real world lessons with Monte Carlo sims (soroushjp.com)
57 points by soroushjp on Feb 7, 2015 | hide | past | favorite | 52 comments



TLDR: Adjust GOMAXPROCS if you want a speedup from multiple goroutines.

http://golang.org/pkg/runtime/

"GOMAXPROCS sets the maximum number of CPUs that can be executing simultaneously and returns the previous setting. If n < 1, it does not change the current setting. The number of logical CPUs on the local machine can be queried with NumCPU. This call will go away when the scheduler improves."

It will be nice when this requirement is eliminated.


Actually the most interesting part of the article is not about using GOMAXPROCS, but exactly how using GOMAXPROCS doesn't automagically turn your program into a parallel one: only a real analysis (helped with the helful profiler) tells you whether you've achieved true parallelism. The author's program wasn't truly parallel until he saw the bottleneck with the global mutex lock.

Said differently: concurrency is easy ("just" go func() all the things), parallelism is hard (GOMAXPROCS is not enough, you'll have to go deeper)


Yup, here's what I normally use for these situations (posted it in disqus too at end of article):

// Initialize to use all available CPU cores

func init() {

   runtime.GOMAXPROCS(runtime.NumCPU())

}


Maybe GOMAXPROCS exists so that the number of hw threads that might be spawn by the go application can be controlled in a more system-agnostic way. Without such env varianle, on Windows (for example) you can control this with CPU affinity (and possibly other better ways), but not sure about Linux/OSX/etc. So this kind of deals with it upfront.


I think that the default of single threaded is going to bite them hard in the long run. I've already discovered libraries that were never tested with MAXGOPROCS > 1 that are not thread safe. They should default it to at least 2 to make sure these bugs are shaken out.


The go race detector (go test -race) is actually really good at finding these kinds of issues, regardless of GOMAXPROCS. I've gotten a lot more value from running my tests with the race detector in a single process than running the tests with multiple processes and hoping to encounter thread-safety issues in a debuggable way.


Seems like you should probably never run the tests with -race. Is there a downside from that being the default?


The only real downside is that it slows things down quite a bit (30s vs 12s for a full test run in my current project; I've seen both better and worse ratios in other projects). I've also seen a project that made heavy use of generated code and a race-enabled build would choke on the very large source files. But overall I agree that it seems worthwhile to make -race the default.


Actually that is exactly the case in a library of mine[0]. Its not a bug of my code directly but due to non-POSIX compliance of Linux that triggers only with multiple threads (setuid does set the uid only for the executed thread and not for the others of the same process - unlike the manual page and POSIX states). Its a cornercase but I also explicitly raise the GOMAXPROCS in the test case to trigger it[1].

[0] https://github.com/sarnowski/mitigation

[1] https://github.com/sarnowski/mitigation/blob/master/mitigati...


(context for others, where you reported this)

https://code.google.com/p/go/issues/detail?id=1435

It is fair to claim that this is "non-POSIX compliance of Linux": the underlying system call interface of an operating system is not something subject to a standard. The mapping from the POSIX-compliant C library to the underlying system calls is allowed to be quite complex, and in fact most of the high-level POSIX functions map to system calls designed for much more general use cases, or arguments with different kinds of struct packing.

You really just should not have been coding directly against the system calls of the operating system: that isn't portable; and later in the discussion, this was specifically addressed ("The syscall package is not for general use. It has no documentation. That's not going to change. When we have a working Setuid etc they will be made available as part of package os."). What sucks is that they seem to have never gotten around to implementing os.Setuid.


Yes! They know it's bad (the documentation says something like "we'll do something smarter later"), but even just a default of 2 would be much better IMO.


You should assume Go's creators thought of this. There are performance implications, as it adds overhead if it's not necessary. Most people when writing libraries will decide if it's necessary or useful enough to use CPU multithreading. Single core is common in cloud servers. This is a really bad idea that will never happen.


Isn't this a misleading title? The article is essentially the author forgetting to set GOMAXPROCS, not really a lack of parallelism in Go.


I didn't take it as a mistake in forgetting. I think this was more of an experiment in understanding. Clearly the author was aware of previous Parallelism vs. Concurrency discussions and this was just an applied test to witness the difference first hand.


tshadwell, the article was meant to help people who don't know even know about GOMAXPROCS. People are equating concurrency and parallelism, and the title is a reference to Rob Pike's talk concerning this exact topic.


Same title has presentation of Rob Pike.


Another error the author made is adding to a sync.WaitGroup in a different goroutine than the one that waits. This is another rookie mistake that go test -race would probably catch.


It does indeed catch it

  $ go test -race -bench=.
  
  ...
  
  WARNING: DATA RACE
  Write by goroutine 4:
    sync.raceWrite()
        /usr/lib/go/src/pkg/sync/race.go:41 +0x35
    sync.(*WaitGroup).Wait()
        /usr/lib/go/src/pkg/sync/waitgroup.go:120 +0x16d
    _/home/twirrim/monte.GetPiMulti()
        /home/twirrim/monte/monte.go:56 +0x23a
    _/home/twirrim/monte.BenchmarkGetPiMulti()
        /home/twirrim/monte/monte_test.go:17 +0x62
    testing.(*B).runN()
        /usr/lib/go/src/pkg/testing/benchmark.go:119 +0xc0
    testing.(*B).launch()
        /usr/lib/go/src/pkg/testing/benchmark.go:207 +0x1ba
  
  ...
And so on.


Indeed. Additionally, waitgroups aren't even needed, the channel usage in the code is already enough sync.


Fixed, thank you :)


OT: The way we use the terms parallel and concurrent in computer science seems completely backward to me. The dictionary says "concurrent" means at the same time, and parallel lines need not be drawn at the same moment...

Yet in CS we talk of things being concurrent even if they're executed as cooperative threads on a single core and parallel only applies when executing concurrently (at the same time).


I agree that the terminology is a bit funny, and I have to reset my intuitions every so often, because it stops making sense to me. I have found these two posts by Bob Harper helpful in thinking about the different ideas here:

https://existentialtype.wordpress.com/2014/04/09/parallelism... https://existentialtype.wordpress.com/2011/03/17/parallelism...


Perhaps concurrent could be used in place of parallel, but parallel could not be swapped used in place of concurrent.

Parallel lines are non-intersecting lines, i.e. lines traveling in identical directions. This is a nice allusion to the way parallelism works by running identical processes that do not interact. The fact that these processes can run simultaneously is a by-product of their parallel structure.

But yeah, the term concurrent is confusing because it can be applied to things that never actually overlap in time. But I can't think of a better term off the top of my head.


>Parallel lines are non-intersecting lines, i.e. lines traveling in identical directions. This is a nice allusion to the way parallelism works by running identical processes that do not interact.

Maybe, but there's nothing about parallelism that says the parallel processes have to be (a) identical or (b) not interact, if I am not mistaken.

Two threads running in parallel (and at the same time) might do totally different things (not identical) and might also interact with each other (e.g. consumer and producer threads).


Good point. In those cases, the analogy breaks down.

Merriam-Webster offers several definitions for "parallel", though the only definition with a temporal connotation is the definition related to computing:

"relating to or being a connection in a computer system in which the bits of a byte are transmitted over separate channels at the same time"

Meanwhile, "concurrent" references "parallel" as a synonym. Perhaps the terms are rather arbitrary after all, and we just need to get comfortable with their less-than-perfect assignments.


Unrelated question, isn't it a bad idea to update the seed for every sample?


It is in multiple ways.

First, you are wasting cycles and with so little work to do before reseeding as in the code shown it probably matters quite a bit.

Second, some random number generators need some warm-up time producing lower quality random numbers at the beginning.

Third, if you are reseeding faster than your seed changes, you will repeatedly consume the same sequence of random numbers. I am not sure what the resolution of Now() is, but unless it is on the order of nanoseconds this will heavily affect the shown implementation.

If the resolution is one millisecond and it took 15 seconds to execute on a single thread, then the generated random values changed only every 74 iterations.


It is indeed nanosecond precision. http://golang.org/pkg/time/#Time


The value is represented with nanosecond resolution, but that does not imply that Now() will return a different value every nanosecond.


You are of course technically correct. It seems the precision returned by Now may be platform-dependent. https://github.com/golang/go/blob/master/src/time/time.go#L7...


Additionally - Pushing the seed operation out of the loop decreased the run time for 1,000,000 samples from ~10 seconds to ~50ms on my machine.


That is nice illustration of premature optimization. Instead of thinking "Lets parallelize", one should measure and find out what causes performance problems. Should one choose to go down the parallel path, it's a good idea to test if hyper threading degrades performance. In my experience it can be expensive use more than the physical cores.

edit: Another issue is that I really dont like when people present speedup in %. How should 540% speedup be interpreted? It makes more sense as a ratio, so we find sequential/parallel = 10067483333/1583584841 ~= 6.36. So the parallel version achieves a speedup factor of 6.36.


(6.36 - 1) *100 = 536 ~= 540,

So what distrinctiont are you trying to draw?


Yes I can do basic arithmetic. My point is that the ratio is easy to interpret, the program ran 6.36 times faster. The only reason I can think of for giving a percentage is to make it sound more impressive.


It is wasteful and nullifies the benefits offered by a PRNG.

For Monte Carlo simulations, it's in fact very bad practice to continually reseed a computation, as it makes them unrepeatable.


Updating the seed is extra pointless operations but if your PRNG is 'good' it shouldn't cause major numerical problems.

Besides the author is only sampling the PRNG 1 million times, this is hardly enough to stumble upon any periodicity in a modern PRNG. I have absolutely no idea if the PRNG provided by Go is any good or what method it is based off of.


It uses the plan 9 PRNG algo written by Ken (http://golang.org/src/math/rand/rng.go)


Fixed, thank you :)


No-one has ever claimed it is, in fact they specifically tell you it isn't, multiple times

A Jan 2013 Go lang blog post

http://blog.golang.org/concurrency-is-not-parallelism

reminding people of the Jan 2012 talk Rob Pike did on the subject

https://talks.golang.org/2012/waza.slide

Feb 2011 : Rob Pike on Parallelism and Concurrency in Programming Languages

http://www.infoq.com/interviews/pike-concurrency

I'll skip all the intermediate steps from there back to

Tony Hoare

http://www.usingcsp.com/


Yep, and the title is a reference to Rob's talk on this.

I refer to it in the post:

"Rob Pike, one of the creators of the Go language, dedicates an entire talk, "Concurrency is not Parallelism" to this ...."

Wrote the post as a real world example to demonstrate Rob's point.


I have just started a Go project for a core aspect of my business, so this information was well timed, for me at least, and gave me a quick overview of concurrency/parallelism in Go.

I will be developing both concurrent and parallel threads in my app, so this was very enlightening.

Thanks to the author for his clear writing style and efforts to educate.


Its a decent intro, but I think instead of just jumping into parallel code, its better to read the docs before hand. There are plenty of references to GOMAXPROCS and the thread safety of math/rand (and most of the stdlib).


Completely agree, nothing beats the docs.


This is a somewhat trivial suggestion, but it would be much more clear to your readers what the speedup was if you aligned the values in the benchmark results.


http://joearms.github.io/2013/04/05/concurrent-and-parallel-...

Parallel, at least in English, means to have nothing in common with others. On most OSes it is a process affined to a dedicated CPU which is configured to access a dedicated physical I/O device only. Everything else is concurrent.


Parallelism is like hunting elephants. The languages we use like Java, Ruby, Python and C++, for example, give you weapons for the hunt on the level of a very strong toothpick at best. Go has apparently upgraded the situation to a 3" pocket knife. Clojure to the level of a proper spear. But we need languages that allow us to completely avoid hunting elephants in the first place.


Have sent a PR handling all the wrinkles: https://github.com/soroushjp/go-parallelism-monte-carlo-demo...

* There is a race condition where wg.Wait() might execute before the wg.Add(1) runs * The wait group was not necessary as we are already waiting for all the results from the channel


Thanks again, very helpful and updated all relevant code in the post also.


Thanks for all the feedback everyone. Incorporated everything to make the code as good as possible, want readers to learn as much as possible. Just to be clear, this article was my way of highlighting Rob Pike's point that concurrency isn't parallelism, not to make it seem as if Go didn't support or was falsely claiming true parallelism.


It looks like you didn't allow for using more than a single process by setting GOMAXPROCS.

Additionally, it looks like you're re-seeding your random engine inside your sample loop, which is very slow. You only need to seed the engine outside the loop at the beginning.


Fixed the issue, thanks wpeterson


I wouldn't call this "real world". In a real world you are better off distributing these kind of tasks across multiple machines. Multicore parallelism per se is overrated and overhyped.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: