"GOMAXPROCS sets the maximum number of CPUs that can be executing simultaneously and returns the previous setting. If n < 1, it does not change the current setting. The number of logical CPUs on the local machine can be queried with NumCPU. This call will go away when the scheduler improves."
It will be nice when this requirement is eliminated.
Actually the most interesting part of the article is not about using GOMAXPROCS, but exactly how using GOMAXPROCS doesn't automagically turn your program into a parallel one: only a real analysis (helped with the helful profiler) tells you whether you've achieved true parallelism. The author's program wasn't truly parallel until he saw the bottleneck with the global mutex lock.
Said differently: concurrency is easy ("just" go func() all the things), parallelism is hard (GOMAXPROCS is not enough, you'll have to go deeper)
Maybe GOMAXPROCS exists so that the number of hw threads that might be spawn by the go application can be controlled in a more system-agnostic way. Without such env varianle, on Windows (for example) you can control this with CPU affinity (and possibly other better ways), but not sure about Linux/OSX/etc. So this kind of deals with it upfront.
I think that the default of single threaded is going to bite them hard in the long run. I've already discovered libraries that were never tested with MAXGOPROCS > 1 that are not thread safe. They should default it to at least 2 to make sure these bugs are shaken out.
The go race detector (go test -race) is actually really good at finding these kinds of issues, regardless of GOMAXPROCS. I've gotten a lot more value from running my tests with the race detector in a single process than running the tests with multiple processes and hoping to encounter thread-safety issues in a debuggable way.
The only real downside is that it slows things down quite a bit (30s vs 12s for a full test run in my current project; I've seen both better and worse ratios in other projects). I've also seen a project that made heavy use of generated code and a race-enabled build would choke on the very large source files. But overall I agree that it seems worthwhile to make -race the default.
Actually that is exactly the case in a library of mine[0]. Its not a bug of my code directly but due to non-POSIX compliance of Linux that triggers only with multiple threads (setuid does set the uid only for the executed thread and not for the others of the same process - unlike the manual page and POSIX states). Its a cornercase but I also explicitly raise the GOMAXPROCS in the test case to trigger it[1].
It is fair to claim that this is "non-POSIX compliance of Linux": the underlying system call interface of an operating system is not something subject to a standard. The mapping from the POSIX-compliant C library to the underlying system calls is allowed to be quite complex, and in fact most of the high-level POSIX functions map to system calls designed for much more general use cases, or arguments with different kinds of struct packing.
You really just should not have been coding directly against the system calls of the operating system: that isn't portable; and later in the discussion, this was specifically addressed ("The syscall package is not for general use. It has no documentation. That's not going to change. When we have a working Setuid etc they will be made available as part of package os."). What sucks is that they seem to have never gotten around to implementing os.Setuid.
Yes! They know it's bad (the documentation says something like "we'll do something smarter later"), but even just a default of 2 would be much better IMO.
You should assume Go's creators thought of this. There are performance implications, as it adds overhead if it's not necessary. Most people when writing libraries will decide if it's necessary or useful enough to use CPU multithreading. Single core is common in cloud servers.
This is a really bad idea that will never happen.
I didn't take it as a mistake in forgetting. I think this was more of an experiment in understanding. Clearly the author was aware of previous Parallelism vs. Concurrency discussions and this was just an applied test to witness the difference first hand.
tshadwell, the article was meant to help people who don't know even know about GOMAXPROCS. People are equating concurrency and parallelism, and the title is a reference to Rob Pike's talk concerning this exact topic.
Another error the author made is adding to a sync.WaitGroup in a different goroutine than the one that waits. This is another rookie mistake that go test -race would probably catch.
OT: The way we use the terms parallel and concurrent in computer science seems completely backward to me. The dictionary says "concurrent" means at the same time, and parallel lines need not be drawn at the same moment...
Yet in CS we talk of things being concurrent even if they're executed as cooperative threads on a single core and parallel only applies when executing concurrently (at the same time).
I agree that the terminology is a bit funny, and I have to reset my intuitions every so often, because it stops making sense to me. I have found these two posts by Bob Harper helpful in thinking about the different ideas here:
Perhaps concurrent could be used in place of parallel, but parallel could not be swapped used in place of concurrent.
Parallel lines are non-intersecting lines, i.e. lines traveling in identical directions. This is a nice allusion to the way parallelism works by running identical processes that do not interact. The fact that these processes can run simultaneously is a by-product of their parallel structure.
But yeah, the term concurrent is confusing because it can be applied to things that never actually overlap in time. But I can't think of a better term off the top of my head.
>Parallel lines are non-intersecting lines, i.e. lines traveling in identical directions. This is a nice allusion to the way parallelism works by running identical processes that do not interact.
Maybe, but there's nothing about parallelism that says the parallel processes have to be (a) identical or (b) not interact, if I am not mistaken.
Two threads running in parallel (and at the same time) might do totally different things (not identical) and might also interact with each other (e.g. consumer and producer threads).
Good point. In those cases, the analogy breaks down.
Merriam-Webster offers several definitions for "parallel", though the only definition with a temporal connotation is the definition related to computing:
"relating to or being a connection in a computer system in which the bits of a byte are transmitted over separate channels at the same time"
Meanwhile, "concurrent" references "parallel" as a synonym. Perhaps the terms are rather arbitrary after all, and we just need to get comfortable with their less-than-perfect assignments.
First, you are wasting cycles and with so little work to do before reseeding as in the code shown it probably matters quite a bit.
Second, some random number generators need some warm-up time producing lower quality random numbers at the beginning.
Third, if you are reseeding faster than your seed changes, you will repeatedly consume the same sequence of random numbers. I am not sure what the resolution of Now() is, but unless it is on the order of nanoseconds this will heavily affect the shown implementation.
If the resolution is one millisecond and it took 15 seconds to execute on a single thread, then the generated random values changed only every 74 iterations.
That is nice illustration of premature optimization. Instead of thinking "Lets parallelize", one should measure and find out what causes performance problems. Should one choose to go down the parallel path, it's a good idea to test if hyper threading degrades performance. In my experience it can be expensive use more than the physical cores.
edit:
Another issue is that I really dont like when people present speedup in %. How should 540% speedup be interpreted? It makes more sense as a ratio, so we find sequential/parallel = 10067483333/1583584841 ~= 6.36. So the parallel version achieves a speedup factor of 6.36.
Yes I can do basic arithmetic. My point is that the ratio is easy to interpret, the program ran 6.36 times faster. The only reason I can think of for giving a percentage is to make it sound more impressive.
Updating the seed is extra pointless operations but if your PRNG is 'good' it shouldn't cause major numerical problems.
Besides the author is only sampling the PRNG 1 million times, this is hardly enough to stumble upon any periodicity in a modern PRNG. I have absolutely no idea if the PRNG provided by Go is any good or what method it is based off of.
I have just started a Go project for a core aspect of my business, so this information was well timed, for me at least, and gave me a quick overview of concurrency/parallelism in Go.
I will be developing both concurrent and parallel threads in my app, so this was very enlightening.
Thanks to the author for his clear writing style and efforts to educate.
Its a decent intro, but I think instead of just jumping into parallel code, its better to read the docs before hand. There are plenty of references to GOMAXPROCS and the thread safety of math/rand (and most of the stdlib).
This is a somewhat trivial suggestion, but it would be much more clear to your readers what the speedup was if you aligned the values in the benchmark results.
Parallel, at least in English, means to have nothing in common with others. On most OSes it is a process affined to a dedicated CPU which is configured to access a dedicated physical I/O device only. Everything else is concurrent.
Parallelism is like hunting elephants. The languages we use like Java, Ruby, Python and C++, for example, give you weapons for the hunt on the level of a very strong toothpick at best. Go has apparently upgraded the situation to a 3" pocket knife. Clojure to the level of a proper spear. But we need languages that allow us to completely avoid hunting elephants in the first place.
* There is a race condition where wg.Wait() might execute before the wg.Add(1) runs
* The wait group was not necessary as we are already waiting for all the results from the channel
Thanks for all the feedback everyone. Incorporated everything to make the code as good as possible, want readers to learn as much as possible. Just to be clear, this article was my way of highlighting Rob Pike's point that concurrency isn't parallelism, not to make it seem as if Go didn't support or was falsely claiming true parallelism.
It looks like you didn't allow for using more than a single process by setting GOMAXPROCS.
Additionally, it looks like you're re-seeding your random engine inside your sample loop, which is very slow. You only need to seed the engine outside the loop at the beginning.
I wouldn't call this "real world". In a real world you are better off distributing these kind of tasks across multiple machines. Multicore parallelism per se is overrated and overhyped.
http://golang.org/pkg/runtime/
"GOMAXPROCS sets the maximum number of CPUs that can be executing simultaneously and returns the previous setting. If n < 1, it does not change the current setting. The number of logical CPUs on the local machine can be queried with NumCPU. This call will go away when the scheduler improves."
It will be nice when this requirement is eliminated.