OCaml 4.03 will, “if all goes well”, support multicore

avsm · on May 21, 2015

If you'd like to see the approach we're taking at OCaml Labs in order to build multicore, read KC's blog post here:

http://kcsrk.info/ocaml/multicore/2015/05/20/effects-multico...

The core idea is incredibly exciting (to us, anyway). Rather than baking in a specific multicore scheduler, we're allowing pluggable schedulers written in OCaml. They use algebraic effects to allow an independent scheduler to compose concurrency among OCaml threads. This will ensure that the OCaml runtime remains lean, and even allow applications to define their own strategies for concurrent scheduling.

DonPellegrino · on May 21, 2015

More information is available

in my original post in r/ocaml: https://www.reddit.com/r/ocaml/comments/36ninh/403_scheduled...

in the repost in r/programming: https://www.reddit.com/r/programming/comments/36ppx0/ocaml_4...

DonPellegrino · on May 21, 2015

For those asking "How the hell does OCaml not support multicore in 2015????", this is my reply, crossposted from /r/ocaml:

You can make OS level threads, but they can't be both running at the same time due to the GIL (Global Interpreter Lock). Then why are they even there you might ask? Because it allows you to do a blocking call on a thread and to keep executing other stuff in the main thread. Other languages that have a GIL (and the same restriction) are Javascript (including Node.js), Ruby and Python.

Now, IN PRACTICE, things are a bit different. You're never gonna make your own thread to block on things. You're gonna use Lwt to manage all your concurrency so you can do tons of blocking stuff at the same time and combine the tasks nicely without ending up in a Node.js-style "callback hell".

But still, even with tons of concurrency, you don't have parallelism. It's all you need for 98% of your programs, but if you then need to do heavy number-crunching it won't be enough. This is the exact same situation that happens in Node.js, Python, etc, except that OCaml is massively faster than those languages, so even some CPU-bound work is acceptable because OCaml is really performant.

Currently, there's 2 options if you wanna do CPU-bound work: you can use ctypes to call C code easily (from Lwt_preemptive) and then release the lock from within C with caml_release_runtime_system(), so your C code will be truly parallel (and running in the thread pool automatically managed by Lwt_preemptive), and you can call caml_acquire_runtime_system() before returning the result back to OCaml to get the lock back and merge back with the normal code.

The second option is to do an oldschool fork() and communicate with message-passing. Or have a master that manages workers and communicates with ZMQ, HTTP, TCP, IPC, etc. Or use a library that does it all for you like parmap, Async Parallel, etc etc.

What this "multicore support" means is that you'll be able to have threads in the same process that run in parallel because the GIL is going away. In practice it'll probably be implemented directly into Lwt so you'll be able to do something with Lwt_preemptive and just tell it to run some function in a separate thread and then use >>= to handle its result. It's gonna be simpler than both options I described above.

Again, more technical information is available in my r/ocaml post

jwatzman · on May 21, 2015

> The second option is to do an oldschool fork() and communicate with message-passing. Or have a master that manages workers and communicates with ZMQ, HTTP, TCP, IPC, etc. Or use a library that does it all for you like parmap, Async Parallel, etc etc.

I work on the Hack language typechecker at Facebook. The typechecker is written in OCaml, and since it needs to operate on the scale of Facebook's codebase (tens of millions of lines of code), it's a pretty performance-sensitive program. We needed real parallelism, but doing it with fork() and IPC was too costly for us, both in terms of storage (if you aren't careful you end up duplicating a bunch of data) and CPU (serializing/deserializing OCaml data structures to send over IPC is CPU-intensive).

We ended up doing something somewhat more interesting. Before we fork(), we mmap a MAP_ANON|MAP_SHARED region of memory -- that region will be backed by the same physical frames in each child after we fork, so writes to it in one child process will be visible in the others. We use a little bit of C code to safely manage the shared-memory concurrency here.

The code for this all open source (along with the rest of the typechecker, HHVM runtime, etc) if you want to take a look: https://github.com/facebook/hhvm/blob/master/hphp/hack/src/h...

I also gave a tech talk a while ago on internals of the type system and typechecker; the latter part starts here: https://www.youtube.com/watch?v=aN22-V-b8RM&feature=youtu.be...

AceJohnny2 · on May 21, 2015

> We ended up doing something somewhat more interesting. Before we fork(), we mmap a MAP_ANON|MAP_SHARED region of memory -- that region will be backed by the same physical frames in each child after we fork, so writes to it in one child process will be visible in the others. We use a little bit of C code to safely manage the shared-memory concurrency here.

Isn't that similar to how Linux implemented threads for a long time (before NPTL [1]) ?

I vaguely recall that for a long time people were complaining about the cost of starting threads in Linux, because it basically amounted to fork()+shared memory.

[1] http://en.wikipedia.org/wiki/Native_POSIX_Thread_Library

jwatzman · on May 21, 2015

I don't know the history of threads/NPTL on Linux. However, the distinction between "thread" and "process" in the Linux kernel is mostly a human one, not a technical one. Take a look at the clone() syscall -- spawning a thread vs. forking a process amount just to different flags to that call, to tell it whether to copy pages or not, how to assign a new ID to the new thread/process, etc. (Not sure if that's how fork() and friends are actually implemented under the hood.)

cbd1984 · on May 22, 2015

> (Not sure if that's how fork() and friends are actually implemented under the hood.)

If you use strace you can see that it is. fork(2) and pthread_create(3) both show up as calls to clone(2).

cbd1984 · on May 22, 2015

Just for the edification of the masses:

pthread_create(3) looks something like this:

    clone(child_stack=0x7f79d754bff0,
    flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
    parent_tidptr=0x7f79d754c9d0,
    tls=0x7f79d754c700, child_tidptr=0x7f79d754c9d0) = 31230

(Newlines are mine.)

Of course, those pointers are only that size on a 64-bit architecture. The flags are where the real point of interest is.

fork(2) is like this:

    clone(child_stack=0,
    flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
    child_tidptr=0x7f543a9aaa10) = 17978

When implementing fork(2), the return value from clone(2) is the child's PID from the context of the parent process. When implementing pthread_create(3), the return value for the parent is still an integer value which is unique to the thread, and which strace uses as if it were a PID when it's tracing down the system calls of individual threads in separate files, which strace can do because it's awesome.

Some more information:

> Linux has a unique implementation of threads. To the Linux kernel, there is no concept of a thread. Linux implements all threads as standard processes. The Linux kernel does not provide any special scheduling semantics or data structures to represent threads. Instead, a thread is merely a process that shares certain resources with other processes. Each thread has a unique task_struct and appears to the kernel as a normal process (which just happens to share resources, such as an address space, with other processes).

http://www.makelinux.net/books/lkd2/ch03lev1sec3

eurleif · on May 21, 2015

>Other languages that have a GIL (and the same restriction) are Javascript (including Node.js)

Not quite true; JS just doesn't support threads at all. It's asynchronous and single-threaded. In node.js's case, an event loop uses a system call like epoll or kqueue to wait for many events at a time, and dispatches those events to the correct callbacks.

You can do parallelism in JS with Web Workers, and they do use native OS threads, but they lack shared memory, and can only communicate using message passing. So from the perspective of the JS code, they behave more like processes than threads. No GIL, in any case.

Matthias247 · on May 22, 2015

I think thats not directly language related but an implementation detail - although a very important one. Javascript on the JVM (Nashorn) allows full multithreading - including then the need for all the common synchronization stuff.

But of course - Javascript and also the typical Javascript libraries were not made with multithreading in mind.

sampo · on May 21, 2015

> Node.js, Python, etc, except that OCaml is massively faster than those languages

The numerical benchmark table in http://julialang.org/ suggests that JavaScript is quite a number crunching beast, within 2x-3x of C performance.

DonPellegrino · on May 21, 2015

The thing is that numbers are usually wrapped in other things, like objects, hashtables, arrays, etc, and OCaml is a beast at dealing with that kind of code.

From a purely numbers perspective, its operations on integers have to use the LEA instruction instead of ADD (for example) because of the 1-bit tag, which slows things down a bit, but the speed at dealing with symblic code as I explained above more than makes up for it.

tangled · on May 21, 2015

It's interesting to read Xavier's annual statement on why there will never be multi core support in OCaml: http://mirror.ocamlcore.org/caml.inria.fr/pub/ml-archives/ca...

AceJohnny2 · on May 21, 2015

When was the latest iteration of post from? The page displays date as NaN...

Edit: I think I found it [1] That iteration was from 2002. I'd be curious to see if his opinion has evolved in 12 years.

Also, interesting to see game developer Chris Hecker [2] in that thread.

[1] http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/threa... search for "Why systhreads?" and "Xavier Leroy". Also, damn their website's broken.

Better link, I wish GMane had better Googlejuice: http://thread.gmane.org/gmane.comp.lang.caml.general/16381/f...

[2] http://en.wikipedia.org/wiki/Chris_Hecker

gnuvince · on May 21, 2015

Around 2001 I think?

rwmj · on May 21, 2015

trentnelson · on May 21, 2015

    > To make things worse, non-blocking I/O is done completely differently
    > under Unix and under Win32.  I'm not even sure Win32 provides enough
    > support for async I/O to write a real user-level scheduler.

sigh, VMS got the link between processes, threads, I/O and waitable events (specifically, the link between tying the completion of future I/O to subsequent computation) right from day one. And by virtue of Cutler, therefore, so did NT, and thus, Windows.

UNIX did not. The core concept of separating the work (computation to be done after an event occurs) from the worker[1] (the thread that performs the work) is absent; the manifestation of that is the lack of good, completion-oriented asynchronous I/O primitives. Instead of being able to say to the kernel "here, do this, then let me know when you're done"[2] and moving on to the next piece of work in the queue, you have to do the elaborate non-blocking multiplex dance for socket I/O, palm file I/O off onto a separate set of threads that can block (or do AIO) and generally manage all threading and concurrency primitives yourself.

It took me ten years of UNIX systems programming to suddenly grasp the elegance of the VMS/NT/Windows approach a few years ago. It provides you with everything you need to optimally exploit all your cores for work that is both heavily compute bound and I/O bound.

It has been fascinating to see the difference in performance between Linux and Windows in practice with PyParallel when Windows kernel primitives are exploited properly:

https://speakerdeck.com/trent/pyparallel-pycon-2015-language....

And more recently, with 10Gbe hardware at home:

Linux lwan (the top performer on Techempower Framework Benchmark):

    [trent@zebra/ttypts/1(~s/wrk)%] time ./wrk --timeout 120 --latency -c 256 -t 12 -d 30 http://10.0.0.2:8080/plaintext
    Running 30s test @ http://10.0.0.2:8080/plaintext
      12 threads and 256 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     5.34ms    7.46ms 197.13ms   82.40%
        Req/Sec    14.41k   364.49    18.82k    76.61%
      Latency Distribution
         50%  398.00us
         75%    9.01ms
         90%   17.50ms
         99%   28.03ms
      5178617 requests in 30.10s, 0.93GB read
    Requests/sec: 172048.49
    Transfer/sec:     31.67MB

Windows PyParallel:

    [trent@zebra/ttypts/1(~s/wrk)%] time ./wrk --timeout 120 --latency -c 256 -t 12 -d 30 http://10.0.0.2:8080/plaintext
    Running 30s test @ http://10.0.0.2:8080/plaintext
      12 threads and 256 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     1.52ms    9.38ms 492.43ms   99.33%
        Req/Sec    18.37k     1.01k   22.75k    73.50%
      Latency Distribution
         50%    1.09ms
         75%    1.28ms
         90%    1.56ms
         99%    5.18ms
      6598900 requests in 30.10s, 1.03GB read
    Requests/sec: 219236.69
    Transfer/sec:     34.92MB
    ./wrk --timeout 120 --latency -c 256 -t 12 -d 30   106.30s user 138.87s system 814% cpu 30.114 total

[1]: https://speakerdeck.com/trent/parallelism-and-concurrency-wi...

[2]: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-...

e12e · on May 22, 2015

Did you try to run that code under ReactOS too? I would assume (I've not checked) that they follow the NT kernel design -- so should have similar "architectural" performance -- even if I doubt they've had as much time to hand-tune the details.

It'd be interesting if running under the VMS/NT thread/fork model could be seen as a reason to deploy some apps on ReactOS rather than Linux/BSD. Would also be interesting if one could see any difference running a multi-core KVM guest on ReactOS vs a Linux/BSD guest/container/jail. Although I suppose one would need to dedicate a hw nic to see any real results (avoiding the host OS/VM scheduler etc)?

Note-to-self: something to play with...

trentnelson · on May 22, 2015

I haven't tried ReactOS; I'm not sure if they have all of the threadpool stuff (Vista+) implemented, and I use that exclusively for PyParallel. It'd be an interesting experiment never the less.

I was also curious to see what would happen if I tried to install it on Wine.

> It'd be interesting if running under the VMS/NT thread/fork model could be seen as a reason to deploy some apps on ReactOS rather than Linux/BSD.

I... couldn't imagine trying to use ReactOS instead of Windows for an actual deployment of anything. Why wouldn't you just use Windows? (Serious question.)

e12e · on May 22, 2015

I couldn't imagine deploying anything that's closed source/non foss for anything serious. I know that windows is source available (if you have a 100k?+ contract...) -- but really - why would you risk your platform going away?

This isn't academic -- look at Sun OS/Solaris. Granted we have open Solaris etc... but that appears as an accident of timing more than anything -- in retrospect.

Now, for the more relevant part: ReactOS vs Windows: If all you want is the kernel/thread model I could see going with ReactOS (pending actual research, as in: does it actually work :-). If you're deploying SQL Server/IIS .net (pending the so far seemingly serious effort to open .net) -- I don't know why one wouldn't go with Windows, no. In that scenario you'd be beholden (good and bad) to Redmond either way.

But for something like a python fork -- I could see something like ReactOS (or any other alternate kernel) be an interesting thing. You don't need much from the OS -- just classic services: basic filesystem/persistence, perhaps privilege separation (not so important for micro-service vms), scheduling.

pascal_cuoq · on May 21, 2015

Xavier's first sentence states that they two operating systems have a visibly different philosophy, not that one is better than the other. The second sentence should be interpreted in the context of this first sentence: if you try to emulate Unix's primitive with Windows', and especially if you want to do this and write a user-level scheduler that does not occasionally deadlock without reason, you will get stuck in a couple of places.

This doesn't mean that Windows' philosophy does not give you optimal performance in PyParallel. It simply means that OCaml had chosen for its low-level system primitives a Unix model and that it was difficult to make a Windows version of the same primitives so that OCaml programmers could write this kind of program portably between Windows and Unix.

NOTE: without, at the time it is in my timezone, looking up the full post, I have to say that I don't think that the quoted two sentences have anything to do with the discussion. It seems to me that the two sentences assume that a multicore (multiprocessor, at the time the post was written) OCaml runtime is not available, and discusses the options to still provide threads. A user-level scheduler is one option to provide threads to OCaml programs without a concurrent OCaml runtime. Another option is to use Windows' native threads and superior philosophy for blocking primitives to run each OCaml thread as a native thread (although at most one of these will be running at any given time. All the others will be waiting on the heap mutex).

OCaml ended up providing threads under Windows and a Unix-like “Unix” module around 1996-ish, way before the linked discussion. So thanks for the explanation about VMS, but I think it is off-topic, too.

NOTE 2: I have now read the original post. You should, too. It starts with:

> Threads have at least three different purposes:

>

> 1- Parallelism on shared-memory multiprocessors.

> 2- Overlapping I/O and computation (while a thread is blocked on a network

> read, other threads may proceed).

>3- Supporting the "coroutine" programming style

> (e.g. if a program has a GUI but performs long computations,

> using threads is a nicer way to structure the program than

> trying to wrap the long computation around the GUI event loop).

>

> The goals of OCaml threads are (2) and (3) but not (1) (for reasons

> that I'll get into later)

What makes it relevant to the current discussion is (1), but Xavier is discussing (2) and (3) at the time of the quote you chose to take out of context.

trentnelson · on May 21, 2015

Oh, no, that's what the sigh was for; Windows has the best model, but there's no equivalent on UNIX, so, you end up having to code to the lowest common denominator (the UNIX model) if you want your software to run somewhere else other than Windows (i.e. almost all open source software).

I'm not disputing any of the technical things he's saying; just ranting about the unfortunate nature of two vastly different kernel models, and the fact that no open source stuff properly exploits Windows facilities, despite them being technically superior.

gtk40 · on May 21, 2015

Would you mind explaining what the link between NT and VMS is?

trentnelson · on May 21, 2015

The principle architect of VMS was David Cutler, purportedly the best engineer at Digital at the time (80s), and best OS designer in the industry.

Digital dropped the ball in the late 80s with regards to management of Cutler and his team, canceling his PRISM project and leaving him and his team disgruntled.

Elsewhere in Seattle, a chap named Bill Gates was flush with billions of cash and knew that the shelf life of DOS was limited; if Microsoft were to succeed, they needed a new, robust, reliable and high-performance OS that they could "bet the company on".

Gates got word that Cutler was disgruntled at Digital, and a mutual party set up a meeting. Cutler was dismissive of Microsoft's technology stack at the time (DOS and some office apps) -- he was a hardcore OS engineer, and DOS was a toy.

Gates persisted, ensuring Cutler that he would have the opportunity to build the next generation of OS from the ground up and essentially unlimited resources at his disposal to do it. Cutler eventually agreed, and the NT kernel project was born.

http://www.amazon.com/Show-Stopper-Breakneck-Generation-Micr...

http://windowsitpro.com/windows-client/windows-nt-and-vms-re...

CountSessine · on May 21, 2015

I actually just read Show Stopper recently. The author is very non-technical and can't really explain the engineering details behind what he's writing about, but if you know something about basic OS design and concepts, that's ok. And the human stories - the stories behind the developers working on the project - are fascinating.

Reading the book and learning the story behind NT's development, it's just amazing that such a good OS came out of that process - they released years after their initial projections and were rushed the whole time. But of course the really good parts of NT - the kernel, the object manager, the pager, async IO, the threading model - were things Cutler and his cohorts had been working on for years, first with VMS, then with PRISM, and then finally in NT. They had YEARS to ruminate about those things before they ever arrived at Microsoft.

The bits of NT that aren't so well-regarded - the registry, NTFS, the graphical shell, csrss.exe and the 'microkernel' design - were completely new and developed in much less time and with less practical experience behind them than they really deserved.

trentnelson · on May 22, 2015

Can't sing the praises of Show Stopper high enough either! Finished it about a month ago and absolutely loved it -- it was great when paired with this: https://github.com/tpn/pdfs/blob/master/Windows%20Research%2...

I sent a tweet to the author saying I was really enjoying the book when I was about half way through it and he actually e-mailed me to say thanks. How nice is that!

nbevans · on May 22, 2015

Dave Cutler is the real stuff of legends. Obviously this is my opinion but I admire him and his work far far far FAR more than anything Linus Torvalds has done.

Arguably, Linus' greatest work was Git, not Linux. Linux is, architecturally, a piece of shit! Actually, wait, so is Git. Mercurial does everything Git does and does it far better and more elegantly. So yeah, wait... one wonders where Linus gets all his fanatics from!

trentnelson · on May 22, 2015

Heh, after reading Show Stoppers and Just For Fun, I actually think Cutler and Linus are actually very similar and would potentially get along in real life if it weren't for the epic technology divide.

tempodox · on May 22, 2015

This is the kind of remark that always gets me downvotes. I couldn't agree more with your opinion about git. I guess the attraction of git comes mainly from the fact that most of its users are too inexperienced to know any better. And then, git makes any old random directory a “repository”. That's waaaay cooler than to have to get a repo from a central server and having to integrate your commits with it...

[Edit: Clarification]

cmrdporcupine · on May 21, 2015

Dave Cutler was a lead engineer on both.

plorkyeran · on May 21, 2015

> Of course, all this SMP support stuff slows down the runtime system even if there is only one processor, which is the case for almost all our users...

> What about hyperthreading? Well, I believe it's the last convulsive movement of SMP's corpse :-)

Oh how things have changed. This was written before it was clear just how much of a disaster the P4 was, so it was a pretty reasonable position at the time.

rodgerd · on May 22, 2015

He was hardly the only one thinking that way - I remember Gabe Newell being scathing about multicore/multiprocessing when the PS3 and Xbox 360 were released. All a fad, waste of time.

bitmadness · on May 22, 2015

That was back in 2002! I don't think he'd take the same stance now...

istvan__ · on May 21, 2015

I think he was saying it is unlikely.

"In summary: there is no SMP support in OCaml, and it is very very unlikely that there will ever be. If you're into parallelism, better investigate message-passing interfaces."

dorfsmay · on May 21, 2015

I took a really hard look at OCaml a year ago, as I was running into performance issues with python. Lack of multicore support made me give up on it.

Now that Rust is around and supporting multicore, that's probably where I'll be investing my time.

I'd love to hear feedback from people who have used both Rust and OCaml.

diginux · on May 21, 2015

I have started playing with Rust and have used OCaml on and off, but recently really diving hard into it building a REST-based application from top to bottom in OCaml, including all the infrastructure bits. We are slowly making our stuff open source as it becomes useable: https://github.com/afiniate

In short, OCaml is a mature language that has been used for decades in commercial applications. I feel OCaml is the next progression for the people that got excited about distributed systems via the Erlang path and want more of the safety and reasoning that comes from a strongly/statically-typed language like OCaml. Rust may or may not take off, but I am confident OCaml will remain viable for the foreseeable future, and probably gain slow, but steady popularity as engineers see all the cool things you can do like MirageOS: http://openmirage.org/

atombender · on May 21, 2015

Isn't the Unicode situation in OCaml more or less the same as in Erlang and Ruby 1.8, ie. "string" is just a byte string, and there's no native encoding support?

Last I checked, there was decent third-party library support in Batteries. I imagine it would be painful if you were to use Batteries' "UTF8.t" string type and had to interface with some other library that used "string" or some other string solution (like Camomile)?

_pvxk · on May 22, 2015

There's no built-in encoding/decoding stuff, ie. you need to use a library like Batteries, Camomile, uutf/uucp if you want to do something like capitalise, split or count characters.

Writing the appropriate glue isn't very hard, the interfaces either work with bytes or have to/from-bytes functions, but I suppose it's a bit annoying (at least when first starting out with the language) to have to figure out which lib is needed for which type of string operation, e.g. if you're into Batteries you still need Camomile (or uucp) for lowercasing:

    module C = CamomileLibraryDefault.Camomile
    module CM = C.CaseMap.Make(C.UTF8)
    module U = Batteries.UTF8
    
    let lower_initial bytes =
      U.sub (U.of_string_unsafe bytes) 0 1
      |> U.to_string_unsafe
      |> CM.lowercase
    
    let () =
      lower_initial "Åge" |> print_endline (* prints "å" *)

atombender · on May 22, 2015

That's pretty horrible. Thanks for the explanation.

toast0 · on May 22, 2015

Erlang has no string type. Most strings are a list of integers (any size), you can put Unicode code points there if you want, or integers less than 255 if you prefer. There is also a binary/bitstring type which is an array of bits (if a multiple of 8, it's a binary). You can put whatever you want in a binary, it's binary.

If you'd like things encoded in some way, that's up to you, there is no type to help you (there is a Unicode module which can help convert between encodings)

WaxProlix · on May 21, 2015

Don't forget the .NET OCaml, F#!

bunderbunder · on May 21, 2015

F# is a member of the ML family, but I'd really hesitate to call it the .NET OCaml. It would be similarly accurate to say that Objective-C is the Apple C++.

They belong to the same family, and they both share a common ancestor that is not object-oriented. But their object systems are very different from each other.

yellowapple · on May 21, 2015

> It would be similarly accurate to say that Objective-C is the Apple C++.

Well it's not inaccurate. Both of them were designed to make C object-oriented, and they tend to be used for many of the same situations because of that.

A more accurate nomenclature would be ".NET's OCaml-equivalent" or "Apple's C++-equivalent" (much like how C# is characterized as ".NET's Java-equivalent). This falls apart with thorough inspection, of course, but it's good enough for tongue-in-cheek comparisons like this.

nbevans · on May 22, 2015

F# has a compiler flag to force it to use OCaml syntax parsing. It basically amounts to just being a strict mode. Because F# has very slightly more forgiving syntax than OCaml; but essentially almost identical.

To call it the .NET OCaml is not too far from the truth. And it was clearly meant tongue in cheek!

darksaints · on May 21, 2015

I've long admired OCaml, but never used it really. I picked up Rust a while ago and actually made some use of it. I love the language a lot, but it is still too immature and low level for my current situation. I ended up using F# as the compromise, which has been amazing.

There are some areas where OCaml is more advanced than F# (functors, the codegen from the optimized compiler, lack of the msbuild barf sandwich, less hacky on non-Windows platforms), but there are also plenty of areas where F# is more advanced than OCaml (computation expressions, code interoperability, real 32 and 64 bit integers, agents, multicore runtime).

I would say that if OCaml was ideal for you except for the lack of parallelism, then you should definitely check out F# before you go all the way to Rust. Rust is awesome and for the right use case you should use it, but F# is a lot closer to OCaml than Rust is.

eatonphil · on May 21, 2015

I am the author of the major web frameworks for ocaml [0]. On of the nicest things about OCaml is the identical system calls. I built my entire OWebl platform by reading through C servers.

On the other hand, Rust (like Erlang) reinvents and wraps a lot of these calls in ways that are not immediately obvious. (Or at least, not AS immediately obvious as they are in OCaml.)

This is such a tremendous aid because there are nearly limitless documents and examples of the Unix API.

[0] http://github.com/eatonphil/owebl

WaxProlix · on May 21, 2015

I'm in the same place sort of, and Golang and Julia seem like the obvious higher-performance transitions from Python. What about Rust makes you consider it so highly?

dorfsmay · on May 21, 2015

To be honest the main reasons are:

* network effect / hype

One of the main reason. It sounds lame, but I can justify to a customer re-writing a project in Rust because they've heard about it and they will be able to hire people who have either used it or at the very least will be interested in learning it. Also, they network/hype means that we are going to see good libraries emerge fairly quickly.

* multicore support

A year ago I was ready to move to OCaml, bought books, started to learn it, but the multicore situation was worse than python. Today we hear that "there is a good chance it's going to get multicore support". In Rust, it's already there, and it's not and after thought.

Rust also happened to be slightly faster for most things according to micro-benchmarks, but we know how reliable those are, and it's we're not talking order of magnitudes here, although it is early and we can hope it gets even better.

yarrel · on May 21, 2015

The Rust hype machine has kicked into high gear. D suddenly seems under-promoted.

iamd3vil · on May 21, 2015

Take a look at Nim Language. (www.nim-lang.org) A High Performance lang with "Pythonesque" syntax.

WaxProlix · on May 21, 2015

I've seen Nim going back to when it was Nimrod, but my impression (correct it here if needed please) of their development cycle isn't especially high. I do like the syntax and the inherent 'threadiness' of it, for lack of a better word.

dorfsmay · on May 21, 2015

My understanding is that it does not have multicore support.

e_d_g_a_r · on May 21, 2015

What were you doing that you needed multicore?

dorfsmay · on May 21, 2015

Pivoting hundreds of millions json entries and upload the results to an object store.

I ended up using a combination of python threads and processes, which was more work I wanted to do, and still relatively slow.

chubot · on May 21, 2015

Why not C or C++? OCaml and Rust aren't going to interface with Python nearly as well.

Also, it's trivial to write multithreaded extensions in C or C++ (at least for data processing). You just have to make sure to release the GIL.

IMO the trick is to just use the Python C API, and not use any wrappers like SWIG or ctypes. The Python C API is a little odd but it is fairly explicit. Also, it's better to write plain functions rather than classes, but for data processing that is natural anyway. If you need a class, do that part in Python.

Writing "one way" extension functions (that don't call back into Python) is quite easy and can give you a huge performance boost.

I think people get hung up on Python extensions because they are using TWO unfamiliar languages -- the wrapper language, and C/C++. But if you are just using one additional language, it's not hard to figure out.

yellowapple · on May 21, 2015

> OCaml and Rust aren't going to interface with Python nearly as well.

I don't know about OCaml, but Rust (supposedly) will interface perfectly fine with anything providing a C API (which includes Python, along with quite a few other scripting languages from that era, like Perl and Ruby). Rust's ABI-compatibility with C is one of the primary selling points.

I don't know off the top of my head if anyone's tried writing Python extensions in Rust, but I don't see why it wouldn't be possible to do so with at least as much capability (if not more) as C/C++.

jpace121 · on May 21, 2015

https://www.youtube.com/watch?v=3CwJ0MH-4MA

The talk above was on the front page last week some time and describes one method to write Python extensions in rust.

pcwalton · on May 21, 2015

Why won't Rust interface with a C API?

kibwen · on May 21, 2015

Indeed, I think that taking the place of C for writing Python and Ruby extensions will be an enormous opportunity for Rust.

chubot · on May 22, 2015

If it has C ABI compatibility, then technically it can. That doesn't mean it's easier to interface to Python than C or C++ though. I'd find it hard to imagine it being easier, and I know it's not simpler.

I'll have to watch the video below, but don't you have to write some sort of Rust wrapper for every definition in Python.h? What about macros like Py_INCREF and Py_DECREF? I'd guess that someone has or will eventually do that work, but it's yet another layer of complexity, which can have the same downsides as SWIG.

For better or worse, the Python C API is tighlty coupled to the C implementation. That makes it very difficult to be more natural and understandable than C. It's a fairly huge API: https://docs.python.org/2/c-api/index.html

People always try to cover it up with "nice" abstractions, but they invariably end up being leaky (e.g. with respect to threads, garbage collection, OS portability, etc.)

Another huge can of worms: the build system. With a plain C extension, all I need is a C compiler on the system, and I can just do "python setup.py build". The situation with Windows is also quite messy -- I can't imagine Rust making it better.

I have experimented with creating normal .o files from OCaml and linking them with .o files from C/C++. That is a great feature of the OCaml toolchain. Still, I remember the documentation being sparse and I don't even remember how I did it at the moment.

FWIW, my main language is Python, but I love OCaml for certain things, and have grown to like C++ as well. Rust seems very interesting to me for its security properties and because it has native threads rather than including a mini-OS in the runtime like Go. My understanding is that Go is hopeless for many kinds of Python extensions, precisely because of the runtime issue, and calling from Go back into Python. (at least this was true a year or 2 ago)

I'm always wary of creating more layers than necessary. A system composed of Python and C will necessarily have fewer layers than one composed of Python and Rust, simply because Python is written in C and its interface is defined in C.

EDIT: I left out the BIGGEST point -- a dealbreaker. Python does NOT have an ABI. It has an API. AFAIK, that means you have to write a Rust ABI-compatible wrapper for EVERY VERSION of Python.

https://www.python.org/dev/peps/pep-0384/

To make a generalization, programmers learn about C APIs before they understand what the ABI is. It's just more concepts that you need to know about to write correct code. So if you just want to speed up an existing Python program, I would still recommend using a simple C or C++ extension. This solves both single-threaded speed issues and give you parallelism with multiple cores, so you will get huge speedups.

pcwalton · on May 22, 2015

> That doesn't mean it's easier to interface to Python than C or C++ though. I'd find it hard to imagine it being easier, and I know it's not simpler.

I think Rust can be easier to learn than C. You get high-level features like modules instead of include guards and preprocessor hacks, and you can skip learning the part about how to debug segfaults.

> I'll have to watch the video below, but don't you have to write some sort of Rust wrapper for every definition in Python.h? What about macros like Py_INCREF and Py_DECREF?

We have bindgen to automate most of that for you. For macros, you do need to duplicate them, but people should just do that once (and Google brings up several projects in which people have already written bindings for that).

> People always try to cover it up with "nice" abstractions, but they invariably end up being leaky (e.g. with respect to threads, garbage collection, OS portability, etc.)

But I don't see how Rust makes that any worse than in C.

> Another huge can of worms: the build system. With a plain C extension, all I need is a C compiler on the system, and I can just do "python setup.py build". The situation with Windows is also quite messy -- I can't imagine Rust making it better.

Yes, you do need to add Rust support to the extension's build system. That is fair, but I think Rust's advantages outweigh that cost :)

> A system composed of Python and C will necessarily have fewer layers than one composed of Python and Rust, simply because Python is written in C and its interface is defined in C.

That doesn't make any sense to me. It's all machine code at runtime. The pertinent question is whether Rust adds any measurable abstraction taxes at runtime (it doesn't) and whether Rust uses the same ABI as C (it does).

> EDIT: I left out the BIGGEST point -- a dealbreaker. Python does NOT have an ABI. It has an API. AFAIK, that means you have to write a Rust ABI-compatible wrapper for EVERY VERSION of Python.

That is unfortunate, but someone could write a crate that dynamically determines which Python is in use and chooses the right functions based on that, put it on Cargo, and everyone could link against it.

Anyway, people are already writing Python programs that call into Rust via ctypes and whatnot, so this hasn't stopped people yet.

> So if you just want to speed up an existing Python program, I would still recommend using a simple C or C++ extension. This solves both single-threaded speed issues and give you parallelism with multiple cores, so you will get huge speedups.

At the cost of memory safety, which is a big deal. And you have to learn C, which for a dynamic language programmer can be a lot harder than learning Rust.

Since you mentioned multicore, I should mention that the threading facilities available "out of the box" in C and C++ are much harder to use than the equivalent ones available in Rust, and if you reach for something like TBB and Boost now you have all the build system issues you described previously.

chubot · on May 22, 2015

Sure, we have the same understanding of what's going on in each situation. But I can't see how anyone could call the Rust version as simple as C/C++. There are simply more concepts involved.

You can try to cover them up with magic code generators and version wrappers, but in my experience those are precisely the leaky abstractions that make people scared of FFI. They work 99% of the time, then bite you 1% of the time. The built-in Python FFI is somewhat ugly, but simple and debuggable.

I'm not saying you shouldn't use Rust for Python extensions -- just that there is an inherent awkwardness in doing so, given that Python is written in C, and the Python 2.x has no stable ABI. Of course Rust offers benefits, so if you want to pay the cost for those benefits, it might be worth it.

nextos · on May 21, 2015

I'm considering OCaml for a new project where C++ would be the typical choice. Think algorithms handling massive amounts of data, and some numerics.

I have some experience with ML, Haskell & Lisp. OCaml is appealing because it is quite efficient and predictable. Does it have the bit of laziness Clojure has that makes functional programming easy with large data?

gmfawcett · on May 21, 2015

Yes, there is support for laziness (see "streams"). A couple things to keep in mind: floating point values in Ocaml are boxed (floats are actually pointers to float data on the heap), and integers are one bit shorter than native types (31 or 63 bits) due to the way that Ocaml values are tagged internally. The native compiler generates good, predictable, but fairly simple code: few optimizations are applied (although there is active work underway, in the "flambda" project, that will significantly change this). Also, there is of course a garbage collector, though it is quite efficient in most cases. These factors may or may not be a performance issue in your own project.

DonPellegrino · on May 21, 2015

In practice the generated code is already extremely fast and the 1-bit shorter ints help make the GC one of the fastest I've seen. If you do a lot of floating point calculations, you can put your floats in an array and they'll become unboxed.

shriphani · on May 21, 2015

Function application in OCaml is eager.

However, implementing laziness is trivially accomplished.

raphaelss · on May 21, 2015

It already comes with support for lazy evaluation.

http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec216

sgeisenh · on May 22, 2015

Just about every eagerly evaluated functional language supports laziness through thunks. This is just syntactic sugar.

SniperOwl · on May 21, 2015

If Jane Street Capital has it their way, multicore support "will definitely go well".

diginux · on May 21, 2015

Are you from JSC by chance? Is this from an authoritative source, or an just an observation of likeliness? Would love if they indeed do push hard for this.

tomjen3 · on May 21, 2015

It sounds like such a company might also have the capital to pay people to make it work.

It is pretty stupid for a language not to have multicore support in 2015. Javascript has it (in its own, somewhat broken way).

rubiquity · on May 21, 2015

> Javascript has it (in its own, somewhat broken way).

No it doesn't. Also OCaml is from a pre-multicore era. Even Erlang wasn't multicore from the start, SMP was added in 2005.

tomjen3 · on May 21, 2015

Webworkers

And I don't really care if it is from an other era - C has threads (as a library but still).

rubiquity · on May 21, 2015

> Webworkers

JavaScript Webworkers can't interact with the page at all, they can only send messages around. This implementation detail leads me to believe the browser is probably doing little more than instantiating another JavaScript interpreter and handling IPC/synchronization for you. This is a far cry from true parallelism.

Further illustrating that JavaScript doesn't have parallelism is that Node.js isn't parallel and in fact encourages its users to use process forking instead.

> And I don't really care if it is from an other era - C has threads (as a library but still).

OCaml has threads. It just doesn't have parallel threads (yet). Threads existed before CPU parallelism so of course C and a bunch of other pre-CPU parallelism languages have them. The difference is C doesn't have a GIL whereas OCaml does/did.

almosthaskeller · on May 21, 2015

Looked into a static FP language recently. Was torn between OCaml and Haskell. Leaned more toward Haskell than OCaml. Mainly because OCaml feels like it was hacked together, with a lot of very strange and inconsistent syntax and poorly thought out semantics. That said, I haven't chosen either yet, because Haskell has its own share of oddities that I'm still not comfortable with. But at least it feels more pure and consistent and well thought out in its syntax and semantics.

elihu · on May 21, 2015

Ocaml and Haskell are both good languages. Haskell probably has a bigger community and more momentum at this point. I switched from Ocaml to Haskell long ago because I wanted parallelism.

The implementation philosophy of the two languages is pretty different, despite being superficially similar in terms of syntax. Ocaml is pretty predictable -- you can look at code and have a pretty good idea of what kind of code the compiler is going to generate.

Haskell is a lot more opaque. Between laziness and a more rigid type system, ghc can do some pretty crazy code transformations. In general, this is a good thing, but it can make performance questions harder to figure out.

I think that Ocaml is easier to learn, but Haskell is more fun, and I've learned more from using it.

ufo · on May 22, 2015

Haskell libraries also tend to have more levels of abstraction then Ocaml ones, in my experience. Ocaml libraries don't tend to use things like monad transformers or lenses.

alextgordon · on May 21, 2015

After many years of struggling with Haskell, I've all but given up on it, because of the broken record semantics. They seem to be able to find time to implement monad comprehensions or whatever paper fodder is most in vogue, but you still can't have two datatypes with the same field name in the same module. It's not a serious language.

elihu · on May 21, 2015

That is also true of Ocaml, and probably every other ML derived language that treats accessor functions as ordinary functions (rather than using the C-style dot operator). There is a type-directed name resolution proposal that would remove this limitation in Haskell, but it would probably make the typechecker a lot more complicated.

The can't-reuse-field-names thing is annoying, but claiming that it "isn't a serious language" because they made a design choice that doesn't meet your exact expectations seems kind of closed-minded to me.

lpw25 · on May 21, 2015

OCaml has had type-directed record disambiguation for a few versions now. So you can quite easily reuse field names these days.

elihu · on May 21, 2015

That's cool, I didn't realize they had done that. I haven't used Ocaml in quite awhile -- it's good to hear the language is still getting better.

pcwalton · on May 21, 2015

> ML derived language that treats accessor functions as ordinary functions

OCaml is not in that tradition—field projection uses dot notation and has historically not been an ordinary function (although maybe that has changed in later versions).

gmfawcett · on May 21, 2015

You're right, and that is still the case.

kuschku · on May 21, 2015

Or one could use the lisp-way – and just call the functions <record-name>-<field-name>.

This avoids those problems.

gmfawcett · on May 21, 2015

The Ocaml way would be to define the functions in separate modules (A.foo, B.foo, etc.). There are convenient syntaxes for declaring which modules are in scope in a given expression, as well as support for first-class modules (you can define functions that take modules as parameters). The module system is sophisticated, and is arguably the "power feature" of Ocaml.

creichert · on May 21, 2015

This is what you generally see in larger haskell programs which might have a name clash (at least it's what I use):

    data F = F { fName :: String }
    data P = P { pName :: String }

LeonidasXIV · on May 21, 2015

Yes, this is one of the cases where ML modules are elegant, since you can put the definition of the type into its own module, so it becomes F.t and P.t and you avoid field name clashes.

djur · on May 21, 2015

Having two datatypes with the same field name in scope doesn't work well with type inference. OCaml permits it, but it requires type annotations, and in practice it seems like developers solve the problem the same way Haskell does -- separate modules.

That said, working with modules is a lot nicer in OCaml than in Haskell, so it's a less painful solution.

issaria · on May 21, 2015

A record field accessor is just a normal function, are you also expecting define methods with the same name within the same module? If you want an elegant solution, write a typeclass. You have to write you java in the haskell way. https://wiki.haskell.org/Name_clashes_in_record_fields

e_d_g_a_r · on May 21, 2015

What do you mean hacked together? It looks like most any other ML for the most part.

And what poorly thought out semantics?

jallmann · on May 21, 2015

ML actually has a formally specified semantics. Haskell does not.

brians · on May 21, 2015

OCaml doesn't either.

jallmann · on May 21, 2015

Sure, for a subset of the language. But the point is, a hand-wavy complaint about PL semantics (which can be precisely defined) doesn't make sense when comparing languages in this context.

codygman · on May 21, 2015

Do you mean:

http://sml-family.org/sml97-defn.pdf

jallmann · on May 21, 2015

That's the one for Standard ML, yes. There are others for SML, some using proof assistants [1]. Other (S)ML extensions have formal semantics as well, and OCaml itself is partially specified [2].

[1] https://github.com/CakeML/cakeml

[2] http://www.cl.cam.ac.uk/~so294/ocaml

creichert · on May 21, 2015

Both are great languages and worth learning. All programming languages have quirks.

feld · on May 21, 2015

Seems like a major feature to put in a point release...

I don't understand why some projects have such bizarre versioning methodology.

LeonidasXIV · on May 21, 2015

This is not really a point release. OCaml versions a little different, a version number has three parts: Super-Major.Major.Patch. Super-Major releases are incredibly rare, the last one was the bump to 4, which was done since the language now supports GADTs (while staying compatible with OCaml 3.x). I don't even know what caused the bump from 2.x to 3.00. Then the Major part is a normal release in which many features may be added. The format is always two digits, of which the first may as well be a 0. The Patch part is just for fixes, stuff that was broken and overlooked when the release was done.

So OCaml 4.03.0 is basically 4.3.0 in a Python-esque versioning scheme (remember how many changes were done between Python 2.2.0 and 2.7.0?).

feld · on May 23, 2015

Thanks for clearing that up

j_baker · on May 21, 2015

Does OCaml not already support multicore? Is concurrency green thread based? Even at that, there's nothing stopping a user from starting multiple processes....

murbard2 · on May 21, 2015

Yes, the threads are green threads: only one can run at a time. There's also an Async framework in Jane Street's Core library and there's LWT.

My understanding is that GC is hard with multithread, particularly in a functional language where it's going to do some heavy lifting and needs to be very performant.

Refefer · on May 21, 2015

That's not really true. Erlang's VM is fantastic at GC with thousands upon thousands of green processes multiplexed onto the system threads, allowing soft realtime performance. Similarly, Haskell's Parallel Strategies library works well with the Parallel GC. Immutability makes this a whole lot easier.

Or were you referring to OCaml in particular?

tormeh · on May 21, 2015

Erlang uses only actors as concurrency mechanism and exploits that fact by giving each actor its own heap. So Erlang's GC does not need to accommodate concurrency, even though Erlang itself does.

murbard2 · on May 21, 2015

To OCaml in particular, and you're right that immutability does make it much easier, but OCaml has mutable variables, which is precisely why it's hard.

bad_user · on May 21, 2015

GC can be hard with single threading as well. Even if you don't have concurrency to worry about, it still has to be incremental - meaning to do its work in small batches and provide a guarantee that the process doesn't get blocked for more than X millis.

Also, if you can rely on the data-structures stored in your heap to be persistent, then you can tune the GC for it. The problem is that you need to make assumptions about the life-cycle of those data-structures. For example, the persistent data-structures being used in Scala or Clojure can be pretty heavy for the JVM's garbage collectors because they tend to produce junk that is neither short-term or long-term, thus invalidating the assumptions with which the JVM was built with. And generally that's OK, because the JVM's GCs can cope pretty well and if the need to optimize arises, well both Scala and Clojure are hybrids (just like OCaml), so you can just use mutable stuff if by profiling you see problems. So the theory is known and a decent concurrent GC can be built.

bad_user · on May 21, 2015

Starting multiple processes sucks though, you know, for the kind of use cases for which OCaml should be well suited for. This is because OS processes are more expensive than threads and communication and synchronization between such processes gets very expensive.

istvan__ · on May 21, 2015

on this argument you could state that Erlang's processes the way to go because OS threads are way more expensive. If I have a problem that I can easily make parallel than I could just start up N OCaml processes and send each process a chunk of work and this would not be much more inefficient than the thread based implementation. On the top of that, I don't like to create a tonn of threads in any application, I much rather have a fixed number of threads and using channels to send them work than dynamically (on demand) creating threads or processes.

bad_user · on May 21, 2015

Erlang's processes are actually better than threads for many use cases, yes. But the thing is that with 1:1 multi-threading you can build many abstractions on top and for example on top of the JVM frameworks using Erlang's model like Akka [1] and Quasar [2] are very popular. Either way, you can't compare Erlang with platforms whose notion of parallelism involves POSIX's fork, but lo and behold, I'm comparing it with Java because I can.

MOST problems are not easily parallelizable and you'll end up with concurrency. Concurrency means synchronizing and sending messages between processes. For example sending messages between processes means opening a socket of some sort, serializing the data you want to send and deserializing it on the consumer's side. That's extremely inneficient and there's no way you can end up with a pipeline of tens of millions of messages per second, but with 1:1 multi-threading you can [3]. Erlang can't cope with this load btw.

One other thing I like about working with threads is the user-friendliness. Yes, skipping over the perils of multi-threading which you can sort of avoid by using better libraries, you can easily do things like number crunching using "parallel collections", combine actors with reactive streams and futures, or fake asynchronous I/O by blocking threads. People nowadays tend to underestimate the utility of blocking threads, but it's pretty cool having an interface like `def fetchData: Future[Result]` that could be implemented on top of Netty (asynchronous I/O) or with JBDC (blocking I/O), only suffer a very small penalty and your process to still be able to reach 80% of CPU utilization.

So I think OCaml getting multi-threading support is a pretty big deal.

[1] http://akka.io/

[2] https://github.com/puniverse/quasar

[3] https://lmax-exchange.github.io/disruptor/

istvan__ · on May 22, 2015

Thanks. I think Erlang is the grandfather of languages using the actor pattern and at the same time Joe realized that POSIX threads does not belong to business logic code and it can be done without leaking the number of actual OS threads to the user's codebase. On the top of that Joe also implemented the message passing to avoid the shared memory that is a serious problem from the safety point of view for applications with threads. One thread can take down the entire process execution. Addressing these in Erlang made it possible to write the first commercial system with extra high availability and fault tolerance.

JVM is obviously great a VM, and Scala brings most of the Erlang (and many other languages) features to the table. I am not sure about the fault tolerance. I need to look into how Akka implements the actors.

> Concurrency means synchronizing and sending messages between > processes.

I think in Erlang you can do only async sending and when the receiving process wakes up it gets it through one of the means you mentioned.

http://bartoszmilewski.com/2009/02/10/message-passing-sync-o...

> Erlang can't cope with this load btw.

This is what I am curious about. I have seen only one big system written in Erlang that was massive big. WhatsApp was also running in Erlang and they achieved something like 1M connection/server. That was impressive. I am wondering how could you do that with Akka?

> https://github.com/puniverse/quasar

I am following the development of these, starting to use it in Clojure soon, I am curious how it works out. Personally I prefer Aleph on the JVM for Clojure projects but wanna see what is happening in Quasar & co.

https://github.com/ztellman/aleph

timruffles · on May 21, 2015

Great! This, plus the lack of libraries, put me off. Will take a new look!

LeonidasXIV · on May 21, 2015

You'll be delighted to hear that OPAM curently features >800 libraries, too.

kristianp · on May 22, 2015

Any libraries in particular?

tempodox · on May 22, 2015

Yupeee!