I have a PhD in computational fluid dynamics (CFD), and have written several CFD...

chousuke · on March 29, 2010

Clojure's not very fast with numerical code if you write it idiomatically. One main reason for this is that functions can't (for now) take primitives as parameters, so all arithmetic is boxed unless it's done inline with explicit type hints. (Hence all the macros in the Clojure code.)

However, don't lose hope just yet. Languages like Haskell or OCaml, with more mature compilers, might not have such limitations. Also, I think that while many small examples might turn out essentially equivalent when comparing imperative and functional styles, the difference becomes more pronounced on a whole-program scale. Pure functions are easier to reason about, and compose better.

Fortran, C and C++ will probably keep their place as tools to use when absolute best performance is needed, but with modern compilers and virtual machines, functional languages do not have to be slow either.

Finally, I hope to see functional programming languages succeed simply because functional programming is a lot of fun.

eric_t · on March 30, 2010

OCaml is only really fast if you use the imperative features. The only high performance Haskell code I've seen is from the language shootout, and it looks pretty hairy as well. Also, note that my Fortran function actually _is_ pure, so the whole argument about purity doesn't hold up.

I kinda understand what do you mean by whole-program scale, but in fact, most of our programs aren't much bigger than this! Typically, the number of lines of code doesn't exceed 50k.

I agree about functional programming being fun, but unfortunately I don't think my coworkers would agree, and especially not the companies funding our research!

luchak · on March 29, 2010

Honestly, in a lot of cases language performance doesn't matter as long as you have a good numerics library. For example, here's one implementation of the projection step in my current project, which does fluid simulation on unstructured tetrahedral meshes and relies heavily on SciPy's sparse matrices:

  def __init__(self, mesh):
    div_matrix = op_mesh.n_to_nm1_boundary.T * op_mesh.F_matrix
    boundary_matrix = op_mesh.F_matrix - op_mesh.FB_matrix
    self.C_matrix = sparse.vstack((div_matrix, boundary_matrix))
    self.CCT_matrix = self.C_matrix * self.C_matrix.T

  def Project(self, velocities):
    lam = linalg.cg(-self.CCT_matrix, self.C_matrix * velocities)[0]
    return velocities + self.C_matrix.T * lam

It's pretty simple (constrained least squares optimization) and it's missing some stuff (preconditioner), but it runs fast enough (~1 minute/frame for > 1M tetrahedra), and only required minimal testing: it's all built up from operators I was already using elsewhere. Since it's the fourth or fifth projection method I've tried for this code, that makes a big difference. The fact that I'm using Python is almost immaterial: all of the "interesting" code is library calls.

eric_t · on March 30, 2010

Interesting. Is the code available somewhere? I looked at your SIGGRAPH paper, and that was very interesting.

Have you seen the FiPy project? It's a full-featured finite volume code in Python, which is very easy to extend (they did struggle with performance in early releases, not sure how the current status is).

I guess that's the area where dynamic or functional languages can be useful, in that they are easier to build generic libraries for, and can give rise to codes with easier extensibility.

cema · on March 29, 2010

I would offer that compared with Clojure, Fortran is like a DSL for numbers crunching. Does it make sense?

swannodette · on March 29, 2010

I'd be curious to know if/how the Clojure code could be written to do this kind of number crunching on GPU. Penumbra has a very idiomatic library for offloading work to the GPU.

prospero · on March 29, 2010

I don't have time for a full conversion just now, but an eight-way diffusion on the GPU in Penumbra looks like this:

  (let [sum 0.0
        count 0.0]
    (convolution 1
      (+= sum %1)
      (+= count 1))
    (/ sum count))

"convolution" is a keyword that iterates over the neighbors (with a radius of 1, in this case), and does not overrun the boundaries of the source textures, hence the need to keep a running count.

For a more extensive example, see this implementation of Sobel edge detection on the GPU: http://github.com/ztellman/penumbra/blob/master/src/example/...

eric_t · on March 30, 2010

In the Fortran code, this could be run on the GPU by simply surrounding the code by

   !acc region
   !acc end region

with the PGI Accelerator. It would have to be a pretty big loop for it to pay off, though, since transfer to/from the GPU is very expensive.

henning · on March 30, 2010

what Fortran compiler(s) do you use and recommend for things like this?

eric_t · on March 30, 2010

We use a lot of compilers. They all vary in ability to detect bugs, and optimization features. The fastest used to be Intel, but right now the Portland group compiler seems to be the best. These are both commercial, though. Of the free compilers, Sun's compiler has been the fastest for us. It will be interesting to see if Oracle will continue that effort.