Functional Fluid Dynamics in Clojure

eric_t · on March 29, 2010

I have a PhD in computational fluid dynamics (CFD), and have written several CFD codes in Fortran, C, Matlab and Python. I'm also a programming language geek, and have been interested in functional programming languages for quite some time. When I see something like this, however, I loose some of my faith. It just seems like too much effort to get decent speed, and even when not considering speed it doesn't seem to give much benefit. For instance, this is how the diffusion function would look in Fortran (skipping some details):

  pure function diffusion(x0) result(x)

    real, intent(in), dimension(:,:) :: x0
    real, intent(out),dimension(:,:) :: x

    x =  x0(2:n+1,1:n  ) & 
      +  x0(0:n-1,1:n  ) &
      +  x0(1:n  ,2:n+1) &
      +  x0(1:n  ,0:n-1) &
      -a*x0(1:n  ,1:n  )

  end function

Some things to note:

- The "pure" keyword guarantees that this function has no side effects.

- No do loops are needed! Fortran array slicing is very handy.

- The compiler will convert this to use SIMD instructions

- Adding some OpenMP hints to make it run on all cores is also very easy.

So this type of code in Fortran is short, very easy to understand and you are guaranteed extreme performance. Maybe functional programming has some benefits when you're dealing with more complex datastructures (for instance I'm working on a code right now which uses parallel octrees, kind off a pain in Fortran), but for simple things like this, I fail to see the point.

I want to believe, so perhaps someone here can enlighten me?

chousuke · on March 29, 2010

Clojure's not very fast with numerical code if you write it idiomatically. One main reason for this is that functions can't (for now) take primitives as parameters, so all arithmetic is boxed unless it's done inline with explicit type hints. (Hence all the macros in the Clojure code.)

However, don't lose hope just yet. Languages like Haskell or OCaml, with more mature compilers, might not have such limitations. Also, I think that while many small examples might turn out essentially equivalent when comparing imperative and functional styles, the difference becomes more pronounced on a whole-program scale. Pure functions are easier to reason about, and compose better.

Fortran, C and C++ will probably keep their place as tools to use when absolute best performance is needed, but with modern compilers and virtual machines, functional languages do not have to be slow either.

Finally, I hope to see functional programming languages succeed simply because functional programming is a lot of fun.

eric_t · on March 30, 2010

OCaml is only really fast if you use the imperative features. The only high performance Haskell code I've seen is from the language shootout, and it looks pretty hairy as well. Also, note that my Fortran function actually _is_ pure, so the whole argument about purity doesn't hold up.

I kinda understand what do you mean by whole-program scale, but in fact, most of our programs aren't much bigger than this! Typically, the number of lines of code doesn't exceed 50k.

I agree about functional programming being fun, but unfortunately I don't think my coworkers would agree, and especially not the companies funding our research!

luchak · on March 29, 2010

Honestly, in a lot of cases language performance doesn't matter as long as you have a good numerics library. For example, here's one implementation of the projection step in my current project, which does fluid simulation on unstructured tetrahedral meshes and relies heavily on SciPy's sparse matrices:

  def __init__(self, mesh):
    div_matrix = op_mesh.n_to_nm1_boundary.T * op_mesh.F_matrix
    boundary_matrix = op_mesh.F_matrix - op_mesh.FB_matrix
    self.C_matrix = sparse.vstack((div_matrix, boundary_matrix))
    self.CCT_matrix = self.C_matrix * self.C_matrix.T

  def Project(self, velocities):
    lam = linalg.cg(-self.CCT_matrix, self.C_matrix * velocities)[0]
    return velocities + self.C_matrix.T * lam

It's pretty simple (constrained least squares optimization) and it's missing some stuff (preconditioner), but it runs fast enough (~1 minute/frame for > 1M tetrahedra), and only required minimal testing: it's all built up from operators I was already using elsewhere. Since it's the fourth or fifth projection method I've tried for this code, that makes a big difference. The fact that I'm using Python is almost immaterial: all of the "interesting" code is library calls.

eric_t · on March 30, 2010

Interesting. Is the code available somewhere? I looked at your SIGGRAPH paper, and that was very interesting.

Have you seen the FiPy project? It's a full-featured finite volume code in Python, which is very easy to extend (they did struggle with performance in early releases, not sure how the current status is).

I guess that's the area where dynamic or functional languages can be useful, in that they are easier to build generic libraries for, and can give rise to codes with easier extensibility.

cema · on March 29, 2010

I would offer that compared with Clojure, Fortran is like a DSL for numbers crunching. Does it make sense?

swannodette · on March 29, 2010

I'd be curious to know if/how the Clojure code could be written to do this kind of number crunching on GPU. Penumbra has a very idiomatic library for offloading work to the GPU.

prospero · on March 29, 2010

I don't have time for a full conversion just now, but an eight-way diffusion on the GPU in Penumbra looks like this:

  (let [sum 0.0
        count 0.0]
    (convolution 1
      (+= sum %1)
      (+= count 1))
    (/ sum count))

"convolution" is a keyword that iterates over the neighbors (with a radius of 1, in this case), and does not overrun the boundaries of the source textures, hence the need to keep a running count.

For a more extensive example, see this implementation of Sobel edge detection on the GPU: http://github.com/ztellman/penumbra/blob/master/src/example/...

eric_t · on March 30, 2010

In the Fortran code, this could be run on the GPU by simply surrounding the code by

   !acc region
   !acc end region

with the PGI Accelerator. It would have to be a pretty big loop for it to pay off, though, since transfer to/from the GPU is very expensive.

henning · on March 30, 2010

what Fortran compiler(s) do you use and recommend for things like this?

eric_t · on March 30, 2010

We use a lot of compilers. They all vary in ability to detect bugs, and optimization features. The fastest used to be Intel, but right now the Portland group compiler seems to be the best. These are both commercial, though. Of the free compilers, Sun's compiler has been the fastest for us. It will be interesting to see if Oracle will continue that effort.

ewjordan · on March 29, 2010

I dunno, seems like exactly the type of situation where it would be much more appropriate to just write the damn thing in Java...isn't the ability to do just that with performance sensitive code one of the best things Clojure offers compared to "standard" Lisps? You'd get better performance, clearer (and less) code, and less development time, as well as the ability to practically cut and paste the C code from the linked paper.

I feel (as I do so often when projects written in certain languages that may or may not start with "Haskell" show up here) that the main take away is "Look, we can write working apps, too, and they're Better-Because-They're-Functional!" And yes, there are a lot of things that are better when you code them functionally, but simple numerical algorithms like this one play specifically to the strengths of imperative languages, and I see no clear benefit.

It's the same code smell that I get when people start putting together utility libraries, macro tools, and bytecode processors just so they can pretend to do functional programming in Java: if you find yourself struggling to do things non-idiomatically in one language that would be trivially natural in another, why not just switch languages? Especially if they all live on the same JVM and play fairly well together...

obeattie · on March 29, 2010

"The archaic C language". Alrighty.

BrianHammond · on March 30, 2010

"The weird thing about C is that its not even prefixed like Clojure (+ 2 2), instead all operators are scattered between the arguments!"

Please tell me he is kidding. Maybe I don't get out of my cave often enough but it seems uncommon that someone could grok FP bypassing C altogether.

anonjon · on March 30, 2010

He's not kidding, C precedence rules are weird. Even D. Ritchie thinks so. http://en.wikipedia.org/wiki/Order_of_operations#Programming...

Oh, and it isn't strange to grok FP bypassing C altogether.

Luc · on March 30, 2010

Of course he's kidding! I refuse to believe that wasn't some tongue-in-cheek statement. For fluid dynamics, all we are talking about is addition and multiplication, and the precedence rules for these operations are what you would expect them to be.

a-priori · on March 29, 2010

Do most fluid dynamics simulations use Euler integration like this?

eric_t · on March 29, 2010

No, as the paper says, explicit Euler integration is extremely unstable. Implicit Euler, as the paper uses, is stable but only first order accurate.

If you're dealing with fluids with low diffusion compared to advection, some higher order Runge-Kutta scheme is typically used. For fluids with high diffusion, however, this will give you very high restrictions on the time step. For this cases, a hybrid scheme is often used, where the advection term is integrated explicitly using an Adam-Basforth scheme and the diffusion term is treated semi-implicitly with a Crank-Nicholson scheme. This means you have to solve additional linear equation systems for the velocities for each time step (Helmholtz type equations), but this is still faster since you can use longer time steps.

All of this may sound very complicated, but it all follows the same patterns, so it's really fairly straight forward.

luchak · on March 30, 2010

The link only uses Euler integration for the diffusion (advection is semi-Lagrangian), and in graphics you never use very much diffusion anyway, because highly diffusive fluids are boring. In fact, when using this particular technique, it's not uncommon to set diffusion to 0, since the multilinear interpolation used in the advection step tends to introduce a fair amount of diffusion.