How is lazy evaluation in Haskell useful?

danieldk · on Nov 25, 2012

Note that there is a elephant in the room that the answers fail to reflect on. The questioner asks:

How can this paradigm be used to build predictable software that works as intended when we have no guarantee when and where an expression will be evaluated?

While it is true that this will not lead to unexpected computations in a pure language, it may lead to excessive memory use. For instance, consider the definition of Data.Map.Lazy.Map:

http://www.haskell.org/ghc/docs/latest/html/libraries/contai...

The Map data type is spine-strict and key strict in Weak Head Normal Form (WHNF). However, it is not strict in its values, and values may be thunks (unevaluated expressions). This may result the memory usage of the values to be much larger than when they are fully evaluated.

There is now also a variant of Data.Map that is value-strict, but only in WHNF, so only the outermost constructors may have been evaluated, and you'd have to evaluate the data further to have thunks replaced by actual values.

Thunk leaks are something that every new Haskell programmer will bump into and is something that makes programs less predictable if you are not yet comfortably reason about strictness and laziness.

Edward Yang wrote a nice explanation of such leaks:

http://blog.ezyang.com/2011/05/anatomy-of-a-thunk-leak/

I very much enjoy Haskell, but I think that the benefits of laziness are much overstated. Strict languages are often easier to reason about and laziness can often be emulated when necessary. That said, Haskell has many other attractions ;).

chimeracoder · on Nov 25, 2012

> laziness can often be emulated when necessary.

I haven't used Haskell enough to say that this is anything more than speculation, but I would imagine that the converse would be true. Python is an excessively eager-evaluating language[1], and trying to emulate lazy evaluation in Python requires enough back-bending that it probably wastes more time/memory than it saves.

In Haskell, since you have guarantees about monadic purity, I would speculate that an alternative runtime could use a secondary thread to 'walk' the tree and evaluate the thunks whenever resources are being under-utilized (or when memory is tight). Perhaps that would be effective enough to be a net improvement?

danieldk · on Nov 25, 2012

trying to emulate lazy evaluation in Python requires enough back-bending that it probably wastes more time/memory than it saves.

Wholesale, yes. But one could e.g. use a generator in Python to emulate a infinite list.

I would speculate that an alternative runtime could use a secondary thread to 'walk' the tree and evaluate the thunks whenever resources are being under-utilized (or when memory is tight).

But the thunks could also evaluate to large or infinite data structures, leading to resource starvation when such a strategy is followed.

IMO if the language uses a lazy evaluation regime, the programmer should decide on where strictness can be applied. It's fairly easy to enforce strict evaluation in Haskell via bang patterns, seq, or deepseq. It's reasoning about laziness that is difficult for newcomers.

chimeracoder · on Nov 25, 2012

> It's reasoning about laziness that is difficult for newcomers.

I'm not so sure - for people who come with a more mathematical background, this is actually easier than thinking in terms of strict evaluation, in which every expression is computed even when it's completely unnecessary. When I'm writing mathematical equations down, I don't stop and think of the computational complexity of every reduction/simplification I make; only of the final result.

Haskell may be a bit difficult for a complete beginner to programming, but for reasons completely unrelated to lazy evaluation, and it makes up for that by being more forgiving in many other ways.

The "current" batch of programmers today seem to come from a more computational/imperative mindset, but I don't think that's any more natural than a mindset which emphasizes expression-based reasoning and lazy evaluation. It's possible that, if Haskell shows its appeal in other areas enough, in a few years people will be talking about how much easier lazy evaluation is than strict evaluation, since you don't have to worry about, eg. accidentally computing an infinite sum only to discard it immediately!

Evbn · on Nov 25, 2012

Hakdell is easy to reason about for meaning, but hard to reason about for performance: like math, where no one thinks about performance. The folks over in applied math and engineering worry about things like computing an integral quickly and accurately, while the pure math folks are showing how to split a sphere into two exact copies of itself using an algorithm that requires an uncountably infinite set of actions.

dbaupp · on Nov 25, 2012

> It's reasoning about laziness that is difficult for newcomers.

It's tricky for experienced people too.

ky3 · on Nov 26, 2012

> How can this paradigm be used to build predictable software that works as intended when we have no guarantee when and where an expression will be evaluated?

s/an expression will be evaluated/commands are fired/

reveals the latent imperative mindset.

FP programs are designed by laying out, not unlike how train tracks are laid out, transformations from input to output. Some of these transformations are too complex to get right all at once, so they get broken down into smaller ones which are then composed together. (Obviously, there are other reasons why composition is a win.)

An FP program "works as intended" because the transformations from input to output are correct. In high-end software some, possibly even all of it are rigorously proven correct using mathematical reasoning.

The two main ways of correctness reasoning about transformations is with strict or with non-strict semantics. Time/space usage, i.e. the performance characteristics, is treated as an operational concern. That's when the nitty-gritty of eager and lazy evaluation come into play.

The bottom line is that if all time is spent stressing over evaluation, you're optimizing prematurely.

Symmetry · on Nov 26, 2012

I feel like I should point out that we have no guarantee when and where an expression will be evaluated is also true in languages like C, to an extent. The compiler is free to re-schedule operations that don't have a side effect, just as in Haskell - it's just that this happens less since so many actions could potentially have side effects in C.

allertonm · on Nov 25, 2012

I guess the benefit of lazy by default is that (given Haskell is an experiment in purity) it sets the expectation that things will be side-effect free by default.

Were that not a design criteria it might have been better to make things strict by default and have a "lazy" monad rather than do lazy by default and use monads for all the side effecting stuff.

ky3 · on Nov 26, 2012

Total functions embed into partial ones, which is why we have a partiality monad. The opposite arrow doesn't exist, which is why we don't have a totality monad.

Now lazy and eager are properly adjectives that govern beta reduction, that is, the interaction between a function and its argument.

The short of it is that neither an eager nor a lazy monad makes any sense.

(Heh, I'm totally citing this in the monads class I teach as proof of why unpacking monads starting from functors is a win.)

allertonm · on Nov 26, 2012

Happy to be of service with my naive wonderings :) Anyway, thanks, I'm not much of a PLT guy but I can begin to see why this could not work.

gtani · on Nov 25, 2012

Let me point out that the haskell / GHC toolset is broad and deep and constantly being expanded, and also the most common causes of space leaks are becoming documented (the wiki and many stackoverflow threads. Tools:

ghc-vis, to visualize sharing: http://felsin9.de/nnis/ghc-vis/

--------------

profilers and analysis of space/time behavior of data structures

http://www.haskell.org/ghc/docs/latest/html/users_guide/prof...

http://blog.ezyang.com/2011/05/space-leak-zoo/

http://lambdamaphone.blogspot.com/2011/02/automatic-asymptot...

--------------

and, sort of a stacktrace:

http://jpmoresmau.blogspot.com/2012/10/using-ghci-history-in...

https://plus.google.com/u/0/107890464054636586545/posts/CfKE...

http://hackage.haskell.org/trac/ghc/wiki/Status/May12

-----------

(not related to time/space reasoning, but a common pea under the mattress): in 7.4 you don't need top level "let" in GHCi

http://www.haskell.org/ghc/docs/7.4.1/html/users_guide/inter...

Tyr42 · on Nov 25, 2012

Might as well read the programmers.stackexchange post [1], it's better formatted.

[1] http://programmers.stackexchange.com/questions/163985/why-is...

Evbn · on Nov 25, 2012

And Daniel Wagner links to a very good answer on another question: http://stackoverflow.com/questions/7868507/non-trivial-lazy-...

bhickey · on Nov 25, 2012

    Lazy languages like Haskell make it a point to be
    referentially transparent, i.e. making sure that the
    order in which expressions are evaluated will never
    affect their result.

Isn't referential transparency a consequence of purity, not laziness?

Edit: While I believe the quote is saying (albeit in a peculiar way) that lazy languages must be referentially transparent, I don't see how this particular response was arguing in favor of laziness. Referential transparency is great as an optimization enabler, which provides one compelling case for purity.

Meanwhile I'm working on a nascent project to develop an eager, pure language that targets LLVM.

benbataille · on Nov 25, 2012

>I don't see how this particular response was arguing in favor of laziness.

In a referentially transparent language, you don't care about order of evaluation because whatever the order you will always end up with exactly the same result. Thus, losing control of it through laziness doesn't matter.

However, IMHO, the initial argument is kind of wrong. Laziness doesn't really make you lose control of order of evaluation. You know when expressions are going to be evaluated : when they are going to be needed. Obviously, when you use a lot of abstraction you didn't write like in Haskell, it becomes tricky. But, if you perfectly know how your data structures are processed by the compiler, you can safely use side effects with laziness. Not that you should, but you can.

ky3 · on Nov 26, 2012

RT needs to be seen in the context of reasoning about programs.

Suppose we want to define a constant function f such that f x = 0. Under strict semantics in a total language, you're fine. Under strict semantics in a partial language f undefined blows up. So constant functions, the simplest kind possible, don't even exist! OMGWTFBBQ!

You can still reason about programs, but at a very low-level, i.e. on operational semantics. RT is high-level reasoning. Math, which does have constant functions, seems pretty successful, so we'd like a slice of that cake, please.

rpearl · on Nov 25, 2012

http://augustss.blogspot.com/2011/05/more-points-for-lazy-ev... is a very good discussion of why laziness has some nice semantic properties from the theory point of view.

And the post it is responding to, https://existentialtype.wordpress.com/2011/04/24/the-real-po... makes some interesting points against it as well.

cpdean · on Nov 25, 2012

Whoa, did they just lift content directly off Stack?

I guess it's appropriate for an article on laziness.

nickbarnwell · on Nov 25, 2012

No; they have an agreement with StackExchange to allow for republishing of the content (check the byline).

Moto7451 · on Nov 25, 2012

Actually I think it has more to do with content on Stack sites being freely available under a creative commons license.

dllthomas · on Nov 25, 2012

That would be an agreement to allow for republishing, albeit one offered to anyone under the same terms.

cageface · on Nov 25, 2012

I'm not so sure I'm convinced by the argument for modularity.

Maybe Haskell makes it easier to write generators of possibly infinite sequences but people use the producer-consumer model all the time in other langages to explore potentially large solution spaces generated on demand.

surrealize · on Nov 25, 2012

One of the things haskell can do with lazy sequences is fuse away intermediate data structures, e.g. take

  map f . map g

and substitute

  map (f . g)

automatically (as a compiler optimization). I don't think that's a safe transformation in general in a non-pure language; not sure about a pure strict language.

ibotty · on Nov 26, 2012

yes, that's possible in a pure referential transparent language (strict or lazy is not the point).

btilly · on Nov 25, 2012

That would be the implementing your own version of laziness part of the modularity argument.

In a language like Haskell you never have to do that. Ever. Because the language is lazy.

hythloday · on Nov 25, 2012

"Implementing your own version of laziness" makes it sound error-prone and difficult. In fact it's this pattern:

  def move(board, position, player):
    """
      board is a 11-character string which is 3 lengths of 3 of the characters
        X, O or _, separated by newlines
      turn is either X or O, depending on who moves
    """
    return board[:position] + player + board[position+1:]

  def getNextStates(board, player):
    spaces = [ i for i,c in enumerate(board) if c == '_' ]
    return [ move(board, i, player) for i in spaces ]

  >>> print board
  X_O
  _OX
  X_O 
  
  >>> for state in getNextStates(board, 'O'):
  ...   print state
  ...   print
  ... 
  XOO
  _OX
  X_O
 
  X_O
  OOX
  X_O
  
  X_O
  _OX
  XOO

I doubt most programmers exposed only to imperative programming would even notice that there was a pattern in what they were implementing, let alone identify it as laziness.

sgk284 · on Nov 25, 2012

The problem comes in when you want to write a search function for the game tree. Your search function will make assumptions about how the tree is constructed, i.e. assuming that the children can be produced given only the current board and player, which isn't true for all games.

Your search function will wind up actually having two responsibilities, that is 1) creating the tree and 2) traversing/searching the tree. If you want to do anything else, like add pruning functions (maximum depth, for example), then your search function will also wind up having to manage that.

The problem is, a search algorithm such as breadth-first or A* really has no reason to need know any of those things. If you passed the search function a pre-computed tree already pruned (i.e. capped at a max-depth) the resultant search function would do one thing: search. It would have no need to prune or compute the tree, as that was already taken care of.

This lets you re-use that search function for many other use-cases, rather than tailoring it to a specific game or use-case. In a lazy language, you can pass the search function an infinite tree, and the search function doesn't need to know that it is infinite. It can treat it just as though it's already been computed. Similarly, you can do the same thing with pruning the tree before passing it to the search function.

The end result is that rather than having to intertwine all of these unrelated concepts (generating the data structure, pruning the data structure, and searching the data structure), a lazy language makes it extremely natural (and in fact the default) to write programs where these building blocks are separate entities that can be composed with other pieces of code.

Hopefully that helps make sense of why laziness is a requirement for truly modular code. Note that you can always emulate lazy evaluation with closures (or often, simply using generators), but it's never as natural as just having it be built into the language by default.

btilly · on Nov 25, 2012

The difficulty depends on the language and on what you're trying to do.

In Python the correct translation is usually to replace return with yield and work with iterators. In a language without that feature, well, http://perl.plover.com/Stream/stream.html demonstrates what a solution looks like in Perl. A Java solution to the same issue is going to be..verbose.

hythloday · on Nov 25, 2012

I avoided generators because implementing ersatz laziness with lazy language features seemed like cheating - you can read my Python lists as C arrays, but it's much harder to map that onto coroutines without significant handwaving.

pdhborges · on Nov 25, 2012

Replace the []s by ()s so only generators are returned and the expressions evaluated when consuming the generators.

DasIch · on Nov 25, 2012

They wouldn't identify it as laziness because there isn't anything lazy about that code.

You have just perfectly shown that it is indeed "error-prone and difficult" congratulations.

cousin_it · on Nov 25, 2012

1) Is there a problem that provably requires asymptotically more memory in a lazy language than in a strict one?

2) Is there a problem that provably requires asymptotically more time in a pure language than in an impure one?

Evbn · on Nov 25, 2012

Any impure algorithm on a system that uses pointers can be made pure with I think log(n) oberhead, which in practice is usually equivalent to a constant factor. But in the real world constant factors matter, in time as well as space (cache locality to do work in chip instead of ram instead of disk instead of tape)

gizmo686 · on Nov 26, 2012

Your answer sounds like the solution to the mutable data question, not the pure/impure question. For the question of memory usage on a lazy language vs a strict language, there is no asymptotic difference. The strict program can be viewed as a series of commands, each modifying the state of the program. For example, consider the following code: int double(int x){ int ans=x; ans=ans2; return ans; } The above code can be viewed as: Input state | execute(ans=x) | execute(ans=2x) | execute(return ans) where execute takes as an input both the state of the program, and a command. (As an aside, this is the approach of piping a state variable across functions is how Haskell handles IO)

The above code can also be viewed as: Input | A(x) | B(x) | C(x)

Evaluating this lazily will get you: C . B . A . Input

What the program wants to return is the final output of C. When the program tries to return C, it will go down the chain of expressions it must evaluate until it reaches one that it can, in this case A(Input). Once A(Input) is calculated, we no longer need Input it memory, so it can be garbage collected. At this point, the program is: C . B . Input' Because Input' is the same size in memory as Input, we are, at this point, using no more memory than the strict version of the program. We can continue this argument until we eventually arrive at the final state, in which we have our answer.

With this, we can see that any strictly evaluated program can be done lazily with no more than twice the memory requirements (there is a point where we need both Input, and Input'), which from an asymptotic standpoint is equvilent.

cousin_it · on Nov 26, 2012

Yeah, that seems to work. Thanks!

I still wonder about the mutable data question though. Can we do better than logarithmic overhead, or is there a problem for which we provably cannot?

yason · on Nov 25, 2012

In my experience, I tend to be of the opinion that lazy, immutable data structures with eager evaluation bring you most of the benefits of lazy evaluation with the control of eager evaluation.

Clojure is an example of that. You can plant printfs there and it just works, but you can also set it to do huge amount of processing on huge amount of data and eventually just consume and crunch small part of it.

tome · on Nov 25, 2012

What are "lazy, immutable datastructures with eager evaluation"?

yason · on Nov 25, 2012

  (defn infinite-series []
    (let [x (atom 0)]
      (repeatedly (fn [] (swap! x inc)))))

  (take 10 (map #(* 10 %) (infinite-series)))

The arguments of 'take' and 'map' are eagerly evaluated. The sequence returned by 'infinite-series' is immutable and infinite and lazy, and as returned it actually gets passed directly to the 'map' function which actually operates on it. Finally, we use 'take' to extract the first ten items and thanks to the data structure being lazy, this will actually limit the execution so that only ten items are ever generated and multiplied by 10.

The code would look conceptually the same in Haskell but if the 10 in 'take' was a variable and could occasionally be 0, in Clojure you can be sure that a debug (println) places in the 'infinite-series' function would actually get executed. Haskell would probably defer the execution forever since "take 0" is a no-op. Yet Clojure doesn't generate or multiply anything if we called (take 0 ...).

chipsy · on Nov 25, 2012

A lazy data structure exposes access methods that exhibit lazy-evaluator behavior. For example, you have a "rectangle" structure that contains position, width and height, and the structure has methods that let you apply matrix transformations to scale, rotate, and skew the rectangle's corners. The matrix methods perform the computation, and then cache that result in case it's needed again.

Evbn · on Nov 25, 2012

That is more about functional programming than laziness, but maybe I am splitting hairs. OpenGL, for example, implements the matrix stack functionality, and then it only works for one class. A lot of work then to get ky everywhere you need it.

Clojure is doing a lot of work with fusion and sequences to transfer laziness from arrays up to the other objects that consume them, but it is harder as you go farther away.