Hacker News new | past | comments | ask | show | jobs | submit login
Coding style and undoing indentation hell (virtualdub.org)
76 points by ashishgandhi on June 1, 2012 | hide | past | favorite | 76 comments



I've seen a lot of heavy nesting caused by the belief that functions should only have one return statement. Ex:

    // Original
    function render() {
      $return = null;
      if($condition1) {
        if($condition2) {
          $return = 2;
        } else {
          $return = 1;
        }
      } elseif($condition3) {
        $return = 3;
      } else {
        $return = 4;
      }
      return $return;
    }
But I'd rather read:

    // Refactored
    function render() {
      if($condition1 && $condition2) return 2;
      if($condition1) return 1;
      if($condition3) return 3;
      return = 4;
    }


At least have an honest comparison:

   function render() {
      $return = null;
      if($condition1 && $condition2) $return = 2;
      elseif($condition1) $return = 1;
      elseif($condition3) $return = 3;
      else $return = 4;
      return $return
   }


Another example

    r = 0;
    outer: {                                                                     
    foreach (x in a) {                                                           
      foreach (y in b) {                                                         
        if (condition(a, b)) {                                                   
          r = 1;                                                                 
          break outer;                                                           
        }                                                                        
      }                                                                          
    }                                                                            
                                                                                 
    return r;                                                                    
                                                                                 
instead of

    foreach (x in a) {                                                           
      foreach (y in b) {                                                         
        if (condition(a, b)) {                                                   
          return 1;                                                              
        }                                                                        
      }                                                                          
    }                                                                            
                                                                                 
    return 0;


I was comparing what I frequently see vs what I'd like to see. Your example code is pretty rare in my experience. So I think the comparison is pretty fair.


I worked on a program where the management had near-religious devotion to the one return statement paradigm. It made for a bunch of unnecessary nesting and locals just to comply. Even worse was strong resistance to using continue and break (outside of switch statements), which lead to unnecessary flagging variables on top of that.



I would show them http://www.cis.temple.edu/~giorgio/cis71/software/roberts/do... and hope it changed some minds.


Yes, the factored version is much clearer. Though you want to make sure the refactored version is equivalent, which it isn't in your case due to a change of order.


Sorry, should be fixed now.


Maybe so, but what happens when you need to do something before returning the value, say like closing a connection? You'd have to make the change in 4 spots (and hope you don't miss one), as opposed to the first example where you'd only have to make the change in 1 spot. It's all about minimizing errors later on.


Wrap in another function.

ETA: I'm amused by how easily the LISP philosophy applies here: any problem can be solved with enough lambdas.


I know, right? I was writing this complex OO solution to a problem in Python, then I realized I could just do it with a function returning a closure.


> Maybe so, but what happens when you need to do something before returning the value, say like closing a connection?

Then your choices include a try...finally block, a "using" statement if you're in c#, or ... limiting yourself to only 1 return. Multiple returns don't fix the problems in all cases, just most of them.

I'm assuming that this is on a language that has automatic garbage collection. The 1-return rule came about in C, which does not and so has a lot more cases where manual clean-up is required.


Then do something like hxa's example. Maintaining a return variable isn't always bad, it's just weird to think of it as functionally better than multiple returns. I tend to find a lot of errors and bugs in heavily nested code because it involves tracking a lot of variable state.


It's probably a matter of style at some level.

I hate this style (from the article):

  if (...) {
        if (...) {
            ...
    } else {
        if (...) {
            ...
        } else if (...) {
            ...
        } else {
            ...
        }
    }
and prefer this:

  if (...) 
  {
      if (...) 
      {
        ...
      } 
      else 
      {
          if (...) 
          {
            ...
          } 
          else if (...) 
          {
            ..
          } 
          else 
          {
            ...
          }
      }
  }
The first example is confusing and hard to read. The second is clean and easy to read.

As to the question of optimization, that's a different matter. Lots of tricks can be used to compress code into fewer statements on a case by case basis:

  if(pushbutton == 0)
  {
    led = 12;
  }
  else
  {
    led = 24;
  }
can become

  led = (pushbutton == 0) ? 12 : 24;
or

  led = (pushbutton * 12) + 12;
or

  led = 12 * (pushbutton + 1);
etc.


I find the first version easy to read, and the second confusing and hard to read. I believe this is a matter of what a person is more accustomed to, and I learned on the first form and lately have been using the first form more. For me the extra brace below the keywords makes it hard to find the phrase of interest, and the expanded vertical space also makes it more difficult to line up indentation levels. But these visual cues are flexible, and I could learn similar scanning tricks for the second form.

I have worked with people that were dogmatic that one form was objectively better than another form, rather than it being a matter of local optimization in the brain; I have difficult working with these people because they tend to show similar rigidity and confusion of the subjective with the objective in other areas, and such small mindedness limits what type of engineering solutions one is able to come up with.


Agreed. If I was forced to pick it would probably be the first one, but other than that I just don't care.


Did either of you pick-up the fact that the first version is missing a curly brace? :)

Hard to see. That's my point. And that's why I tend to favor the second style.

I'll use any convention if required, but if given a choice I go for styles that protect me from my own dumb mistakes as much as possible.


An early history of writing code on 24 x 80 terminals has trained me to prefer K&R style (where the bracket is on the same line as the if) over the expanded style. The goal was to have as much of the code on the screen as possible so vertical space was precious. Of course when you've got a 1080p monitor on its side you get all the vertical space you need and the open style does give you more room.

I strongly avoid however converting logical semantics into arithmetic ones. The LED example where:

   led = (pushbutton * 12) + 12;
Makes me cringe.

You know that the person who wrote that assumed 'non-zero' meant it was pressed, not '1' means its pressed at some point the pushbutton is connected to bit 3 of the I/O port and the the kid was doing a byte read 'cuz it was easy' and now the value of pushbutton is 8 (which still would work on their code but not the refactored code) and led gets the value 108 and the code that takes the LED value turns on some random GPIO bit somewhere else because it can "only" get either 12 or 24 when now it can get a lot of different values. And a long escalation in manufacturing that the 'new software screws up the gizmo' turns out to be a side effect from refactoring. Been there, done that, wrote the after action report. No thanks!


Just a silly example. Real I/O code should have lots of checks and balances.


Two responses:

First I understood that you were not seriously proposing this sort of I/O code, but as someone who has been around the block my share of times I know that such code both exists and is in production, almost exactly as written. The memories are like a war veteran who can't forget the trenches, I wince now and then :-).

Second, my comment was that code which has a 'boolean' flow is probably boolean code, and not arithmetic code. So any refactoring that 'assumes' boolean and uses arithmetic becomes a landmine for some future coder to trip over. MUCH better to refactor the code as such into:

   led = (button)? 12 : 24; 
Than "assuming" that true is '1' and false is '0'. When reading old code I've trained my self to read the statement

   if (variable == 0) {
      // this code assumes the variable is zero 
   } else {
      // this code has to work for all non-zero values
   }
Which may not have been the programmer's intent, but it is what they wrote.


I understand. This was about coding style and not about solving a particular problem correctly. Next time I might choose a better example. Maybe I'll include complete keyapad matrix scanning, debouncing and encoding code...just to be safe. (just kidding).

It's been a long, long time since I've checked button activity that way. These days andy time I do embedded work I tend to use either my own RTOS or a third party RTOS. In most of these cases UI events get turned in to messages in a message loop (sort of like programming for Windows). So it is likely that there's a big switch statement somewhere picking picking off messages off the FIFO and processing them accordingly. The low-level stuff happened somewhere else. The message would indicate the event that took place (UP, DOWN, CLICK, DOUBLE_CLICK) and parameters in the message packet would identify the button or buttons (in the case chords are enabled) that generated the event.

Fun stuff. Wrote this code years ago and keep reusing it. It simply works. Now doing a lot more iOS and web development. iOS can feel a lot like embedded development at times.


I prefer this:

    if 
      if
    else
      if
        
      else if
      
      else
In other words, I prefer CoffeeScript.

Good procedural code is factored into short functions, which would make some of the types of programs the article is alluding to easier to read.

Also, object oriented programming can be a good option for reducing cyclomatic complexity in a lot of different situations where the system you are modeling lends itself to a class structure. And CoffeeScript makes this easier also.


""" can become

    led = (pushbutton == 0) ? 12 : 24;
or

    led = (pushbutton * 12) + 12;
or

    led = 12 * (pushbutton + 1);
"""

It might be worth pointing out that the latter two don't actually have the same result as the if/else.


How so?


In C at least, pushbutton could be strictly greater than one and still be treated as true, but that would be a different result in the arithmetic versions of the statement. It could be made identical by using "!!pushbutton".


I think you might be taking what was intended to be a simple example a little too far.

That said, "pushbutton" could very well be limited to the values 0 and 1 during the keypad scanning, debouncing and decoding code.

Chill bro.


If pushbutton is neither 0 nor 1.


Where pushbutton == 2, 36 != 24


I also find the second style much more readable.


I agree, the second style is infinitely more readable. It clearly shows the hierarchy. Unfortunately it seems to be rare compared to the second one, as most people seem to want to hide the opening braces or something. Maybe because they come from ugly languages without braces, like Assembly, Python, Fortran, or Ruby...


I don't know why you were downvoted.

I've programmed in everything from entering hex code into a hexadecimal keypad to assembly, Forth, APL and beyond. Sure, languages can influence style. I wouldn't go as far as calling any language "ugly". There's that initial phase of exposure when one might not be comfortable with the language constructs. And yes, at that moment in time it is natural to feel that a language might be "ugly". I certainly felt that way when learning Objective-C. Once you get on the other side and you start using the language for real work things can change. I don't hate any particular language but do enjoy some more than others. Forth, APL and Lisp were a lot of fun to work with.


Don't forget that in many cases, you can replace a conditional with polymorphism. http://c2.com/cgi/wiki?ReplaceConditionalWithPolymorphism


You certainly can, but it doesn't mean that you should.

Case in point - Linux networking code, and more specifically sockets handling part. Mr. Kuznetsov did an excellent job abstracting a lot of things behind interfaces (= blocks of function pointers), but at the same time this made it much much harder to follow the code. It's basically impossible to trace the code by hand without a pen and paper.

(edit) To elaborate on the context - reading and tracing the code straight off the source files is a necessity of kernel development, routinely required for both debugging and writing new code.


Spend this afternoon figuring out GLOBAL and setting up to work with your editor.

http://www.gnu.org/software/global/


How would this help to understand what sk->ops[SK_READ]->foo[index](bar) exactly calls?


It lets you quickly jump to the definitions of the variables, their types, and their uses. (And then back to where you were.)

It's not going to do all of the work for you, but it'll reduce some of the cognitive load.


This is the same as ctags then, and it's completely irrelevant to the issue with over-abstracting things in the code.


Or pattern matching, if you're lucky.


This is a better suggestion than most of the article. Especially alongside a classic guideline like: "code to interfaces, not classes".


This -

  if (!test()) {
    handle_failure();
  } else {
    // do lots of work
  }
is just another incarnation of

  if (NULL == v)
i.e. it sacrifices a natural flow of code and makes it harder to follow for something of a superficial value.

Also this -

  if (!test1()) {
      handle_failure_1();
      return false;
  }
is fine unless the function needs to do a cleanup, which is pretty much always the case, and which in turn means that the { } block with grow with every if and it will include a lot of redundant code.

A far cleaner and easier to follow approach is to use if-goto, whereby all cleanup code sits at the bottom of the function (where it naturally belongs) and the flow jumps to the appropriate part of the cleanup on failure. The only exception is when there's no cleanup needed, in which case it can do if-return.


I use both if-return and if-goto. In practice you need both, depending on circumstance. In much of my recent code I've managed to use a lot of if-return checks at the top of functions. Another nice thing about a bunch of if-returns at the beginning is it makes a nice checklist of things that will cause the function to fail, all right up front.


> a nice checklist of things that will cause the function to fail

Should these be assert()s? ;)


Not sure if serious, but assert only makes sense in cases where you want to switch off the tests at some point right?


Asserts are for enforcing invariants in the code. If you replace an assert with a conditional that allows the execution to continue, you are concealing the problem and making it that much harder to track.

Practically speaking, your function should not be failing because your other code passed it invalid parameters. That's a bug in the former. That's why parameter validity should be asserted.


I see. I usually never let execution continue and log + exit instead, but you're right, an assert is easier in that situation.


Sometimes, yes.


Makes sense. However, it was pretty well covered in the article as well.. Did you read the whole thing?


Yes I did, and it is only mentioned in passing.


The broad suggestion that indentation is somehow wrong is ill-advised.

Sure, various of the refactorings given are sensible. But the basic problem is not indentation. If something can be simplified, simplify it. If something is too long, break it into pieces. But if something is tree-structured, that is what it is, and it probably ought not be bent out of shape.

(Note: Fowler's refactoring 'Replace Nested Conditional with Guard Clauses' is very specific -- it does not declare that all nesting is bad.)

A tree-like-indentation structure is a perfectly good element of programming -- why try/want to change it into something else?

In fact, a good case could be made that software in its most fundamental essential conception is tree-structured -- even that engineering generally is tree-structured.


At this point in my coding life (15 years in), I'm pretty strongly against if / else clauses. The readability of such code just plummets in my view.

I try to limit my conditionals to inline statements:

  $x = $y if $b;
Often joined with early returns:

  return $b if $a;
If I find myself coding an if/else branch, that's often a clue that I should refactor this chunk into a function that can use early returns to avoid nested conditionals:

so

  if ($a == FOO) {
    $c = $d;
    }
  else if ($a == BAR) {
    $c = $e; 
    }
  else {
    $c = $f;
  }
becomes

  $c = get_val($a);

	...

  sub get_val {
   my ($a) = @_;
   return $d if $a == FOO;
   return $e if $a == BAR;
   return $f;
   }

Thankfully much of my coding lately is in perl, which lets me use postfix "if/unless" as well as "and/or" to avoid having to use braced conditionals.

I also love myself the ternary operator. Basically my take comes down to "use conditionals for one-line assignments only, NOT for block execution".


I'm glad I am not the only that is particular about these things. Computers may not care which order your IF statement is in... but some people completely disregard that their code is going to be read by humans too.


Generally good advice, but w/r/t the first recommendation, I prefer to structure if/else clauses so that common case is handled first and the exceptional case is handled last.


As shown in the article, that first recommendation leads directly to the next step(s). In the ideal case (which actually happens sometimes in the real world!), you end up with something like this:

    if (bad1) fail();
    if (bad2) fail();
    if (bad3) fail();
    
    step1();
    step2();
    step3();
    step4();
    return SUCCESS;
To me, this is a very nice way to structure things when you can. You get a checklist of things that might be wrong, and then comes the wonderfully naive straightforward implementation.


Stating failures first can also be considered analogous to input validation, preconditions, or declaring the ___domain for which the function is undefined. Some related discussion on here: http://c2.com/cgi/wiki?GuardClause


I like to do this as well, both for reasons of clarity and because I like to imagine that it saves my code from executing an often-wasted conditional (though I really have no idea if the order of branch condition evaluation is guaranteed in the face of compiler transformations, or if branch prediction and/or speculative execution make this point completely moot).


meh. when i'm looking for a bug, i tend to start in the places with lots of branching. in general, high quality code doesn't have much branching, and if you find yourself neding complicated and nested branching, there is probably a better way to factor your code.

one easy trick is to structure your code such that 'null' is impossible, by initializing to an empty value, or re-ordering the way things happen, or using exceptions or error-monad.


I have to run so this is really a substandard comment for hacker news, but... Really? He advocates adding multiple exit points to a function as a way of REDUCING COMPLEXITY? The solution when you have too deep of nesting is to pull out inner parts into useful simpler stateless functions, or so I was led to believe!


He's preaching to the choir in my case. I'm a big fan of busting out of functions/methods early when you already know "it's over". (sometimes this means putting the cleanup in something like a try/finally wrapper routine, though)

But, kudos: I can send this to my team instead of having to write it myself. Win!


The goto cleanup strategy can be improved upon by using multiple goto labels, like here: http://vilimpoc.org/research/raii-in-c/


Or just test the resource handles you opened, and only close them if they're valid. That way, you don't have to worry about doing a goto to the wrong label, or messing up the order of cleanup.

    BOOL do_stuff_with_resources()
    {
        int success = FALSE;
        
        char *memBuf = (char *) malloc(256 * 1024);
        if (NULL == memBuf) goto cleanup;
        
        if (randBomb()) goto cleanup;
    
        FILE *fp = fopen("dummy-file.txt", "w");
        if (NULL == fp) goto cleanup;
        
        if (randBomb()) goto cleanup;
    
        int sock = socket(AF_INET, SOCK_DGRAM, 0);
        if (-1 == sock) goto cleanup;
        
        if (randBomb()) goto cleanup;
    
        int shmId = shmget(0x1234ABCD, 256*1024, IPC_CREAT);
        if (-1 == shmId) goto cleanup;
        
        shmRegion = (char *) shmat(shmId, NULL, 0);
        if ((char *) -1 == shmRegion) goto cleanup;
    
        if (randBomb()) goto cleanup;
    
        sem_t st = -1;
        if (-1 == sem_init(&st, 0, 1)) goto cleanup;
        
        if (randBomb()) goto cleanup;
    
        
        // TODO: Do stuff
    
        success = TRUE;
    
    cleanup:
        if(-1 != st)
        {
            sem_destroy(&st);
        }
        if(-1 != shmId)
        {
            shmctl(shmId, IPC_RMID, NULL);
        }
        if(-1 != sock)
        {
            close(sock);
        }
        if(NULL != fp)
        {
            fclose(fp);
        }
        if(NULL != memBuf)
        {
            free(memBuf);
        }
    
        return success;    
    }


If you do your indentation "wrong" and nothing breaks, then you will do your indentation "wrong". That's why I find whitespace-sensitive languages easier to read, YMMV.


The article talks about unwinding deeply nested control structures, which is equally possible in whitespace sensitive languages.


He's basically a step away from having several objects (let's call them handlers) which can do your handling for you. Further, they can define the set of circumstances in which they are valid.

I'm going to work with a couple of assumptions:

1) The test casts which he refactors to test*() methods are non-trivial 2) The handlers _may_ be large sets of code

Here's a way to handle one of this refactored methods:

if (!test1()) { handle_failure_1(); return false; }

if (!test2()) { handle_failure_2(); return false; }

if (!test3()) { handle_failure_3(); return false; }

handle_success(); return true;

That's a lot of potential code we're dealing with in this class.

What if:

class HandlesFailure1

  def handles_this_case(params)
     #code from test1()
  end

  def handle(params)
    #code from handle_failure_1()
    return false;
  end
end

class HandlesFailure2

  def handles_this_case(params)
     #code from test2()
  end

  def handle(params)
    #code from handle_failure_2()
    return false;
  end
end

class HandlesFailure3

  def handles_this_case(params)
     #code from test3()
  end

  def handle(params)
    #code from handle_failure_3()
    return false;
  end
end

class HandlesSuccess

  def handles_this_case(params)
    return true
  end

  def handles(params)
    //code from handle_success
    return true
  end
end

#And now the original class

class DoesSomething

  #There are ways to set this up fairly easily. One simple way it to just new it up here, but you could auto-wire it too
  handlers = [HandlesFailure1, HandlesFailure2, HandlesFailure3, HandlesSuccess]

  def do_something(params)

    handlers.each do |handler|
      return handler.handle(params) if handler.handles_this_case(params)
    end

  end
end

There. Now there's a loop and an if. Check each case and either handle or toss it out. This can also work for the case where you might want multiple handlers to handle something, just don't return on the first one.

I'm not saying it's the right way, but it's another way.

Pros: It really helps to group together the handling methods with the test methods. Very simple dispatch method

Cons: Separate classes, and harder to view all the handlers together.


I agree with most of what was said, especially the use of goto! I don't like success boolean, but that's probably just me.


I used to be a big goto-error-cleanup proponent, but ultimately gave it up. The resulting code is cleaner, but the bug it's trying to treat (returning without correctly cleaning up) is still too easy. Complicated functions end up with 4-5 cleanup labels and it's hard to look at them and understand the resource acquisition order.

Generally these days I prefer functional decomposition for that (actually honestly I'm preferring C++ RAII style where you get the cleanup for free in almost all cases). That is, do something which requires cleanup (e.g. allocate memory, take a mutex, etc...), call a function to "initialize" the thing you just did, check results and free it on failure. So the allocation and cleanup happen next to each other in the code. Nest as needed for complicated resource strategies, and it scales well. No one is ever confused about what the state is supposed to be.


The problem isn't with 1 resource. It is with 2 or more, right? You successfully allocate the first, but the second fails and so you have to go back and deallocate the first. The goto strategy avoids code repetition.

RAII is the best part about C++!


I stopped reading when he recommended if-goto.


Usage of goto is fine (the effects are easily understood) when the destination only points to lines ahead of the goto, within the same block.

In this case, it makes the control flow more readable.

http://www.kerneltrap.org/node/553/2131


What if it were called "try/finally" instead?


I read the comments just expecting yours. Something must obviously bad if Dijkstra says so, right? We don't have to think for ourselves anymore if the god of CS himself has banned something, right?


Sure, goto was abused, but it's no worse than (or even better than) using continuations or using break and continue.


Any of those mysterious jump-to-somewhere-else code features resemble goto. Not that I don't think they clean up some nesting issues, but they're pretty much the same thing as goto.


Too bad, because it's the only sensible advice in the whole post.


Someone would sure enjoy golang!


More specifically, the `defer` keyword lets you handle cleanup in the case of failure a bit more neatly.

The gotos can be replaced with stuff like this:

    foo, err := bar()
    if err != nil {
      return err
    }
    defer foo.cleanup() // or what have you
Now, right before the function returns, Go will call `foo.cleanup()` and your `cleanup()` call sits right next to the initialization code. It's a lot easier to tell whether or not you are (or someone else is) doing cleanup properly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: