Hacker News new | past | comments | ask | show | jobs | submit login
Portrait of a Noob (steve-yegge.blogspot.com)
77 points by ptn on Dec 22, 2009 | hide | past | favorite | 41 comments



I read this a while back and i still remember the takeaway point from it :

"It's denser: there's less whitespace and far less commenting. Most of the commenting is in the form of doc-comments for automated API-doc extraction"

This is right on the button. I prefer writing/reading an overall README.txt which says concisely what that module/package etc. is supposed to do and doc-comments where appropriate when something is not obvious. Anything else and i automatically start seeing that as noise filtering it as i go along. The worst are the obvious and/or the outdated ones.


There's a big difference between "what"/"how" and "why" comments. "This increments a variable" is a stupid comment because the code already says it, but a note such as "I'm using this particular data structure/algorithm/etc. because ..., even though ... seems like a better choice" can speak volumes. It's hard to make code itself clearly convey intent (careful naming is the main way), and it's the first thing to get buried by verbose code.

If you have documentation about the design of the system and write reasonably clear code, you can document sparsely. (Having fewer comments also gives those present added emphasis.) As with most engineering, it's more about trade-offs than hard-and-fast rules, though.


Yup , that was my point on the "doc-comments where appropriate". Overall design of the system can/should be expressed with a good,well-written README.txt. I know UML is "supposed" to solve that problem but not for me.


I've reduced the amount of whitespace I use as I've learned to read dense code more patiently. You have to read at the right speed and not let your eyes fly down the screen, leaving comprehension behind. Strangely enough, I majored in math, so I've always had this skill, but it's hard for me to apply it to code for some reason.


He's spot-on about Java being a safe-haven for "data modeling" geeks who are afraid of doing real work. Writing a whole bunch of getters and setters in a freshly minted Java class certainly feels like work. However, I think he gets it wrong when he puts OCaml on the extreme "for you to model everything" end of the spectrum:

And Haskell, OCaml and their ilk are part of a 45-year-old static-typing movement within academia to try to force people to model everything. Programmers hate that. These languages will never, ever enjoy any substantial commercial success

Hold on a minute...last time I checked OCaml was a dialect of ML. ML does not ask you to "statically" type everything, in fact it does the opposite. It infers all the types. Sure, it allows you to hint types to the compiler, but this is not necessary. You get all the benefits of "static" type checking through ML's concept of type inference. It's designed specifically to make it less work for the programmer to enjoy the benefits of a sound, correct type system.

If you want to learn more about type inference in ML, check out this short introduction: http://bit.ly/8WEhD8


Haskell also uses type inference. It does have a lot of type system stuff, but because of how the language works the types are not just metadata. The types are often just as important as the rest of your code.


OCaml does type checking at compile-time and therefore is statically typed. It often can infer what those types are, but it doesn't wait until runtime to do so.


Comments are invisible until the moment you need them.

When you understand your code, you don't bother to read the comments. As soon as you forget how the code works, you look to the comments for direction.

But if you didn't revise your comments along with your code, the comments will trick and deceive you until you realize that they're out of date. So you must re-grok the code anyway.

Use comments to describe your code's purpose. Let your code self-document the implementation.If you are particularly clever in your implementation, add a quick comment to explain your cleverness. When you realize you shouldn't have been so clever, be sure to remove the implementation comment as well.


This is why api-level documentation is great: functions are used for a purpose, and they're supposed to be a black box. If you keep your functions small and focused, you can just describe what they do, and it all works out.


I've shared Yegge's disdain for static typing for quite some time. One of the best examples of how awful it can be is type hinting (optional type constraints on parameters) in PHP. Many times I've explained to people why type hints are awful thing deserving banishment to hell — along with Facebook suggestions and Microsoft product recommendations — but so many PHP programmers seem to love them! Now that I read this I can see why that sort of thinking is bound to exist for a language like PHP (read: noobs).

Recently, however, I've been getting into Haskell and in doing so I found my opinion on static typing left a little beaten in the legs and suffering some obvious facial wounds. Static typing works in Haskell, I'm pretty sure of that. The question I'm left with is, why?

I think it might have something to do with the fact that type checks in class-based OO languages don't even begin to insure the program correctness that novices are often foolish enough to think they do. Everyone else recognizes the need for all manner of testing, and that's how correctness is proved (albeit, approximately) in imperative languages. In Haskell, I've found that type checks go quite some way to proving correctness, no really! (They don't actually prove correctness but proofs of correctness would be impossible without them). Without even bothering to do any proofs or testing you get a much more rigorous check for correctness in Haskell than you do when a Java program successfully compiles.

I can't emphasize this difference enough. To me static typing in Java is an annoyance and a lie. And so when I program in a language that teeters between static and dynamic typing, such as PHP, I go out of my way to write libraries that make things more dynamic (I've got one GitHub) and I call men who advocate aforementioned type hinting, girls names. In Haskell, it's totally different; it actually works; it's actually useful. Not to mention the clever stuff that you can do with types which I couldn't begin to do justice here given my inexperience.


The difference is, as you say, that the type system in Haskell is a tool that helps you write concise and correct code, whereas the type system in a language like Java (it is hardly the only offender) feels arbitrary and capricious.

I too once thought static typing was nothing more than useless bloat, but then I started using Haskell. Haskell's type classes are actually a useful means of handling abstraction (know something that acts like a monoid? then save yourself some effort and use foldMap). The one thing I did like about Java's class system was interfaces, and Haskell's typeclasses handle that exceptionally well.


I think part of the reason the type system in Haskell works and doesn't seem to get in your way is because of type inference.

Also, Haskell is more strongly (and richly) typed than Java\C++\C# so having strongly-typed code works really well.

About proving correctness, I guess you may have seen this amusing piece: http://perl.plover.com/yak/typing/samples/slide030.html

The type system found an infinite-loop bug in the code at compile-time.


I'm suprised no one mentions this, good code does not need comments, because it clearly speaks with variable names and function names.

The Lisp function Yegge gives, is horrible in that aspect. for example this piece of code:

  (if (or (= tt js2-LB) (= tt js2-LC))
should really be more like

  (if (matches-js2-line-ending current_token))
No matter "how good you are" code reading is faster if what you read matches with what you're doing in English.


A nice reminder that skill and raw brainpower are two different things, and the latter can retard development of the former. I worked with many pieces of code written by one particular coworker of mine who was way too smart for his own good. His code was littered with little formulae like that, repeated again and again throughout a file, and to him it was obvious at a glance what they meant. It was also easy for him to scan through his code and find errors in these expressions, as easy as it is for you or me to see that "(matthes-js2-line-ending ..." contains a typo. I have no idea how he coped with finding and changing every instance of a formula that needed to be changed. For me it was a slog every time I had to read, change, or debug his code.

P.S. The most frequent and flagrant was calculating array indices into multi-dimensional arrays in C++. It's so easy to write a simple class that lets you do this:

  arr(n1 + i, n2 + j, n3 + k) = calc_foo(i, j, k);
instead of this:

  arr[n3 + k + (n2 + j) * (nz + (n1 + i) * ny)] = calc_foo(i, j, k);


1) That's not a multi-dimensional array, that's a one-dimensional array simulating a multi-dimensional array.

2) Why not

    arr[n1 + i][n2 + j][n3 + j] = calc_foo(i,j,k)
? It's a bit more work to write the proxy classes, but a Sufficiently Smart Compiler(TM) can make that all go away for you.


By leaving out the definition of that long function name you mask the increased cognitive load of your change. If you include the definition of your new function it doesn't look like an improvement anymore.


We've also left out the new code that also has to match things against js2 line endings.

That said, this does require some taste. It is horrible to spread a single logical operation across multiple functions, classes, and files just for the sake of "object orientation" or "self documentation", but judicious application of bottom-up FP really helps code size and readability.


Yet again, the title of reposts needs to be dated. I was bittersweet hopeful this was one of the final 3 posts he has promised us since May.


Just pointing this out, but if you're that eager for a new post, and you didn't know this is an older post, then the end result you desire has been realized, no?


you didn't know this is an older post .... until I started reading it. I don't always recognize these posts by their title.


I'm not a big fan of lots of comments in the code itself, but I very much so advocate design documentation.

I once had the pleasure of rewriting an Ada avionics subsystem in C. The code was very tersely commented, and the only documentation I could find on the subsystem was a requirement to the effect of "such-and-such subsystem shall exist". So I was left to figuring out what the subsystem worked from the code itself.

That's possible to do, but things would have gone a lot faster if I would have had a few pages describing what the goals and overall architecture of the software was.


Things only go faster if the design documentation matches the code. This is, in my experience, never.

In the best case, the design documentation is updated as the code is written and problems are found. However, even then, decisions made while coding will avoid problems, and ambiguities in the design documentation don't get updated when that happens.

I've had to do similar rewrites in the past, and I usually just skim the design documentation to get some idea of what the plan was. However, programming languages are in general the most concise and unambiguous way to represent what a program does.


Old HN thread about this essay: http://news.ycombinator.com/item?id=113244


From the article:

If you're a n00b, you'll look at experienced code and say it's impenetrable, undisciplined crap written by someone who never learned the essentials of modern software engineering. If you're a veteran, you'll look at n00b code and say it's over-commented, ornamental fluff that an intern could have written in a single night of heavy drinking.

I have never seen this so succinctly expressed before.


A guy who works for us just read that and laughed and said:

"To be honest I'd sit there and say both were equally shoddy and unlearned in their own way. There is a sweet spot middle ground that the really really good programmers learn"

I suspect he is right.


I've found that the best compromise is to use verbose "n00b-style" commenting in header files (or wherever your interfaces are defined), and sparse commenting in the implementation. Interfaces are like contracts, so you want to make sure everything (including how corner-cases are handled) are explicitly described. With well-designed and fully-documented interfaces, the implementation code can usually be quickly understood even if it lacks comments and is written in a more "advanced" style.


This completely does not match my experience. When I first started programming, and even after a year or two in the industry, I used to write zero comments. Or maybe a line or two of comments every few thousands lines of code. Then, I started adding comments in places I thought required explanation and above function declarations. After I joined Google I was taught by smart people to add a comment every ten lines of code or so explaining what the next ten lines of code do.

I still don't write much comments when doing hobby programming at home (after all, the whole point of hobby is to indulge yourself), but there is no slightest doubt in my mind that abundance of comments is a good thing. I've never really seen code I considered overcommented. I see code which is severely undercommented all the time.


The first part seems to be motivating something like a macro system for comments. What if one abbreviated compound situations using a single word or a phrase along with a separate file with expansions of the abbreviations into more detail. So, one can comment at a more abstract level and still be understandable (using say, a tool which displays the expansion when hovering over the abbreviation, or a separate window with expansions of all the abbrevs used nearby). Ideally, of course, the abstraction would be captured in the code itself, and the expansion would be the comment before the function or the macro defining the abstraction. So this system will be useful only when there are abstractions in comments which are not reflected in the code.


This is one of the most inspired pieces of writing on programming I've read in quite some time. Acknowledging the difference in preference for code density among noobs vs. seasoned programmers will be really useful during any annoying meetings on programming style I have to attend in the future.

Edit: It got less inspired as I continued to read but the initial idea was good.


If it was poorly written code, I'd rather see those huge comments just so I knew what was going through the authors head. I think if it's poorly written code with "no" comments, then you have even bigger problems.

I want to say I read that entire article, but I only read the first few paragraphs, so, my comment may be off topic somehow. That was a hefty post!


For a second I thought maybe Steve had made a new blog entry :(


That's a very large function for an experienced programmer. In my opinion, each function should do one thing and that thing should be the name of the function.


Only because of comments, an indentation style that pairs function argument on lines, etc. It's not a big function in the sense you're thinking, and breaking it up further may be counterproductive.


This is one of Yegge's most stupid rants. I strongly recommend SQLite Dr. Hipp's video presentations on how to write well commented code.


Loved it and found myself nodding along.

Anyone else catch the irony that this essay was really long?


His essays are always long; he has a lot to say and he types very quickly :)


I refute Yegge thusly: Which code would you rather maintain?


Surely Yegge would tell you that he would rather maintain the concise code with sparse comments. That's the whole point of the post.


But he would be lying. Yegge would rather maintain the concise code with sparse comments if and only if it were Yegge's concise code with sparse comments. If it were someone else's, he'd rather it were in fact rather heavily commented.


You would be wrong. I have encountered all sorts of code written by others that I have to maintain.

My favorite code has consistently been sparsely commented (though effectively heavily commented with good variable names and appropriate functions) while the absolute worst I've maintained was code that was so heavily commented that it was virtually impossible to see the flow of the program.


To the point where you need to rewrite 10 lines of comments when you change one increment operator? I doubt it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: