Hacker News new | past | comments | ask | show | jobs | submit login

For all the talk of "best practices" and "training" the depressing truth is that guaranteeing correct software is incredibly difficult and expensive. Professional software engineering practices aren't nearly sufficient to guarantee correctness with heavy math. The closest thing we have is NASA where the entire development process is designed and constantly refined in response to individual issues to create the checks and balances with the lofty goal of approaching bug impossibility at an organizational level. Unfortunately this type of evolutionary process is only viable for multi-year projects with 9-figure budgets. It's not going to work for the vast majority of research scientists with limited organizational support.

On the positive side, such difficulty is also in the nature of science itself. Scientists already understand that rigorous peer review is the only way to come to reliable scientific conclusions over time. The only thing they need help with understanding is that the software used to come to these conclusions is as suspect as—if not more so than—the scientific data collection and reasoning itself, and therefore all software must be peer-reviewed as well. This needs to be ingrained culturally into the scientific establishment. In doing so, the scientists can begin to attack the problem from the correct perspective, rather than industry software experts coming in and feeding them a bunch of cargo cult "unit tests" and "best practices" that are no substitute for the deep reasoning in the specific ___domain in question.




I've spent a bit of time on the inside at NASA, specifically working on earth observing systems. There is a huge difference between the code quality of things that go into control systems for spacecraft (even then, meters vs. feet, really?) and the sort of analysis/theoretical code the article talks about. Spacecraft code gets real programmers and disciplined practices, while scientific code is generally spaghetti IDL/Matlab/Fortran.

There is a huge problem with even getting existing code to run on different machines. My team's work was primarily dealing with taking lots of project code (always emailed around, with versions in the file name) and rewriting it to produce data products that other people could even just view. Generally we'd just pull things like color coding out of the existing code and then write our processors from some combination of specifications and experimentation.

I'd agree that "unit tests" and trendy best practices are probably not the full answer, but the article is correct in emphasizing documentation, modularity, and source control. Source control alone would protect against bugs produced by simply running the wrong version of code.


Definitely. Obviously the software industry has a lot of know-how that would be invaluable to the science community. The critical point I was trying to make is that scientists need to understand the fundamental difficulty of software correctness before they can be expected to apply best practices effectively.


  the depressing truth is that guaranteeing correct software
  is incredibly difficult and expensive
There is a world of difference between the correctness of industrial programs that follow 'cargo cult best practices' and the correctness of scientific programs. This is achieved without incurring incredible expenses. That we can't go all the way by (practical) definition doesn't mean we shouldn't try to get further.

One of the main problems is convincing, especially young, scientists that their code sucks. Young programmers, you can coach. You review their code, teach them what works and what doesn't and they get better. Scientists that happen to write progams, they don't learn to become better programmers: they've got other things to worry about. There's nobody to help them and since they're usually highly intelligent and overestimate their capabilities in things they don't want to spend time on (which is a way of justifying for yourself not to spend time on it), they need all the more guidance to become good.


That's why my main point was that scientists need to be taught that code is as likely a source of errors as anything else in the scientific process.


I agree that correct software can not be achive by industry's practices.

BUT isn't better to use "cargo cult best practices", as you call them, than code-and-fix without any kind of formal test or documentation?

The hole point of these software programming practices is to improve overall quality with limited resources, not to craft perfect code.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: