I think there are multiple reasons for this problem, and only one of them is a l...

I think there are multiple reasons for this problem, and only one of them is a lack of training in software management. Another problem is that science is an inherently exploratory procedure. You design an experiment, gather some data, and then go about analyzing it. You have an idea of what you'll find, but depending on what you get, you might need to then reformat/restructure the data, transform it, cut it up, etc.

The problem is that this represents one of the worst problem cases in software design: evolving requirements. By itself this is bad enough. Recently I have been analysing data from a recent study. You start off with a data structure that you think represents things, but then you notice for example you need to synchronize several recordings; now you have to track time. You realize some recordings need to be split down the middle to aid in synchronization; now you need to add a 'part' field. You derive some value from several data points that takes a long time to compute, so you need to create a file to hold it. This needs to be kept in synch with the original data. Eventually you realize that text files aren't going to cut it; you start moving things to a database. Now you need to reconfigure your visualization program to read from the database. Then you realize that you want to add another similar derivative value, but this time it's a 3x3 matrix for each data point; time to extend the database again. etc.. etc.. Eventually you decide it would be best to really rewrite the codebase because it's becoming impossible to work with. Unfortunately the paper is due soon and you just need to generate a few more graphs..

And I didn't even mention the growing directory of scripts that aren't properly organized into modules, that end up with copy-pasted code because it's not very clear how to cleanly put this into a function, or which module it should belong to.

Now, this is bad enough when you have a CS degree and have designed several software frameworks in your life. Combine this with someone who knows nothing about software architecture and you have a really big problem on your hands. My point is this: it happens to the best of us, no matter how hard you try to organizing things, when you don't have the requirements available ahead of time.

The best approach I've found is to force myself to simply write functions as small as possible, that do one simple thing at a time. I try to break up functions as much as possible for reuse, and avoid copy-pasting code at all costs. Admittedly it's not always easy, sometimes a function that generates a particular graph just needs a certain number of lines of logic, and it's very difficult to modularize. Then you find that you want a similar graph but with a slightly different transformation on the Y axis... etc.. etc..