Hacker News new | past | comments | ask | show | jobs | submit | more diegolo's comments login

+1


https://www.amazon.co.uk/Leadership-Self-Deception-Getting-O...

life changing, I reread it every now and then


> The job of a scientist is really not to ship software, that's what a team of engineers would do.

I think that this is the real problem - in academia there is this idea that learning good practices is like a 'dirty' thing that is not required, while instead it would speed up the work and make it more reliable. if you look at chemistry or medicine, there researches have good practices for managing the lab and respect them.


> in academia there is this idea that learning good practices is like a 'dirty' thing that is not required

I think you got me wrong. Shipping quality software is not 'dirty' but requires a specialised focus. One can not do everything by yourself - science and engineering are complementary skills. In your example of chemistry, the chemist who designs a molecule does not spend time to ship the molecule to the world.


Except that it wouldn't speed things up at all. Academia writes run-once code, which changes spec fourteen times in one week. Their use case is orthogonal to industry.

Have you considered that maybe the academics actually know what they are doing?


Lol I spent 5 years in academia, and I have a PhD in CS - I know what I'm talking about. Specs of code change in academia as in industry, I was able to write unit tests and document my code also in academia. And I know in medicine and chemistry time to publish are much longer - but that is not connected with the fact that they know how to properly use a microscope, clean the lab, and keep an inventory.

If you don't write unit tests how do you hedge the possibility of having bugs in your code?


Most scientists have no training in computer science, much less engineering, but still need to do it sometimes to build experiments. They've largely taught themselves. You are not the norm.

I've taught dozens of grad students enough programming to get the job done and it would have been a total waste of time to make the code that robust. They need experimental results next week, with only one computer ever expected to run the code, not a product demo.

The software isn't their research project, it's a nuisance that they have to deal with. Accordingly they neither want to nor have time to do it perfectly. I cannot blame them.

That said, there should be a system to encourage actual trained programmers to get involved, including coauthorship and consideration in tenure decisions. The current system is bad, I'm just saying it's not the scientists fault here. This is just literally not their ___domain of interest or expertise, and I would rather they focus on the thing they're uniquely good at.


I agree - the message I wanted to communicate is the same :) I never thought the problem are the scientists ;)


The authors of HMMER know what they are doing. That's an extremely rare situation.


> if you look at chemistry or medicine, there researches have good practices for managing the lab and respect them.

Their studies / experiments last years.

In CS/ML/Applied Math you sometimes have to write an experiment with a deadline next week. Excuse me if when I'm trying to scramble for a deadline at 3am I don't have my mind toward TDD or I'm not neatly packaging everything in a docker.


Hey, I feel you - and I understand the pressure - i have been in that situation. The point is that this:

> you sometimes have to write an experiment with a deadline next week.

shouldn't happen. And yes, at the moment is like this - sometime you will have to hack. But if the all community start to push for proper practices, instead of just saying "is as it is" - there will be less papers, with more quality.


You can put them in your library/desk to impress your friends


I keep mine in clear view on the shelf in the hope that its collected wisdom will radiate outward and suffuse into my code. Not happened yet, but perhaps it has a useful psychological effect as a shrine to algorithms; whenever I am tempted by a quick, cheap hack I see the books and am steered back to the righteous path.

Actually I usually just do the cheap hack anyway but it is reassuring to know that it is there.


Not in Uk


"a general search can be machine learning" I don't get this sentence: Machine learning is about building a mathematical model of sample data, known as "training data".

If you want to talk about machine learning and search you should probably talk about learning to rank (https://en.m.wikipedia.org/wiki/Learning_to_rank)


I'd argue that you're too restrictive in your definition. e.g. unsupervised clustering has no sample training data.

The usual definition (due to Mitchell) is that machine learning is a system s.t. its performance on a given task improves by past experience.


Actually, any unsupervised method, including clustering, still has training data. The only difference is it doesn't have a target y variable in the training set to minimize the error metric, hence the name unsupervised.

But the definition you mention is right. Yet, any dataset that you use to fit your model will be your training set, even if you don't have a train test split or the like, because you used it to train your model over.


K-means has no "training data" per se.


also UK


I would try OkC djsumdog, I think it is better for geeks :)


definitely worth a read


Sebastiano Vigna never ceases to amaze me


Indeed. I was never aware that he was working on his own text editor.


What else has he been involved in?


A Java framework, 'fastutil', seems to be his most popular.

https://github.com/vigna?tab=repositories


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: