More

diegolo · on May 12, 2020

https://www.amazon.co.uk/Leadership-Self-Deception-Getting-O...

life changing, I reread it every now and then

diegolo · on April 26, 2020

> The job of a scientist is really not to ship software, that's what a team of engineers would do.

I think that this is the real problem - in academia there is this idea that learning good practices is like a 'dirty' thing that is not required, while instead it would speed up the work and make it more reliable. if you look at chemistry or medicine, there researches have good practices for managing the lab and respect them.

tchalla · on April 26, 2020

> in academia there is this idea that learning good practices is like a 'dirty' thing that is not required

I think you got me wrong. Shipping quality software is not 'dirty' but requires a specialised focus. One can not do everything by yourself - science and engineering are complementary skills. In your example of chemistry, the chemist who designs a molecule does not spend time to ship the molecule to the world.

317070 · on April 26, 2020

Except that it wouldn't speed things up at all. Academia writes run-once code, which changes spec fourteen times in one week. Their use case is orthogonal to industry.

Have you considered that maybe the academics actually know what they are doing?

diegolo · on April 26, 2020

Lol I spent 5 years in academia, and I have a PhD in CS - I know what I'm talking about. Specs of code change in academia as in industry, I was able to write unit tests and document my code also in academia. And I know in medicine and chemistry time to publish are much longer - but that is not connected with the fact that they know how to properly use a microscope, clean the lab, and keep an inventory.

If you don't write unit tests how do you hedge the possibility of having bugs in your code?

neltnerb · on April 26, 2020

Most scientists have no training in computer science, much less engineering, but still need to do it sometimes to build experiments. They've largely taught themselves. You are not the norm.

I've taught dozens of grad students enough programming to get the job done and it would have been a total waste of time to make the code that robust. They need experimental results next week, with only one computer ever expected to run the code, not a product demo.

The software isn't their research project, it's a nuisance that they have to deal with. Accordingly they neither want to nor have time to do it perfectly. I cannot blame them.

That said, there should be a system to encourage actual trained programmers to get involved, including coauthorship and consideration in tenure decisions. The current system is bad, I'm just saying it's not the scientists fault here. This is just literally not their ___domain of interest or expertise, and I would rather they focus on the thing they're uniquely good at.

diegolo · on April 26, 2020

I agree - the message I wanted to communicate is the same :) I never thought the problem are the scientists ;)

dekhn · on April 26, 2020

The authors of HMMER know what they are doing. That's an extremely rare situation.

JustFinishedBSG · on April 26, 2020

> if you look at chemistry or medicine, there researches have good practices for managing the lab and respect them.

Their studies / experiments last years.

In CS/ML/Applied Math you sometimes have to write an experiment with a deadline next week. Excuse me if when I'm trying to scramble for a deadline at 3am I don't have my mind toward TDD or I'm not neatly packaging everything in a docker.

diegolo · on April 26, 2020

Hey, I feel you - and I understand the pressure - i have been in that situation. The point is that this:

> you sometimes have to write an experiment with a deadline next week.

shouldn't happen. And yes, at the moment is like this - sometime you will have to hack. But if the all community start to push for proper practices, instead of just saying "is as it is" - there will be less papers, with more quality.

diegolo · on May 22, 2019

You can put them in your library/desk to impress your friends

silverdemon · on May 22, 2019

I keep mine in clear view on the shelf in the hope that its collected wisdom will radiate outward and suffuse into my code. Not happened yet, but perhaps it has a useful psychological effect as a shrine to algorithms; whenever I am tempted by a quick, cheap hack I see the books and am steered back to the righteous path.

Actually I usually just do the cheap hack anyway but it is reassuring to know that it is there.

diegolo · on April 20, 2019

Not in Uk

diegolo · on March 23, 2019

"a general search can be machine learning" I don't get this sentence: Machine learning is about building a mathematical model of sample data, known as "training data".

If you want to talk about machine learning and search you should probably talk about learning to rank (https://en.m.wikipedia.org/wiki/Learning_to_rank)

snotrockets · on March 23, 2019

I'd argue that you're too restrictive in your definition. e.g. unsupervised clustering has no sample training data.

The usual definition (due to Mitchell) is that machine learning is a system s.t. its performance on a given task improves by past experience.

thegginthesky · on March 24, 2019

Actually, any unsupervised method, including clustering, still has training data. The only difference is it doesn't have a target y variable in the training set to minimize the error metric, hence the name unsupervised.

But the definition you mention is right. Yet, any dataset that you use to fit your model will be your training set, even if you don't have a train test split or the like, because you used it to train your model over.

snotrockets · on March 25, 2019

K-means has no "training data" per se.

diegolo · on July 16, 2018

also UK

diegolo · on Dec 7, 2016

I would try OkC djsumdog, I think it is better for geeks :)

diegolo · on June 21, 2016

definitely worth a read

diegolo · on Nov 15, 2015

Sebastiano Vigna never ceases to amaze me

matt4711 · on Nov 15, 2015

Indeed. I was never aware that he was working on his own text editor.

unixhero · on Nov 15, 2015

What else has he been involved in?

OJFord · on Nov 15, 2015

A Java framework, 'fastutil', seems to be his most popular.

https://github.com/vigna?tab=repositories