Hacker News new | past | comments | ask | show | jobs | submit login

Naively, you’d think such a revolutionary paper [BERT] would be met with open arms. But when it was given the best paper award (at NAACL 2019), the postdocs I talked to universally grumbled about it. Why? It wasn’t interesting, they bemoaned. “It just scaled some stuff up.”

I'm from that community, was there when they presented it, have used and still use BERT a lot, and still if it were my decision I wouldn't have given it the best paper award, even in hindsight.

BERT the model has been, of course, enormously influential. I still use it, even after the generative LLM revolution (which also stands on it shoulders). I greatly respect its authors and am truly grateful that they published and open-sourced it.

BERT the paper? It's not really well-written (almost everyone who wants to understand Transformers or BERT turn to a blog, because the papers are so bad) and it's not stellar in terms of scientific insights, because indeed, it scales some stuff up and comes up with a lot of magic numbers that are there presumably because other alternatives were tried and those happened to work. Or maybe not even that, because some stuff included in BERT actually turned up being useless (see RoBERTa), so I guess they just winged much of it and it worked.

From a scientific paper, I would expect much more explanation: why the architecture is designed like this, why this number of layers/dimensions and not that, etc. which that paper thoroughly lacks. No one will learn to do better science from that paper, it's not a paper that a PhD student would benefit much from reading except as a curiosity (they of course can benefit from downloading, using and getting to know the model. Just not from the paper).

Maybe create a "best model award", "best software award" or whatever, but in my view as an academic a best paper award is just not for this.




Coming up with something that is new, original, and actually works better than anything before it is quite hard. It does not happen often. When it happens, it should be embraced and cherished. Because without these discoveries, science is not worth anything at all. These discoveries are what science is all about. Yes, making sense of the discoveries is also science, but that is the easy part of science.


It happens every day. The job description of an engineer is basically to apply their knowledge of fundamentals to design and improve products, not simply copy the existing products. Products in every field must incrementally and monotonically improve in some way in order to compete. It is often not even science at all, just development.


We are talking about different things. Incremental engineering can definitely be part of it, but at some point something different happens: the discovery.


I don't think that this is what GP wrote about. They complain about the paper and not the software. Imho, most scientific papers are really badly written, because most scientists (and most people) are bad writers.


If you want beautiful writing, read poetry. You should give the best paper award to the paper with the best invention / discovery. Because as much as I like a well-written paper, I like a paper with a great invention / discovery even better, even if badly presented. Remember, you get what you incentivise. There are too many papers out there that, while nicely written, do not move the needle at all.


I would certainly appreciate beautiful writing, but I want something far more basic: Good writing. I want a text that is pleasant to read. That does not mean that the text should be dumbed down or that I expect it to be easy. A good text avoids unnecessary jargon and is simplified to make it very clear what the reader should take away from reading the text. A lot of the technical details can be delegated to the supplement so that the main text can remain clear and focused.


I think that best paper really means best discovery, or best finding.

No one really cares how well written a paper is.


I agree the best paper award should go to the paper with the strongest contributions, perhaps with writing quality as secondary. I don’t agree with your last sentence.

Writing quality is often a make or break thing when it comes to whether a paper is accepted or not, mostly because it makes a paper easier to understand, ie its contributions and the evidence that they are truly there are easy for a reviewer to pick up on and appreciate.

Furthermore a well written paper is a far larger contribution to the research community - the audience post reviewers - than a poorly written paper with the same contributions, for the same reasons: well written papers can be a joy to read, particularly if a leap in contributions are presented in an easy to understand way.

I understand that these things are cultural (ie field specific) but this has been my experience.


A badly written paper is bad at delivering its core content of scientific knowledge. The reading process less efficient, the understanding comes slower, and discussing the core ideas and findings is harder and less fruitful.

A well written paper transports the reader to the perspective of the writer. It’s not about poetry or aesthetics.

Bad handwriting is illegible. Good handwriting is clear. Calligraphic aesthetics is another ___domain.


> comes up with a lot of magic numbers that are there presumably because other alternatives were tried and those happened to work.

The process of experimentation is what makes Computer Science "science"!


A central part of experimentation in scientific fields is to document the experiments in a very comprehensive way, including

- which other options you also tried, but which failed

- write down hypotheses that explain the worse results of the other options

- write down hypotheses why the chosen options gives better results

- ideally formulate testable predictions that these hypotheses imply

- etc.

Simply saying "other alternatives were tried and those happened to work" is not science, but tinkering around combined with magical thinking.


Do you take issue with the 'purely empirical' approach (just trying out variants and seeing which sticks) or only with its insufficient documentation?

I don't know how you'd improve on the former. For a lot of it there simply isn't any sound theoretical foundation, so you just end up with flimsy post-hoc rationalizations.

While I agree that it's unfortunate that people often just present magic numbers without explaining where they come from, in my experience providing documentation for how one arrives at these often enough gets punished because it draws more attention to them. That is, reviewers will e.g. complain about preliminary experiments, asking for theoretical analysis or question why only certain variants were tried, whereas magic numbers are just kind of accepted.


Seems pretty clear they aren't objecting to throwing stuff at the wall and seeing what sticks, but with calling the outcome of sticky-wall work "science".

I'd say that's a bit strict take on science, one could be generous and compare it to biologist going out into the forest and combing bsck with a report on finding a new lichen.

Thought admittedly these days the biologist is probably expected to report details about their search strategy, which the sticky-wall researchers don't.


The biologist would be expected to describe the lichen in detail, including where it was found, its expected ecology, its place in the ecosystem, life-cycle, structure, etc. It is no longer 1696 where we can go spear some hapless fish, bring back its desiccated body, and let our fellow gentleman ogle over its weirdness.


I'm not GP, but I don't think they are taking issue with the fact that e.g. layer numbers or architecture were arrived at without first-principles but rather empirically.

Rather that when you do come to something empirically, you need to validate your findings by e.g. ablations, hypothesis testing, case studies, etc...


Exactly, I can confirm this is what I meant.


> I don't know how you'd improve on the former. For a lot of it there simply isn't any sound theoretical foundation, so you just end up with flimsy post-hoc rationalizations.

So great science would come up with a sound theoretical foundation, or at least strong arguments as to why no such foundation can exist.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: