Hacker News new | past | comments | ask | show | jobs | submit login

1) It is a general version of knowledge distillation. For example, this paper from 2016 describes the same technique: Sequence-Level Knowledge Distillation [0]

> This sequence-level approximation leads to a simple training procedure wherein the student network is trained on a newly generated dataset that is the result of running beam search with the teacher network

2) Fine-tuning is a step in the training process. Language models are first pre-trained, then fine-tuned. This is a pedantic quibble.

3) It is unsurprising that you don't understand ad hominem. Giving background information and pointing out the style of writing is relevant to arguments made.

[0] https://arxiv.org/abs/1606.07947




It's arguable that saying "EY has a very shallow understanding of ML" is even lower than ad hominem (which is DH1) on the pg scale [4], since pg specifically gives "The author is a self-important dilettante." as an example of DH0.

[4] http://www.paulgraham.com/disagree.html




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: