1) It is a general version of knowledge distillation. For example, this paper fr... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

ipsum2 on March 14, 2023 | parent | context | favorite | on: Eliezer Yudkowsky on large language model economic...

1) It is a general version of knowledge distillation. For example, this paper from 2016 describes the same technique: Sequence-Level Knowledge Distillation [0]

> This sequence-level approximation leads to a simple training procedure wherein the student network is trained on a newly generated dataset that is the result of running beam search with the teacher network

2) Fine-tuning is a step in the training process. Language models are first pre-trained, then fine-tuned. This is a pedantic quibble.

3) It is unsurprising that you don't understand ad hominem. Giving background information and pointing out the style of writing is relevant to arguments made.

[0] https://arxiv.org/abs/1606.07947

xdavidliu on March 16, 2023 [–]

It's arguable that saying "EY has a very shallow understanding of ML" is even lower than ad hominem (which is DH1) on the pg scale [4], since pg specifically gives "The author is a self-important dilettante." as an example of DH0.

[4] http://www.paulgraham.com/disagree.html

Consider applying for YC's Summer 2025 batch! Applications are open till May 13
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact