The way this is being described is almost like a maze-traversal algorithm, where...

radarsat1 · 2024-10-25T18:42:57 1729881777

Sampling sequentially to find the highest joint probability over the sequence is definitely a search problem. that's why you see algorithms like beam search often used for sampling.

jpfed · 2024-10-29T15:59:01 1730217541

I also ask about approaching LLM decoding in terms of navigation, although from a different angle, in this reddit post: https://www.reddit.com/r/MachineLearning/comments/1dw2pqo/d_...

trq_ · 2024-10-25T17:58:24 1729879104

Yes that's right, it seems like an area of more research.

Honestly it goes counter to the Bitter Lesson (http://www.incompleteideas.net/IncIdeas/BitterLesson.html, which stems from getting too fancy about maze traversal in Chess. But at the scale LLMs are at right now, the improvements might be worth it.

menhguin · 2024-10-25T20:01:37 1729886497

Hi, contributor to Entropix here. This is just my opinion, but I don't think it goes counter to the Bitter Lesson at all, because it's meant to leverage model computation capabilities. Several papers have suggested that models internally compute certainty (https://arxiv.org/abs/2406.16254), and in my view our method simply leverages this computation and factors it explicitly into decoding.

This is as opposed to pure sampling + next token prediction which basically randomly chooses a token. So if a model does 1274 x 8275 and it's not very sure of the answer, it still confidently gives an answer even though it's uncertain and needs to do more working.

danielmarkbruce · 2024-10-25T21:05:42 1729890342

100%. It's in line with bitter lesson learnings. Good going.

danielmarkbruce · 2024-10-25T21:04:58 1729890298

Yeah i don't think it's counter at all. The bitter lesson calls out the fact that more computation/search wins.