That's not really trivially compatible with the transformer scheme used to pick tokens and generate results.
Transformers are generative AI, not classifiers. They throw out a lot of statistics in the service of forward progress and completing the generative task. This project is a rudimentary attempt to regenerate those stats
Transformers are generative AI, not classifiers. They throw out a lot of statistics in the service of forward progress and completing the generative task. This project is a rudimentary attempt to regenerate those stats