I am not sure what you mean. The idea is that the network should return the text, and a confidence expressed as probability. When trained, the log-score should be optimized. (i'm not sure it would actually work given how the training is structured, but something like this would be useful)
It's not that simple how would the model know when it knows? Removing hallucination has to be a post-training thing because you need to test the model against what it actually knows first in order to provide training examples of what it knows and doesn't know and how to respond in those circumstances.