I’ve been playing with local models for handling text correction, aka grammar and spelling in multilingual text.
Most models with less than 20b parameters struggle to do that, even instruct models.
For example, they tend to answer questions from the text instead of correcting the mistakes. Strict prompting not to do so only partially helps. They always tend to add or remove sentences of there own.
Are there any good small models for text correction or is that just not a task LLM are good at?
# Grammar correction .... :input:A couple of sentences from your text here:input: :output:
And see what it fills in after the output.
>They always tend to add or remove sentences of there own.
You may be running into context size issues here. Try going small a sentence at a time. And using a new chat for each sentence.
btw: When I am saying a base model, I mean try using it in a text generation mode not a chat mode.
edit: There are models specifically trained for grammar correction though, for the multilingual case you may have to train one. See a link to an explanation of how someone did it for a google model from 2019: https://deeplearninganalytics.org/nlp-building-a-grammatical...