This package seems to use llama_cpp for local inference [1] so you can probably use anything supported by that [2]. However, I think it's just passing OCR output for correction - the language model doesn't actually see the original image.
That said, there are some large language models you can run locally which accept image input. Phi-3-Vision [3], LLaVA [4], MiniCPM-V [5], etc.
That said, there are some large language models you can run locally which accept image input. Phi-3-Vision [3], LLaVA [4], MiniCPM-V [5], etc.
[1] - https://github.com/Dicklesworthstone/llm_aided_ocr/blob/main...
[2] - https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#de...
[3] - https://huggingface.co/microsoft/Phi-3-vision-128k-instruct
[4] - https://github.com/haotian-liu/LLaVA
[5] - https://github.com/OpenBMB/MiniCPM-V