I am deeply unsatisfied with how most RAG systems handle questions, chunking, embeddings, storage, and even those used for summaries are usually rubbish. That's why I created my own tool. Check it out I updated it a lot! It supports ollama too for private use.
You do need a beefy GPU to run the local LLM, but I think it's a similar requirement for running any LLM on your machine.