I've been thinking of this. It's just fascinating to me to have a small device that you can converse with and knows almost everything.
Perfect for preppers / survivalists.
Store it in a faraday cage along with a solar generator.
It really doesn't. It doesn't even know what it knows and what it doesn't know. Without ways to check up on whether what it told you is true or not you may well end up in more trouble than where you were before.
That would make a lot more sense. That way at least you have a chance to check up on the output, lest your first meal of 'Hedysarum alpinum' ends up being your last.
The key aspect to this being a good solution is actually building a corpus representing as much possible reference knowledge needed in scenario. The idea that the answer is Wikipedia way underestimates the scope.
The Wikipedia patch doesn’t make much sense to me.
What percent of the important questions being asked in this doomsday scenario actually have their answer in Wikipedia?
If 50% of the time you are left trusting raw LLaMa, then you don’t really have a decent solution.
I do appreciate the sentiment tho that future or finetuned LLMS might fit on an RPi or whatever, and be good enough.
I think the theory is that an LLM can integrate the knowledge of Wikipedia and become something greater than the sum of its parts by applying reasoning that is explained in one article to situations in other topics where the reasoning might not be so well explained. Then you can ask naive questions in new scenarios where you might not have the background knowledge (or simply the mental energy) to figure out a right answer to a situation on your own but it powers through for you. AFAIK current LLMs are not this abstract. If one type of reasoning is more often applied in one scenario and another type of reasoning is applied to another, they don't have any context beyond the words and they know what topics humans usually jump to from other given topics.
There was another paper out recently that adds to this: https://arxiv.org/pdf/2308.04430.pdf. Looks like a more flexible approach to document storage, and it outperforms retrieval in context.
They trained up their own LLM, but from the text it seems like it might be possible to use any LLaMA-style LM without retraining. Not sure though, need to give it a proper look.
It should provide links to the the source with the relevant content, to check the exact text:
> You'll need to wait 20-30 seconds (depending on your machine) while the LLM model consumes the prompt and prepares the answer. Once done, it will print the answer and the 4 sources it used as context from your documents
You will have noticed, in that first sentence, that it may not be practical, especially on an Orange Pi.
Yes, especially having fact checked output of LLMs would be a nice step in the right direction. Throwing out the hallucinated bits and keeping the good stuff would make LLMs a lot more applicable.
Isn't that a bit of a holy grail though? If your software can fact check the output of LLMs and prevent hallucinations then why not use that as the AI to get the answers in the first place?
Because you - hopefully - have a check against something that is on average of higher quality than the combined input of an LLM.
I'm not sure if this can work or not but it would be nice to see a trial, you could probably do this by hand if you wanted to by breaking up the answer from an LLM into factoids and then to check each of those individually, and to assign a score to them based on the amount of supporting evidence for the factoid. I'd love that as a plug-in to a browser too.
https://arxiv.org/pdf/2308.04430.pdf is interesting from that point of view. They've tackled it from the perspective of avoiding copyright content in training, but including it in inference but I think it ought to mean less hallucination because they also (claim to) solve the attribution problem.
My hypothesis is that including information in the LLM’s prompt to support its answer changes the task roughly from text generation, very hallucination prone, to text summarization, or reformulation with some reasoning, and this is less likely to hallucinate.
That was my personal experience in general with ChatGPT as well as LLaMa1/2.
What is the current state of "correspondence between fed text and output text"? I.e. how much, when fed in the training e.g. that "Spain invested 6000$ in Columbus' first voyage", LLMs will repeat that notion exactly?
This, without taking into account reasoning and consistence. And already this notion that I picked randomly is not without issues: dollars how computed? And, it is not difficult to state that Columbus reached the Caribbeans in 1492; more complex to "decide" the year of the siege of Troy out of the many dates proposed.
But already at the simplified level of determined clear notions: if LLMs are told that "A is B", and in absence of inconsistency in the training corpus, what is the failure rate (i.e. then outputting something critically different)?
> ways to check up
Some LLMs work as search engines, outputting not just their tentative answer but linked references. A reasonably safe practice at this stage is to use LLMs that way: ask then use the output to check the reference.