do you know that LLMs operate on text and don't have any of the sensory input or relevant training data? you're just handwaving away 99.9% of the work and declaring it solved. of course what you're talking about is possible, but you started this by stating that cooking is easy for an LLM and it sounds like you're describing a totally different system which is not an LLM