You should look into the history of Itanium, it was designed around the idea that the compiler would do pretty much exactly that. It looked great on paper, but in practice nobody figured out how to actually write a compiler capable of doing it without constantly running into weird edge cases.
X86 does have "prefetch" instructions, which tell the CPU that you want to use some data in the near future. There are also "branch hint" instructions which tell the CPU if a branch is usually taken or not. The problem is that they tend to make your code slower: the CPU is already more than capable of predicting it by itself, and the extra instructions slow down the overall execution because they take up cache space and have to be decoded too.
VLIW is pretty successful for loopy DSP and now AI/ML. However, Itanium was trying to work for general purpose code and then as you say, it constantly ran into weird edge cases. It seemed as if VLIW succumbed to the Peter Principle, it had risen to its level of incompetence. But as long as you use it appropriately, VLIW is a best practice and LLVM supports it.
BTW, CS252 at Berkeley and Onur Mutlu's ETH lectures give a conventionally disparaging view of VLIW without pointing out its successes.
Adding on, VLIW is only successful in AI/ML because GPUs are incapable of doing branching in a good way let alone prediction. I would guess the same story applies to DSPs. If someone figures out how to stick a branch predictor in those pipelines Im guessing the VLIW nature of those platforms will disappear overnight.
The defining character of VLIW is to have the brilliant compiler software schedule dumb parallel hardware instructions statically and then not depend on power/transistor expensive dynamic branch prediction and OOO execution.
In a perfect VLIW world that would mean you don't spend any transistors or power on branch prediction or out of order instruction searches. Indeed the original VLIW paper [1] spends vastly most its paragraphs on solving the (hard) compiler instruction scheduling problem with trace scheduling which is still used. The VLIW hardware itself is dead simple.
So if VLIW fits the problem it has fantastic performance characteristics. If it doesn't fit, and far and away most don't, then VLIW is terrible. VLIW is very brittle.
I need to make a caveat about the Mill CPU which is a VLIW but I see I've written too much already.
X86 does have "prefetch" instructions, which tell the CPU that you want to use some data in the near future. There are also "branch hint" instructions which tell the CPU if a branch is usually taken or not. The problem is that they tend to make your code slower: the CPU is already more than capable of predicting it by itself, and the extra instructions slow down the overall execution because they take up cache space and have to be decoded too.