I have wondered about this myself... and I am working on an CV ML project myself... but data that is correctly predicted is not not as useful.. what you want are edge cases, images with low confidence, and errors. So when users take over and disengage, that is useful data. when the system beeps and forces a disengagement.. this is the data you need. That has been my experience and I'm glad to hear them validate that. Your training data needs diversity.