> Vision. Computer vision keeps getting better. Depth sensors are widely available. Interpretation of 3D scenes kind of works. A decade ago, the state of the art was aligning an IC over the right spot and a board and putting it in place.
I disagree, at least with this as evidence for your 5 year timeline - computer vision has been improving, yes, but nothing earth shattering in the last 5 years that I've seen. We've seen good incremental improvements over 30 years here but they don't seem to be approaching "good enough" yet, at least not in a way that would give me confidence we're at an inflection point. Most of the most recent interesting improvements have been in areas that don't push the boundaries - they make it easier to get closer to state of the art performace with less - fewer sensors, less dimensional & depth info, etc. But state of the art with expensive multiple sensor setups isn't good enough anyway, so getting closer to it isn't going to solve everything.
Same with the 3D scene stuff still people have been plugging away at that for 30 years and while I think some of the recent stuff is pretty cool, still has a long way to go. Whenever you start throwing real world constraints in the limitations show up fast.
> They make it easier to get closer to state of the art performace with less
Which gets us, for example, cost-effective robotic weeding, and sorting of recyclables. When each sensor only needs about a smartphone's worth of processing capacity, and cameras are cheap, they can be applied in bulk to mundane tasks.
Sure, there are applications where it is a real benefit. Typically (like your examples) where we can manipulate the environment to work around the limitations of the technology. This is a good thing! When the tech gets cheap, it’s easier to apply more broadly.
However it doesn’t really speak to your contention. This is an example of doing less than state of the art perception for much cheaper, but to meet your goal (5 years or otherwise) we need to significantly improve the state of the art.
> computer vision has been improving, yes, but nothing earth shattering in the last 5 years
I totally and completely disagree. Sure, "computer vision" industrial cameras doing edge detection haven't changed much, but the computer vision my phone can do is many orders of magnitude better today than it was 5 years ago.
There's tools now that can take a short video of your bookcase and identify every book. That's serious progress!
Breaking down video into tokens for large language models and asking for structured data out. That's ground breaking compared to any non-LLM style machine vision.
I agree there is cool stuff going on in vision, absolutely. But I wasn’t taking about the field in general.
I just don’t think it moves the needle significantly in this particular area. For example, structured data out of a single camera is way better than it was 5+ years ago, but it isn’t as good as a dedicated multi sensor setup (ie state of the art for robotics) and that in turn isn’t good enough for the problems in GP post - which was the point.
I disagree, at least with this as evidence for your 5 year timeline - computer vision has been improving, yes, but nothing earth shattering in the last 5 years that I've seen. We've seen good incremental improvements over 30 years here but they don't seem to be approaching "good enough" yet, at least not in a way that would give me confidence we're at an inflection point. Most of the most recent interesting improvements have been in areas that don't push the boundaries - they make it easier to get closer to state of the art performace with less - fewer sensors, less dimensional & depth info, etc. But state of the art with expensive multiple sensor setups isn't good enough anyway, so getting closer to it isn't going to solve everything.
Same with the 3D scene stuff still people have been plugging away at that for 30 years and while I think some of the recent stuff is pretty cool, still has a long way to go. Whenever you start throwing real world constraints in the limitations show up fast.