Their repository is pretty neat! It includes three state-of-the-art architectures in object detection: Faster-RCNN, RFCN, and SSD. It is missing YOLO [1][2], though, which shares some similarities with SSD. Another detector is the recently released Mask-RCNN [3], which of course wouldn't be possible to be included in this publication as we can't travel through time yet.
In practice Faster R-CNN worked better for me than YOLOv2 as it, in contrast to what is reported in the paper, had a higher recall for the detect task I used it for.
"Speed/accuracy trade-offs for modern convolutional object detectors" seems to establish that Faster R-CNN beats R-FCN and SSD-type architectures in accuracy, however YOLOv2 can beat Faster R-CNN and R-FCN in speed, while retaining high accuracy.
Is this a new Google API for use through their cloud offering or is it a set of tensorflow artifacts one can download and use freely without ever contacting Google Cloud?
So they are launching all of these frameworks targeted to mobile but what's happening to Tensorflow Lite ? I'm beginning to think that these things that they are releasing are scaffolding for this . I really hope it's not going to be vaporware from google I/O
Lol. Parallel data computations across resource (including battery) constrained devices? Good news, the owner of the device is now the product. The device is also the product. Can't wait.
I'm going to guess that Google knows a thing or two about mobile devices and their performance characteristics. Also, feeding something through an already trained NN can be pretty darn performant. I'll wait and see what this ends up looking like, but I am hopeful.
We still have a ways to go, to completely eliminating false positives, but these tools will help us get there. For example, you can recognize different types of objects now but we still need to figure out which are meaningful or not (like a person or animal vs a tree blowing in the wind). Even certain classes, some are benign while others or not, for example pedestrians walking by the front of my house versus a guy wearing ski mask fiddling with my window, they're both people, but their behavior is what separates them.
Even a confidence level would go a long way. If I get a notification that says "motion detected" I have to look at it, but if it said "motion detected, person with 75% confidence" that suddenly becomes much more valuable.
I recently came across a company that's built a ML model to track feet (for footfall observations). It seems that if you had an appropriate training set (labelled feet) you could re-create what they have done with this technology. Perhaps not achieving state-of-art but close. Thoughts?
I'm on the street I just want it come back dog maybe dangerous dog, friendly dog.
Also, would be cool/scary to just get back movable object 1, person 1, living movable object 3 etc. and if I give it multiple scenes from a video it knows person 1 is the same person 1 and if I name (them) Tony it keeps tracking tony.
I imagine you could use the confidence-value output of the object-detection API as input into a separate system that would also incorporate the other inputs you mention.
My sentiment exactly. For my full-time startup, we've been trying, testing (many), and productionizing (one) object detection network for the past nine months. It was a tedious effort of implementing papers from last year's CVPR conference. This makes some of our MOJO go away, but in the scheme of things we can focus more closely on our business. Mixed bag.
The researchers have created a framework for object detection such that one can easily experiment with using different feature extraction networks, separated from the "meta-architecture" such as Faster R-CNN, R-FCN, or SSD, used to handle the object detection task. They compare many models using this framework, described in https://arxiv.org/abs/1611.10012 - and they were able to construct the winning entry of the COCO 2016 detection challenge based on this research.
[1]: https://arxiv.org/abs/1506.02640
[2]: https://arxiv.org/abs/1612.08242
[3]: https://arxiv.org/abs/1703.06870