Supercharge your Computer Vision models with the TensorFlow Object Detection API

yamaneko · on June 17, 2017

Their repository is pretty neat! It includes three state-of-the-art architectures in object detection: Faster-RCNN, RFCN, and SSD. It is missing YOLO [1][2], though, which shares some similarities with SSD. Another detector is the recently released Mask-RCNN [3], which of course wouldn't be possible to be included in this publication as we can't travel through time yet.

[1]: https://arxiv.org/abs/1506.02640

[2]: https://arxiv.org/abs/1612.08242

[3]: https://arxiv.org/abs/1703.06870

mrplank · on June 17, 2017

There are already newer versions, Yolov2 and DSSD. See http://github.com/sbrugman/deep-learning-papers

In practice Faster R-CNN worked better for me than YOLOv2 as it, in contrast to what is reported in the paper, had a higher recall for the detect task I used it for.

pveierland · on June 17, 2017

"Speed/accuracy trade-offs for modern convolutional object detectors" seems to establish that Faster R-CNN beats R-FCN and SSD-type architectures in accuracy, however YOLOv2 can beat Faster R-CNN and R-FCN in speed, while retaining high accuracy.

elliottcarlson · on June 16, 2017

So, could you use this to solve the image recognition captcha's that ask you to select all images that contain [object]?

glup · on June 16, 2017

Maybe they will make you do a Captcha before you access the API?

gregable · on June 16, 2017

LOL. The API requires you to first identify all of the objects in a different picture.

andrewrice · on June 16, 2017

That's how they train the API!

genkimind · on June 17, 2017

No, other people ARE the API!

obstinate · on June 17, 2017

It's a ponzapi scheme.

gumby · on June 18, 2017

You're saying it's a kind of soylent green of APIs?

EGreg · on June 16, 2017

So then you bootstrap that using another api key :)

ijidak · on June 16, 2017

It's captchas all the way down. :)

fooker · on June 17, 2017

Mutual recursion ;)

mee_too · on June 16, 2017

They can make you (or your customers) solve a captcha before each API call.

cjhanks · on June 16, 2017

At that point the data will have likely gone full circle. So, maybe.

-- Edit: Strike that. They're not actually providing any model data afaict. I assumed this was comparable to AWS's offering.

spullara · on June 16, 2017

They have pretrained weights from the COCO dataset included with the open source models.

https://github.com/tensorflow/models/tree/master/object_dete...

polskibus · on June 16, 2017

Is this a new Google API for use through their cloud offering or is it a set of tensorflow artifacts one can download and use freely without ever contacting Google Cloud?

spullara · on June 16, 2017

It has been added to the TensorFlow github repository like Inception. You can use it completely independently from Google.

https://github.com/tensorflow/models/tree/master/object_dete...

azernik · on June 16, 2017

Clicking through the two layers of links, it is a GitHub repository containing pre-trained models, training scripts, and scripts for running the models on Google Cloud: https://github.com/tensorflow/models/tree/master/object_dete...

radarsat1 · on June 17, 2017

Holy moly, I can't believe I didn't know about https://github.com/tensorflow/models

matt4077 · on June 16, 2017

Yes. Both.

zitterbewegung · on June 16, 2017

So they are launching all of these frameworks targeted to mobile but what's happening to Tensorflow Lite ? I'm beginning to think that these things that they are releasing are scaffolding for this . I really hope it's not going to be vaporware from google I/O

wyldfire · on June 16, 2017

I missed I/O -- what's particular to Tensorflow Lite? Is that distinct from the CPU target?

kyrra · on June 16, 2017

Mobile focused version of tensorflow.

haimez · on June 17, 2017

Lol. Parallel data computations across resource (including battery) constrained devices? Good news, the owner of the device is now the product. The device is also the product. Can't wait.

oh_sigh · on June 17, 2017

I'm going to guess that Google knows a thing or two about mobile devices and their performance characteristics. Also, feeding something through an already trained NN can be pretty darn performant. I'll wait and see what this ends up looking like, but I am hopeful.

wyldfire · on June 17, 2017

Many SoCs have under-utilized DSPs that can be used for tensorflow.

E.g. https://www.qualcomm.com/news/onq/2017/01/09/tensorflow-mach...

dgacmu · on June 16, 2017

It's not vaporware. (It's not released yet, but it's not vaporware.) (blah blah this is not an official statement blah blah)

matt4077 · on June 16, 2017

Finally I'm getting the results for all those traffic sign CAPTCHAS I've been solving.

(And I just noticed I should not have include the post as part of the sign–sorry for any inaccuracies I may have caused)

koolba · on June 16, 2017

Anyone know of a sample app that uses this?

Say to detect if something is or isn't a hot dog?

fosk · on June 16, 2017

Yes, here you go: https://itunes.apple.com/us/app/not-hotdog/id1212457521

etaioinshrdlu · on June 17, 2017

https://nothotdog.io

accountyaccount · on June 16, 2017

This would be great to run a security camera still feed through. It could completely eliminate false positives.

kyrra · on June 16, 2017

I wonder if Nest is using it with their new cam[0], as it has person alerts now (with face detection).

[0] https://nest.com/camera/meet-nest-cam-iq/

spullara · on June 16, 2017

In the research blog entry they do say they are using it in the new Nest cams.

https://research.googleblog.com/2017/06/supercharge-your-com...

halflings · on June 16, 2017

They would probably use FaceNet[0] then, if they only want to detect faces, as that should give better results.

[0] https://arxiv.org/abs/1503.03832

odbol · on June 17, 2017

Except they wouldn't be able to detect people wearing masks, which is probably an important thing for a security camera to do...

jd20 · on June 16, 2017

We still have a ways to go, to completely eliminating false positives, but these tools will help us get there. For example, you can recognize different types of objects now but we still need to figure out which are meaningful or not (like a person or animal vs a tree blowing in the wind). Even certain classes, some are benign while others or not, for example pedestrians walking by the front of my house versus a guy wearing ski mask fiddling with my window, they're both people, but their behavior is what separates them.

accountyaccount · on June 16, 2017

Even a confidence level would go a long way. If I get a notification that says "motion detected" I have to look at it, but if it said "motion detected, person with 75% confidence" that suddenly becomes much more valuable.

KGIII · on June 17, 2017

Digital cameras pick up different parts of the spectrum. My curiosity is raised. Could that be used to increase confidence levels?

monkeydust · on June 21, 2017

I recently came across a company that's built a ML model to track feet (for footfall observations). It seems that if you had an appropriate training set (labelled feet) you could re-create what they have done with this technology. Perhaps not achieving state-of-art but close. Thoughts?

sharemywin · on June 16, 2017

They need some kind of context input.

-GPS position, intent/goal, ___domain etc.

I'm at a dog show I would want breed etc.

I'm on the street I just want it come back dog maybe dangerous dog, friendly dog.

Also, would be cool/scary to just get back movable object 1, person 1, living movable object 3 etc. and if I give it multiple scenes from a video it knows person 1 is the same person 1 and if I name (them) Tony it keeps tracking tony.

asciimo · on June 16, 2017

> I'm on the street I just want it come back dog maybe dangerous dog, friendly dog.

Most autonomous humans ship with this capability.

BrianHenryIE · on June 16, 2017

But not all.

https://www.google.com/search?q=blind+man+bitten+by+dog

neuronexmachina · on June 16, 2017

I imagine you could use the confidence-value output of the object-detection API as input into a separate system that would also incorporate the other inputs you mention.

Omnipresent · on June 17, 2017

Would it be able to detect textual regions in an image as it depics kite/persons in the example image?

vivekrathod · on June 17, 2017

Yes, if you train those models using a dataset with box annotations. A more relevant model if you want to transcribe the text : https://github.com/tensorflow/models/tree/master/attention_o...

mlaretallack · on June 16, 2017

Just spent the last 6 months making anpr camera. Now just need to put Python on it. Fun times.

TuringNYC · on June 17, 2017

My sentiment exactly. For my full-time startup, we've been trying, testing (many), and productionizing (one) object detection network for the past nine months. It was a tedious effort of implementing papers from last year's CVPR conference. This makes some of our MOJO go away, but in the scheme of things we can focus more closely on our business. Mixed bag.

nzjrs · on June 17, 2017

What's the hype here. It's a curated model zoo, or?

pveierland · on June 17, 2017

The researchers have created a framework for object detection such that one can easily experiment with using different feature extraction networks, separated from the "meta-architecture" such as Faster R-CNN, R-FCN, or SSD, used to handle the object detection task. They compare many models using this framework, described in https://arxiv.org/abs/1611.10012 - and they were able to construct the winning entry of the COCO 2016 detection challenge based on this research.

throwaway321373 · on June 18, 2017

This doesn't seem to include training scripts ?

Drdrdrq · on June 16, 2017

I can't find the license, anyone have better luck?

advisedwang · on June 16, 2017

The root of the repo has Apache license 2.0.

nostrademons · on June 16, 2017

Basically everything Google releases is Apache 2.0. It was company policy when I was there.

Joboman555 · on June 17, 2017

Anyone know what license this is under?

aw3c2 · on June 16, 2017

Direct link https://research.googleblog.com/2017/06/supercharge-your-com...

artursapek · on June 16, 2017

Admins update submission please

impish19 · on June 17, 2017

https://news.ycombinator.com/item?id=14562314