Hacker News new | past | comments | ask | show | jobs | submit login
Undocumented Facebook API to identify friends in photos (narenonit.blogspot.com)
215 points by vanwilder77 on June 4, 2016 | hide | past | favorite | 34 comments



Yup yup! This was the screenscraping technique we used to turn Facebook into an automatic face detector: https://arxiv.org/abs/1602.04504

It's a giant pain to screenscrape this using 'curl'. If I recall correctly, the bounding box coordinates I wanted are set as CSS properties inside inline HTML sent to the client wrapped up in a Javascript string literal as part of Javascript served to the client as the result of an AJAX call, if memory serves correctly. To get my screenscraper working, I had to do the AJAX call, parse the literal javascript, walk the AST to find the string literal I needed, parse the HTML to find the element I needed, then use the computed CSS properties. Looks like the author of this post found a much nicer way.

(note: that work wasn't about recognition; it was about just finding the faces in images, not identifying them)


I'd really like to read your paper. Everywhere I've found it referenced has a paywall. Is the full text freely available anywhere?


Click "PDF" on the right side and you get https://arxiv.org/pdf/1602.04504v1.pdf


There's a little-known way to solve that problem for any paper: post the link to /r/scholar, and you'll get the full pdf within a couple hours.


I wouldn't call it 'little-known', it's become quite "famous" in my academic circles at least.

Just make sure you look up the DOI on libgen.


Sci-hub doesn't have it?


That's a link to arXiv which the biggest publicly available preprint server. That means you can get it for free legally.


Have you quantified the number of people per account that FB is giving a suggested label to a detected face vs a users number of friends? It'd be interesting to see how FBs classifier performs.


Hm. I haven't quantified that, no. I think FB will only suggest labels that it's already very confident of.


I'd assume the they would be thresholds but the statistics would be interesting.


Yes if you see the code, I extract the JSON object from response by removing the for(;;); in the beginning and then parse the json to hash and flatten it to array of strings. As there is only 1 value that will have the html content, I found this by using some css selector of the content and then parse the html and used unique css selector class for the names of the users.


I've never liked Facebook's "do you want to tag your friend" feature. It's a loaded question, like... have you stopped beating your wife? However you answer it, you've given Facebook feedback about their facial recognition.

If I hit yes, I'm tagging friends who might not want to be tagged. Furthermore, I might end up in the same boat with friends tagging pictures of me! Either way, I help better Facebook's facial recognition, which unnerves me.

On the surface, clicking "no" means that they got the facial recognition wrong. But what am else am I revealing? If the match was 98%, would they infer that one of us (or both) is concerned about privacy? That we have something to hide?

The third alternative is to click nothing. The only information that gives Facebook is that I'm not interested in helping curate their data any more than I already am.


That's a good point and it also brings to mind a way to fight it.

This http://arxiv.org/pdf/1412.1897v4.pdf could be one way. You could generate images which are not of someone who wants to protect his privacy, but tweak them to strongly correlate to an image of that person by the neural net, whereas it could be a picture of anything you have just optimized (the images could be just noise, or another person, etc). You could generate batches of those and upload them, and confirm to Facebook that they are indeed photos of the person who wants their privacy back. You could repeat this (automatically) and corrupt the weights in FB's neural net, which would overcome their face detection abilities for the individual in question.


Interesting paper, but I'm not sure that's entirely practical.

If I were Facebook, I'd look at the entropy of an image before attempting to classify it. Anything particularly high (noise) or low (squiggles) would be discarded before running through the classifier.


So, why do you still use Facebook again? FOMO > privacy?


For now, it's still the simplest and easiest way to keep in touch with my family and friends. I tried living like I'm a CIA agent for a while, True Crypting everything, disconnecting from social networks, and I came to one conclusion:

I'm still just as vulnerable to attack, if not more (ie: trying to host my own email service). The only difference is I'm markedly more isolated from my friends. It simply wasn't worth the trade off.

It's similar to people who get ridiculed for stockpiling guns and food in underground bunkers for the coming War-for-Independence 2.0. Sure, I could become a "digital prepper" and survive the data-pocalypse--likely at the cost of my relationships.

It's not like a "10 Cloverfield Lane" lifestyle is particularly appealing to me, either. I'll live with my friends for now, with only a mild sense of paranoia.


Same reason everyone used Windows in the workplace in the 90s/00's? Network effect = no reasonable alternative, and abstinence means you become a pariah.

I say this as someone who has exercised the option to avoid Facebook entirely for the past 7 years (including noscript+faceblockers for the pervasive thumbs-up buttons.


The title (and half of the post) seems to say that there is some undocumented API that can do more than what the actual Facebook service does with its regular usage. This does not seem to be the case.


This is great! I tended to favor this technique for testing during my time in QA automation.

People gravitate towards Selenium too quickly, when you really only need Selenium to test rendering.


Is there a way to get the position (coordinates) of the face as well, just like it is on Facebook?


This is scary.


It's not that bad.

> I found this API has a limitation it only suggests people in the photo who are in your FB friend list


Right. That's Facebook's compromise on privacy. EPIC is suing them over even that.[1]

Face.com did facial recognition on Facebook with no limits back in 2010.[2] Then Facebook bought them and killed the broad face recognition.[3]

There are other face recognition companies now, but they're keeping a lower profile about how broad their database is. Except for Findface[4] in Russia. They loaded up the entire photo database of Vcontact, a social network in Russia. 70% success in identifying random young people on the St. Petersburg metro.

[1] https://www.epic.org/privacy/facebook/Facebook-Biometric-Rul... [2] https://web.archive.org/web/20100701083622/http://face.com/ [3] https://web.archive.org/web/20120723211743/http://face.com/ [4] http://findface.ru/


I tested Findface and it was unable to find my Vkontakte account, despite a copious amount of photos on my account there. It did, however, find a huge number of Russians that I would say look exactly like me. Also, strangely, a couple of Vkontakte accounts that had Agent Gibbs as profile pics. It's a pretty cool app!


It found mine, with two different starting pictures. It also found abou 60 Russian dudes who my wife said looked nothing like me.


Yeah, OK, now think about the version Facebook (and thus law enforcement, intelligence agencies and advertisers) has access to.


Why would FB share any of this with advertisers?


For money.


They don’t need to share this for money. Advertisers don’t want your data, they want accurate targeting.


I believe them when they say they don't share, but are you sure about that? Even advertisers that already have my full name and some data, they wouldn't want to link with what Google and Facebook have?

To be concrete, my local supermarket and Amazon have my full name, and partial purchase information (the supermarket via a loyalty card). Both have ways to contact me with promotions. You don't think they'd like to know what I like on Facebook (if I did that) or what I search on Google that might suggest purchase intent?

I think they'd love that data, the platforms just make more money by only allowing them to target with it.


off the top of my head so that it could coordinate the data facebook has with the data in a store discount card database


Why would that version be any better than the one that is generally available?

The only difference that I imagine exists is being able to search from the pool of "everyone" rather than "your friends", but that would make it less confident. In the law enforcement version, there would probably be a way to anchor the search to another individual just to improve accuracy to an acceptable level.


I can't imagine it would be that good? I suspect limiting the search space to your friend list (and probably the part of your friend list that you're ever actually likely to meet) is the only way to keep these results even a little accurate.


Uhm... Duh?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: