Hacker News new | past | comments | ask | show | jobs | submit login

Yup yup! This was the screenscraping technique we used to turn Facebook into an automatic face detector: https://arxiv.org/abs/1602.04504

It's a giant pain to screenscrape this using 'curl'. If I recall correctly, the bounding box coordinates I wanted are set as CSS properties inside inline HTML sent to the client wrapped up in a Javascript string literal as part of Javascript served to the client as the result of an AJAX call, if memory serves correctly. To get my screenscraper working, I had to do the AJAX call, parse the literal javascript, walk the AST to find the string literal I needed, parse the HTML to find the element I needed, then use the computed CSS properties. Looks like the author of this post found a much nicer way.

(note: that work wasn't about recognition; it was about just finding the faces in images, not identifying them)




I'd really like to read your paper. Everywhere I've found it referenced has a paywall. Is the full text freely available anywhere?


Click "PDF" on the right side and you get https://arxiv.org/pdf/1602.04504v1.pdf


There's a little-known way to solve that problem for any paper: post the link to /r/scholar, and you'll get the full pdf within a couple hours.


I wouldn't call it 'little-known', it's become quite "famous" in my academic circles at least.

Just make sure you look up the DOI on libgen.


Sci-hub doesn't have it?


That's a link to arXiv which the biggest publicly available preprint server. That means you can get it for free legally.


Have you quantified the number of people per account that FB is giving a suggested label to a detected face vs a users number of friends? It'd be interesting to see how FBs classifier performs.


Hm. I haven't quantified that, no. I think FB will only suggest labels that it's already very confident of.


I'd assume the they would be thresholds but the statistics would be interesting.


Yes if you see the code, I extract the JSON object from response by removing the for(;;); in the beginning and then parse the json to hash and flatten it to array of strings. As there is only 1 value that will have the html content, I found this by using some css selector of the content and then parse the html and used unique css selector class for the names of the users.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: