Hacker News new | past | comments | ask | show | jobs | submit login
Detecting Deepfake Video Calls Through Monitor Illumination (unite.ai)
67 points by Hard_Space on July 8, 2022 | hide | past | favorite | 29 comments



I was surprised to learn children figured this out over the pandemic in the form of shining a light at their webcam to see who has their window "pinned" among the class.


Hah, creepy but I guess understandable behavior for teenagers. Funny how everyone became a streamer for a few months..


I have a really bad feeling about this technique.

Change your webcam output to 3 fps, mumble something about a bad connection, and Bob's your uncle.

But more fundamentally:

If I am deepfaking my appearance on video call, I have full control of my machine. I thus can (with negligible lag) just grab the screen content. Once I have that, it's really not too much work to write a filter projecting this screen onto the deepfake image.

In other words: as a 'security by obscurity' method this might have worked for a while, with this now being public knowledge, I expect deepfake software to defeat this within the year, as this really is not a computationally or architecturally demanding change. It is ultimately just a filter, corresponding to screen content, mirrored and super-blurred. That shouldn't have any discernible lag with a modern computer.


The next 10 years of software dev will be defined by the arms race of fraud generation (deepfakes, GAN images, GPT-X bots) and detector bots. What will end up happening is the detectors will catch the bottom 99% of fraudulent content, leaving the top 1% free reign over the dataspace.


And then there's the impact of false positives. You could even get a situation like we have with some CAPTCHAs where the best fakes are "hyperreal" and actually detect as less suspicious than many real images.


I personally like that deepfakes exist. Anything can be real and anything can be a lie. Plausible deniability for all.


Indeed. And it forces everyone to think critically. No pictures, audio or video online are trustworthy by themselves.

Even in science, you can never be completely sure of anything. "Sure enough" for purposes of human life, perhaps.

But you can invalidate anything:

https://en.wikipedia.org/wiki/Karl_Popper


Potentially resulting in even more division and radicalisation. The human tendency is to judge evidence strongly subjected to priors. Does the evidence allow me to keep my beliefs, or does it force me to change them. Evidence of questionable authenticity will be believed if it reinforces the priors, and disbelieved if it contradicts them, thereby increasing the divergence of held beliefs.


If you want to find truth, and you understand the nature of reality and perception, it will make you want to seek out counterexamples to your beliefs.

I am thankful for anyone pointing out inconsistencies between my own statements and reality. It will let me learn more easily, and refine my beliefs.


Yeah, there's a difference though between these aspirations - which I happen to share - and what is a statistically observable human bias that ought to shape our expectations of what's realistically going to happen.


You are right. I apologize for unconsciously trying to divert the conversation.

Indeed, I now realize what you are saying is more people will get more material for their confirmation bias.


Lol, no need for apology here! You have given me no reason to doubt that you are exchanging viewpoints in good faith.


>I am thankful for anyone pointing out inconsistencies between my own statements and reality

You will find there are people that are not appreciative of that and are willing to go as far as ending your life if you go too far in pointing out their faults.


> And it forces everyone to think critically

Overly optimistic.


We're not there yet, unfortunately. There has to be large scale public scandal with unmistakable proof of deepfaking for the public to comprehend the reality we are in.


The usual problem is of course, whether that plausible deniability will result in everyone being given equal benefit of doubt, or if will be (ab)used for selective enforcement.


The model here is about using a challenge window with an illumination sequence specifically designed to change faster than can be incorporated into live deepfake models on the fly without creating telltale artifacts from improper approximation of the illumination effects on the subject. Correctly faking the effects of monitor illumination is a non trivial challenge for well-chosen illumination patterns.


I read the article.

I was proposing to skip the (computationally expensive) step of approximation of the illumination effects on the subject within the live deepfake model, and just do a (computationally cheap) filter over the webcam image to be sent out.

I was saying 'correctly faking' might not be necessary, so let's fake it incorrectly. Of course I haven't tested it, but from reading the article, this seems to be enough to fool their detection model.

If I misunderstood this, please correct me, I am eager to learn. But please don't just repeat the article back at me. Thank you.


> Correctly faking the effects of monitor illumination is a non trivial challenge for well-chosen illumination patterns.

Surely not.

If you have a face in a specific orientation and lighting the whole point of deep fakes is that it generates “style transfer” of the incoming image, including lighting.

Doing it in real time is a computational restriction that makes it expensive, not difficult.

Are you really suggesting that the deepfake models can’t accurately reproduce structured light and shadow patterns?

I find that very surprising, and not what the paper says (“ We verified that the deep fakes created by Avatar- ify (github.com/alievk/avatarify-python) do not incorporate the environmental lighting and are therefore easily identifiable because in the presence of our active illu- mination, their temporal facial hue is flatlined with a nearly zero correlation.” <— they didn’t even test it except on a system that doesn’t do lighting transfer).

…so, this feels a lot more like “well, if you’re doing a crappy off the shelf meme using deepfakelive it’s pretty obvious” rather than “detect a convincing DF”.

The approach probably doesn’t scale against more sophisticated models or opponents with adequate resources.


> If I am deepfaking my appearance on video call, I have full control of my machine. I thus can (with negligible lag) just grab the screen content. Once I have that, it's really not too much work to write a filter projecting this screen onto the deepfake image.

The article already addresses this, in its fourth paragraph:

> The theory behind the approach is that live deepfake systems cannot respond in time to the changes depicted in the on-screen graphic, increasing the ‘lag’ of the deepfake effect at certain parts of the color spectrum, revealing its presence.


I was proposing to skip the (computationally expensive) step of approximation of the illumination effects on the subject within the live deepfake model, and just do a (computationally cheap) filter over the webcam image to be sent out.

I was saying 'correctly faking' might not be necessary, so let's fake it incorrectly. Of course I haven't tested it, but from reading the article, this seems to be enough to fool their detection model.


What if you have a good lighting setup and your camera isn't attached to your monitor?

I do video calls with a SLR connected to my laptop, so the screen isn't directly in front of me (it's about 15 degrees off centre, to the left of the camera), it's further away than it would be if I was using the laptop's camera, and the LED lights more than overpower any light coming from the monitor. Seems like this would assume I'm fake since it wouldn't be able to project enough light on me and it would be coming from the "wrong" place if it could.


You have more than enough time even at 60fps to map an illumination texture on top of your generated video, even if you map it on a rough 3d model. And you can even add artificial latency to give yourself an extra 10 or 20ms.


That is what I proposed above, but very few seem to have understood it. Thanks for a different way of phrasing. Much appreciated.


Unless there is a fundamental breakthrough in science somehow, these kind of techniques to detect deepfakes will in turn be circumvented in the deepfake algorithm itself. It’s almost impossible to use these techniques properly because of the adaptive bandwidth techniques in video streaming. In the interim the only way to use these techniques is to not reveal them, make them proprietary and only keep them to verify. I am sure nsa will do that exactly.


On a related note, monitors do a great job of providing a nicely diffuse light source if you need a bit more pop in a dark environment. I just did this yesterday with a 5am call with an office on a 12hr time difference. My desk lamp was doing fine for the left side of my face, fired up a full screen text editor in the monitor on my right and tada!


Is it the case that deepfakes are really only possible in low resolution video? That as resolutions move on to the next degree you cannot create a deep fake while still preserving high resolution?


It gets computationally more expensive.

At the moment, if you have some random laptop, yes, surely, only low-res is possible. If you have an A100 or something like that, mich higher resolution. It's just getting ever more computationally expensive, with a caveat.

The caveat is that you can use ML to upscale a low-res image to a higher-res one. This is computationally expensive as well, but less so than the deep-fake.

So the answer is , with increasing compute, no. Higher resolutions very much possible, given you have high quality training data.


Trust doesn't come from carefully observing video, bank note security elements, or stitching on a handbag. Trust forms in the supply chains.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: