FingerIO: Using Active Sonar for Fine-Grained Finger Tracking

parkaboy · on March 18, 2016

Regarding everyone's latency concerns, as someone who has done low-latency audio processing on Android -- in their defense I'd bet almost anything the demo is meant to only demonstrate the math behind this. Depending on the platform (Android cough), low latency audio processing can be almost a dark art itself. And hey look, they're doing this on Android.

My guess is that they decided to release the demo earlier instead of spending days/weeks getting up to speed with low-latency audio processing in the Android JNI.

It's an academic demo/press release. Not a software release for production/market.

rsp1984 · on March 18, 2016

I'm curious, besides doing things in C/C++, are there any "magic tricks" to doing low-latency audio processing on Android? Looking at the chart here [1] it still seems to be a good bit behind iOS.

[1] http://www.androidpolice.com/2015/11/13/android-audio-latenc...

parkaboy · on March 18, 2016

In the Java layer? About the only thing you can do is ensure that you're using the devices native sampling rate: typically 48 kHz for phones, and 44.1 kHz for tablets. Non-native sampling will induce a rather large latency hit. The buffering + buffer size stuff, unfortunately, is really only accessible in native IIRC.

To be completely honest, it's been a long time since I've messed with audio-stuff on the Java-side, so not sure if/how much things have changed for the better.

rsp1984 · on March 18, 2016

Thanks. I actually meant in addition to doing the computation side of things in C/C++, are there any undocumented tricks or pitfalls to avoid?

sehugg · on March 18, 2016

This is a good explainer of the various latencies in the Android audio pipeline: http://superpowered.com/androidaudiopathlatency

tonyarkles · on March 18, 2016

And a cool follow-up article: http://superpowered.com/android-marshmallow-latency

Last time I worked on Android Audio was around the 4.1-4.2 era, and it was absolutely brutal compared to iOS. Glad to see that it's improving finally!

jacobolus · on March 18, 2016

A lot of production input devices for stuff like waving hand gestures and pen input also have unacceptably high latency. Getting latency down is hard, and makes a huge difference to usability.

magic_man · on March 18, 2016

I work with sonar and the physical positioning of the sensors is important in trying to get useful results. Why is it these academic types don't release the apks or software? Just publications and maybe a video.

mattlutze · on March 18, 2016

You should try asking these academic types for the data and source code.

A lot of researchers are more than happy to discuss their work, but a big part of the academic industry is your research's impact and references. A way to get a better handle on who is looking at or following on with your work is to implicitly ask them to have a conversation with you before getting the whole kit 'n caboodle.

MawNicker · on March 18, 2016

These sorts of comments are why I come here. Thank you. Without prior awareness that something is common practice it can be extremely difficult to recognize it as an option. I simply lack the motivation for this behavior. If I were to release a teaser about my work it would be because I'm not ready to release my work.

phkahler · on March 18, 2016

I read an abstract for paper once (on debluring images) and could not find the original paper for free. I emailed the author about the situation, and got a full color hard copy in the mail shortly after. My best hope had been a .pdf by email, but he exceeded that by far. It's still on my shelf a decade later, while a .pdf probably would have gotten lost in the GBs.

Agustus · on March 18, 2016

Additionally, a number of these research programs take the source code and make a go of a small business product.

guyzero · on March 18, 2016

And in many cases the universities own the IP associated with the research that goes on in their departments so they keep the source and treat it like a company would treat a trade secret. It's likely Washington will patent this and try to license the patents.

sorenjan · on March 19, 2016

What's the legal situation with these academic research papers? Am I allowed to implement an algorithm from a research paper and then either sell the software or release it as open source?

I assume other academics are allowed to reimplement methods in order to reproduce the result and to compare to their own methods. Can I do the same as a learning exercise?

CamperBob2 · on March 18, 2016

I think we left the topic of "science" about 4 posts up. I don't know what's being described here, but it's not science. Yet somehow, I get the feeling that I'm paying for it.

zymhan · on March 18, 2016

> implicitly ask them

I think you mean "explicitly"

MawNicker · on March 18, 2016

No he means implicitly. By not providing everything the academic implicitly requests a conversation before providing everything. The request is never stated but is implied by the assumption that there is no other way to acquire everything.

zymhan · on March 18, 2016

Ah I misunderstood who was implicitly requesting the conversation, that makes sense.

MawNicker · on March 20, 2016

I actually assumed that before. The way you had quoted it was revealing. My comment reeks of nerd-rage now that I re-read it. Damn italics. I should have mentioned that it would be fair to say: The academic wants an explicit request for everything. I can see how it would have been easy to misinterpret his statement in this way. It's true. It's just not what was stated.

on March 18, 2016

[deleted]

Jtsummers · on March 18, 2016

If you're reading the paper you're aware of their existence. If you want to know more, you have to ask them, but they never directly (explicit) state this. It's implied and something of a cultural (academic/research culture) unwritten rule.

jgrahamc · on March 18, 2016

Did you ask? Many years ago I decided to reproduce an algorithm used to detect copy/paste image modification.

http://blog.jgc.org/2008/02/tonight-im-going-to-write-myself...

The researcher was happy to provide me with their test images to verify that my implementation worked. Try asking.

fudged71 · on March 18, 2016

Have you considered providing this tool as a service? It would be great to easily detect forgeries without CLI experience.

jgrahamc · on March 18, 2016

No. I wrote that 8 years ago, I've open sourced the code, anyone who wants to take it and run with it can. It's too much work for me to maintain a system that would do this for people.

db1024 · on March 18, 2016

It depends on the conference or journal they submit to. I typically request that authors release data and code in the review and the same is requested of me when I submit a paper for review. I don't know, maybe CHI doesn't have that sort of culture. Or maybe they do and these students just don't have the time right now and plan to do it right before the conference in May.

raphman_ · on March 18, 2016

Indeed, there is no culture of replicability at CHI (and even less so at UIST). Reviewers usually reward novelty and cool PoC videos, not thoroughness. It is quite rare (especially for U.S. labs) to also publish source code, schematics, or raw data. There have been some initiatives advocating for replicability, and some researchers indeed publish everything, but in the whole, a quick, shiny video of a PoC implementation is often sufficient for a paper to be accepted.

saulrh · on March 18, 2016

Having been in this kind of situation before: because the software isn't ready for any kind of release. They probably spent a week gathering and tuning the parameters on the DSP for each individual phone they ran it on, and without their knowledge of the system it'd take you a month and a half to get it working on your phone.

knughit · on March 18, 2016

If they are publishing a paper, their methods (including custom source code) are ready for release for peer review. Ftheit software sucks, thet should not rely on it for scientific results.

saulrh · on March 18, 2016

Here's a case that I have published under before: Their software is ready for their own use for research by them personally. However, it has no help, documentation, automation, magic, or even small amounts of assistance. It doesn't work until you have it spit out a page of numbers, which you feed into Matlab and hack on for a week before compiling a new copy of the app that has the resulting calibration matrix baked into the code. The instruction manual for the apk you're asking for would be equivalent to a bachelor's in computer science and half a PhD on DSP and machine learning. That's why they're not releasing it. It proves that their math works and the approach is valid. Replacing their expertise and making it fast enough and reliable enough for general release would require a startup, six developers three of whom need PhDs, a UX team, and a year and a half of work. The approach works and they can prove it. That's it.

nmrm2 · on March 18, 2016

Product development isn't actually the purpose of science.

jstapels · on March 18, 2016

I wonder how accurate it really is. The demo video didn't match up with the movements at all and the on-screen drawings looked like prerecorded video that they were trying to sync to.

It's a neat idea, but without a dedicated component or an extremely high-speed RTOS, you're not going to come close to the level accuracy that's really needed to do the math and still allow interaction.

I don't mean to rain on the parade, but I just don't think they really have anything usable.

CarVac · on March 18, 2016

To me it looked like a ton (two seconds or so) of latency, not pre-recorded video.

And I would expect latency for such heavy DSP work on a phone.

Mithaldu · on March 18, 2016

This is correct. The movements match up just fine, but the reaction time there is something above 2 seconds.

foobarqux · on March 18, 2016

They need to buffer to get a Fourier transform and to do the autocorrelation.

tobr · on March 18, 2016

They say they use an "inaudible high frequency soundwave", so that should be > 20kHz. Shouldn't a buffer of a few milliseconds be more than enough then?

foobarqux · on March 18, 2016

Presumably the buffer is longer to make the system more robust by avoiding spurious detections, not because of some fundamental limit like the Nyquist rate. You would need to set the buffer size experimentally.

aiabgold · on March 18, 2016

> I just don't think they really have anything usable.

I'm not sure that's the purpose. I wouldn't think of this as being intended to be a fully usable product right now. They could be intending it merely as an interesting experiment to explore new possibilities for interaction with mobile and wearable tech.

Another CS student came up with a virtual keyboard using the iPhone's accelerometer[1]. It only had ~80% accuracy[2], so was it all that useful or practical? Probably not. But could it lead to another person or company refining the technique for production in the future? Certainly.

[1] Video: https://vimeo.com/49780741

[2] http://www.gottabemobile.com/2012/11/13/cs-student-turns-iph...

melling · on March 18, 2016

I bet you have RSI within a week banging on a desk like that.

If it comes down to using something other than your fingers, someone has the nose working as a user input device:

http://www.looknohands.me

pbreit · on March 18, 2016

I would totally disagree. I don't think it would have to have much fidelity or low latency be insanely useful. The huge advantage is no extra hardware necessary. And maybe it might be hard for you to believe it could do what you want it to do but I'm guessing your vision is pretty narrow.

jcoffland · on March 18, 2016

This reads like a personal attack.

noir-york · on March 18, 2016

If the tech is great then wonderful, but I still remember LeapMotion...

bossx · on March 18, 2016

Have you tried the Orion SDK? It's an order of magnitude improvement in tracking accuracy, even with the older hardware. https://developer.leapmotion.com/orion

alexwebb2 · on March 18, 2016

I tried the "massive improvement" they released before Orion and it was still terrible.

Only so many times a company can say they've got their issues ironed out before I stop believing them.

_oghd · on March 18, 2016

Orion really is quite impressive. I tried the improvement as well. Orion actually works.

ajacksified · on March 18, 2016

Hmm, I'll have to try it out tonight. I have some 4-year-old v0.6 and v0.8 hardware in a drawer somewhere.

MasterScrat · on March 18, 2016

LeapMotion is not precise?

k_bx · on March 18, 2016

We need to immediately improve on PIN-code protection upon cash-withdrawal in ATMs. The problem has been there for a while, but man, it gets easier and easier.

hga · on March 18, 2016

Or be (not, it would seem) overly paranoid like me, my PIN patterns and entering do not involve moving my fingers horizontally, I put my hand down on the pad with my fingers on set keys, cover it with my wallet, and type it in. I also always double one key to make it that much harder to get by observation, a trick I learned from a sysadmin with major access to my school's systems back when we all had to use public terminals a lot.

ricardobeat · on March 18, 2016

By keeping your fingers longer on the keys you're actually making it easier for the person after you to just take an IR shot of the keyboard and reduce their search space to 16 combinations or less.

hga · on March 18, 2016

This is designed to foil garden variety skimmers now and in the foreseeable future, not someone "after" me. Who's going to go to that much trouble when there are I presume many many more people entering their PINS in ways that are easily skimmed?

john_reel · on March 18, 2016

What does it mean to “double one key”?

hga · on March 18, 2016

Have the PIN or password repeat one key in the sequence. Like "mwfabrrpg", if you type quickly an observer won't notice that the 'r' was typed twice in quick sequence.

hardwaresofton · on March 18, 2016

That's a great observation -- while it's not like ATMs were super secure to start with, now anyone who can mimic this sonar tech can put any device that just looks like it's supposed to be at the ATM near the pad, and pick up people's PINs

Geee · on March 18, 2016

You don't even need the sonar, because the buttons are clicky and you should be able to triangulate the origin of the sound quite easily. Also, I guess this would work with just one microphone with simple pattern matching (I assume every click + echo patterns from the structure makes every button sound quite different). The microphone should listen vibrations in the structure (not air waves). The device could be quite far away from the keyboard if it's connected in the same structure and can hear the clicks.

monatron · on March 18, 2016

Sorry, I'm not sure how this is relevant or if this sentence even makes sense. Can you explain?

cafeoh · on March 18, 2016

I suppose he means that a small microphone/speaker setup near an ATM machine's keypad could allow you to track the position of the finger and get the corresponding keypresses without any intrusive modification to the ATM itself.

Of course that would only be valid if you proceed to then steal the person's card right afterward, so all in all, not that useful.

seszett · on March 18, 2016

Isn't it easier to just put up a small camera? Then you can also record the card's number if you fail to steal it afterwards.

swiley · on March 18, 2016

It would be easy to build a device that did something similar to this and captured pin codes on ATM machines.

k_bx · on March 19, 2016

Currently you can see everywhere ATMs are asking you to visually check that keypad and card-input looks exactly like on picture. With this technology, you don't need to make a fake keypad to "hear" PIN sequence, you could just "listen" to it from somewhere else, some place not seen well.

andrew3726 · on March 18, 2016

I guess he means that you can place a smartphone near an ATM and catch the PIN-code with this technology somehow.

IanCal · on March 18, 2016

Really interesting.

Reminds me of SOLI (which is radar rather than sonar): https://www.youtube.com/watch?v=0QNiZfSsPc0

Is there a way of trying this out? I know it'd only be demo line drawing applications but it'd still be interesting to try.

sarreph · on March 18, 2016

I love it when people find ways of using existing hardware with software innovation to make new interactions such as this!

wyldfire · on March 18, 2016

I'm looking forward to the first theremin app.

tootie · on March 18, 2016

I assume it would be impossible because the sound of the music would interfere with the sonar?

hardwaresofton · on March 18, 2016

Wouldn't the music only be on the audible spectrum? Sonar works (can work?) on the inaudible spectrum.

fudged71 · on March 18, 2016

Headphones

adrianN · on March 18, 2016

I wonder how much power all the processing draws. Judging from the slow movements and the delayed update on the screens in this video, it's pretty heavy on the processor.

jerf · on March 18, 2016

The question is, is the multisecond latency because of processing and code efficiency limitations for a academic research project, or is it because the data is unusable without two seconds of smoothing? Given what I see in the video I could argue either way. I do note the latent signal is still a bit noisy, but then, touch screen input isn't necessarily clean either.

But it's also worth keeping in mind this is all off-the-shelf hardware. It seems very likely to me that if a cell phone or smart watch was designed to do this from the get-go that several easy hardware improvements and maybe a bit of custom DSP work would make this work much better. (By "easy hardware improvements", I mean things like speakers intended to emit frequencies for sonar, microphone arrays intended to receive them, etc.) From that perspective, even if the system we saw is fundamentally limited I'd still call it incredibly promising considering the constraints it is operating under!

If I were a smart watch manufacturer I'd be falling over myself to get one of my best engineers and one of my best recruiters an appointment with these people.

mattb314 · on March 19, 2016

The accompanying paper[1] claims the phone lasts four hours running the current version of fingerIO, but also that improvements could be made to preserve power (such as reducing the sampling rate).

[1] http://fingerio.cs.washington.edu/fingerio.pdf

foobarqux · on March 18, 2016

They are calculating autocorrelation and a Fourier transform so they need to buffer the data. Two seconds is probably the shortest buffer that works reliably.

faded242 · on March 18, 2016

Uhh.. not the best name choice in my opinion.

shi · on March 18, 2016

Yep. When I read the title I thought it was a parody on IO-names.

xtf · on March 18, 2016

Cats will love it. Thats why the first ultrasound remote never became standard

mxuribe · on March 18, 2016

Aww, this is just going to wreak havoc for whales and dolphins once divers start using these! ;-)

https://en.wikipedia.org/wiki/Marine_mammals_and_sonar

viewer5 · on March 18, 2016

What do you mean?

mcculley · on March 18, 2016

I suspect he means that this would be audible to cats and have unintended side effects.

crudbug · on March 18, 2016

Similar to SixthSense [1] work from Media Lab.

[1] https://www.media.mit.edu/research/highlights/sixthsense-wea...

4684499 · on March 18, 2016

This is much like Project Soli. I started dreading devices like that. Sonars everywhere.

gr3yh47 · on March 18, 2016

First thing i thought of is the horrible privacy implications when this tech becomes ubiquitous and cloud based

verbatim · on March 18, 2016

I can understand the "I", but what's the "O" part of this?

strictnein · on March 18, 2016

The images drawn on screen?

Eduard · on March 18, 2016

hm, that logic would mean a computer mouse is an input-output device, not only an input device.

aub3bhat · on March 18, 2016

This is not same as FingerIO since it does not uses sophisticated signal processing but still interesting. Make sure that you remove earphones before using it.

https://danielrapp.github.io/doppler/

fitzwatermellow · on March 18, 2016

This is cool. Thinking about smartphones as sensors opens up so many possibilities, even if their capabilities aren't nearly as accurate as dedicated devices. Wondering if the sonar information can be combined with images from the camera to create a close-range depth camera?

szczys · on March 18, 2016

Latency looks like a real issue in this demo. If it can be improved this could be important, but think about how impatient you are if your smartphone doesn't respond to your touch immediately. Users have been trained to be irritated by laggy interfaces.

melling · on March 18, 2016

There has been a lot of recent work with gesture based computing: Intel Real Sense, Google's Soli, Myo, Leap Motion

https://github.com/melling/ErgonomicNotes/blob/master/README...

Leap Motion made huge improvements a few weeks ago with their Orion SDK:

http://venturebeat.com/2016/03/04/leap-motions-hyper-accurat...

We must be close to actually getting something basic for our desktops.

halotrope · on March 18, 2016

This looks interesting. Can you try it somewhere?

lucb1e · on March 18, 2016

Doesn't seem like it. Until anyone other than the researchers tried it, I don't really believe it works outside of controlled environments.

debarshri · on March 18, 2016

How about security? How does it protect other uses from controlling your device?

DuckyC · on March 18, 2016

How does your phone protect other people from pressing your volume buttons, or using your touchscreen? Security is pretty irrelevant for a HID.

gr3yh47 · on March 18, 2016

it's obviously pretty different when you don't have to be touching the phone to interact with it

njharman · on March 18, 2016

Not when it's a few inches away. It's less distance than voice interaction.

lucb1e · on March 18, 2016

A lot of people are a few inches from my pocket every day. And I can easily imagine software listening for such sonars and sending back fake responses. I think debarshri has a valid point.

njharman · on March 18, 2016

> A lot of people are a few inches from my pocket every day.

Really? And your phone is on? With the screen visible for them to see what they are interacting with. And as they are fumbling around in your personal space inches from your hand or pocket, no one notices.

Get real! If true you should be more worried about pickpockets than some random gestures.

lovelearning · on March 18, 2016

According to their paper (which is written well IMO), their prototype uses a double swipe gesture to trigger or stop the detection. I suppose the idea could be extended to use something like a lock/unlock pattern, similar to the swipe patterns on Android lock screens.

drdaeman · on March 18, 2016

I wonder about the other kind of the security. I doubt there's any serious protection against a rogue background-running application tracking your finger and gathering sensitive information, like keystrokes.

You know, check for what's the foreground process, notice it's a password manager or bank application (or even a lock screen would do, who knows, maybe user had reused the same PIN elsewhere), blast the speakers with ultrasound and wait for the typing-like movement patterns.

djsumdog · on March 18, 2016

I have a feeling this is also really depending on the hardware. The demos were probably designed around the specific brand of watch and cellphone since they'd need to know exact distances between the microphones/speakers.

It's a really cool concept. I wish they'd open source what they have, or at least have plans to open source it. However if this came about via University funding, they'll probably claim IP on it. If it was a student's own fellowship, he/she/they might decide to create a start-up out of it.

JamesBaxter · on March 18, 2016

I wonder how well it would work in a noisy environment?

TTPrograms · on March 18, 2016

How many microphones do cell phones typically have? I guess I assumed one, though background noise cancelling would certainly be improved by having more. For this kind of positioning it seems the minimum needed would be 3 - and the Android SDK can access those audio streams separately unprocessed? Pretty neat.

labithiotis · on March 18, 2016

Isn't it too early for April fools ?

ape4 · on March 18, 2016

Exactly my thoughts. I was watching the video thinking this is a bit far fetched. Just like when I learned about Google's moon base.

sehugg · on March 18, 2016

I was thinking of something along these lines for a proximity sensor / motion detector application where you don't need very much accuracy.

yread · on March 18, 2016

I wonder if you can just run it on any smartphone or you need to configure the positions of microphones beforehand

memonkey · on March 18, 2016

What happens when there are multiple devices around each other emitting the signals? Great proof of concept.

dandare · on March 18, 2016

I don't get it, how do they track specifically the tip of a finger? Or are they not?

kennethkl · on March 20, 2016

they probably track the closest object to the phone. just a guess. then ignore subsequent echos for a split second.

k__ · on March 18, 2016

Ultrasound tracking is always problematic, because of all the noise.

I have the feeling, every few years someone has the idea again, to use ultrasound for something and it starts promising, but then the accuracy and lag doesn't go away and dogs and cats go wild.

supergirl · on March 18, 2016

cool idea but the tracking will never be good enough to be practical.

if people really want this type of interaction then phones will start to incorporate specialized hardware for it.

exotiik · on March 18, 2016

For a second i tought we were April 1st

eltronix · on March 18, 2016

It might as well be alchemy

pizza · on March 18, 2016

sonar keylogger enabled

xuan · on March 18, 2016

very interesting!

leosteve78 · on March 18, 2016

It's impressive!

basicallydan · on March 18, 2016

Very cool, good job :)

jbverschoor · on March 18, 2016

Fake video.....

nly · on March 18, 2016

My intuition tells me this just doesn't hold water with respect to information theory.. i.e. the number of bits of useful information about a finger you can pull from a microphone. Putting aside human digits, has anyone even demonstrated that you can reliably detect an eighteen-wheeler rig moving toward a phone with this technique? And what about the range of the speaker? Complete nonsense.

candu · on March 18, 2016

Your intuition is probably wrong: off-the-shelf consumer-grade microphones can typically gather 44.1-96k samples per second at 16-24 bits per sample. That's a lot of potential information.

Also, consider that your ear/brain apparatus estimates object positions and occlusions from audio signals all the time.

Also also, the eighteen-wheeler problem is vastly different from the finger problem, as the latter is smaller, slower, and closer to the microphone, each by 1-2 orders of magnitude.

et2o · on March 18, 2016

Are you suggesting the video is fake? I think your intuition is wrong.

To address your comment more directly, I don't see any information theory type limit immediately applicable here for finding xyz coordinates using echolocation. That's done in a variety of contexts.

rhema · on March 18, 2016

From an information theory perspective, think about the number of samples. Even 22Khz yields a tremendous number of individual points.