Regarding everyone's latency concerns, as someone who has done low-latency audio processing on Android -- in their defense I'd bet almost anything the demo is meant to only demonstrate the math behind this. Depending on the platform (Android cough), low latency audio processing can be almost a dark art itself. And hey look, they're doing this on Android.
My guess is that they decided to release the demo earlier instead of spending days/weeks getting up to speed with low-latency audio processing in the Android JNI.
It's an academic demo/press release. Not a software release for production/market.
I'm curious, besides doing things in C/C++, are there any "magic tricks" to doing low-latency audio processing on Android? Looking at the chart here [1] it still seems to be a good bit behind iOS.
In the Java layer? About the only thing you can do is ensure that you're using the devices native sampling rate: typically 48 kHz for phones, and 44.1 kHz for tablets. Non-native sampling will induce a rather large latency hit. The buffering + buffer size stuff, unfortunately, is really only accessible in native IIRC.
To be completely honest, it's been a long time since I've messed with audio-stuff on the Java-side, so not sure if/how much things have changed for the better.
A lot of production input devices for stuff like waving hand gestures and pen input also have unacceptably high latency. Getting latency down is hard, and makes a huge difference to usability.
I work with sonar and the physical positioning of the sensors is important in trying to get useful results. Why is it these academic types don't release the apks or software? Just publications and maybe a video.
You should try asking these academic types for the data and source code.
A lot of researchers are more than happy to discuss their work, but a big part of the academic industry is your research's impact and references. A way to get a better handle on who is looking at or following on with your work is to implicitly ask them to have a conversation with you before getting the whole kit 'n caboodle.
These sorts of comments are why I come here. Thank you. Without prior awareness that something is common practice it can be extremely difficult to recognize it as an option. I simply lack the motivation for this behavior. If I were to release a teaser about my work it would be because I'm not ready to release my work.
I read an abstract for paper once (on debluring images) and could not find the original paper for free. I emailed the author about the situation, and got a full color hard copy in the mail shortly after. My best hope had been a .pdf by email, but he exceeded that by far. It's still on my shelf a decade later, while a .pdf probably would have gotten lost in the GBs.
And in many cases the universities own the IP associated with the research that goes on in their departments so they keep the source and treat it like a company would treat a trade secret. It's likely Washington will patent this and try to license the patents.
What's the legal situation with these academic research papers? Am I allowed to implement an algorithm from a research paper and then either sell the software or release it as open source?
I assume other academics are allowed to reimplement methods in order to reproduce the result and to compare to their own methods. Can I do the same as a learning exercise?
I think we left the topic of "science" about 4 posts up. I don't know what's being described here, but it's not science. Yet somehow, I get the feeling that I'm paying for it.
No he means implicitly. By not providing everything the academic implicitly requests a conversation before providing everything. The request is never stated but is implied by the assumption that there is no other way to acquire everything.
I actually assumed that before. The way you had quoted it was revealing. My comment reeks of nerd-rage now that I re-read it. Damn italics. I should have mentioned that it would be fair to say: The academic wants an explicit request for everything. I can see how it would have been easy to misinterpret his statement in this way. It's true. It's just not what was stated.
If you're reading the paper you're aware of their existence. If you want to know more, you have to ask them, but they never directly (explicit) state this. It's implied and something of a cultural (academic/research culture) unwritten rule.
No. I wrote that 8 years ago, I've open sourced the code, anyone who wants to take it and run with it can. It's too much work for me to maintain a system that would do this for people.
It depends on the conference or journal they submit to. I typically request that authors release data and code in the review and the same is requested of me when I submit a paper for review. I don't know, maybe CHI doesn't have that sort of culture. Or maybe they do and these students just don't have the time right now and plan to do it right before the conference in May.
Indeed, there is no culture of replicability at CHI (and even less so at UIST). Reviewers usually reward novelty and cool PoC videos, not thoroughness. It is quite rare (especially for U.S. labs) to also publish source code, schematics, or raw data. There have been some initiatives advocating for replicability, and some researchers indeed publish everything, but in the whole, a quick, shiny video of a PoC implementation is often sufficient for a paper to be accepted.
Having been in this kind of situation before: because the software isn't ready for any kind of release. They probably spent a week gathering and tuning the parameters on the DSP for each individual phone they ran it on, and without their knowledge of the system it'd take you a month and a half to get it working on your phone.
If they are publishing a paper, their methods (including custom source code) are ready for release for peer review. Ftheit software sucks, thet should not rely on it for scientific results.
Here's a case that I have published under before: Their software is ready for their own use for research by them personally. However, it has no help, documentation, automation, magic, or even small amounts of assistance. It doesn't work until you have it spit out a page of numbers, which you feed into Matlab and hack on for a week before compiling a new copy of the app that has the resulting calibration matrix baked into the code. The instruction manual for the apk you're asking for would be equivalent to a bachelor's in computer science and half a PhD on DSP and machine learning. That's why they're not releasing it. It proves that their math works and the approach is valid. Replacing their expertise and making it fast enough and reliable enough for general release would require a startup, six developers three of whom need PhDs, a UX team, and a year and a half of work. The approach works and they can prove it. That's it.
I wonder how accurate it really is. The demo video didn't match up with the movements at all and the on-screen drawings looked like prerecorded video that they were trying to sync to.
It's a neat idea, but without a dedicated component or an extremely high-speed RTOS, you're not going to come close to the level accuracy that's really needed to do the math and still allow interaction.
I don't mean to rain on the parade, but I just don't think they really have anything usable.
They say they use an "inaudible high frequency soundwave", so that should be > 20kHz. Shouldn't a buffer of a few milliseconds be more than enough then?
Presumably the buffer is longer to make the system more robust by avoiding spurious detections, not because of some fundamental limit like the Nyquist rate. You would need to set the buffer size experimentally.
> I just don't think they really have anything usable.
I'm not sure that's the purpose. I wouldn't think of this as being intended to be a fully usable product right now. They could be intending it merely as an interesting experiment to explore new possibilities for interaction with mobile and wearable tech.
Another CS student came up with a virtual keyboard using the iPhone's accelerometer[1]. It only had ~80% accuracy[2], so was it all that useful or practical? Probably not. But could it lead to another person or company refining the technique for production in the future? Certainly.
I would totally disagree. I don't think it would have to have much fidelity or low latency be insanely useful. The huge advantage is no extra hardware necessary. And maybe it might be hard for you to believe it could do what you want it to do but I'm guessing your vision is pretty narrow.
Have you tried the Orion SDK? It's an order of magnitude improvement in tracking accuracy, even with the older hardware. https://developer.leapmotion.com/orion
We need to immediately improve on PIN-code protection upon cash-withdrawal in ATMs. The problem has been there for a while, but man, it gets easier and easier.
Or be (not, it would seem) overly paranoid like me, my PIN patterns and entering do not involve moving my fingers horizontally, I put my hand down on the pad with my fingers on set keys, cover it with my wallet, and type it in. I also always double one key to make it that much harder to get by observation, a trick I learned from a sysadmin with major access to my school's systems back when we all had to use public terminals a lot.
By keeping your fingers longer on the keys you're actually making it easier for the person after you to just take an IR shot of the keyboard and reduce their search space to 16 combinations or less.
This is designed to foil garden variety skimmers now and in the foreseeable future, not someone "after" me. Who's going to go to that much trouble when there are I presume many many more people entering their PINS in ways that are easily skimmed?
Have the PIN or password repeat one key in the sequence. Like "mwfabrrpg", if you type quickly an observer won't notice that the 'r' was typed twice in quick sequence.
That's a great observation -- while it's not like ATMs were super secure to start with, now anyone who can mimic this sonar tech can put any device that just looks like it's supposed to be at the ATM near the pad, and pick up people's PINs
You don't even need the sonar, because the buttons are clicky and you should be able to triangulate the origin of the sound quite easily. Also, I guess this would work with just one microphone with simple pattern matching (I assume every click + echo patterns from the structure makes every button sound quite different). The microphone should listen vibrations in the structure (not air waves). The device could be quite far away from the keyboard if it's connected in the same structure and can hear the clicks.
I suppose he means that a small microphone/speaker setup near an ATM machine's keypad could allow you to track the position of the finger and get the corresponding keypresses without any intrusive modification to the ATM itself.
Of course that would only be valid if you proceed to then steal the person's card right afterward, so all in all, not that useful.
Currently you can see everywhere ATMs are asking you to visually check that keypad and card-input looks exactly like on picture. With this technology, you don't need to make a fake keypad to "hear" PIN sequence, you could just "listen" to it from somewhere else, some place not seen well.
I wonder how much power all the processing draws. Judging from the slow movements and the delayed update on the screens in this video, it's pretty heavy on the processor.
The question is, is the multisecond latency because of processing and code efficiency limitations for a academic research project, or is it because the data is unusable without two seconds of smoothing? Given what I see in the video I could argue either way. I do note the latent signal is still a bit noisy, but then, touch screen input isn't necessarily clean either.
But it's also worth keeping in mind this is all off-the-shelf hardware. It seems very likely to me that if a cell phone or smart watch was designed to do this from the get-go that several easy hardware improvements and maybe a bit of custom DSP work would make this work much better. (By "easy hardware improvements", I mean things like speakers intended to emit frequencies for sonar, microphone arrays intended to receive them, etc.) From that perspective, even if the system we saw is fundamentally limited I'd still call it incredibly promising considering the constraints it is operating under!
If I were a smart watch manufacturer I'd be falling over myself to get one of my best engineers and one of my best recruiters an appointment with these people.
The accompanying paper[1] claims the phone lasts four hours running the current version of fingerIO, but also that improvements could be made to preserve power (such as reducing the sampling rate).
They are calculating autocorrelation and a Fourier transform so they need to buffer the data. Two seconds is probably the shortest buffer that works reliably.
This is not same as FingerIO since it does not uses sophisticated signal processing but still interesting.
Make sure that you remove earphones before using it.
This is cool. Thinking about smartphones as sensors opens up so many possibilities, even if their capabilities aren't nearly as accurate as dedicated devices. Wondering if the sonar information can be combined with images from the camera to create a close-range depth camera?
Latency looks like a real issue in this demo. If it can be improved this could be important, but think about how impatient you are if your smartphone doesn't respond to your touch immediately. Users have been trained to be irritated by laggy interfaces.
A lot of people are a few inches from my pocket every day. And I can easily imagine software listening for such sonars and sending back fake responses. I think debarshri has a valid point.
> A lot of people are a few inches from my pocket every day.
Really? And your phone is on? With the screen visible for them to see what they are interacting with. And as they are fumbling around in your personal space inches from your hand or pocket, no one notices.
Get real! If true you should be more worried about pickpockets than some random gestures.
According to their paper (which is written well IMO), their prototype uses a double swipe gesture to trigger or stop the detection. I suppose the idea could be extended to use something like a lock/unlock pattern, similar to the swipe patterns on Android lock screens.
I wonder about the other kind of the security. I doubt there's any serious protection against a rogue background-running application tracking your finger and gathering sensitive information, like keystrokes.
You know, check for what's the foreground process, notice it's a password manager or bank application (or even a lock screen would do, who knows, maybe user had reused the same PIN elsewhere), blast the speakers with ultrasound and wait for the typing-like movement patterns.
I have a feeling this is also really depending on the hardware. The demos were probably designed around the specific brand of watch and cellphone since they'd need to know exact distances between the microphones/speakers.
It's a really cool concept. I wish they'd open source what they have, or at least have plans to open source it. However if this came about via University funding, they'll probably claim IP on it. If it was a student's own fellowship, he/she/they might decide to create a start-up out of it.
How many microphones do cell phones typically have? I guess I assumed one, though background noise cancelling would certainly be improved by having more. For this kind of positioning it seems the minimum needed would be 3 - and the Android SDK can access those audio streams separately unprocessed? Pretty neat.
Ultrasound tracking is always problematic, because of all the noise.
I have the feeling, every few years someone has the idea again, to use ultrasound for something and it starts promising, but then the accuracy and lag doesn't go away and dogs and cats go wild.
My intuition tells me this just doesn't hold water with respect to information theory.. i.e. the number of bits of useful information about a finger you can pull from a microphone. Putting aside human digits, has anyone even demonstrated that you can reliably detect an eighteen-wheeler rig moving toward a phone with this technique? And what about the range of the speaker? Complete nonsense.
Your intuition is probably wrong: off-the-shelf consumer-grade microphones can typically gather 44.1-96k samples per second at 16-24 bits per sample. That's a lot of potential information.
Also, consider that your ear/brain apparatus estimates object positions and occlusions from audio signals all the time.
Also also, the eighteen-wheeler problem is vastly different from the finger problem, as the latter is smaller, slower, and closer to the microphone, each by 1-2 orders of magnitude.
Are you suggesting the video is fake? I think your intuition is wrong.
To address your comment more directly, I don't see any information theory type limit immediately applicable here for finding xyz coordinates using echolocation. That's done in a variety of contexts.
My guess is that they decided to release the demo earlier instead of spending days/weeks getting up to speed with low-latency audio processing in the Android JNI.
It's an academic demo/press release. Not a software release for production/market.