This is very, very cool; it's a thing I've been looking for on my backburner for several years. It's a very interesting problem.
There are a ton of directions I can think about you taking it in.
The household application: this one is already pretty directly applicable. Have a bunch of wireless speakers and you should be able to make it sound really good from anywhere, yes? You would probably want support for static configurations, and there's a good chance each client isn't going to be able to run the full suite, but the server can probably still figure out what to send to each client based on timing data.
Relatedly, it would be nice to have a sense of "facing" for the point on the virtual grid and adjust 5.1 channels accordingly, automatically (especially left/right). [Oh, maybe this is already implicit in the grid - "up" is "forward"?]
The party application: this would be a cool trick that would take a lot more work. What if each device could locate itself in actual space automatically and figure out its sync accordingly as it moved? This might not be possible purely with software - especially with just the browser's access to sensors related to high-accuracy ___location based on, for example, wi-fi sources. However, it would be utterly magical to be able to install an app, join a host, and let your phone join a mob of other phones as individual speakers in everyone's pockets at a party and have positional audio "just work." The "wow" factor would be off the charts.
On a related note, it could be interesting to add a "jukebox" front-end - some way for clients to submit and negotiate tracks for the play queue.
Another idea - account for copper and optical cabling. The latency issue isn't restricted to the clocks that you can see. Adjusting audio timing for long audio cable runs matters a lot in large areas (say, a stadium or performance hall) but it can still matter in house-sized settings, too, depending on how speakers are wired. For a laptop speaker, there's no practical offset between the clock's time and the time as which sound plays, but if the audio output is connected to a cable run, it would be nice - and probably not very hard - to add some static timing offset for the physical layer associated with a particular output (or even channel). It might even be worth it to be able to calculate it for the user. (This speaker is 300 feet away from its output through X meters of copper; figure out my additional latency offset for me.)
> This speaker is 300 feet away from its output through X meters of copper; figure out my additional latency offset for me.
0.3 microseconds. The period of a wave at 20kHz (very roughly the highest pitch we can hear) is 50 microseconds. So - more or less insignificant.
Cable latency is basically never an issue for audio. Latency due to speed of sound in air is what you see techs at stadiums and performance halls tuning.
Oh, thanks for correcting me! Now that you mention it, I'm confused by a memory I have. Wired speakers seem to be less common these days but I remember being told about two decades ago that the "proper" way to install speakers was to run out equal lengths of speaker cable (basically just jacketed copper, afaik) to different speakers even if they weren't equidistant in a room. (This was advice for home installation, not stadium-sized installations.)
Do you suppose there exists some other reason for that, like maybe matching impedance on each cable, or is this likely one of those superstitions that audiophiles fall prey to?
For those wondering: The rule thumb here is that light travels at one foot per nanosecond. 300 ns =0,3 μsec. Electricity is a bit slower but the same order of magnitude.
I’m in europe so I am all in on the metric system. But “about a foot” per nanosecond is so easy to remember, understand and reason about that it is worth the exception. If you prefer something European, think of a sheet of A4 printer paper: the long side is 29.7 cm. “One length of A4 per nanosecond” is within 1% of the actual value of the speed of light.
The original comment used imperial measures, following comments kept to that for consistency.
To put things into proper units: speed of light in vacuum is approx 1.8 terafurlongs per fortnight, and electricity in wires has a pace of similar magnitude, and sound in normal atmospheric conditions shuffles along at approx 2.1 megafurlongs per fortnight.
Thank you for the kind words! Yeah, I think it gets a lot more complicated once you start dealing with speaker hardware. It pretty much only works for the device's native speaker at the moment.
The instant you start having wireless speakers (eg. bluetooth) or any sort of significant delay between commanding playback and the actual sound coming out, the latency becomes audible.
Bluetooth audio devices that I use tend to change the protocol as soon as it switches to headset mode (with microphone enabled), which works terribly for music. I imagine the protocol used when the microphone is enabled might have completely different latency characteristics than the one used purely for audio, so a chirp might be measuring completely different thing
You could use a different device in the swarm for measurement, but yeah it seems pretty quickly complicated! I have no idea as well how stable the latency is
If you support mic input, you can allow the user to select a device as the "nexus" with mic recording on. Then you tell each device in your setup to "chirp" at the same exact time, but at different frequencies. Then you can derive the individual device's "local delay" and compensate.
This allows you to tune the surround setup to full accuracy for a given point in space, and it will take care of ring buffer differences, wireless transfers of non-teathered speakers, etc.
Absolutely! Silent disco still requires impractically expensive rental hardware to work well as far as I know. A lot of them run off FM radio, since it's the simplest way to go, but nobody owns portable radios anymore.
An OSS app with the ability to sync everyone up over mobile or wifi, on Android or iOS with BYO headphones, would be incredible. This should be a thing :)
I wonder if something like this (without the OSS part) doesn't already exist. Some cinemas in France have some kind of app for people who are either hearing or visually impaired which allows them to follow the movie.
I've never seen in action and don't know how it works, but at least for the audio part it should be able to synchronize the phone with the cinema screen.
Snapcast has a webapp and a native android client. Although I'm not sure how well it handles many, many clients. In theory, if all on the same WiFi they should all play in sync like a silent disco (at least for those not using Bluetooth headphones where the playback latency is too high/not available).
Web radios handle many clients. The first problem could be if the Wi-Fi hot spot can handle that many clients. The second one is that web radios and their protocols usually don't care if two clients are not in sync. They are usually in different places, maybe different continents.
I'm self hosting a web radio for my LAN at home. I set it up years ago, I'm not there so I can't check the details but I think it is: Icecast2 on an ARM small server with DeeFuzzer (sp?) to send my mp3s to the Icecast2 server. MPV or VLC to play music on my Linux laptop and Transistor from F-Droid (I believe)
They handle many clients but they absolutely do not play in sync. That's never a requirement for them and I'm not aware of any web radio protocol supporting that feature. Web radio is not the right solution for a silent disco type situation where you can at least guarantee everyone is relatively local.
"Their own source" looks like they are bringing their own files or (more probably) their Spotify or YouTube. It happens all the time on public transport. Or did you mean bringing their own music and taking turns at sharing it with the other people around? That might be against the terms of service of some services.
Surely, since "silent disco" only really works if everyone is dancing to the same music (which is the only thing that would make sense for a post about synchronizing audio), they're using "source" to mean "device"
There are a ton of directions I can think about you taking it in.
The household application: this one is already pretty directly applicable. Have a bunch of wireless speakers and you should be able to make it sound really good from anywhere, yes? You would probably want support for static configurations, and there's a good chance each client isn't going to be able to run the full suite, but the server can probably still figure out what to send to each client based on timing data.
Relatedly, it would be nice to have a sense of "facing" for the point on the virtual grid and adjust 5.1 channels accordingly, automatically (especially left/right). [Oh, maybe this is already implicit in the grid - "up" is "forward"?]
The party application: this would be a cool trick that would take a lot more work. What if each device could locate itself in actual space automatically and figure out its sync accordingly as it moved? This might not be possible purely with software - especially with just the browser's access to sensors related to high-accuracy ___location based on, for example, wi-fi sources. However, it would be utterly magical to be able to install an app, join a host, and let your phone join a mob of other phones as individual speakers in everyone's pockets at a party and have positional audio "just work." The "wow" factor would be off the charts.
On a related note, it could be interesting to add a "jukebox" front-end - some way for clients to submit and negotiate tracks for the play queue.
Another idea - account for copper and optical cabling. The latency issue isn't restricted to the clocks that you can see. Adjusting audio timing for long audio cable runs matters a lot in large areas (say, a stadium or performance hall) but it can still matter in house-sized settings, too, depending on how speakers are wired. For a laptop speaker, there's no practical offset between the clock's time and the time as which sound plays, but if the audio output is connected to a cable run, it would be nice - and probably not very hard - to add some static timing offset for the physical layer associated with a particular output (or even channel). It might even be worth it to be able to calculate it for the user. (This speaker is 300 feet away from its output through X meters of copper; figure out my additional latency offset for me.)