This seems very similar in principle to the perceptual neural hash that Apple created and uses to check every file on any Apple device. I recall that some people had an issue with that, because there is no guarantee what hashes will be added to the database, and no real way to know what file they will map. So, the hash could be anything, and could send anything, which is entirely up to the whins of whatever company or entity that deploys such a product.
Effectively, this just means that you can in fact check nearly anything happening on an input, if it maps to some perceptual hash that is similar enough to one the server has in its db.
At least on iOS, not without hacking the operating system. Siri’s ability to listen for a wakeword without an microphone indicator requires privileges that normal apps don’t get. On Android, as far as I can tell, the same is true, except that some phones ship with preinstalled third-party apps which can then get extra privileges.