It is very well known that the actually speech processing happens on the cloud. To deploy a whole cloud voice recognition system if you have distributed network with TTS capability on the device would be quite a lot of redundant work to go through.
However, with that said, unless they do certificate pinning on their device the answer to that is to MITM the device and snoop on the traffic.
If they do certificate pinning the answer is:
1. Pre-record an Alexa commend
2. Play back the recording
3. Wait a minute
4. Replay the command
5. Measure the size of the packets going across the network
6. Wait a week while playing something that sounds like natural conversation - say an audio book
7. Replay the command audio file
8. Measure the amount size of data sent between the end of the second command and the end of the last
It should be slightly more than the second command was to account for things like checking for updates. But if it includes the TTS (which is essentially an audio book transcribed at this point) than it would be quite a bit larger even with text compression.
Any amount of text - when compressed - would be dwarfed by a number of things that may also be included in the data exchange, such as a software update. There's no way to conclude that a larger exchange of data means a big exchange of a week's worth of text.
However, with that said, unless they do certificate pinning on their device the answer to that is to MITM the device and snoop on the traffic.
If they do certificate pinning the answer is:
1. Pre-record an Alexa commend
2. Play back the recording
3. Wait a minute
4. Replay the command
5. Measure the size of the packets going across the network
6. Wait a week while playing something that sounds like natural conversation - say an audio book
7. Replay the command audio file
8. Measure the amount size of data sent between the end of the second command and the end of the last
It should be slightly more than the second command was to account for things like checking for updates. But if it includes the TTS (which is essentially an audio book transcribed at this point) than it would be quite a bit larger even with text compression.