Why exactly do I still need a middleman in 2023 to talk to someone else's computer? Is NAT the only reason?
Also, why exactly did we introduce IPv6 again? Everything today is NAT-within-NAT-within-NAT (much of it using IPv4), and almost nobody has a publicly routable IP address. Was the whole transition just a massive waste of effort?
Peer to peer doesn't really work for group video calls, like more than 5 or so participants. As n goes up, each peer is sending n video/audio streams and receiving n video/audio streams. This will quickly saturate your/your peers network and burden your CPU doing video encoding.
Suffice it to say there are other things you can do besides just a central relaying server, but it's the most common architecture.
My ISP supports ipv6 and i have it configured - however their software on the router/AP is bad and does not allow setting up a firewall for ipv6. This is inherent with ipv4 NAT (with uPnP disabled).
So it forced me to use my own router - still the interface for ipv6 firewall is non-existent, but at least i can write firewall rules manually.
Why do I need firewall on router? Because devices on my network have services open on all interfaces - For example "smart" weather station has web service open for all to see. This is absolutely non-issue when only using ipv4 behind NAT.
Another issue is revealing of internal network topology to outside world - this is something that NAT hides really well.
STUN, TURN servers, NAT hole punching, proxies for especially unlucky situations, connection setup helper to allow finding people using human readable ways, trans-encoding of video if necessary because of different platforms working only well with different codecs, in case of many people meeting full p2p between all people also can be an issue (bandwidth and keeping them in sync)
Through a lot of their code isn't being a middleman but making the video streaming on all clients work, which is easy for some MVP hobby project but hard to make it actually work reliable across the many different devices and software versions used in the wield.
Then there are features like noise filters, background video filters etc.
Don't forget battery life, roaming, discoverability etc.
The days of everybody having exactly one computer with a rarely-changing IP address are over. These days, most people have a phone which changes its IP address a few times a day (when you leave your house and switch from WiFi to cellular and then go back.) If you wanted to be directly reachable, you'd need to share these changes publicly, which would make it pretty trivial to figure out when you leave home, who you visit, which cafes with free WiFi you frequent and which countries you go to for your business trips. The stalking potential here is enormous.
As far as I know, Jitsi uses SFU (i.e. not P2P, not MCU), so every device sends their stream once (to the server), which doesn't do any transcoding but only forwards the streams to each client.
Therefore (just like multicast) you only send your stream once, and every client receives n streams.
How is this significantly different from n-1 streams coming from a single peer/server connection? Isn't that simply what it takes to have a group video call?
This is quite trivial to work around by just tunneling/VPN'ing to a IPv6 tunnel broker. Yes it's not really efficient but it will work especially if the only thing you use it for is communications. It's actually no different to TURN (if you were behind a restrictive NAT), just that your TURN provider is now the tunnel broker and is independent of your communications service.
The problem is that there's no money to be made here, so no software is built to take advantage of end-to-end connectivity. Even if you could get IPv6 right now (and you can with tunneling/VPN), what are you going to do with it? Big tech is quite happy with the loss of end-to-end connectivity since it enforces the need for a middleman, and they have no reason to make it easier for you to regain your independence.
The ISP is still "evaluating" IPv6 because there's just no real end-user demand because besides ideology or specific requirements of a technical minority there just isn't any reason for the average user to need it. If tomorrow every OS came with a built-in SIP client that actually worked and there was an actual successful deployment of consumer-grade SIP, demand for IPv6 would skyrocket and the ISP would get their act together or start losing customers over it. But there will never be a built-in SIP client because Big Tech would rather have you use FaceTime or MS Teams or Skype than some open protocol that doesn't require a middleman nor isn't vulnerable to advertising nor tracking.
At least one part of the problem is the WebRTC design. It requires a middle man (or side channel) for session initiation. You can't host a static website that does WebRTC between peers because you can't just input an IP+Port to connect to a peer like you can do with real end-to-end protocols.
The demise of end-to-end connectivity brought on by NAT was a boon to capitalists who can now be middlemen and charge rent for it (either in the form of money or "engagement" aka advertising/spam, tracking, etc). They aren't particularly interested in going back to the old standard even if we now have the technology to do so.
Software that can take advantage of end-to-end connectivity is nowadays very rare, so even if tomorrow we magically had full IPv6 deployment worldwide, not much software would take advantage of it and I'm not sure there would be any commercial pressure to develop it.
Even if your Mac and iPhone had IPv6 and were end-to-end connectable, Apple would rather have you use FaceTime with an Apple account rather than just type in the IP address/DNS of the other side and call them directly. Same with all the other tech companies.
Also, why exactly did we introduce IPv6 again? Everything today is NAT-within-NAT-within-NAT (much of it using IPv4), and almost nobody has a publicly routable IP address. Was the whole transition just a massive waste of effort?