When using a single GPU, as is the case here, you will not be able to saturate a...

When using a single GPU, as is the case here, you will not be able to saturate a TB2 slot in any reasonable deep learning workload. For multiple GPUs, sure, PCIe performance matters, but for single GPUs it is not a bottleneck for any practical scenario I can think of.

Right now, I am working with large volumetric datasets (think 100+GB) and even then I am only seeing ~200MB/s peak transfers, which are well within the capabilities of TB2. In my experience, large datasets are bottlenecked by the hard drive, which is not a problem for modern rMBPs.

Edit: I am using a 2.5GB/s NVMe SSD, so the 200MB/s is the raw bandwidth used when mass-evaluating batches on the GPU. For training, I am seeing around 70-80MB/s sustained.