I want to try out Google, but they need to make it easier to try it out. I have petabytes of data in S3 that I would need to move first (at least some of it).
`Transfer data to your Cloud Storage buckets from Amazon Simple Storage Service (S3), HTTP/HTTPS servers or other buckets. You can schedule once-off or daily transfers, and you can filter files based on name prefix and when they were changed.`
It would be nice if they managed the transfer themselves via AWS Snowball. Sure, they would have upfront costs, but based on what I spend on AWS monthly, it's probably worth it to them.
Yup, and I work for Google. We routinely copy petabytes between AWS and GCP. It's not currently more efficient to ship petabytes if you include the time to copy to the device and then recover it.
I've work on high performance networking file transfers. my experience is that most people who move data get very low utilization compared to the actual throughput of the network. People typically use one TCP connection, one process. high performance data transfers use thousands of TCP connections and thousands of processes.
Many other people underestimate the time/labor effort of dealing with a snowball.
We used a box of disks to get about 20 terabytes out of Amazon to CMU. It ended up being about 50℅ cheaper (from memory - may be off a bit) because we did not account for any employee costs. Startup, running on fumes, none of us drawing a salary, etc.
Technically, that's a logical fallacy:. A&B->true does not mean !B->false.
But, really, I'm not trying to prove or disprove your point. Just noting that there was a situation for us where disk made sense, and we were satisfied with the outcome. Spending 4 hours of person time to save a thousand dollars was reasonable for us in a way it probably wouldn't be for many real companies, because we had comparatively little money and we're willing to work for peanuts.
(Note that I actually share your bias in this one. I both use GCP for my personal stuff and I'm writing this from a Google cafe. :-)
Many customers want to dual-host their data to not be beholden to a single cloud provider. Or to have redundancy across providers. or to put their data closer to the compute.
There's likely a significant difference in cost and capacity between copying a PB of data from a corporate datacenter over that corporation's connectivity to S3 and copying from S3 to Google over Amazon's and Google's connectivity.
Hah. What about about the egress cost? Getting 100 PB out of AWS is not cheap. (maybe it is cheap in actual cost, but not in what the end user has to pay to AWS cost).
I didn't realize that they have direct connections. The AWS data center down the street (Virginia) is directly connected to some google cloud datacenter? Sorry - I'm generally ignorant of datacenter technology.
So a back of envelope calculation says that it would take around 10 days to transfer 100 PiB over Gigabit ethernet. When you say immense, do you mean faster than that?
Hey ap22213. Never got your email. Just following up to see if you've resolved your issues with GCP, if not feel free to contact me at [email protected]