Hacker News new | past | comments | ask | show | jobs | submit login

I want to try out Google, but they need to make it easier to try it out. I have petabytes of data in S3 that I would need to move first (at least some of it).



You can try "Cloud Storage Transfer":

`Transfer data to your Cloud Storage buckets from Amazon Simple Storage Service (S3), HTTP/HTTPS servers or other buckets. You can schedule once-off or daily transfers, and you can filter files based on name prefix and when they were changed.`

https://cloud.google.com/storage/transfer/


It would be nice if they managed the transfer themselves via AWS Snowball. Sure, they would have upfront costs, but based on what I spend on AWS monthly, it's probably worth it to them.


Why would you want to move data between AWS and GCP using a hard drive? The networks between the two cloud providers are immense.


At some point (many 1000s of terrabytes) it is faster and cheaper to ship the data on physical devices than it is to send it over the network.

(I work for AWS)


Yup, and I work for Google. We routinely copy petabytes between AWS and GCP. It's not currently more efficient to ship petabytes if you include the time to copy to the device and then recover it.


Given that Amazon currently offers this as a service (two different services actually, snowball and snowmobile for exabytes), i wonder what gives?

Is one less informed than the other or what am I missing something? Why such disparity in opinion?


Google also offers this as a service;

I've work on high performance networking file transfers. my experience is that most people who move data get very low utilization compared to the actual throughput of the network. People typically use one TCP connection, one process. high performance data transfers use thousands of TCP connections and thousands of processes.

Many other people underestimate the time/labor effort of dealing with a snowball.


Do you have any references / examples of people underestimating the effort of working with snowball?

I've actually only heard positive things about snowball. I would be very interested in negative feedback.

(I work for amazon)


So the question then is what are those services for?


We used a box of disks to get about 20 terabytes out of Amazon to CMU. It ended up being about 50℅ cheaper (from memory - may be off a bit) because we did not account for any employee costs. Startup, running on fumes, none of us drawing a salary, etc.


"It ended up cheaper because we did not account for employee costs".

You basically just proved my point: things aren't cheaper if you have to factor in employee cost.


Technically, that's a logical fallacy:. A&B->true does not mean !B->false.

But, really, I'm not trying to prove or disprove your point. Just noting that there was a situation for us where disk made sense, and we were satisfied with the outcome. Spending 4 hours of person time to save a thousand dollars was reasonable for us in a way it probably wouldn't be for many real companies, because we had comparatively little money and we're willing to work for peanuts.

(Note that I actually share your bias in this one. I both use GCP for my personal stuff and I'm writing this from a Google cafe. :-)


Many customers want to dual-host their data to not be beholden to a single cloud provider. Or to have redundancy across providers. or to put their data closer to the compute.


There's likely a significant difference in cost and capacity between copying a PB of data from a corporate datacenter over that corporation's connectivity to S3 and copying from S3 to Google over Amazon's and Google's connectivity.


"Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway."



> All this power fuels one terabyte per-second of data transfer speeds across a 40 gigabyte per-second connection.

How is that supposed to work? Do they mean several 40gbit/s connections? Or did I misunderstand sth here?


Yes, several.


Awesome :)


I never do. Nor do I underestimate the bandwidth between cloud providers.


Hah. What about about the egress cost? Getting 100 PB out of AWS is not cheap. (maybe it is cheap in actual cost, but not in what the end user has to pay to AWS cost).


What magnitude are we talking about? Genuinely curious.


When we would roll out additional datacenters for our video delivery network, we shipped new storage to old DCs, loaded, then shipped to new DCs.

At meaningful volumes, literally months faster than using tier 1 backbone, and a fraction of the cost as well.


It'd avoid expensive egress costs.


The networks between the two cloud providers are also immensely expensive to use.


I didn't realize that they have direct connections. The AWS data center down the street (Virginia) is directly connected to some google cloud datacenter? Sorry - I'm generally ignorant of datacenter technology.

So a back of envelope calculation says that it would take around 10 days to transfer 100 PiB over Gigabit ethernet. When you say immense, do you mean faster than that?


Google's new NoVa DC will probably be in/around the equinix right down the street from AWS.


I would say at least multiple 10Gbit interfaces if you wanted to transfer the data.


Transferring 100 PiB at 10 Gbps (1.3 GiB/s) takes 121 days.


They probably meet at some exchange very close to the DCs


Email me at [email protected] and I'll be sure to start internal conversations around what we can do to help you out.


Cool - I'll try to get someone on my team to give you an email.


Hey ap22213. Never got your email. Just following up to see if you've resolved your issues with GCP, if not feel free to contact me at [email protected]

Have a great day.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: