I build lots of websites for people. In fact, I’m working on a couple of them right now and host them on Amazon Web Services because it’s inexpensive, reliable, and scalable. AWS has let me get a website up and running in a couple hours and also let me seamlessly scale certain websites as traffic grew. For instance, I built a website for a family friend who published a book that began to get traction in bookstores and on Amazon. As the book surged in popularity–causing higher traffic–the server was able to scale up to better-performing hardware that could accept and handle more concurrent requests–seamlessly.
After my experience with AWS and Google Cloud Platform, I look at the alternatives to the public cloud and shudder. I don’t have time to set up an entire compute environment! I don’t have space to run a personal server nor the time to ensure it’s running healthy nor the ability to troubleshoot when something bad happens. All of that is taken care of in the public cloud.
“A company like Amazon has so many engineers focused on these services—so many people watching for potential problems. It has already spent a decade building this thing.” – Cade Metz
Sometimes I just have a small bit of code that I need to run quickly on a distributed system, like when I used GCP to render a 4K iPhone animation I made for a class or watched my friend run a massive neural network seamlessly through a remote connection. The cloud is empowering users to do and make things like never before.
This summer, I worked on the Google Cloud Platform team and saw firsthand the intricacy of the service. I was fascinated that we contractually obligated ourselves to ensuring 99.999% global uptimes on some products and constantly planned for failure, in order to prevent it. During training, one speaker asked, “Have any of you ever experienced Google being down?” As I racked my brain to remember a time, I looked around to find no one of the 400 interns in the room raising their hands. The truth is, companies like Snapchat, Spotify, and Uber use Google Cloud Platform because they trust in the uptime guarantees and understand the remarkable benefits to using the Cloud. How else would Pokemon Go have scaled so fast if not for GCP global load balancers? Banks too entrust sensitive data on the platform because they know Google will keep it secure. Even Google runs some of its services on Google Cloud Platform.
What then to make of the rare cases of outages, data loss, and privacy issues? I say, thank goodness these things have happened! Incidents are always an opportunity for improvement. As Mr. Slosser mentions, not even Google or Amazon strive for perfection.
“[An outage] is an expected part of the process of innovation, and an occurrence that both development and SRE teams manage rather than fear.” – Mr. Slosser, Google
I can attest that post-mortems are part of the culture and actually help contribute to that five-9’s uptime percentage. Paradoxically, companies do have the users interests in mind first, since they choose whether or not to pay for their service. Goldman Sachs is not going to shell out millions transitioning to the Cloud if outages and data loss are imminent. To the detractors, I say the overwhelming odds are that using these services will benefit companies and people more than not–and that’s only backed up by the success of these services.