Thought Leaders in the Cloud: Talking with Randy Bias, Cloud Computing Pioneer and Expert

Randy Bias is a cloud computing pioneer and recognized expert in the field.  He has driven innovations in infrastructure, IT, Operations, and 24×7 service delivery since 1990. He was the technical visionary on the executive team of GoGrid, a major cloud computing provider. Prior to GoGrid, he built the world’s first multi-cloud, multi-platform cloud management framework at CloudScale Networks, Inc.

In this interview, we discuss:

  • Cloud isn’t all about elasticity.  Internal datacenters run about 100 servers for each admin.  The large cloud providers can manage 10,000 servers per admin.
  • Users can procure cloud resources on an elastic basis, but like power production, the underlying resource isn’t elastic, it’s just built above demand.
  • Just doing automation inside of your datacenter, and calling it private cloud, isn’t going to work in the long term.
  • Laws and regulations are not keeping pace with cloud innovations.
  • Startups aren’t building datacenters.  In the early days, companies built their own power generation, but not any more.  Buying compute instead of building compute is evolving the same way.
  • The benefit of cloud isn’t in outsourcing the mess you have in your datacenter.  It’s about using compute on-demand to do processing that you’re not doing today.

Robert Duffner: Could you take a minute to introduce yourself and your experience with cloud computing, and then tell us about Cloudscaling as well?

Randy Bias: I’m the CEO of Cloudscaling. Before this, I was VP of Technology Strategy at GoGrid, which was the second to market infrastructure-as-a-service provider in the United States. Prior to that, I worked on a startup, building a cloud management system very similar to RightScale’s. 

I was interested very early in cloud technology and I also started blogging on cloud early in 2007. Prior to cloud I had already amassed a lot of experience building tier-one Internet service providers (ISPs), managed security service providers (MSSPs), and even early pre-cloud technology solutions at Grand Central Communications.

Cloudscaling was started about a year and a half ago, after I left GoGrid. Our focus is on helping telcos and service providers build infrastructure clouds along the same model of the early cloud pioneers and thought leaders like Amazon, Google, Microsoft, and Yahoo.

Robert: On your blog, you recently stated that elasticity is not cloud computing. Many people see elasticity as the key feature that differentiates the cloud from hosting. Can you elaborate on your notion that elasticity is really a side effect of something else?

Randy: We look at cloud and cloud computing as two different things, which is a different perspective from that of most folks. I think cloud is the bigger megatrend toward a hyper-connected “Internet of things.” We think of cloud computing as the underlying foundational technologies, approaches, architectures, and operational models that allow us to actually build scalable clouds that can delivery utility cloud services.

Cloud computing is a new way of doing IT.  Much in the same way that enterprise computing was a new way of doing IT compared to mainframe computing. There is a clear progression from mainframe to enterprise computing and then from enterprise computing to cloud computing. A lot of the technologies, architectures, and operational approaches in cloud computing were pioneered by Amazon, Microsoft, Google, and other folks that work at a very, very large scale.

In order to get to a scale where somebody like Google can manage 10,000 servers with a single head count, they had to come up with whole new ways of thinking about IT. In a typical enterprise data center, it’s impossible to manage 10,000 servers with a single head count.  There are a number of key reasons this is so.

As one example, a typical enterprise data center is heterogeneous. There are many different vendors and technologies for storage, networking, and servers. If we look at somebody like Google, they stated publicly that they have somewhere around five hardware configurations for a million servers. You just can’t get any more homogeneous than that. So all of these big web operators have had to really change the IT game.

This highlights how we think of cloud computing as something fundamentally new. One of the side effects of large cloud providers being able to run their infrastructures on a very cost effective basis at large scale is that it enables a true utility business model.

The cost of storage, network, and computing will effectively be driven toward zero over time. Consumers have the elastic capability to use the service on a metered basis like phone or electric service, even though the actual underlying infrastructure itself is not elastic.

It’s just like an electric utility. The electricity system isn’t elastic, it’s a fixed load. There’s only so much electricity in the power grid. That’s why we occasionally get brown-outs or even black-outs when the system becomes overloaded. It’s because the system itself is not elastic, it’s the usage.

Robert: That’s actually a great analogy, Randy. You mentioned that public cloud is at a tipping point. There are obvious reasons for organizations wanting to go down a private cloud path first. Are you sensing that many organizations will go to the public cloud first? And then re-evaluate to see what makes sense to try internally?

Randy: In our experience, a typical large enterprise is bifurcated. There is a centralized IT team focused on building internal systems that you could call private cloud as an alternative to the public cloud services. On the other side are app developers in the various lines of business, who are trying to get going and accomplish something today. Those two constituencies are taking different approaches.

The app developers focus on how to get what they need now, which tends to push them toward public services. The centralized IT departments see this competitive pressure from public services and try to build their own solutions internally.

We should remember that we’re looking at a long term trend, and that it isn’t a zero-sum game. Both of these constituencies have needs that are real, and we’ve got to figure out how to serve both of them.

We have a nuanced position on this, in the sense that we are neither pro-public cloud nor pro-private cloud. However, we generally take the stance that probably in the long term, the majority of enterprise IT spending and capacity will move to the public cloud. That might be on a 10 to 20 year time-frame.

If you’re going to build a private cloud that will be competitive, you’re going to have to take the same approach as Amazon, Google, Microsoft, Yahoo, or any of the big web operators. If you just try to put an automation layer on top of your current systems, you won’t ultimately be successful.

We know the history of trying to do large-scale automation inside our data centers over the past 20 or 30 years. It’s been messy, and there’s no reason to think that’s going to change. You’ve got to buy into that idea of a whole new way of doing IT. Just adding automation inside your data center and calling it a private cloud won’t get you there.

Robert: Some of the people that we’ve spoke to have expressed the notion that clouds only work at sufficient scale. When we talk about Azure and the cloud in the context of ideal workloads or ideal scenarios, we always talk about this idea of on and off batch processing that requires intensive compute or a site that’s growing rapidly. And then of course your predictable and unpredictable bursting scenarios. In your experience, is there some minimum size that makes sense for cloud implementation?

Randy: For infrastructure clouds, there probably is a minimum size, but I think it’s a lot lower than most people think. It’s about really looking at the techniques that the public cloud providers have pioneered.

I see a lot of people saying, “Hey, we’re going to provide virtual machines on demand. That is a cloud,” to which I respond, “No, that’s virtual machines on demand.” Part of the cloud computing revolution is that providers like Amazon and Google do IT differently, like running huge numbers of servers with much lower head count.

Inside most enterprises, currently IT can manage around a 100 servers per 1 admin. So when you move from a 100:1 to say a 1000:1, labor opex moves from $75 a month for a server to $7.50 per month. And when you get to ten thousand, it’s a mere $0.75 a month.

These are order of magnitude changes in operational costs, or in capital expenditures, or in the overall cost structure. Now what size do you have to be to get these economies? The answer is … not as big as you think.

When some people consider economies of scale, they believe it means the ability to buy server hardware cheaply enough. But that’s not really very difficult.  You can go direct to Taiwanese manufacturers and get inexpensive commodity hardware that is very reliable.  These hardware has the same components as the hardware you could get from IBM, Dell, or HP today and is built by the same companies that build these enterprise vendors hardware.

For hardware manufacturers, especially the original Taiwanese vendors, there is only so much of a discount they can provide, so Amazon doesn’t have significantly more buying power than anybody who’s got a few million bucks in their pocket.

There are also economy of scale comes from more subtle places, such as the ability to build a rock star cloud engineering team.  For example, Amazon Web Services cloud engineering team iterates on a rapid pace and they have designed software so they can actually manage a very large data system efficiently at scale.

You could do that with a smaller team and less resources, but you’ve got to be really committed to do that. Also finding that kind of talent is very difficult.

Robert: You’ve also talked about how cloud is fundamentally different from grid and HPC. How do you see that evolving? Do you see them remaining very separate, for separate uses and disciplines? Or do you see the lines blurring as time goes on?

Randy: I think those lines will blur for certain. As I say in the blog post, I view cloud more as high scalability computing than as high performance computing. That actually means that the non-HPC use cases at the lower end of the grid market already make sense on public clouds today. If you run the numbers and the cost economics make sense, you should embrace cloud-based grid processing today.

Amazon is building out workload-specific portions of their cloud for high performance computing running on top of cloud. Still, at that very top of the current layer of grid use cases that are HPC, the cost economics for cloud are probably never going to make sense. For example, it may be the case for a large research institution like CERN or some other large HPC consumer that really needs very low latency infrastructure for MPI problems.

Robert: It seems that a lot of issues around the cloud are less associated with technical challenges than they are about law, policy, and even psychology. I’m thinking here about issues of trust from the public sector, for example. Many end customers also currently need to have the data center physically located in their country. How do you see the legal and policy issues evolving to keep up with the technical capabilities of the cloud?

Randy: It’s always hard to predict the future, but some of the laws really need to get updated as far as how we think about data and data privacy. For example, there are regulatory compliance issues that come up regularly when I talk to people in the EU. Every single EU member country has different laws about protecting data and providing data privacy for your users. Yet at the same time, some of that is largely prescriptive rather than requirements-based, like stating that data can’t reside outside of a specific country.

I don’t know that that makes as much sense as specifying that you need to protect the data in such a way that you never leave it on the disk or move it over the network in such a way that it can be picked up by an unauthorized party. I think the security, compliance, and regulatory laws really need to be updated to reflect the reality of the cloud, and that will probably happen over time. In the short term, I think we’re stuck in a kind of fear, uncertainty, and doubt cycle with cloud security.

Previously, I spent about seven years as a full-time security person.  What I found is that there is always a fairly large disconnect between proper security measures and compliance. Compliance is the codification in laws to try to enforce a certain kind of security posture.
But because of the way that data and IT are always changing and moving forward, while political systems take years to formulate laws, there’s always a gap between the best practices in security and what the current compliance and regulatory environment is.

Robert: Now, you mentioned a big cloud project your company did in South Korea. What are some of the issues that are different for cloud computing with customers outside the United States?

Randy: I think one of the first things is that most folks outside the U.S. are really at the beginning of the adoption cycle, whereas inside the U.S., folks are pretty far along, and they’ve got more fully formulated strategies. And the second thing is that in many of these markets, since the hype cycle hasn’t picked up yet, there are still a lot of questions around whether the business model actually works.

So for example, in South Korea, the dedicated hosting and web hosting business is very small, because most of the businesses there have preferred to purchase the hardware. It’s a culture where people want to own everything that they are purchasing for their infrastructure. So will a public cloud catch on? Will virtualization on demand catch on? I don’t know.

I think it’ll be about cost economics, business drivers, and educating the market. So I think you’re going to find that similar kinds of issues play out in different regions, depending on what the particulars are there. We’re starting to work wit
folks in Africa and the Middle East, and in many cases, hosting hasn’t caught on in any way in those regions.

At the same time, the business models at Infrastructure as a Service providers in the U.S. don’t really work unless you run them at 70 to 80% capacity. It’s not like running an enterprise system where you can build up a bunch of extra capacity and leave it there unused until somebody comes along to use it.

Robert: I almost liken it to when the long-distance companies, because of the breakup of the Bells, started to offer people long distance plans. You had to get your head around what your call volume was going to look like. It was the same when cell phones came out. You didn’t know what you didn’t know until you actually started generating some usage.

Randy: I think the providers will have options about how they do the pricing, but the reality is that when you are a service provider in the market, you are relatively undifferentiated. And one of the ways in which you try to achieve differentiation is through packaging and pricing. You see this with telecommunications providers today.

So we’re going to see that play out over the next several years. There will be a lot of attempts at packaging and pricing services to address consumers’ usage patterns. I liken it to that experience where you get that sticker shock because you went over your wireless minutes for that month, and then you realize that you need plan B or C, and then you start to use that.

Or when you, as a business, realize that you need to get an all you can eat plan for all of your employees, or whatever you want that now works for your business model. Then service providers will come up with a plethora of different content pricing and packaging to try to service those folks and that will be more successful.

Robert: In a recent interview I did with New Zealand’s Chris Auld, he said that cloud computing is a model for the procurement of computing resources. In other words it’s not a technological innovation as much as a business innovation, in the sense that it changes how you procure computing. What are your thoughts on his point?

Randy: I am adamantly opposed to that viewpoint. Consider the national power grid; is it a business model or a technology? The answer is that it’s a technology. It’s a business infrastructure, and there happens to be a business model on top of it with a utility billing model.

The utility billing model can be applied to anything. We see it in telecommunications, we see it in IT, we see it with all kinds of resources that are used by businesses and consumers today.
We all want to know, what is cloud computing? Is it something new? Is it something disruptive? Does it change the game?

Yes, it’s something new. Yes, it’s something disruptive. Yes, it’s changed the game.

The utility billing model itself has not changed the game.  Neither has the utility billing model as applied to IT, because that has been around for a long time as well. People were talking about and delivering utility computing services ten years ago, but it never went any where.

What has changed the game is the way that Google, Amazon, Microsoft, and Yahoo use IT to run large scale infrastructure.  As a side effect, because we’ve figured out how to do this very cost effectively at a massive scale, the utility billing model and the utility model for delivering IT services now actually works. Before, you couldn’t actually deliver an on-demand IT service in a way that was more cost effective than you could build inside your own enterprise.

Those utility computing models didn’t work before, but now we can operate at scale, and we have ways to be extremely cost-efficient across the board. If we can continue to build on that and improve it over time, we’re obviously going to provide a less expensive way to provide IT services over the long run.

It’s really not about the business model. It really is about enabling a new way of doing IT and a new way of computing that allows us to do it at scale.  Then on top of this to provide a utility billing model.

Robert: Clearly, we’re seeing a lot of immediate benefit to startups, for the obvious reason that they don’t need to procure all of that hardware. Are you seeing the same thing as well?

Randy: I’ve been more interested in talking about enterprise usage of public services lately, but it seems that the start ups are well into the mature stage, where nobody ever goes out and builds infrastructure anymore if they have a new start up. It just doesn’t make any sense.

When folks were first starting to use electricity to automate manufacturing, textiles, and so on, larger businesses were able either to build a power plant, or to put their facility near some source of power, such as a hydroelectric water mill. Smaller businesses couldn’t.

Then when we built a national power grid, suddenly everybody could get electricity for the same cost, and it became very difficult to procure and use electricity for a competitive advantage. We’re seeing the same thing here, in the sense that access to computing resources is leveling the playing field. Small businesses and start ups actually have access to the same kinds of resources that very large businesses do now. I think that that really changes the game over the long term.

You will know we crossed a tipping point when two guys and their dog in a third world country can build the infrastructure to support the next Facebook with a credit card and a little bit of effort.

Robert: Those are all of the prepared questions I had. Is there anything else you’d like to talk about?

Randy: There are a few things that I’d like to add, since I have the opportunity. The first thing reaches back to the point I made before, likening the way cloud is replacing enterprise computing to the way client-server or enterprise computing replaced mainframes. What drove the adoption of client-server (enterprise) computing?

It really wasn’t about moving or replacing mainframe applications, but about new applications. And when you look at what’s going on today, it’s all new applications. It’s all things that you couldn’t do before, because you didn’t have the ability to turn on 10,000 servers for an hour for $100 and use them for something.

If you look at the way that enterprises are using cloud today, you see use cases like financial services businesses crunching end-of-day trading data, or pharmaceutical companies doing very large sets of calculations overnight, where they didn’t have that capability before.
There’s a weird fixation in a lot of the cloud community on enterprise or private cloud systems. They’re trying to say that cloud computing is about outsourcing existing workloads and capacity. Somebody who maybe doesn’t have the same kind of cost efficiencies that Amazon or Google has.

If you just outsource the mess in your data center to someone else who has the same operational cost economics, it can’t really benefit you from a cost perspective. What has made Amazon and others wildly successful in this area is the ability to leverage this new way of doing IT in ways that either level the playing field or otherwise create new revenue opportunities. It’s not about bottom line cost optimization.

If we just continue doing IT the way we already do it today, I think we’re going to miss the greater opportunity. On the other hand, you ask your developers, “What can you do for the business if I give you an infinite amount of compute, storage, and network that you can turn on for as little as five minutes at a time?” That’s really the opportunity.

Robert: That’s excellent, Randy. I really appreciate your time.

Randy: Thanks Robert.