Thought Leaders in the Cloud: Talking with Chandra Krintz, Associate Professor at UC Santa Barbara

 

Chandra Krintz is an Associate Professor in the Computer Science Department at the University of California, Santa Barbara. She is also the director of the AppScale open source project. Chandra has received a number of awards, including the 2010 IBM X10 Innovation Award, the 2009-2010 IBM OCR Award, 2008-2009 UCSB Academic Senate Distinguished Teaching Award, and 2008 CRA-W Anita Borg Early Career (BECA) Award. You can learn more about Chandra at www.cs.ucsb.edu/~ckrintz/ and www.cs.ucsb.edu/~ckrintz/racelab.html.

In this interview, we discuss:

  • An open source implementation of the Google App Engine known as AppScale
  • Growing interest in dynamic languages
  • Using the same technology and skills to develop for public and private clouds
  • Any plans to take AppScale in a commercial direction
  • Will cloud computing require new programming languages?
  • How AppScale is already growing beyond a pure implementation of GAE

Robert Duffner: Chandra, could you take a minute to introduce yourself and your involvement with AppScale?

Chandra Krintz: Sure. I’m a professor in computer science at the University of California, Santa Barbara, and I’ve been there about nine years. My background is in computer languages and their implementation in runtime systems. Recently, my group and I have developed AppScale, which is the open source implementation of the Google App Engine APIs.

We went down this path because we were interested in understanding the next generation of the platform level in cloud computing, to understand how cloud works currently, and then to do the research investigating what the next generation of systems, software, and developer support need to look like. That’s where we started.

We have lots of background in languages and runtimes, and Java virtual machines in particular. We wanted to expand that to modern languages such as Python, Ruby, and lots of languages that are used to develop distributed web services, so we went out and looked for an opportunity to be part of the open source community, to contribute software as well as to get some feedback.

We wanted a real system that people were interested in using, so we took our inspiration from Dr. Rich Wolski, another faculty member in computer science at UC Santa Barbara, who started the Eucalyptus Project, which is an open source infrastructure as a service offering.

Since I was most interested in the platform level, we followed his lead to develop an open source version of some cloud platform, and the one that was readily available at the time was Google App Engine.

We set out to provide and support it as an open source offering and follow the APIs as best we could. We also started to investigate some new ways of pushing new APIs, looking for novel ways of scaling and optimizing these systems and taking that forward in the research community.

Robert: With AppScale, what trends are you seeing with regard to the development frameworks and languages developers want to use to deploy applications to the cloud? What preferences are you seeing emerge there?

Chandra: As you know, Google has lots of investment in Python and Python frameworks like Django and web.py. We inherited a lot of that with the user community as we began this project. Since App Engine developers can write apps and have them work in Google or on their own private cloud using AppScale, we inherited a lot of that.

As App Engine evolved into more of a Java offering, we had a lot more users interested in managing their own clouds or running applications that were more like traditional Java web services, primarily because there’s a lot of experience with that from the last decade of use.

From the outside community, I see a lot of Ruby on Rails. A lot of my colleagues and collaborators and people who are interested in writing web services want to use Ruby on Rails, and so we have this set of APIs that allow those types of applications to be written as Google App Engine equivalents.

The trends I’m seeing indicate that people are really interested in dynamic languages. They want very simple, very efficient syntax, and they don’t want to specify types, because they don’t even know what the types look like. They don’t know what their classes are going to be, ultimately, so they want to investigate and develop on the fly, incrementally.

We want to be able to capitalize on that, as well as providing the ability to dynamically optimize and make that trend really fast, which eventually means we have to go to more optimized versions of the system.

From the developer community, it’s not just web services, either. I think there’s a changing trend to solve more interesting problems, such as data analytics and more computationally intensive applications. Some come from the science community, but some are from big data-processing, for example large-graph algorithms.

They want ease of use, but they also want a scalable back end that can solve larger and larger problems. I think those are the key trends: language development, more computationally intensive opportunities, and cloud scaling for those types of apps.

Robert: I recently had some conversations with some Gartner analysts, and we talked about this idea of systemic applications, or the step just below mission-critical applications. These applications require reliability, availability, and serviceability, and a lot of application development shops want to use Java, .NET, and C#. Where do you see Java, .NET, and C# apps when it comes to deploying them to the cloud?

Chandra: I think there’s a huge opportunity for these. I see the Java/.NET position in all of this as giving you a high-level language that is easy, very efficient, and provides many, many libraries for programmer productivity and development support.

At the same time, they’re static languages, which means that the types are known ahead of time, and that just is amazingly powerful when you want to make these things run fast. I think we’re on the path now of wanting performance.

As we move to these more and more interesting applications and services, those types of languages are going to be very important. There’s going to be some coordination, I believe, between dynamic prototype development and these more static languages.

I think the garbage-collected languages are really important for programmer productivity, and the .NET framework, the CLR, and the Java virtual machine have really advanced in terms of performance they can extract from programs, from both the memory-management point of view and in terms of program efficiency.

It’s going to be interesting to see how the trend in dynamic languages can be combined with leveraging robust and well-thought-out implementations in the .NET and Java realms.

Robert: Chandra, looking at platform as a service or PaaS, as it evolves people are definitely making it clear that they want some choice in where their PaaS runs. We host the Windows Azure platform, and we’ve also announced an Azure appliance. Talk about what you see as the need for users to have control over where their PaaS runs.

Chandra: I think that is really important, and it’s one of the things that makes me excited about this entire technological evolution that we’re going through. From the developers’ perspective, they want to be able to develop both locally and on their own local private cluster, and then have it work in large-scale data centers as well.

 

The developer doesn’t need or want to know about any of that, and so I believe where a platform comes in is to provide the bridge between local prototyping, development, and testing and optimizing.

Then, when applications become solid, robust, and complete, they can really leverage resources that are available at a much larger scale than anyone wants to buy or support locally. I feel that the platform is going to be the liaison between the developer and the next generation of distributed systems in the cloud.

At the same time, you mentioned choice. I think developers really want a choice. They don’t want to feel they’re locked into anything. They want competition, and they want the best deal: the best amount of scale for the dollar, the availability for the dollar, or the performance for the dollar.

Having a hybrid approach is something that we are pursuing rigorously and aggressively with AppScale, to provide a platform that allows applications to execute in different clouds. We’re not focused only on Google App Engine. That was just a starting place for us so we could start to build a community and see what those applications look like.

We really are interested in both a private/public cloud hybrid as well as a multi cloud hybrid, where it may be that the type of scale a particular user is interested in will vary. Different clouds may provide different services that are more appropriate for a particular component of an application.

We envision applications running in Windows Azure, in Google App Engine, and locally, because that gives the developer a choice. They can keep some data private, and they don’t have to share it in remote storage, but they can share some of that data with the cloud platform or infrastructure that provides the best price/performance trade off, or now it’s going to be price/scale or price/availability.

I believe that if you have a platform that allows multiple languages and frameworks to be written against it, it provides a certain level of portability. Therefore, the actual application can go to different clouds, and when they want to move, it’s really easy. You can go from Google to Azure, from running your own stuff in Amazon to Windows Azure.

I think that will make developers and users more confident, because they won’t feel tied to any individual infrastructure, and they can just pick the one that meets their needs best at the time. Then companies will develop amazing infrastructures to compete for users and certain application domains. I think that’s how the future may look.

Robert: Let’s talk a little bit about AppScale.  Back in January, a blog post on Thinking Out Cloud was asking, “Does AppScale Have a Commercial Future?” It pointed out that Eucalyptus started similarly, also at UC Santa Barbara. AppScale’s obviously fascinating from a university research perspective, but talk about the commercial potential.

Chandra: I’m a researcher at heart, and I love computer science for the scientist. I have been approached by analysts, potential investors, and other people asking the same types of questions. I don’t have business experience, so I can’t really speak to how the business model might play out. But I think having an open source community contributing back is going to be beneficial to industry and to long-term technological advancement.

I have no plans today to commercialize AppScale, because I like the ability to work on hard problems without being tied to the bottom line. That gives me the freedom to think about problems or solutions that are outside the box. My students may be interested in taking this forward at some point, but right now, we see this as a research infrastructure and a tool for the open source community.

We’ve used a very free open license, a BSD variant. I’ve heard stories about people in other countries who have already started to charge for AppScale cloud usage for a Google App Engine app. I don’t know if there’s any truth to that, but I know there’s interest if someone is from the business side who would want to flesh out what the business model would be.

Robert: This is a great segue with regard to your motivation as an academic and professor. Let’s switch gears a little bit on that point. What do you see that’s interesting about cloud computing to students? What’s some of the cool stuff going on at the UC Santa Barbara RACE lab or elsewhere?

Chandra: Even though I’m a researcher and I want to do science, it’s not pie in the sky science. I want to do things that are very practical that impact technology today. The things we build are very solid and robust, and we work really hard at that. So I am a researcher, but also there’s a key engineering piece.

What I like about Santa Barbara, where I think we have an interesting angle, is our curriculum model, which is very systems-oriented and practical. As we bring in cloud, we think of it as the next generation of distributed systems.

If you call it distributed systems, that’s not as exciting, perhaps, as cloud computing and using all these different language frameworks that all the students come in hearing about and being really interested in. We can actually leverage cloud computing to teach them key concepts about systems and distributed systems and all the different languages that come into play.

I think it’s attractive to students in the sense that it’s very hands on and technical, but it’s also very real and important to the world around us immediately. Students feel like the time they’re putting in is really going to pay off when they go to search for a job or pursue graduate school, and that’s really what is happening.

Robert: New computing paradigms have often meant new languages. I’m thinking about UC Berkeley working on something called BOOM, for example. Do you think existing languages are a short-term thing, and that eventually we’ll be predominantly be using novel, new cloud languages?

Chandra: There are all these different types of resources and behaviors that developers might want to control: things like availability and consistency of data. A couple of directions may play out. One argument against new languages is that no one really wants to learn a whole new language. People get ingrained; they already have languages that they are familiar with, and it’s hard for them to learn something new.

At the same time, following that argument, I foresee that there will be new language features that make some of these concepts first class: availability, scalability, specifying what type of scale you need, when you nee
it, what data should be private, and what data should be public. I think you will see some of those concepts that come from cloud start to filter into languages.

Ruby and Python are new languages, and they’ve had a huge uptake, so it’s not impossible that there would be a new language that comes out of this that takes more advantage of the fact that you’re in a virtualized environment, the fact that things are happening in parallel and concurrently.

I’ve seen a lot of effort targeting parallel and concurrent computation, making is as easy as possible so that when you move to a cloud environment it’s much more efficient, easier to scale, and easier to understand what the program is doing.

Regardless of how it specifically works out, I believe you need community engagement to get uptake, and that’s hard to do. It’s hard to engender an entirely new user community, but not impossible. I think it’s very exciting, and people should definitely be working on this.

Robert: You obviously spend a lot of time thinking about really difficult technical challenges in the cloud. Many of the barriers have little to do with technology, but tend to be more psychological or regulatory. What are your thoughts on the non technical hurdles to getting the most out of the cloud?

Chandra: The pros have to outweigh the cons, of course, so the scale you’re going to get needs to be useful to your audience. Second, the scale they’re trying to achieve at very low cost must have excellent potential for forwarding their businesses.

I also think it’s important to provide a hybrid approach where you can give them a step toward moving into the cloud without being all or nothing. They don’t have to put all their data there. They can do it incrementally. They can start to experience for themselves the benefit that clouds bring, and then they can have a rational argument about whether that benefit they’re achieving is worth the risk that they perceive or that may be real.

There is so much great work to be done in terms security and privacy, and lots of it is being done at Microsoft. As those tools and technology advance, even though it’s not technology focused, as customers can start to trust the infrastructure to provide them the guarantees they want, they’ll become more and more comfortable.

People are a little bit apprehensive about anything new, and I think it just takes time.

Robert: One of the other issues we see a lot is how governments are going to deal with issues around where data is located. Will it get to the point where laws acknowledge the capabilities of the cloud, and will we change the laws?

Chandra: I’m not the person to ask about that, but my feeling is that there will be technological advances such as encrypted data and rich management of data that will allow a lot of business to get done. At the same time, nothing top secret’s ever going to go onto a public cloud infrastructure. But I think that there’s a lot of middle ground where people will be OK with having their data stored elsewhere.

Robert: I wanted to get back a little bit to the AppScale implementation. It’s an open source implementation on Google App Engine, and it’s obviously important to be a mirror image of what Google provides. You touched a little bit upon this earlier in our discussion here, but as you look toward the future, do you see opportunities for AppScale to expand beyond what Google App Engine does?

Chandra: We already do, in fact. For example, we provide access to the APIs from different languages, and do not place restrictions on resource use, whereas in Google, you’re restricted there are strict time constraints on most operations. Also, Google has and uses their internal MapReduce infrastructure for their indexing and other applications, and you have no control over that. AppScale lets you write your own mappers and reducers and employ MapReduce from your Google App Engine app.

Another big difference is that the library’s are really restricted in terms of what you can use. This, I believe, has limited the number of people who have tried Google App Engine, because you can only use a very small subset of all the libraries that are available for Python and Java.

With AppScale, you can allow or disallow any library. It’s your cluster. So you can use a much wider range of code that’s out there. I think those are the big ones.

Why Google does this is because it’s how they achieve scale. They can make guarantees because they understand what code can possibly be executing, and they can restrict that down so that they can have the super scale of those shared resources.

Also you can’t do much computation in Google. Each request you handle, execute very quickly. You can have background tasks, but all of that is restricted in terms of the duration of the computation.

Where we’re moving, where we’re already tasked, and what I believe has to happen for clouds, is that you have to be able to run arbitrary code. You have to be able to run computationally intensive, large-scale data-processing operations on the cloud, because those resources can solve many more problems than we can with private clusters.

What AppScale offers, I believe, and what Azure offers in the same way, is that you can do a lot more in terms of a diversity of functionality than Google App Engine has today. Again, this is how Google has chosen to go down for scaling reasons. But I think those are key.

I think more and more services are going to be important, and I think users are also going to be very interested in the ability to go to different cloud infrastructures as well as using services in other clouds. I think that’s what we’ll have an advantage in pursuing.

Robert: Most customer experience with the cloud is either with infrastructure as a service or software as a service. You find that when you want to talk about platform as a service, it requires you to make quite a leap. Have you experienced the same thing?

Chandra: Yes, and I think it’s going to be the slowest to come. You’ve got to get the developer community’s buy in and show that it’s going to have an impact, although I think it’s a necessary middle ground. You don’t necessarily want just to use someone else’s app and customize it. You want to make your own app, but you don’t want to have it be fully self-service when its not necessary to be.

This requires a lot of manpower, and it’s very error-prone if you have to do it all yourself, even though there are a lot of great tools out there that help you, of course.

Developers want to be able to write an app and have it be available to everyone in the whole world immediately. I think that’s very exciting, but it’s also going to take longer for that to come, because there are many issues that have to be resolved, such as deployment and scalability issues.

If you’re a public cloud, how are you going to partition your resources and make your service guarantees? Those are hard problems, but I think if we can do it well, we have potential for having this become the standard paradigm. Instead of writing a program that runs on your desktop, you will write programs, and they will just run everywhere.

Robert: I really appreciate your taking the time for this conversation. Is there anything else you would like to add in closing?

Chandra: I would like to point out that I chose Google because Azure really wasn’t quite ready at the time. This was two years ago, and it was just a choice to get onto the playing field. I think Azure has a lot of potential, and I’m writing proposals to extend AppScale to Azure as we speak. I’m glad we’re all on similar paths.

Robert: That’s a great place to wrap up. Thanks again, Chandra.

Chandra: My pleasure.