The StochSS team at the 2013 NSF DALI workshop

On Nov. 1st, part of our team participated in the 2013 NSF DALI workshop (  We had the opportunity to interact with a group of amazing individuals from very diverse backgrounds in science, software, and data analytics.  There were experts in R, Matlab, runtime systems, cloud platforms, and computational science.  The list of talks is available here:  We presented our approach to cloud platforms for scalable simulations and data analytics.

Data and Application Management in an Open Cloud Platform

Cloud computing has had tremendous uptake in the global market and is expected to grow well into the future. The commoditization of computational and storage resources has given massive capabilities to individuals and companies to acquire such resources on demand, and to relinquish them when no longer required, without the need to budget for additional hardware and management.
Platform-as-a-Service (PaaS) architectures have arisen in the past years to alleviate the burdens of resource management for developers who may now focus strictly on application development. This faster time-to-value has increased productivity for both developers and their respective organizations. Developers no longer have to worry about lower level details such as CPU consumption, bandwidth limitations, memory consumption, and disk usage, as it has been common in the past. The scaling of applications is now the burden of the platform system. PaaS systems have become the operating systems of the datacenter.
Our research has been focused on developing a PaaS system which can give the aforementioned attributes in an open and pluggable way. We emulate the Google App Engine PaaS system as it was one of the first to come to market and offered the promise of infinite scalability at the front end of application servers and the backend of large data storage, all powered by Googles robust infrastructure. We call our PaaS solution AppScale. AppScale is an open cloud platform capable of transparently executing Google App Engine applications at scale and without modification. AppScale is a cloud-based web framework which provides multiple services that provide cloud infrastructure control, data persistence, caching and a number of other common application technologies. AppScale both simplifies and facilitates the benchmarking of the execution of scalable cloud technologies using real applications. This Ph.D. dissertation discusses the design, implementation, and evaluation of AppScale. It considers the many components of AppScale with a focus on the data management layer for scalable storage, transaction semantics, scalable queries, analysis of Big Data, and live migration support.
The PhD thesis can be downloaded here.

The AppScale Cloud Platform: Enabling Simplified Development and Portable, Scalable Deployment of Web Applications

In this paper, we overview the motivation, design, and implementation behind AppScale, an open source distributed software system that implements cloud platform as-a-service (PaaS).  Our goal with AppScale is to simplify cloud application (app) development and deployment and by doing so to broaden the population of developers who are able to innovate using cloud systems.  We enable this by targeting the problem of app portability across clouds and app service/library implementations (functionality common across apps such as data management, search, messaging, tasking, etc.).  In particular, AppScale defines a simple, unifying, and open set of APIs based on the de facto public cloud standard of Google App Engine, i.e. apps that execute using Google’s public cloud also do so over AppScale without modification. AppScale then “plugs in” and automatically configures and deploys a number of alternative implementations of the different app services.  Moreover, since we make AppScale available as a virtual machine image and a set of deployment tools, users can execute AppScale on-premise or over public cloud infrastructures.  Developers can take advantage of the portability AppScale offers to simplify development and deployment of cloud apps, to compare/contrast different cloud services and fabrics without changing their apps or becoming an expert with the constituent technologies, and to investigate and evaluate new cloud platform advances using a rich application and service ecosystem.
The paper will appear in the February/March issue of IEEE Internet Computing.

A Pluggable Autoscaling Service for Open Cloud PaaS Systems

In this paper we present the design, implementation, and evaluation of a pluggable autoscaler within an open cloud platform-as-a-service (PaaS). We redefine high availability (HA) as the dynamic use of virtual machines to keep services available to users, making it a subset of elasticity (the dynamic use of virtual machines). This makes it possible to investigate autoscalers that simultaneously address HA and elasticity. We present and evaluate autoscalers within this pluggable system that are HA-aware and Quality-of-Service (QoS)-aware for web applications written in different programming languages, automatically (that is, without user intervention). Hot spares can also be utilized to provide both HA and improve QoS to web users. Within the open source AppScale PaaS, utilizing hot spares can increase the amount of web traffic that the QoS-aware autoscaler serves to users by up to 32%.
As this autoscaling system operates at the PaaS layer, it is able to control virtual machines and be cost-aware when addressing HA and QoS. Therefore, we augment these autoscalers to make them cost-aware. This cost awareness uses Spot Instances within Amazon EC2 to reduce the cost of machines acquired by 91%, in exchange for an increase in startup time. This pluggable autoscaling system facilitates the investigation of new autoscaling algorithms by others that can take advantage of metrics provided by different levels of the cloud stack (IaaS, PaaS, and SaaS).

Automated Configuration and Deployment of Applications in Heterogeneous Cloud Environments

Cloud computing is a service-oriented approach to distributed computing that provides users with resources at varying levels of abstraction. Cloud infrastructures provide users with access to self-service virtual machines that they can customize for their applications. Alternatively, cloud platforms offer users a fully managed programming stack that users can deploy their applications to and scales without user intervention. Yet challenges remain to using cloud computing systems effectively. Cloud services are offered at varying levels of abstraction, meter based on vendor-specific pricing models, and expose access to their services via proprietary APIs. This raises the barrier-to-entry for each cloud service, and encourages vendor lock-in.
The focus of our research is to design and implement research tools to mitigate the effects of these barriers-to-entry. We design and implement tools that service users in the web services domain, high performance computing domain, and general-purpose application domain. These tools operate on a wide variety of cloud services, and automatically execute applications provided by users, so that the user does not need to be conscientious of how each service operates and meters. Furthermore, these tools leverage programming language support to facilitate more expressive workflows for evolving use cases.
Our empirical results indicate that our contributions are able to effectively execute user-provided applications across cloud compute services from multiple, competing vendors. We demonstrate how we are able to provide users with tools that can be used to benchmark cloud compute, storage, and queue services, without needing to first learn the particulars of each cloud service. Additionally, we are able to optimize the execution of user-provided applications based on cost, performance, or via user-defined metrics.
Download the thesis.
/Chandra, Chris