UArk links

DVC links

Other links

Dynamic Virtual Clustering Publications

For the BibTex file containing the following papers' references, see here.

Dynamic Virtual Clustering

Authors: Wesley Emeneker, Dan Stanzione
Published: IEEE Cluster 2007
Abstract: Multiple clusters co-existing in a single research campus has become commonplace at many university and government labs, but effectively leveraging those resources is difficult. Intelligently forwarding and spanning jobs across clusters can increase throughput, decrease turnaround time, and improve overall utilization. Dynamic Virtual Clustering (DVC) is a system of virtual machines, deployed in a single or multi-cluster environment, to increase cluster utilization by enabling job forwarding and spanning, flexibly allow software environment changes, and effectively sandbox users and processes from each other and the system. This paper presents both the initial implementation of DVC and performance results from synthetic workloads executed under DVC.

Thesis: Dynamic Virtual Clustering

Thesis Author: Wesley Emeneker
Abstract: The increasing reliance on parallel computing for scientific simulation and experimentation has made high performance cluster computing an integral part of the scientific process. The result of this explosion in demand is that multiple clusters often co-exist within the same research campus. As more and more computational power is demanded by researchers, effectively leveraging existing cluster resources becomes critical to supplying needed cycles. One way to improve cluster utilization is to forward and span jobs across multiple clusters. However, software heterogeneity is often a barrier to implementing these capabilities. Dynamic Virtual Clustering is presented as the solution to enable forwarding and spanning in arbitrary clusters by using virtual machines to abstract software environment differences. Although the use of virtual machines incurs a performance loss, the ability to span and forward jobs improves throughput and turnaround thereby increasing the performance seen by common cluster workloads.

Increasing Reliability through Dynamic Virtual Clustering

Authors: Wesley Emeneker, Dan Stanzione
Published: LACSI Symposium 2006, HAPCW Workshop
Abstract: In a scientific community that increasingly relies upon High Performance Computing (HPC) for large scale simulations and analysis, the reliability of hardware and applications devoted to HPC is extremely important. While hardware reliability is not likely to dramatically increase in the coming years, software must be able to provide the reliability required by demanding applications. One way to increase the reliability of HPC systems is to use checkpointing to save the state of an application. If the application fails for some reason (hardware or software errors), the application can be restarted from the most recent checkpoint. This paper presents Dynamic Virtual Clustering as a platform to enable completely transparent parallel checkpointing.
 

Dynamic Virtual Clustering with OSCAR

Authors: Geoffroy Vallee, Wesley Emeneker, Thomas Naughton, Stephen Scott, Dan Stanzione
Abstract: System level virtualization solutions, such as Xen, are maturing and may also provide solutions of value to the high performance computing (HPC) environments. However, the management of virtual machines within such an environment is still a complex task, especially to address the issues of virtual machine definition and deployment, and dynamically creating independent clusters of virtual machines. This paper presents Xen-OSCAR (an extension of OSCAR for the management of Xen virtual machines) as a platform for Dynamic Virtual Clustering (DVC) with the Moab cluster scheduler.
 

Dynamic Virtual Clustering with Xen and Moab

Authors: Wesley Emeneker, Dave Jackson, Josh Butikofer, Dan Stanzione
Published: ISPA 2006, XHPC Workshop
Award: Best Paper
Abstract: As larger and larger commodity clusters for high performance computing proliferate at research institutions around the world, challenges in maintaining effective use of these systems also continue to increase. Among the many challenges are maintaining the appropriate software stack for a broad array of applications, and sharing workload across clusters. The Dynamic Virtual Clustering (DVC) system integrates the Xen virtual machine with the Moab scheduler to allow for creation of virtual clusters on a per-job basis. These virtual clusters can provide a unique software environment for a particular application, or can provide a consistent software environment across multiple heterogeneous clusters. In this paper, the overhead of Xen-based DVC vs. native cluster performance is examined for workloads consisting of both serial and MPI-based parallel jobs.

HPC Cluster Readiness of Xen and User Mode Linux

Authors: Wesley Emeneker, Dan Stanzione
Abstract: This paper examines the suitability of different virtualization techniques in a high performance cluster environment. A survey of virtualization techniques is presented. Two representative technologies (Xen and User Mode Linux) are selected for an in depth analysis of cluster readiness in terms of their performance, reliability, and their overall impact on complexity of cluster administration.

Related Readings (To be greatly expanded)

The overarching goal of DVC is cluster spanning and forwarding. This is motivated by research done by Will Jones at Clemson University showing that spanning and forwarding can increase throughput.
See these papers:
Bandwidth-aware Co-allocating Meta-schedulers for Mini-grid Architectures
Characterization of Bandwidth-aware Meta-schedulers for Co-allocating Jobs Across Multiple Clusters

These papers provide the basis for this work in that they describe situations where coallocating and coscheduling among a pool of cluster resources can improve the overall quality of service in terms of throughput and turnaround.
Since we are using virtual machines to accomplish this, see these papers for an overview:
Xen and the Art of Virtualization
User Mode Linux

Obviously there are many more papers on Xen, scheduling, etc., but these provide a background on what we are attempting to tie together in order to make DVC work.