Dynamic Virtual Clustering Publications
For the BibTex file containing the following papers' references, see here.
Dynamic Virtual Clustering
Authors: Wesley Emeneker, Dan Stanzione
Published: IEEE Cluster 2007
Abstract:
Multiple clusters co-existing in a single research campus has become commonplace at many university and government labs, but effectively leveraging those resources is difficult.
Intelligently forwarding and spanning jobs across clusters can increase throughput, decrease turnaround time, and improve overall utilization.
Dynamic Virtual Clustering (DVC) is a system of virtual machines, deployed in a single or multi-cluster environment, to increase cluster utilization by enabling job forwarding and spanning, flexibly allow software environment changes, and effectively sandbox users and processes from each other and the system.
This paper presents both the initial implementation of DVC and performance results from synthetic workloads executed under DVC.
Thesis: Dynamic Virtual Clustering
Thesis Author: Wesley Emeneker
Abstract:
The increasing reliance on parallel computing for scientific simulation and experimentation has made high performance cluster computing an integral part of the scientific process.
The result of this explosion in demand is that multiple clusters often co-exist within the same research campus.
As more and more computational power is demanded by researchers, effectively leveraging existing cluster resources becomes critical to supplying needed cycles.
One way to improve cluster utilization is to forward and span jobs across multiple clusters.
However, software heterogeneity is often a barrier to implementing these capabilities.
Dynamic Virtual Clustering is presented as the solution to enable forwarding and spanning in arbitrary clusters by using virtual machines to abstract software environment differences.
Although the use of virtual machines incurs a performance loss, the ability to span and forward jobs improves throughput and turnaround thereby increasing the performance seen by common cluster workloads.
Increasing Reliability through Dynamic Virtual Clustering
Authors: Wesley Emeneker, Dan Stanzione
Published: LACSI Symposium 2006, HAPCW Workshop
Abstract: In a scientific community that increasingly relies upon High
Performance Computing (HPC) for large scale simulations and analysis,
the reliability of hardware and applications devoted to HPC is
extremely important.
While hardware reliability is not likely to dramatically increase in
the coming years, software must be able to provide the reliability
required by demanding applications.
One way to increase the reliability of HPC systems is to use
checkpointing to save the state of an application.
If the application fails for some reason (hardware or software errors),
the application can be restarted from the most recent checkpoint.
This paper presents Dynamic Virtual Clustering as a platform to enable
completely transparent parallel checkpointing.
Dynamic Virtual Clustering with OSCAR
Authors: Geoffroy Vallee, Wesley Emeneker, Thomas Naughton, Stephen Scott, Dan Stanzione
Abstract:
System level virtualization solutions, such as Xen, are maturing and
may
also provide solutions of value to the high performance computing (HPC)
environments.
However, the management of virtual machines within
such an environment is still a complex task, especially to
address the issues of virtual machine definition and deployment, and
dynamically creating independent clusters of virtual machines.
This paper presents Xen-OSCAR (an extension of OSCAR for the management
of Xen
virtual machines) as a platform for Dynamic Virtual Clustering (DVC)
with the
Moab cluster scheduler.
Dynamic Virtual Clustering with Xen and Moab
Authors: Wesley Emeneker, Dave Jackson, Josh Butikofer, Dan Stanzione
Published: ISPA 2006, XHPC Workshop
Award: Best Paper
Abstract:
As larger and larger commodity clusters for high performance computing
proliferate
at research institutions around the world, challenges in maintaining
effective use
of these systems also continue to increase. Among the many challenges
are
maintaining the appropriate software stack for a broad array of
applications, and
sharing workload across clusters. The Dynamic Virtual Clustering (DVC)
system
integrates the Xen virtual machine with the Moab scheduler to allow for
creation
of virtual clusters on a per-job basis. These virtual clusters can
provide a
unique software environment for a particular application, or can
provide a
consistent software environment across multiple heterogeneous clusters.
In this
paper, the overhead of Xen-based DVC vs. native cluster performance is
examined
for workloads consisting of both serial and MPI-based parallel jobs.
HPC Cluster Readiness of Xen and User Mode Linux
Authors: Wesley Emeneker, Dan Stanzione
Abstract:
This paper examines the suitability of different virtualization
techniques in a high performance cluster environment. A survey of
virtualization
techniques is presented. Two representative technologies (Xen and User
Mode Linux)
are selected for an in depth analysis of cluster readiness in terms of
their
performance, reliability, and their overall impact on complexity of
cluster
administration.
Related Readings (To be greatly expanded)
The overarching goal of DVC is cluster
spanning and forwarding. This is motivated by research done by Will
Jones at Clemson University showing that spanning and forwarding can
increase throughput.
See these papers:
Bandwidth-aware
Co-allocating Meta-schedulers for Mini-grid Architectures
Characterization
of Bandwidth-aware Meta-schedulers for Co-allocating Jobs Across
Multiple Clusters
These papers provide the basis for this work in that they describe
situations where coallocating and coscheduling among a pool of cluster
resources can improve the overall quality of service in terms of
throughput and turnaround.
Since we are using virtual machines to accomplish this, see these
papers for an overview:
Xen and the Art of Virtualization
User Mode Linux
Obviously there are many more papers on Xen, scheduling, etc., but
these provide a background on what we are attempting to tie together in
order to make DVC work.