Dynamic Virtual Clustering (DVC)
Dynamic Virtual Clustering is a system that deploys virtual machines (VMs) in a multi-cluster environment to improve job workload performance.
The explosion of cluster computing for business and scientific applications has resulted in a common situation where a single research group or campus may have multiple independent clusters (a multi-cluster environment) within a small geographic area. Typically, each cluster is an autonomous and independent unit that has no interaction with other clusters. Each cluster also represents a significant investment of money and time, so effectively using cluster resources is highly desirable. Decreasing job queue time, decreasing job turnaround time, and increasing system throughput are three inter-related ways to more cost-effectively use clusters. Combining all clusters into a larger pool of schedulable resources will load balance each cluster's workload if necessary.
Load balancing cluster workloads has been theoretically shown to decrease job queue time, decrease turnaround time, and increase job throughput. Load balancing workloads across multiple clusters is accomplished with two techniques - Forwarding and Spanning
However, even though forwarding and spanning have been mathematically shown to improve workload performance, very little practical work into actually enabling them has been done. DVC uses virtual machines to forward and span real cluster jobs, and is able to decrease average job queue time, decrease average job turnaround time, and increase system throughput for a range of cluster workloads.
For more detailed information, see the detailed DVC information page.
Note: See work by
Will Jones
or Anca Bucur for theoretical results on cluster forwarding and spanning (co-allocation and co-scheduling).
Note: Forwarding is a special case of spanning where all spanned resources are located on one cluster.
Definitions
Forwarding - Lets us take a job submitted to one cluster, and run it on a different cluster without any changes to the job or the job requirements.
Properly done, this will take care of data staging, software compatibility, user accounts, etc.
Spanning - Lets us take a job and split the resources requested by the job among 2 or more clusters.
For example, we have a job requiring 32 processors.
Cluster 1 has 16 available processors and Cluster 2 has 16 available processors.
By using 16 processors each from Clusters 1 and 2, we have the 32 processors requested.