CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations

Young Kyoon Suh, Seounghyeon Kim, Jeeyoung Kim

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Efficient scheduling among simultaneous simulation jobs is of critical importance in the allocation of limited computing and I/O resources. The difficulty of predicting when a job is completed can cause nontrivial problems for system administrators and users e.g., squandered resources, long waiting times, and simulation plan delays. To alleviate these problems, we propose a novel simulation runtime estimation scheme termed CLUTCH, which employs a well-orchestrated ensemble of clustering, classification, and regression techniques. The proposed scheme trains a runtime estimation model through a series of steps: ( {i} ) grouping past simulation provenance records by clustering, (ii) labeling each of the grouped records by classification, and (iii) performing regression on the execution times in each group. Given a simulation and its external arguments, the trained model predicts the simulation's runtime with high accuracy in a black box fashion, using only basic external arguments without needing extra information. We additionally propose two optimization algorithms which significantly reduce training overhead without sacrificing estimation quality. In the experiment with real datasets, our model achieved approximately a 14.2% growth in estimation accuracy, compared to the most recent state-of-the-art method; with our optimizations applied, the model was trained 16 times faster while still retaining accuracy.

Original languageEnglish
Article number9281033
Pages (from-to)220710-220722
Number of pages13
JournalIEEE Access
Volume8
DOIs
StatePublished - 2020

Keywords

  • classification
  • clustering
  • ensemble machine learning
  • K-means
  • pre-processing
  • random forest
  • regression
  • simulation provenance
  • Simulation runtime estimation

Fingerprint

Dive into the research topics of 'CLUTCH: A Clustering-Driven Runtime Estimation Scheme for Scientific Simulations'. Together they form a unique fingerprint.

Cite this