AMORA: An Advanced Malleable and Operational Framework for Performance
Prediction of Big Data Systems
Abstract
In the data era, big data systems have emerged as pivotal tools,
underscoring the importance of performance prediction in enhancing the
efficiency of big data clusters. Numerous performance models have been
proposed, often grounded in artificial intelligence or simulation
methodologies. While the buck of research focuses on refining prediction
precision and minimizing overhead, limited attention has been given to
the consignation and standardization of these models. To bridge this gap
between model developers and end-users, this paper introduces AMORA—a
novel versatile framework tailored for predicting the performance of big
data systems. Leveraging the identified Behavior
Descriptions-Computation Submodels (BD-CS) pattern that is prevalent
among various big data job performance models, AMORA allows access to
different plugins accommodating different performance models’
implementations. This framework also integrates a novel mutable
computation graph technique to facilitate backtracking computation.
Furthermore, AMORA’s functionality extends to comprehensive end-to-end
usability by enabling the acceptance of origin configuration files from
diverse big data systems and presenting easily interpretable prediction
reports. This work demonstrates AMORA’s efficacy in producing an
accurate trace of Hadoop job through the selection of appropriate
performance model plugins and parameter adjustments and showcasing the
application of the proposed mutable computation graph technique in
calculating the starting moment of an early-start reducer. Additionally,
two validation experiments are conducted, involving the implementation
of various Hadoop and Spark performance models, respectively, to exhibit
AMORA’s role as a benchmark platform for implementing various types of
big data job performance models catered to diverse big data systems.