Hardware Verification Group Home > Research > Methodologies and Frameworks >
Modelling and Verifying an Adaptive Failure-aware Scheduler for Hadoop
Mbarka Soualhia,
Foutse Khomh and
Sofiene Tahar
Contact:
soualhia@encs.concordia.ca
Hadoop has become the framework of choice on many off-the-shelf clusters for processing large data in the cloud.
However, because of the complexity and dynamic nature of the cloud, failures
are common and these failures often impact the performance of Hadoop.
Although Hadoop possesses built-in failure detection and recovery mechanisms,
several scheduled jobs still fail because of unforeseen events.
An effective Hadoop scheduler requires a proactive response to changes in
the cloud which would reduce the failures rate of tasks.
Traditionally, simulation is the most commonly used automated technique to check the
behavior of Hadoop scheduler. However, it cannot give clear understanding and
exhaustive coverage especially when failure occur.
In this research project, we present a novel approach for modelling and verifying
an adaptive failure-aware Hadoop scheduler. First, we propose to use machine
learning techniques and Markovian Decision Process to adapt scheduling decisions of Hadoop to events occurring
in the cloud.
Second, we propose to formally verify some critical properties in our proposed scheduler using model checking techniques.
- M. Soualhia, F. Khomh, and S. Tahar: ATLAS: An Adaptive Failure-Aware Scheduler for Hadoop, In: IEEE International Performance Computing and Communications Conference (IPCCC’15), Nanjing, China, December 2015, pp. 1-8.
- M. Soualhia, F. Khomh, and S. Tahar: Predicting Scheduling Failures in the Cloud: A Case Study with Google Clusters and Hadoop on Amazon EMR, In: IEEE High Performance Computing and Communications (HPCC'15), New York, USA, August 2015, pp. 58-65.
- M. Soualhia, F. Khomh, and S. Tahar: ATLAS: An Adaptive Failure-Aware Scheduler for Hadoop, Technical Report, Department of Electrical and Computer Engineering, Concordia University, November 2015. [24 Pages].
- M. Soualhia, F. Khomh, and S. Tahar: Predicting Scheduling Failures in the Cloud, Technical Report, Department of Electrical and Computer Engineering, Concordia University, July 2015. [26 Pages].