Research


Modelling and Verifying an Adaptive Failure-aware Scheduler for Hadoop

 
Mbarka Soualhia, Foutse Khomh and Sofiene Tahar

Contact: soualhia@encs.concordia.ca

Hadoop has become the framework of choice on many off-the-shelf clusters for processing large data in the cloud. However, because of the complexity and dynamic nature of the cloud, failures are common and these failures often impact the performance of Hadoop. Although Hadoop possesses built-in failure detection and recovery mechanisms, several scheduled jobs still fail because of unforeseen events. An effective Hadoop scheduler requires a proactive response to changes in the cloud which would reduce the failures rate of tasks. Traditionally, simulation is the most commonly used automated technique to check the behavior of Hadoop scheduler. However, it cannot give clear understanding and exhaustive coverage especially when failure occur. In this research project, we present a novel approach for modelling and verifying an adaptive failure-aware Hadoop scheduler. First, we propose to use machine learning techniques and Markovian Decision Process to adapt scheduling decisions of Hadoop to events occurring in the cloud. Second, we propose to formally verify some critical properties in our proposed scheduler using model checking techniques.

some text

Publications

 
  1. M. Soualhia, F. Khomh, and S. Tahar: ATLAS: An Adaptive Failure-Aware Scheduler for Hadoop, In: IEEE International Performance Computing and Communications Conference (IPCCC’15), Nanjing, China, December 2015, pp. 1-8.

  2. M. Soualhia, F. Khomh, and S. Tahar: Predicting Scheduling Failures in the Cloud: A Case Study with Google Clusters and Hadoop on Amazon EMR, In: IEEE High Performance Computing and Communications (HPCC'15), New York, USA, August 2015, pp. 58-65.

  3. M. Soualhia, F. Khomh, and S. Tahar: ATLAS: An Adaptive Failure-Aware Scheduler for Hadoop, Technical Report, Department of Electrical and Computer Engineering, Concordia University, November 2015. [24 Pages].

  4. M. Soualhia, F. Khomh, and S. Tahar: Predicting Scheduling Failures in the Cloud, Technical Report, Department of Electrical and Computer Engineering, Concordia University, July 2015. [26 Pages].

 
 

Concordia University