Research

Formal Verification of Hadoop Schedulers

 
Mbarka Soualhia, Foutse Khomh and Sofiene Tahar

Contact: soualhia@ece.concordia.ca

The Hadoop scheduler is a centerpiece of Hadoop, the leading processing framework for data-intensive applications in the cloud. Given the impact of failures on the performance of applications running on Hadoop, testing and verifying the performance of the Hadoop scheduler is critical. Existing approaches such as performance simulation and analytical modeling are inadequate because they are not able to ascertain a complete verification of an Hadoop scheduler. This is due to the wide range of constraints and aspects involved in Hadoop. In this paper, we propose a novel methodology using model checking techniques to perform a formal verification of Hadoop schedulers, focusing on the following properties : schedulability, fairness and resources-deadlock freeness. We use the CSP language to formally describe a Hadoop scheduler, and the PAT model checker to verify its properties. Thereafter, we apply our proposed approach on the scheduler of OpenCloud, a Hadoop-based cluster, in order to illustrate the usability and benefits of our work. Results show that our proposed methodology can help identify several tasks failures (up to 78%) early on, i.e., before the tasks are executed on the cluster.





Downlaod Code

 
 
 

Concordia University