Research

On the Formal Analysis of HMM using Theorem Proving

 
Liya Liu, Vincent Aravantinos, Osman Hasan and Sofiene Tahar

Contact: liy_liu@ece.concordia.ca , vincent@ece.concordia.ca

Abstract

Hidden Markov Models (HMMs) have been widely utilized for modeling time series data in various engineering and biological systems. The analyses of these models are usually conducted using computer simulations and paper-and-pencil proof methods and, more recently, using probabilistic model-checking. However, all these methods either do not guarantee accurate analysis or are not scalable. As an alternative, we propose to use higher-order logic theorem proving to reason about properties of discrete HMMs by applying automated verification techniques. This paper presents some foundational formalizations in this regard, namely an extended-real numbers based formalization of finite-state Discrete-Time Markov chains and HMMs along with the verification of some of their fundamental properties. The distinguishing feature of our work is that it facilitates automatic verification of systems involving HMMs. For illustration purposes, we utilize our results for the formal analysis of a DNA sequence.

HOL Code

 

    Foundational Formalization

    Application

    Formal Analysis of DNA Sequence

    • dnaApp.sml (This file contains the formal verification of a DNA sequence)

    5' Splice Site Recognition Model

      A DNA fragment is a sequence of proteins called A, T, G and C. In our application, we consider a DNA sequence with hidden states E(exon), 5(5'splice site), I(intron). The model of this DNA sequence is described in the following figure.
    some text

    Automating the HOL Computations

    Our formalization of HMMs also allows to reduce user intervention in formal modelling and analysis of real-world systems that can be expressed in terms of HMMs. To facilitate this process further, we introduce some automatic simplifiers to make the proposed method a very practical solution for the formal analysis of HMMs.
    • autohmm.sml (This file contains the script for finding best path automatically)
 
 

Concordia University