On the Formal Analysis of HMM using Theorem Proving
Contact: liy_liu@ece.concordia.ca , vincent@ece.concordia.ca
Abstract
Hidden Markov Models (HMMs) have been widely utilized for modeling time series data in various engineering and biological systems. The analyses of these models are usually conducted using computer simulations and paper-and-pencil proof methods and, more recently, using probabilistic model-checking. However, all these methods either do not guarantee accurate analysis or are not scalable. As an alternative, we propose to use higher-order logic theorem proving to reason about properties of discrete HMMs by applying automated verification techniques. This paper presents some foundational formalizations in this regard, namely an extended-real numbers based formalization of finite-state Discrete-Time Markov chains and HMMs along with the verification of some of their fundamental properties. The distinguishing feature of our work is that it facilitates automatic verification of systems involving HMMs. For illustration purposes, we utilize our results for the formal analysis of a DNA sequence.HOL Code
- dtmcBasic.sml (This file contains formalization of DTMC)
- hmmScript.sml (This file contains the formalization of hidden Markov model)
- setUsefulScript.sml (This file contains the underlying subtheorems)
- cond_probScript.sml (This file contains the useful condition probability theorem)
- dnaApp.sml (This file contains the formal verification of a DNA sequence)
- autohmm.sml (This file contains the script for finding best path automatically)
Foundational Formalization
Application
Formal Analysis of DNA Sequence
5' Splice Site Recognition Model
-
A DNA fragment is a sequence of proteins called A, T, G and C.
In our application, we consider a DNA sequence with hidden states E(exon), 5(5'splice site), I(intron). The model of this DNA sequence is described in the following figure.
Automating the HOL Computations
Our formalization of HMMs also allows to reduce user intervention in formal modelling and analysis of real-world systems that can be expressed in terms of HMMs. To facilitate this process further, we introduce some automatic simplifiers to make the proposed method a very practical solution for the formal analysis of HMMs.