ISSN: 2090-4924
Mohamed H Ibrahim and Ahmed M Khedr
Gene sequence classification is a well-known problem that impacts several sub-disciplines of Bioinformatics including functional genomics and gene expression data analysis. In gene classification task gene families are frequently formulated using large Generalized Hidden Markov Models (GHMMs) representing a bottleneck for any decoding method and weakening its efficiency. Thus an efficient decoding of such GHMMs remains a key challenge. In this paper, we introduce a new pruned-based strategy for improving the decoding of GHMM using pruning techniques. We focus on viterbi decoding algorithm but the strategy is applicable to GHMM decoding in general. Unlike standard decoding methods, a paradigm shift from screening to-wards recognition is first performed to integrate all considered models into a combined state space. Then the decoding process is limited to the activated states within a beam around the optimal solution to significantly reduce the computational e ort, and thus greatly speeding up the model decoding. Our experiment on Eukaryotic gene demonstrates the e activeness of our approach for speeding up gene classification task.