Abstract:
As a pattern-based biclustering model, OPSMs (Order preserving submatrices) are widely used in the analysis of gene expression data matrix. An OPSM cluster is of great biological significance since its genes exhibit coherent expression patterns in certain conditions, which implies the genes share associated control information. Deep OPSM is a special case of OPSM clusters which have fewer rows and comparatively more columns. A fast and exact algorithm is put forward to mine OPSM clusters dispersed in the gene data matrix in this paper. Firstly, search the common subsequences of arbitrary two rows in the gene matrix; then count the support of the common subsequences by the STL map; finally output the OPSM clusters meeting the support threshold, which includes Deep OPSMs by setting different threshold. Experiments show that the algorithm can rapidly find the qualified Deep OPSMs, which is of biological significance through the p-value analysis.