基于公共子序列的OPSM双聚类算法

A New OPSM Algorithm Based on All Common Subsequences

  • 摘要: OPSM作为一种基于模式的双聚类方法,被广泛应用于基因数据矩阵的分析上.在一个OPSM聚类中,形成聚类的若干基因在特定的条件子集下具有一致的表达模式,其中隐含着基因的关联调控信息,对基因数据矩阵进行双聚类分析具有生物学意义.其中,Deep OPSM 是OPSM 聚类中行数少列数多的特殊聚类.根据OPSM模型,该文提出了一种快速有效的精确性算法,用于挖掘分散在基因数据矩阵中的OPSM聚类.首先寻找基因数据矩阵中任意两行的公共子序列,然后利用STL map对找到的公共子序列进行支持度统计,并将符合支持度阈值的OPSM聚类输出,且通过阈值的设置即可输出Deep OPSMs.结果证明该算法能够快速地找到符合条件的Deep OPSMs.通过P-value值分析,验证了找到的Deep OPSM具有明显的生物学意义.

     

    Abstract: As a pattern-based biclustering model, OPSMs (Order preserving submatrices) are widely used in the analysis of gene expression data matrix. An OPSM cluster is of great biological significance since its genes exhibit coherent expression patterns in certain conditions, which implies the genes share associated control information. Deep OPSM is a special case of OPSM clusters which have fewer rows and comparatively more columns. A fast and exact algorithm is put forward to mine OPSM clusters dispersed in the gene data matrix in this paper. Firstly, search the common subsequences of arbitrary two rows in the gene matrix; then count the support of the common subsequences by the STL map; finally output the OPSM clusters meeting the support threshold, which includes Deep OPSMs by setting different threshold. Experiments show that the algorithm can rapidly find the qualified Deep OPSMs, which is of biological significance through the p-value analysis.

     

/

返回文章
返回