Abstract:
There are a lot of redundant data in Exon/Intron Database (EID) based on GenBank. In order to resolve this puzzle, a non-redundant EID is constructed based on RefSeq. RefSeq is a sequence database maintained and renewed by NCBI staff for medical, functional, and diversity studies, providing a consistent reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis, expression studies, and comparative analyses. This EID is a good choice for large-scale computational investigation of exon/intron structure and splicing. It has many internal filters that could control for sequence quality, consistency of gene descriptions, accordance with standards, and possible errors. New modification also includes data of untranslated regions (UTR) of gene sequences as well. Here some issues on the construction of non-redundant EID are addressed.