Document details

Protein sequence pattern mining with constraints

Author(s): Ferreira, Pedro Gabriel cv logo 1 ; Azevedo, Paulo J. cv logo 2

Date: 2005

Persistent ID: http://hdl.handle.net/1822/6293

Origin: RepositóriUM - Universidade do Minho

Subject(s): Bioinformatics; Databases


Description
Considering the characteristics of biological sequence databases, which typically have a small alphabet, a very long length and a relative small size (several hundreds of sequences), we propose a new sequence mining algorithm (gIL). gIL was developed for linear sequence pattern mining and results from the combination of some of the most efficient techniques used in sequence and itemset mining. The algorithm exhibits a high adaptability, yielding a smooth and direct introduction of various types of features into the mining process, namely the extraction of rigid and arbitrary gap patterns. Both breadth or a depth first traversal are possible. The experimental evaluation, in synthetic and real life protein databases, has shown that our algorithm has superior performance to state-of-the art algorithms. The use of constraints has also proved to be a very useful tool to specify user interesting patterns
Document Type Article
Language English
delicious logo  facebook logo  linkedin logo  twitter logo 
degois logo
mendeley logo

Related documents



    Financiadores do RCAAP

Fundação para a Ciência e a Tecnologia Universidade do Minho   Governo Português Ministério da Educação e Ciência Programa Operacional da Sociedade do Conhecimento EU