Author(s):
Célia Gonçalves
; Rui Camacho
; Eugénio Oliveira
Date: 2011
Persistent ID: http://hdl.handle.net/10216/67120
Origin: Repositório Aberto da Universidade do Porto
Description
Whenever new sequences of DNA or proteins have been decoded it is almost compulsory to look at similar
sequences and papers describing those sequences in order to both collect relevant information concerning
the function and activity of the new sequences and/or know what is known already about similar sequences.
In current web sites and data bases of sequences there are, usually, a set of curated paper references linked
to each sequence. Those links are a good starting point to look for relevant information related to a set of
sequences. One way to implement such approach is to do a blast with the new decoded sequences, and collect
similar sequences. Then one looks at the papers linked with the similar sequences. Most often the number
of retrieved papers is small and one has to search large data bases for relevant papers. This paper proposes
a process of generating a classifier based on the initially set of relevant papers. First, the authors collect
similar sequences using an alignment algorithm like Blast. Then, the authors use the enlarges set of papers
to construct a classifier. Finally a classifier is used to automatically enlarge the set of relevant papers by
searching the MEDLINE using the automatically constructed classifier. Whenever new sequences of DNA or proteins have been decoded it is almost compulsory to look at similar
sequences and papers describing those sequences in order to both collect relevant information concerning
the function and activity of the new sequences and/or know what is known already about similar sequences.
In current web sites and data bases of sequences there are, usually, a set of curated paper references linked
to each sequence. Those links are a good starting point to look for relevant information related to a set of
sequences. One way to implement such approach is to do a blast with the new decoded sequences, and collect
similar sequences. Then one looks at the papers linked with the similar sequences. Most often the number
of retrieved papers is small and one has to search large data bases for relevant papers. This paper proposes
a process of generating a classifier based on the initially set of relevant papers. First, the authors collect
similar sequences using an alignment algorithm like Blast. Then, the authors use the enlarges set of papers
to construct a classifier. Finally a classifier is used to automatically enlarge the set of relevant papers by
searching the MEDLINE using the automatically constructed classifier.