Author(s):
Kolchinsky, A.
; Lourenço, Anália
; Li, L.
; Rocha, Luís M.
Date: 2013
Persistent ID: http://hdl.handle.net/1822/22130
Origin: RepositóriUM - Universidade do Minho
Description
Background. Drug-drug interaction (DDI) is a major cause of morbidity and mortality. DDI research
includes the study of different aspects of drug interactions, from in vitro pharmacology, which
deals with drug interaction mechanisms, to pharmaco-epidemiology, which investigates the effects of
DDI on drug efficacy and adverse drug reactions. Biomedical literature mining can aid both kinds of
approaches by extracting relevant DDI signals from either the published literature or large clinical
databases. However, though drug interaction is an ideal area for translational research, the inclusion
of literature mining methodologies in DDI workflows is still very preliminary. One area that can benefit
from literature mining is the automatic identification of a large number of potential DDIs, whose
pharmacological mechanisms and clinical significance can then be studied via in vitro pharmacology
and in populo pharmaco-epidemiology.
Experiments. We implemented a set of classifiers for identifying published articles relevant to
experimental pharmacokinetic DDI evidence. These documents are important for identifying causal
mechanisms behind putative drug-drug interactions, an important step in the extraction of large
numbers of potential DDIs. We evaluate performance of several linear classifiers on PubMed abstracts,
under different feature transformation and dimensionality reduction methods. In addition,
we investigate the performance benefits of including various publicly-available named entity recognition
features, as well as a set of internally-developed pharmacokinetic dictionaries.
Results. We found that several classifiers performed well in distinguishing relevant and irrelevant
abstracts. We found that the combination of unigram and bigram textual features gave better
performance than unigram features alone, and also that normalization transforms that adjusted for
feature frequency and document length improved classification. For some classifiers, such as linear
discriminant analysis (LDA), proper dimensionality reduction had a large impact on performance.
Finally, the inclusion of NER features and dictionaries was found not to help classification.