TY - UNPB
T1 - Motif discovery in ranked lists of sequences
AU - Nielsen, Morten Muhlig
AU - Tataru, Paula
AU - Madsen, Tobias
AU - Hobolth, Asger
AU - Pedersen, Jakob Skou
PY - 2016
Y1 - 2016
N2 - Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs with sequence rank. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact probabilities for motif observations in sequences. Motif enrichment is optionally calculated using random walk, Brownian bridge, or modified rank based statistics. These features make Regmex well suited for a range of biological sequence analysis problems related to motif discovery, exemplified by microRNA seed enrichment, but also including enrichment problems involving complex motifs and combinations of motifs. We demonstrate a number of usage scenarios that take advantage of the regular expression feature, including enrichments for combinations of different microRNA seed sites. The method is implemented and made publicly available as an R package and supports high parallelization on multi-core machinery.
AB - Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs with sequence rank. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact probabilities for motif observations in sequences. Motif enrichment is optionally calculated using random walk, Brownian bridge, or modified rank based statistics. These features make Regmex well suited for a range of biological sequence analysis problems related to motif discovery, exemplified by microRNA seed enrichment, but also including enrichment problems involving complex motifs and combinations of motifs. We demonstrate a number of usage scenarios that take advantage of the regular expression feature, including enrichments for combinations of different microRNA seed sites. The method is implemented and made publicly available as an R package and supports high parallelization on multi-core machinery.
M3 - Working paper
SP - 1
EP - 4
BT - Motif discovery in ranked lists of sequences
ER -