Presenter
Abstract:
Next Generation Sequencing technologies differ from each other in many variables. A choice between short-fragment sequencing or long-fragment sequencing technologies requires a choice between the accuracy rate or length of the fragments. In this study, we developed Hercules, a unique algorithm. Hercules is the first algorithm to use machine learning technique to correct errors in long fragments. Researchers generally correct errors in long pieces with short pieces. Current correction methods based on graph structure and alignment ignore the error profile of sequencing technology. Machine learning techniques that handle the error profile and are memory and time efficient have the potential to better fix errors and better combine both technologies. The algorithm we present designs each long segment as a Hidden Markov Model with a profile appropriate to the error profile of the technology used. Our algorithm learns and changes the transition and emission probabilities for all long segments, allowing errors in long segments to be corrected. Using two datasets from DNA sequencing (CH17-157L1 and CH17-227A2) and one dataset from RNA sequencing (human brain cerebellum polyA), we found that the fragments debugged by Hercules had the highest mapping rate and the largest size of long fragments compared to debugging using other algorithms. We showed that it has the highest error rate when the section is covered with short pieces.
Date: May 23rd 2019 – 17:00
Language: Turkish