Discovering Coding lncRNAs Using Deep Learning Training Dynamics


Afshan Nabi

Afshan is a machine learning engineer at OccamzRazor. She completed her MS in Computer Science from Sabanci University and her BS in Molecular Biology & Genetics from Bilkent University. She is interested in applying machine learning to solve problems in computational biology.


Long non-coding RNAs (lncRNAs) are the largest class of non-coding RNAs (ncRNAs). However, recent experimental evidence has shown that some lncRNAs contain small open reading frames (sORFs) that are translated into functional micropeptides. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (ribo-seq) experiments, which are expensive and cell-type dependent. We present a framework that leverages deep learning models’ training dynamics to determine whether a given lncRNA transcript is misannotated. Our deep sequential learning models achieve AUC scores >91% and AUPR >93% in classifying non-coding vs. coding sequences while allowing us to identify possible misannotated lncRNAs present in the dataset. Our results overlap significantly with a set of experimentally validated misannotated lncRNAs as well as with coding sORFs within lncRNAs found by a ribo-seq dataset. The methodology offers promising potential for assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs and for curating better datasets for building coding potential predictors.

Date: October 14th, 2021 – 6:00 PM (GMT+3)

Language: English

You can register for the webinar here !

Leave a Reply