Discovering Coding lncRNAs Using Deep Learning Training Dynamics


Afshan Nabi

Afshan is a machine learning engineer at OccamzRazor. She completed her MS in Computer Science from Sabanci University and her BS in Molecular Biology & Genetics from Bilkent University. She is interested in applying machine learning to solve problems in computational biology.


Long non-coding RNAs (lncRNAs) are the largest class of non-coding RNAs (ncRNAs). However, recent experimental evidence has shown that some lncRNAs contain small open reading frames (sORFs) that are translated into functional micropeptides. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (ribo-seq) experiments, which are expensive and cell-type dependent. We present a framework that leverages deep learning models’ training dynamics to determine whether a given lncRNA transcript is misannotated. Our deep sequential learning models achieve AUC scores >91% and AUPR >93% in classifying non-coding vs. coding sequences while allowing us to identify possible misannotated lncRNAs present in the dataset. Our results overlap significantly with a set of experimentally validated misannotated lncRNAs as well as with coding sORFs within lncRNAs found by a ribo-seq dataset. The methodology offers promising potential for assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs and for curating better datasets for building coding potential predictors.

Date: October 14th, 2021 – 6:00 PM (GMT+3)

Language: English

You can register for the webinar here !

How Far Are We from the Rapid Prediction of Drug Resistance Arising Due to Kinase Mutations?


Mehmet Ergüven

Mehmet Ergüven finished his Bachelor’s studies in the field of protein biochemistry (Department of Biochemistry, Ege University, Izmir) in 2016. He then carried out his Master’s studies in Izmir Biomedicine and Genome Center in the field of cell biology and computational structural biology. After finishing his Master’s in 2019, he continued studying in the same place as a research assistant for one year. He is currently doing his PhD studies in the field of chemoenzymatic synthesis (Cells in Motion graduate School, University of Münster, Institute for Biochemistry, Münster).


Many protein kinases act in proliferative pathways. Consequently, point mutations occurring within the kinase’s ATP-binding site can lead to a constitutively active or drug-resistant enzyme, and ultimately, to cancer. Because of technical and economical limitations, rapid experimental exploration of the impact of such mutations remains to be a challenge. This underscores the importance of protein−ligand binding affinity prediction tools that are poised to measure the efficacy of inhibitors in the presence of kinase mutations. To this end, here, we compare the performances of six web-based scoring tools (DSX-ONLINE, KDEEP, HADDOCK2.2, PDBePISA, Pose&Rank, and PRODIGY-LIG) in assessing the impact of kinase mutations on their interactions with their inhibitors. This assessment is carried out on a new structure-based “BINDKIN” benchmark we compiled. BINDKIN contains wild-type and mutant crystal structure pairs of kinase−inhibitor complexes, together with their corresponding experimental binding affinities (in the form of IC50, Kd, and Ki). The performance of various web servers over BINDKIN shows that they cannot predict the binding affinities (ΔGs) of wild-type and mutant cases directly. Still, few of the web servers could catch whether a mutation improves or worsens the ligand binding (ΔΔGs), with Ki being the most predictable descriptor and DSX-ONLINE being the most accurate predictor. When homology models are used instead of Ki-associated crystal structures, DSX-ONLINE loses its predictive capacity. The results highlight that there is room to improve the available scoring functions to estimate the impact of protein kinase point mutations on inhibitor binding.

Date: July 16th, 2021 – 6:00 PM (GMT+3)

Language: English

You can register for the webinar here !

Minimizer-space de Bruijn Graphs


Barış Ekim

Baris Ekim is a PhD student in Electrical Engineering and Computer Science (EECS) at Massachusetts Institute of Technology, under the supervision of Bonnie Berger. He graduated with a double major in Computer Science and Molecular Biology and Mathematics from MIT in 2020, and partakes a general interest in developing novel algorithms for applications in bioinformatics and computational genomics. More specifically, his research focuses on designing efficient and accurate algorithms and developing software for next-generation sequencing (NGS) data.


DNA sequencing data continues to progress towards longer reads with increasingly lower sequencing error rates. We focus on the problem of assembling such reads into genomes, which poses challenges in terms of accuracy and computational resources when using cutting-edge assembly approaches, e.g. those based on overlapping reads using minimizer sketches. Here, we introduce the concept of minimizer-space sequencing data analysis, where the minimizers rather than DNA nucleotides are the atomic tokens of the alphabet. By projecting DNA sequences into ordered lists of minimizers, our key idea is to enumerate what we call k-min-mers, that are k-mers over a larger alphabet consisting of minimizer tokens. Our approach, mdBG or minimizer-dBG, achieves orders-of-magnitude improvement in both speed and memory usage over existing methods without much loss of accuracy. We demonstrate three uses cases of mdBG: human genome assembly, metagenome assembly, and the representation of large pangenomes. For assembly, we implemented mdBG in software we call rust-mdbg, resulting in ultra-fast, low memory and highly-contiguous assembly of PacBio HiFi reads. A human genome is assembled in under 10 minutes using 8 cores and 10 GB RAM, and 60 Gbp of metagenome reads are assembled in 4 minutes using 1 GB RAM. For pangenome graphs, we newly allow a graphical representation of a collection of 661,405 bacterial genomes as an mdBG and successfully search it (in minimizer-space) for anti-microbial resistance (AMR) genes. We expect our advances to be essential to sequence analysis, given the rise of long-read sequencing in genomics, metagenomics and pangenomics.

Date: June 25th, 2021 – 8:00 PM (GMT+3)

Language: English

To register the webinar, you can visit this link:

Predictive Cell-Specific Gene Regulatory Models


Asst. Prof. Hatice Ülkü Osmanbeyoğlu

Hatice Ülkü Osmanbeyoğlu is an Assistant Professor of the Biomedical Informatics Department and UPMC Hillman Cancer Center at University of Pittsburgh Medical School. Her research focuses on developing data-driven computational approaches to understand disease mechanisms in order to assist in the development of personalizing anticancer treatments. Previously, she was a postdoctoral research associate at Memorial Sloan Kettering Cancer Center (MSKCC). She obtained her Ph.D. in Biomedical Informatics from University of Pittsburgh and holds a MS degree in Electrical and Computer Engineering from Carnegie Mellon University and MS in Bioengineering from University of Pittsburgh. She completed her BS in Computer Engineering from Northeastern University (Summa Cum Laude). She is a recipient of the NIH NCI Pathway to Independence Award, Memorial Sloan Kettering Postdoctoral Research Award and the Innovation in Cancer Informatics Award.


The identity and functions of specialized cell types are dependent on the complex interplay between signaling and transcriptional networks. Recently single-cell technologies such as CITE-seq have been developed that enable simultaneous quantitative analysis of cell-surface receptor expression with transcriptional states. To date, these datasets have not been used to systematically develop cell-context-specific maps of the interface between signaling and transcriptional regulators orchestrating cellular identity and function. We present SPaRTAN (Single-cell Proteomic and RNA based Transcription factor Activity Network), a computational method to link cell-surface receptors to transcription factors (TFs) by exploiting cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) datasets with cis-regulatory information. SPaRTAN is applied to immune cell types in the blood to predict the coupling of signaling receptors with cell context-specific TFs. The predictions are validated by prior knowledge and flow cytometry analyses. SPaRTAN is then used to predict the signaling coupled TF states of tumor infiltrating CD8+ T cells in malignant peritoneal and pleural mesotheliomas. SPaRTAN greatly enhances the utility of CITE-seq datasets to uncover TF and cell-surface receptor relationships in diverse cellular states.

Date: June 11th, 2021 – 7:00 PM (GMT+3)

Language: English

To register the webinar, you can visit this link:

A genuine type of plotting with ggplot2: Part-1

We continue with the tutorial posts regarding data visualization. The first post was about Volcano Plot and how to interpret (in Turkish). Today, I will share the modified version of one example(original codes are here) I’ve used recently while I was writing a literature review for my research proposal.

mtcars is a very well known dataset used to give example visualizations and analysis in R, however I prefer to use specific hypothetical sample dataset to generate the graph given below. Hopefully it helps you as well. Enjoy with the cool plots!

# First of all, don't forget to set your working directory to the location where your files that you want to work with are located on your computer. Otherwise, you might get an error message specifying that the file is not found. Might be like this:
# Install ggplot2 first
# load the library

# Let's say your file is in txt format, which means they are tab separated, so we will use separator (sep) in \t format 

mdata=read.table("methodtimeline.txt", header=TRUE, sep="\t")

#Tip: Let's say your row  names have more than one word, and you forget to add separator as \t, you will get an error. Because the default of sep is "", which means a space, so when there is a space between the words of same row, your code is likely to not run properly. So make sure that you use the right separator depending on the file and its format.

#Let's see how this small dataset looks like


# How does it look like when you call it:
     Methods  PubDate log10ofCellCount  Type DetectionLimit
1  et han al 1/1/2010              0.7 siRNA            200
2  met et al 1/2/2012              2.4 siRNA            300
3 isop et al 1/3/2016              1.8 siRNA             20
4    T et al 2/1/2014              1.9  tRNA             25
5  ABC et al 2/2/2017              2.3 siRNA           3500
6  XYZ et al 2/3/2011              1.5  tRNA             45
7    X et al 3/1/2019              3.8 siRNA              1
8    Y et al 3/2/2021              1.2 siRNA            100
9    Z et al 3/2/2021              2.1 piRNA             40

# or

# I will call the data from my computer. However, since it is a small dataset, you can generate on your computer and run it on your computer as well. Just make sure that you call with the name you use for the file.
# Let's give the row names

rownames(mdata) <- mdata$Methods

# Then let's look at the data once more

# Did you notice anything different? YES, the row names!
            PubDate log10ofCellCount  Type DetectionLimit
et han al  1/1/2010              0.7 siRNA            200
met et al  1/2/2012              2.4 siRNA            300
isop et al 1/3/2016              1.8 siRNA             20
T et al    2/1/2014              1.9  tRNA             25
ABC et al  2/2/2017              2.3 siRNA           3500
XYZ et al  2/3/2011              1.5  tRNA             45
X et al    3/1/2019              3.8 siRNA              1
Y et al    3/2/2021              1.2 siRNA            100
Z et al    3/2/2021              2.1 piRNA             40

# Let's learn little more about our hypothetical data and our aim by the plot we want to generate. KNOWING WHAT YOU are DEALING with is one of the most IMPORTANT PARTs of the ANALYSIS. Also the purpose of the analysis...

# So, in this specific example, we want to demonstrate the change of the detection limit and cell throughput for specific set of target RNA types for given publication date of the methods stated by row names (Something similar to Svensson et al. (Nature, 2018): Exponential scaling of single-cell RNA-seq in the past decade).
# Name of the article is given by the first author surnames as row names (Methods), publication date is specified in the PubDate column, cell throughput is given in log10 base (log10ofCellCount), RNA target is given in Type columns, and number of given RNA type detected is shared in the column named as DetectionLimit.
# Let's start with a basic graph

mdataplot <- ggplot(mdata, aes(x=PubDate, y=log10ofCellCount, col=Type, size=DetectionLimit)) +
  geom_point(color = 'red') + #you can change the color of the dots
  theme_bw(base_size = 10) #you can change your theme such as theme_classic

# Let's see how it looks like
# Let's make it even cooler by adding annotations using ggplot2::geom_text

mdataplot + geom_text(aes(label = rownames(mdata)),
              size = 2.5, show.legend = TRUE)
# You can show the legends as well

mdataplot + geom_text(aes(label = rownames(mdata)),
                      size = 2.5, show.legend = TRUE) #by changing the preference #for show.legend=TRUE
# Use ggrepel::geom_text_repel to add some fancy boxes for the labels

mdataplot + geom_label_repel(aes(col=Type, label = rownames(mdata),
                         fill = factor(Type)), color = 'white',
                         size = 3) +
  theme(legend.position = "bottom") #You can adjust the legend position as well
# I personally dont like the red dots and so I will use black dots and change the theme for classic (so there will be no gridlines)

mdataplot <- ggplot(mdata, aes(x=PubDate, y=log10ofCellCount, col=Type, size=DetectionLimit)) +
  geom_point(color = 'black') + #you can change the color of the dots
  theme_bw(base_size = 10) #you can change your theme such as theme_classic

#Let's see how it looks like
# Let's finalize it by adding titles and relevant axis labels
# Let's use ggrepel::geom_label_repel and change color by groups
mdataplot + geom_label_repel(aes(col=Type, label = rownames(mdata),
                         fill = factor(Type)), color = 'white',
                     size = 3) +
  theme(legend.position = "top") + #let's put the label on the top of the plot
  # add/change the titles
  ggtitle("Cell and RNA Throughput by Method") +
  xlab("Publication Date") + ylab("log10(Number of Cells)") + labs(fill = "RNA Type") 

# The size of the points depicts the detection limit whereas the location in the y-axis shows the number of cells sequenced in parallel. x-axis shows the publication date.

For more information, you can visit these websites:


For more of these useful ggplot2 plots:


Wanna learn more about the very basics of ggplot2 first, but don’t know where to start? Fret not! Go and check our github page for ggplot workshop (previously given by Melike Dönertaş) .

Palaeogenomic Investigation of the Past Human Populations in Northeast Asia


Dr. Gülşah Merve Kılınç

Gülşah Merve Kılınç graduated from Hacettepe University, Department of Biology and received her PhD in Molecular Biology and Genetics from Bilkent University. She worked as postdoctoral researcher at CompEvo Lab in the Department of Biology at Middle East Technical University. She continued her postdoctoral studies at Stockholm University at the Department of Archaeology and Classical Studies. Her research focuses on analysing ancient human genome sequences from different time periods extending from Palaeolithic to present-day to better understand the human population genetic history. She uses ancient DNA from different parts of the world to infer the migrations and genetic structure of past human populations. Her research published in prestigious journals such as Science Advances, Current Biology and Plos Biology.


Over the last two decades, ancient DNA has revealed a fascinating snapshot of the genetic variation and population dynamics of past human populations. In this talk, I will present human population dynamics in ancient northeast Asia, inferred from the ancient human genomes spanning a time period between approximately 17,000 and 550 years ago. We produced genome sequences of 40 ancient individuals from different parts of northeast Asia covering Yakutia and Lake Baikal and revealed previously unknown gene flow and admixture events during a time period between the Late Upper Palaeolithic and Iron Age. Our genetic data further provided the first direct genetic evidence for the ancestors of Palaeo-Inuits who spread eastwards from Siberia and launched the second wave of migration into the Americas. We also discover the presence of Yersinia pestis, plague-related bacterium, in ancient Northeast Asia. This talk will cover all these recent findings and show the dynamic population structure in ancient northeast Asia.

Date: May 18th, 2021 – 3:00 pm (GMT+3)

Language: English

To register the webinar, you can visit this link:

Share your announcement with us!

RSG Turkey aims to bring your available positions or collaborations to a wider audience!

You can fill in the form below for the announcement of available positions in your lab or a collaboration.

RSG Turkey team will share it on the web page, social media accounts, slack channel and during the symposium event.

We wish you good luck!


Species-Specific Allosteric Drug Design on Glycolytic Enzymes and Analysis by Molecular Dynamics Simulations


Reyhan Metin
Metehan Çelebi


The glycolytic pathway is the most basic energy production mechanism common to all living things. A drug designed by targeting a bacterial glycolytic protein does not interact with the protein in humans, even if the same protein and the same mechanism are found in both organisms. Since catalytic regions are vital to the function of the protein, these regions are more conserved during evolution, while allosteric regions are more open to structural or sequential changes. The Constraint Molecular Dynamics simulation method has been used to examine the effects of molecules that provide allosteric inhibition of the glycolytic enzymes. Unlike the other MD simulations, this method is not require addition of a parameterized ligand in to the system. Instead, residues known to interact with the ligand are restricted in the specified region, limiting the movement of the residues as if there was a ligand there. In this study, we clarify how glycolytic enzymes are allosterically inhibited using various analysis methods.

Date: May 7th, 2021 – 5:00 pm (GMT+3)

Language: English

To register the webinar, you can visit this link:

The Role of Cholinergic Signals in Cancer Biology


Assoc. Prof. ‪Özlen Konu

Dr. Ozlen Konu graduated from Middle East Technical University, Turkey, in 1987 with a B.S. degree in Biological Sciences. Dr. Konu pursued her graduate studies in the Biology Department at Texas Tech University, USA, and received her M.S. and Ph.D. degrees in 1992 and 1999, respectively. She was a postdoctoral research fellow at the University of Tennessee at Memphis, USA, during 2000-2002. Since September 2002, she is a faculty member at the Department of Molecular Biology and Genetics at Bilkent University. Dr. Konu’s research interests include gene expression data analysis and meta-analysis with respect to cholinergic signaling as it applies to addiction and cancer, and comparative expression profiling using zebrafish model.


Cholinergic signals, endogenously by acetylcholine and exogenously by nicotine, act on nicotinic acetylcholine (nAChRs) receptors and may modulate cellular activity, proliferation and death. Although neuronal cholinergic signaling is well studied epithelial cholinergic signals and their role in cancer biology remain relatively unexplored. Understanding how sythesis and metabolism of acetylcholinesterase, the enzyme that degrades acetycholine, as well as presence/absence of nAChRs affect the cancer cell signaling can provide novel leads in cancer research and therapy. Herein Dr. Konu will present results obtained through in silico, in vitro and in vivo approaches on the role of cholinergic signals in cancer progression. Her group recently established a significant proliferative and prognostic role for CHRNA5, the alpha 5 subunit of the pentameric nAChRs, in breast cancer. Moreover, they have developed zebrafish xenograft models to test effects of microenvironment and novel drugs against liver cancer cells.

Date: April 14th, 2021 – 5:00 pm (GMT+3)

Language: English

To register the webinar, you can visit this link:

RSG TURKEY Webinar + Tutorial

Translational control of cancer and stromal cells – Dr. Ola Larsson

Estrogen receptor alpha (ERα) activity is associated with increased cancer cell proliferation. Studies aiming to understand the impact of ERα on cancer-associated phenotypes have largely been limited to its transcriptional activity. Herein, we demonstrate that ERα coordinates its transcriptional output with selective modulation of mRNA translation. Importantly, translational perturbations caused by depletion of ERα largely manifest as “translational offsetting” of the transcriptome, whereby amounts of translated mRNAs and corresponding protein levels are maintained constant despite changes in mRNA abundance. Transcripts whose levels, but not polysome association, are reduced following ERα depletion lack features which limit translation efficiency including structured 5’UTRs and miRNA target sites. In contrast, mRNAs induced upon ERα depletion whose polysome association remains unaltered are enriched in codons requiring U34-modified tRNAs for efficient decoding. Consistently, ERα regulates levels of U34-modifying enzymes and thereby controls levels of U34-modified tRNAs. These findings unravel a hitherto unprecedented mechanism of ERα-dependent orchestration of transcriptional and translational programs that may be a pervasive mechanism of proteome maintenance in hormone-dependent cancers.
Date: 09/04/2021 – 17.00 GMT+3 

Ola Larsson is an associate professor at the Department of Oncology-Pathology in Karolinska Institutet and SciLifeLab, Sweden. He obtained his Ph.D. in Functional Genomics from Karolinska Institutet and he has completed his postdoctoral studies on translational control of cancer at the University of Minnesota and mechanisms of translational control at McGill University between 2005-2010. By combining biomedicine, statistics, and informatics, Ola Larsson is mapping the regulation of protein production in cancer cells. His aim is to gain a fundamental understanding of why cancer cell growth is uncontrolled. It is hoped that it will be possible in the future to use drugs to restore order in cancer cells.

A tutorial: Transcriptome wide analysis of translational efficiency – İnci Şevval Aksoylu (PhD Student) 

Date: 09/04/2021 – 18.00 GMT+3 

Inci Sevval Aksoylu is a Ph.D. Student at Department of Oncology-Pathology in Karolinska Insitutet and SciLifeLab, Sweden. She obtained her B.Sc. degree in Molecular Biology and Genetics from Bilkent University in 2019. Currently, she works on the control of mRNA translation under supervision of Dr. Ola Larsson and Dr. Charlotte Rolny.


RSG-Turkey is a member of The International Society for Computational Biology (ISCB) Student Council (SC) Regional Student Groups (RSG). We are a non-profit community composed of early career researchers interested in computational biology and bioinformatics.


Follow us on social media!