Long non-coding RNAs (lncRNAs) are the largest class of non-coding RNAs (ncRNAs). However, recent experimental evidence has shown that some lncRNAs contain small open reading frames (sORFs) that are translated into functional micropeptides. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (ribo-seq) experiments, which are expensive and cell-type dependent. We present a framework that leverages deep learning models’ training dynamics to determine whether a given lncRNA transcript is misannotated. Our deep sequential learning models achieve AUC scores >91% and AUPR >93% in classifying non-coding vs. coding sequences while allowing us to identify possible misannotated lncRNAs present in the dataset. Our results overlap significantly with a set of experimentally validated misannotated lncRNAs as well as with coding sORFs within lncRNAs found by a ribo-seq dataset. The methodology offers promising potential for assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs and for curating better datasets for building coding potential predictors.
Many protein kinases act in proliferative pathways. Consequently, point mutations occurring within the kinase’s ATP-binding site can lead to a constitutively active or drug-resistant enzyme, and ultimately, to cancer. Because of technical and economical limitations, rapid experimental exploration of the impact of such mutations remains to be a challenge. This underscores the importance of protein−ligand binding affinity prediction tools that are poised to measure the efficacy of inhibitors in the presence of kinase mutations. To this end, here, we compare the performances of six web-based scoring tools (DSX-ONLINE, KDEEP, HADDOCK2.2, PDBePISA, Pose&Rank, and PRODIGY-LIG) in assessing the impact of kinase mutations on their interactions with their inhibitors.
This assessment is carried out on a new structure-based “BINDKIN” benchmark we compiled. BINDKIN contains wild-type and mutant crystal structure pairs of kinase−inhibitor complexes, together with their corresponding experimental binding affinities (in the form of IC50, Kd, and Ki). The performance of various web servers over BINDKIN shows that they cannot predict the binding affinities (ΔGs) of wild-type and mutant cases directly. Still, few of the web servers could catch whether a mutation improves or worsens the ligand binding (ΔΔGs), with Ki being the most predictable descriptor and DSX-ONLINE being the most accurate predictor. When homology models are used instead of Ki-associated crystal structures, DSX-ONLINE loses its predictive capacity. The results highlight that there is room to improve the available scoring functions to estimate the impact of protein kinase point mutations on inhibitor binding.
DNA sequencing data continues to progress towards longer reads with increasingly lower sequencing error rates. We focus on the problem of assembling such reads into genomes, which poses challenges in terms of accuracy and computational resources when using cutting-edge assembly approaches, e.g. those based on overlapping reads using minimizer sketches. Here, we introduce the concept of minimizer-space sequencing data analysis, where the minimizers rather than DNA nucleotides are the atomic tokens of the alphabet. By projecting DNA sequences into ordered lists of minimizers, our key idea is to enumerate what we call k-min-mers, that are k-mers over a larger alphabet consisting of minimizer tokens. Our approach, mdBG or minimizer-dBG, achieves orders-of-magnitude improvement in both speed and memory usage over existing methods without much loss of accuracy. We demonstrate three uses cases of mdBG: human genome assembly, metagenome assembly, and the representation of large pangenomes. For assembly, we implemented mdBG in software we call rust-mdbg, resulting in ultra-fast, low memory and highly-contiguous assembly of PacBio HiFi reads. A human genome is assembled in under 10 minutes using 8 cores and 10 GB RAM, and 60 Gbp of metagenome reads are assembled in 4 minutes using 1 GB RAM. For pangenome graphs, we newly allow a graphical representation of a collection of 661,405 bacterial genomes as an mdBG and successfully search it (in minimizer-space) for anti-microbial resistance (AMR) genes. We expect our advances to be essential to sequence analysis, given the rise of long-read sequencing in genomics, metagenomics and pangenomics.
The identity and functions of specialized cell types are dependent on the complex interplay between signaling and transcriptional networks. Recently single-cell technologies such as CITE-seq have been developed that enable simultaneous quantitative analysis of cell-surface receptor expression with transcriptional states. To date, these datasets have not been used to systematically develop cell-context-specific maps of the interface between signaling and transcriptional regulators orchestrating cellular identity and function. We present SPaRTAN (Single-cell Proteomic and RNA based Transcription factor Activity Network), a computational method to link cell-surface receptors to transcription factors (TFs) by exploiting cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) datasets with cis-regulatory information. SPaRTAN is applied to immune cell types in the blood to predict the coupling of signaling receptors with cell context-specific TFs. The predictions are validated by prior knowledge and flow cytometry analyses. SPaRTAN is then used to predict the signaling coupled TF states of tumor infiltrating CD8+ T cells in malignant peritoneal and pleural mesotheliomas. SPaRTAN greatly enhances the utility of CITE-seq datasets to uncover TF and cell-surface receptor relationships in diverse cellular states.
Over the last two decades, ancient DNA has revealed a fascinating snapshot of the genetic variation and population dynamics of past human populations. In this talk, I will present human population dynamics in ancient northeast Asia, inferred from the ancient human genomes spanning a time period between approximately 17,000 and 550 years ago. We produced genome sequences of 40 ancient individuals from different parts of northeast Asia covering Yakutia and Lake Baikal and revealed previously unknown gene flow and admixture events during a time period between the Late Upper Palaeolithic and Iron Age. Our genetic data further provided the first direct genetic evidence for the ancestors of Palaeo-Inuits who spread eastwards from Siberia and launched the second wave of migration into the Americas. We also discover the presence of Yersinia pestis, plague-related bacterium, in ancient Northeast Asia. This talk will cover all these recent findings and show the dynamic population structure in ancient northeast Asia.
The glycolytic pathway is the most basic energy production mechanism common to all living things. A drug designed by targeting a bacterial glycolytic protein does not interact with the protein in humans, even if the same protein and the same mechanism are found in both organisms. Since catalytic regions are vital to the function of the protein, these regions are more conserved during evolution, while allosteric regions are more open to structural or sequential changes. The Constraint Molecular Dynamics simulation method has been used to examine the effects of molecules that provide allosteric inhibition of the glycolytic enzymes. Unlike the other MD simulations, this method is not require addition of a parameterized ligand in to the system. Instead, residues known to interact with the ligand are restricted in the specified region, limiting the movement of the residues as if there was a ligand there. In this study, we clarify how glycolytic enzymes are allosterically inhibited using various analysis methods.
Cholinergic signals, endogenously by acetylcholine and exogenously by nicotine, act on nicotinic acetylcholine (nAChRs) receptors and may modulate cellular activity, proliferation and death. Although neuronal cholinergic signaling is well studied epithelial cholinergic signals and their role in cancer biology remain relatively unexplored. Understanding how sythesis and metabolism of acetylcholinesterase, the enzyme that degrades acetycholine, as well as presence/absence of nAChRs affect the cancer cell signaling can provide novel leads in cancer research and therapy.
Herein Dr. Konu will present results obtained through in silico, in vitro and in vivo approaches on the role of cholinergic signals in cancer progression. Her group recently established a significant proliferative and prognostic role for CHRNA5, the alpha 5 subunit of the pentameric nAChRs, in breast cancer. Moreover, they have developed zebrafish xenograft models to test effects of microenvironment and novel drugs against liver cancer cells.
Public databases are treasure troves of sequence data. Given the small genome size of viruses, they represent the entity with one of the largest number of full-genome sequences. Genetic diversity has been one of the mechanisms by which viruses evade the host immune response. Viruses, in particular those of RNA genetic material, mutate rapidly and thus contribute a large number of viral variants. In this talk, we describe the viral diversity dynamics at the protein sequence level and the implication to vaccine design.
An emerging theme from large-scale genetic screens that identify genes essential for fitness of a cell, is that essentiality of a given gene is highly context-specific and depends on a number of genetic and environmental factors. Identification of such contexts could be the key to defining the function of the gene and also to develop novel therapeutic interventions. Here we present CEN-tools (Context-specific Essentiality Network-tools), a website and an accompanying python package, in which users can interrogate the essentiality of a gene from large-scale genome-scale CRISPR screens in a number of biological contexts including tissue of origin, mutation profiles, expression levels, and drug response levels. We show that CEN-tools is suitable for both the systematic identification of genetic dependencies as well as for targeted queries into the dependencies of specific user-selected genes. The associations between genes and a given context within CEN-tools are represented as dependency networks (CENs) and we demonstrate the utility of these networks in elucidating novel gene functions. In addition, we integrate the dependency networks with existing protein-protein interaction networks to reveal context-dependent essential cellular pathways in cancer cells. Together, we demonstrate the applicability of CEN-tools in aiding the current efforts to define the human cellular dependency map.
RSG-Turkey is a member of The International Society for Computational Biology (ISCB) Student Council (SC) Regional Student Groups (RSG). We are a non-profit community composed of early career researchers interested in computational biology and bioinformatics.