Language Models Can Learn Complex Functional Properties of Proteins

Presenter

Serbülent Ünsal

Serbulent Unsal received her B.Sc. degree in Statistics and Computer Sciences from Karadeniz Technical University in Turkey. Following his graduation he continued his M.Sc. degree in Medical Informatics from Middle East Technical University in Turkey. During the Master’s program he studied multiscale computational tumor modeling in which he developed a tumor progression model using cellular automata and partial differential equations with Dr.Aybar Can Acar. In 2014 he started his PhD at the same department on developing deep learning models for low-data protein function prediction. His thesis is also part of a large-scale research project on discovery of new immune-escape mechanisms and drug repurposing against them. Currently, he is about to finish his PhD and working as Senior ML Engineer in Antiverse to design antibodies using machine learning and deep learning models.

Abstract

Proteins are essential macromolecules for life. To understand and manipulate biological mechanisms, functions of proteins should be understood, and this is pos- sible through studying their relationship with the amino acid sequence and 3-D structure. So far, only a small percentage of proteins could be functionally characterized (currently ∼0.5% according to UniProt) due to cost and time requirements of wet-lab-based procedures. Lately, protein function prediction (PFP), which can be defined as the annotation of proteins with functional definitions using statistical/computational methods, gains importance to explore the uncharacterized protein space and/or protein variants carrying function altering changes. Among many different algorithmic approaches proposed so far, machine learning (ML), especially deep learning (DL), techniques have become popular in PFP due to their high pre- dictive performance. The input data used by these ML/DL methods are numerical feature vectors representing the protein (i.e., protein representations), and they are mostly generated from amino acid sequences of proteins which are readily available in databases (e.g., UniProt). In this study, we evaluated protein representation methods for the prediction of functional attributes of proteins and benchmarked these methods in 4 challeng- ing tasks, namely: (i) Semantic similarity inference (we calculated pairwise semantic similarities between human proteins using their gene ontology annotations and compared them with representation vector similarities to observe the correlation in- between), (ii) Ontological protein function prediction (we built GO term categories based on term specificities and the sample sizes which reflects different levels of pre- dictive difficulty and evaluated representation methods by training/validating ML models on these datasets), (iii) Drug target protein family classification (five major target families are selected and methods are evaluated in terms of classifying proteins to families via ML models), and (iv) Protein-protein binding affinity estimation (we used the SKEMPI dataset to evaluate methods in estimating protein-protein binding affinity changes upon mutations). We evaluated 23 protein representation methods in total, including both classical approaches and cutting-edge representation learning methods, to observe whether these novel approaches have advantages over classical ones, in terms of extracting high level/complex properties of proteins that are hid- den in their sequence. Finally, we provide an open-access tool, PROBE (Protein RepresentatiOn BEnchmark), where the user can assess new protein representation models over the above mentioned benchmarking tasks with only a line of code.

Date: July 6th, 2022 – 18:00 (GMT+3)

Language: English

You can register for this webinar here !

Computational Challenges in Protein-RNA Interactions

Presenter

Asst. Prof. Yaron Orenstein

Yaron Orenstein is a Senior Lecturer and the head of the Computational Biology lab at the School of Electrical and Computer Engineering at Ben-Gurion University of the Negev. Yaron completed his BSc summa cum laude in Electrical Engineering and Computer Science at Tel-Aviv University, where he continued on a direct MSc track under the supervision of Prof. Dana Ron. He then completed his PhD in Computer Science at Tel-Aviv University supervised by Prof. Ron Shamir, where he received numerous awards and fellowships, such as the Deutch prize and the Dan David fellowship. He completed his post-doctoral training at Massachusetts Institute of Technology with Prof. Bonnie Berger, and spent a semester as a Research Fellow at the Simons Institute for the Theory of Computing. In the last four and a half years, Yaron has been the head of a fruitful and productive lab with numerous publications, grants, and graduating students. He authored more than 40 journal manuscripts and conference proceedings papers, received grants from the ISF, BSF, NIH, ICA, and IIA, and mentored more than 15 graduate students. His main research interests include sequence design problems and application of deep neural networks in genomics.

Abstract

Protein-RNA interactions play vital roles in many cellular processes, and as a result are the main focus of many biological studies. Biologists would like to efficiently measure protein-RNA interactions in high-throughput, and based on these high-throughput experimental measurements train accurate machine-learning models to predict interactions to new RNA sequences. In the talk, I will present solutions to both challenges: design of efficient high-throughput experiments, and training highly accurate predictive models on high-throughput genomic data. First, I will present DeCoDe, a new method based on Integer Linear Programming to design protein-coding templates to efficiently cover many proteins in a single high-throughput experiment. DeCoDe outperforms extant methods for the task, and newly enables features that were not possible before, such as covering variable-length proteins and optimizing globally over multiple templates. Second, I will present DeepUTR, a new method based on Deep Learning to predict mRNA degradation dynamics based on the 3’-UTR sequence of an mRNA. DeepUTR outperforms extant methods for the task, and newly enables prediction of mRNA levels at various time points. Moreover, we extended the Integrated Gradients interpretability approach to handle multiple input types, and using the extended approach discovered known and novel regulatory 3’-UTR elements associated with mRNA degradation. I will conclude my talk with future plans on both sequence design problems, and deep neural networks applications in genomics.

Date: June 14th, 2022 – 11:00 AM (GMT+3)

Language: English

You can register for this webinar here !

Novel full-length transcriptome analysis workflow ‘Nexons’ to uncover the regulation of poison exons in splicing factors in human germinal centre B cells

Presenter

Özge Gizlenci

Ozge Gizlenci received her B.Sc. degree in Molecular Biology and Genetics from Middle East Technical University in Turkey. Following her graduation in 2015, she continued her M.Sc. degree in Molecular Biosciences with a major in Cancer Biology from the University of Heidelberg. During the Master’s program, she took a semester abroad to start a joint project in her specialized interests, gene editing and stem cells, in the laboratory of Dr. Christian Brendel and Dr. David A. Williams at Dana-Farber/Boston Children’s Cancer and Blood Disorders Center where she returned to her work with Dr. Christian Brendel as a researcher prior to her graduate studies. At Dana-Farber, she used the base editing method to correct a disease-causing mutation in Schwachman-Diamond Syndrome disease and later to apply it to gene therapy approaches. In October 2018, she started her PhD position funded by the Marie Skłodowska-Curie Actions of the European Union’s Horizon 2020 research and innovation programme of COSMIC consortium in the Immunology Programme at the Babraham Institute. Her PhD project with Dr. Martin Turner is focused on understanding the changes in gene expression and alternative splicing in B cells in response to positive selection signals in the germinal centre using long-read next-generation sequencing technologies (e.g. Oxford Nanopore Technology). She aims to investigate the relationship between alternative splicing and abnormally functioning adaptive immune cells in B cell malignancy and Rheumatoid Arthritis using both computational and molecular biology approaches.

Abstract

Alternative splicing (AS) plays a major role in the differentiation of immune cells during an immune response as 29% of AS genes are specific to the immune system. Although the role of AS is extensively investigated in T cells, its role in B cell activation is less characterised. We sought to develop a long-read technology, Oxford Nanopore Technologies (ONT), workflow to understand post-transcriptional regulation at both gene and isoform levels of human germinal centre B cells. As one of the challenges of ONT is the accurate computational analysis of isoforms, we developed the ‘Nexons’ pipeline to identify differentially spliced transcript variants using long-read sequencing. An in-depth analysis of splicing regulators with Nexons revealed that poison exons of splicing factors (e.g. SRSF3) were preferentially spliced out upon activation whereas naïve B cells expressed isoforms carrying poison exon, leading to nonsense-mediated mRNA decay. Moreover, we identified novel spliced variants of these genes, which were difficult to deconvolute using short-read data due to the limitations of short-read technology. Altogether, our findings validate the combination of Nexons with ONT cDNA-PCR sequencing as a suitable method for the identification and quantification of complex isoforms.

Date: May 20th, 2022 – 10:30 AM (GMT+3)

Language: English

You can register for this webinar here !

Deep Learning for Medical Image Analysis

Presenter

Prof. Çiğdem Gündüz Demir

Çiğdem Gündüz Demir received her B.S. and M.S. degrees in computer engineering from Boğaziçi University in 1999 and 2001, respectively, and her Ph.D. degree in computer science from Rensselaer Polytechnic Institute in 2005. She is currently a Professor of Computer Engineering and the Deputy Director of the KUIS AI Center at Koç University. Before joining Koç University, she was working as a faculty member at the Computer Engineering Department at Bilkent University. She was a visiting professor at Nanyang Technological University NTU, Singapore, in Fall 2009, and Stanford University in Spring 2013. Her main research interests and projects include development of new computational methods based on deep learning and computer vision for medical image analysis. Currently, her research group works on the interdisciplinary projects in collaborations with the Departments of Pathology and Biology for the microscopic analysis of histopathological images and in vitro fluorescence and live cell images and with the Departments of Ophthalmology and Radiology for the analysis of images acquired with in vivo imaging of CT, MR, and OCT. She was a recipient of Distinguished Young Scientist of the Turkish Academy of Sciences and CAREER Award of the National Scientific and Technological Research Council of Turkey.

Abstract

Automated imaging systems are becoming important tools for medicine and biology research as they facilitate rapid analyses with better reproducibility. Segmenting regions of interest on a medical image is typically the first but one of the foremost steps of these systems, which greatly affects the success of the entire analysis. In this talk, I will briefly mention the main challenges associated with segmentation tasks in medical image analysis, and then present examples of the dense prediction networks that my research group designed and implemented to address these challenges. Particularly, I will talk about our proposed network architectures and loss functions that were specifically designed to facilitate better training of the segmentation networks. At the end, I will discuss future research possibilities towards the direction of developing more robust segmentation networks for medical image analysis.

Date: April 27th, 2021 – 6:00 PM (GMT+3)

Language: English

You can register for this webinar here !

Modelling Complex Microbial Communities Using Metagenomic Data

Presenter

Assoc. Prof. Niranjan Nagarajan

Dr. Nagarajan is Associate Director and Senior Group Leader in the Genome Institute of Singapore, and Associate Professor in the Department of Medicine and Department of Computer Science at the National University of Singapore. His research focuses on developing cutting edge genome analytic tools and using them to study the role of microbial communities in human health. His team conducts research at the interface of genetics, computer science and microbiology, in particular using a systems biology approach to understand host-microbiome- pathogen interactions in various disease conditions. Dr. Nagarajan received a B.A. in Computer Science and Mathematics from Ohio Wesleyan University in 2000, and a Ph.D. in Computer Science from Cornell University in 2006 (Advisor: Prof. Uri Keich). He did his postdoctoral work in the Center for Bioinformatics and Computational Biology at the University of Maryland working on problems in genome assembly and metagenomics (Advisor: Prof. Mihai Pop).

Abstract

The structure and function of diverse microbial communities is underpinned by ecological interactions that remain uncharacterized. With rapid adoption of next-generation sequencing for studying microbiomes, data-driven inference of microbial interactions based on abundance correlations is widely used, but with the drawback that ecological interpretations may not be possible. Leveraging cross-sectional microbiome datasets for unravelling ecological structure in a scalable manner thus remains an open problem. We present an expectation-maximization algorithm (BEEM-Static) that can be applied to cross-sectional datasets to infer interaction networks based on an ecological model (generalized Lotka-Volterra). The method exhibits robustness to violations in model assumptions by using statistical filters to identify and remove corresponding samples. Benchmarking against 10 state-of-the-art correlation based methods showed that BEEM-Static can infer presence and directionality of ecological interactions even with relative abundance data (AUC-ROC > 0.85), a task that other methods struggle with (AUC-ROC < 0.63). In addition, BEEM-Static can tolerate a high fraction of samples (up to 40%) being not at steady state or coming from an alternate model. Applying BEEM-Static to a large public dataset of human gut microbiomes (n = 4,617) identified multiple stable equilibria that better reflect ecological enterotypes with distinct carrying capacities and interactions for key species.

Date: April 13th, 2021 – 10:00 AM (GMT+3)

Language: English

You can register for this webinar here !

pornjk.com watchfreepornsex.com pornsam.me pornpk.me pornfxx.me foxporn.me porn110.me porn120.me oiporn.me pornthx.me

RSG-Turkey is a member of The International Society for Computational Biology (ISCB) Student Council (SC) Regional Student Groups (RSG). We are a non-profit community composed of early career researchers interested in computational biology and bioinformatics.

Contact: turkey.rsg@gmail.com

Follow us on social media!