Blog

Sex, Genes and Diplomonads: The Evolution of Sex-related Genes in Hexamita inflata – Begüm Serra Büyüktarakçı

Presenter

Begüm Serra Büyütarakçı

After I completed BSc at Boğaziçi University, Molecular Biology and Genetics department, I moved to Sweden for MSc and studied Evolutionary Biology at Uppsala University. Meanwhile, I got interested in bioinformatics and focused on phylogenetic analysis in the thesis of my master’s degree. I am currently working as a research assistant in Molecular Evolution group of Jan Andersson at Biomedical Centre (BMC), Uppsala University.

Abstract

Sexual reproduction is widespread among eukaryotes however it is not very wellknown outside of the animals, land plants and fungi kingdoms. Metamonada, a phylum of single-celled eukaryotes, comprises diverse lineages including diplomonads. Some members of diplomonads have been assumed to be asexual, though the presence of putative meiotic genes were reported in recent studies. I applied a comparative phylogenomic approach to clarify the occurrence of sexual life cycle in diplomonads. Here, I surveyed the sets of sex-related genes in the ongoing Hexamita inflata genome project. The inventory of sex-related genes was compiled based on the major sexual processes: cell fusion (plasmogamy), nuclear fusion (karyogamy) and meiosis. My analysis showed that H. inflata encodes karyogamy protein, Gex1 but not the plasmogamy protein, Hap2. Putatively meiosis specific genes: Spo11, Dmc1, Hop2 and Mnd1 were identified in H. inflata genome. Based on my findings, H. inflata possesses Mer3/Hfm1 gene which is required during meiotic crossover formation and postmeiotic genes (Mlh2/Pms1 and Mad2). I hypothesize that H. inflata is capable of some sex-related processes such as nuclear fusion and meiotic inter-homolog recombination. My results indicate that the sex machinery varies among diplomonads and other Metamonada based on the wide distribution of sex-related genes.

Date: September 30th, 2020 – 4:00 pm (GMT+3)

Language: English

To register the webinar, you can visit this link:

https://www.bigmarker.com/bioinfonet/Sex-Genes-and-Diplomonads-The-Evolution-of-Sex-related-Genes-in-Hexamita-inflata

Density based clustering and error correction of metabarcodes in Nanopore sequencing using the novel bioinformatics algorithm ASHURE – Bilgenur Baloğlu

Presenter

Bilgenur Baloğlu

Bilgenur Baloglu earned her B.S. in molecular biology and genetics at Istanbul Technical University. She then earned her Ph.D. in Biological Sciences from National University of Singapore in 2018. Her thesis focused on the biological assessment of aquatic habitats using DNA sequencing technologies, which contributed to solving an ecological outbreak caused by aquatic insects as well as led to the discovery of nearly 350 insect species in a tropical swamp forest. Throughout her Ph.D., she provided consulting services to the National Water Agency of Singapore government. Dr. Baloglu worked as/is as postdoctoral researcher at the Centre for Biodiversity Genomics, University of Guelph in Canada, where she focused on developing new methods for DNA sequencing, Nanopore sequencing, and phylogenetics of sub-arctic insects. She is also coordinating a US EPA funded project on the Great Lakes DNA barcoding along with four collaborating universities in the USA.

Abstract

Metabarcoding (identification of the plant, animal, and fungal taxa present in an environmental sample) rapidly gains importance in ecology, food safety, pest identification, and disease surveillance. It has a compelling advantage over traditional approaches for obtaining data on species distributions, however, it is often difficult to detect all the species present in a bulk sample using High-throughput Sequencing (HTS). This can – in parts – be attributed to the shorter read lengths most HTS instruments generate. Moreover, most HTS platforms are not portable, making in situ field-based sequencing not feasible. Oxford Nanopore sequencing platforms such as the MinION represent an exception to that and they are also known to provide longer reads albeit limited by rather high error rates (~12-15%). We used a freshwater mock community of 50 Operational Taxonomic Units (OTU) to test the capacity of the Oxford Nanopore MinION coupled with a rolling circle amplification protocol to provide long read metabarcoding results. We also propose a new Python pipeline that explores error profiles of nanopore consensus sequences, mapping accuracy, and overall community representation within a complex bulk sample. Using our molecular and bioinformatics workflow, we were able to estimate the diversity of the tested freshwater mock community with an average sequence accuracy of >99% for 1D2 sequencing on the nanopore platform. We also showed that the high error rates associated with long-read single-molecule sequencing can be mitigated by using a rolling circle amplification protocol. Future bioassessment programs will tremendously benefit from such portable, highly accurate, species-level metabarcoding and it appears that we reached a point were cost-effective field-based DNA metabarcoding is possible.

Date: August 28th, 2020 – 3:00 pm (GMT+3)

Language: English

To register the webinar, you can visit this link:

https://www.bigmarker.com/bioinfonet/Density-based-clustering-and-error-correction-of-metabarcodes-in-Nanopore-sequencing-using-ASHURE

The Impact of Protein Structure on Sequence Evolution – Julien Y. Dutheil

Presenter

Julien Y. Dutheil

My research aims at understanding the mechanisms of biological evolution at the molecular level. I am in particular interested in the study of stochastic processes and the role of organisational levels (a.k.a. “systems”). Research in my group combines computational with experimental approaches, applied to population genomics, structural bioinformatics and statistical analysis of “omics” data.

Abstract

The fate of mutations in populations depends on their impact on the fitness of the individual that carries them. This fitness effect depends, in turn, on the location of the mutation in the genome: a mutation occurring in a non-coding region generates a new allele that will evolve neutrally, while a mutation located within a functional region can have deleterious or advantageous effects, effects that will furthermore depend on the function of the underlying gene. Yet within a given gene, mutations can have very distinct effects. For genes encoding a macromolecule, RNA or protein, an important determinant of these effects is the structure of the encoded molecule. I will here present some insights that we gained regarding the impact of protein structure on the evolution of sequences, with a focus on protein-encoding sequences. In particular, we ask the following questions: (1) what is the distribution of adaptive mutations along 3D protein structures and (2) to which extent does protein structure generate coevolution between positions? To leverage information about the distribution of fitness effects, we relied on comparative genome analyses. I will present two statistical approaches: an extension of the McDonald-Kreitman approach that allows inferring the rate of adaptive non-synonymous substitutions by modeling the distribution of fitness effects of mutations, and a substitution mapping procedure used for inferring coevolving positions.

Date: September 4th, 2020 – 6:00 pm (GMT+3)

Language: English

To register the webinar, you can visit this link:

https://www.bigmarker.com/bioinfonet/The-Impact-of-Protein-Structure-on-Sequence-Evolution

Projecting the Course of COVID-19 in Turkey: A Probabilistic Modeling Approach – Hüseyin Cahit Burduroğlu

Presenter

Hüseyin Cahit Burduroğlu

He graduated from the Molecular Biology and Genetics department of Yildiz Technical University. After working in the area of structural bioinformatics for 3 years in two different projects that are focused on the stability of metalloproteins and peptides to be used on drug-delivery, he joined the Bioinformatics Master Program in METU Informatics Institute in 2019 where he currently works as a research assistant.

Abstract

The COVID-19 Pandemic originated in Wuhan, China, in December 2019 and became one of the worst global health crises ever. The first confirmed cases were announced early in March and since then, serious containment measures have taken place in Turkey. Here, we present a different approach, a Bayesian negative binomial multilevel model with mixed effects, for the projection of the COVID-19 pandemic and apply this model to the Turkish case. We predicted confirmed daily cases and cumulative numbers for June 6th to June 26th with 80%, 95%, and 99% prediction intervals (PI). Our projections showed that if we continued to comply with measures and no drastic changes are seen in diagnosis or management protocols, the epidemic curve would tend to decrease in this time interval. Also, the predictive validity analysis suggests that proposed model projections should be in the 95% PI band for the first 12 days of the projections.

Date: August 21th, 2020 – 2:00 pm (GMT+3)

Language: English

To register the webinar, you can visit this link:

https://www.bigmarker.com/bioinfonet/Projecting-the-Course-of-COVID-19-in-Turkey-A-Probabilistic-Modeling-Approach

Lewontin Paradoksu ve Düşündürdükleri – Ergi Deniz Özsoy

Presenter

Ergi Deniz Özsoy

Ergi Deniz Özsoy, 1967 yılında Hannover’da doğdu. 1993 yılında Hacettepe Üniversitesi Fen Fakültesi Biyoloji Bölümü’nü bitiren Özsoy, 1996 yılında yine aynı bölümde yüksek lisans tezini vererek bilim uzmanı oldu. 2002 yılında Hacettepe Üniversitesi Biyoloji Bölümü ‘nde doktorasını tamamladı. Doktora deneylerini Groningen Üniversitesi Genetik Bölümü Popülasyon Genetiği biriminde aldığı TÜBİTAK bursuyla tamamladı. 2000 ve 2002 yıllarında Kuzey Karolina Üniversitesi’nde istatistiksel genetik üzerine eğitim aldı. 2004 yılından itibaren çeşitli sürelerle aynı üniversitede Trudy Mackay’ın laboratuvarında kantitatif genetik ve genomik çalıştı. 2010 yılında Fullbright bursiyeri olarak Kaliforniya Üniversitesi San Diego’da Ekoloji ve Evrimsel Biyoloji Bölümü’nde araştırmalarda bulundu. Şu an Hacettepe Üniversitesi Biyoloji Bölümü’nde genotip-fenotip ilişkisinin karmaşık genetiği ve genomiği üzerine evrimsel genetik perspektif kullanarak Drosophila modelleri çerçevesinde çalışmaktadır. Ek olarak egzersiz genetiği ve genomiği, gelişim genetiği ve çeşitli genetik temelli hastalıkları genomdaki genetik varyasyonla ilişkisinin araştırılması gibi çalışmalarda yürütmektedir. Evrimsel biyoloji, genetik, genomik ve kantitatif genetik Özsoy’un çalışma alanlarıdır. Özsoy, evrimsel biyolojinin tarihi ve evrim felsefesi ve biyoloji felsefesi konularıyla da ilgilenmektedir. Bu konularda yurt içinde ve yurt dışında yayınlanmış makaleleri bulunmaktadır.

Özet

Bir türün sahip olduğu genetik çeşitlilik miktarının genellikle, nötral (seçilimsel olarak birbirine eş) mutasyonların birikmesiyle oluştuğu düşünülür. Nötral evrim kuramına göre, nötral mutasyonların genetik sürüklenme ile birikmesi sonucunda oluşan heterozigotluk (genetik çeşitlilik) ile populasyonların etkin (efektif) büyüklüğü arasında doğrusal bir ilişki olmalıdır: popülasyon büyüklüğü arttıkça nötral mutasyonların birikme ihtimali de artar ve genomik heterozigotluk düzeyiyle, dolayısıyla, popülasyon büyüklüğü doğru orantılıdır. Bununla birlikte, ilk defa tüm açıklığıyla çağımızın büyük evrimsel genetikçisi Richard Lewontin’in analizinin işaret ettiği gibi, bu ilişki bir yanılsamaya dayalı olabilir ve yapılan pek çok çalışma büyük popülasyon-düşük genetik varyasyon ya da düşük genetik varyasyon büyük popülasyon büyüklüğüne sahip pek çok türe ve tür-içi (popülasyonlar arası) farka işaret etmektedir. Popülasyon büyüklüğü ile nötral genetik çeşitlilik arasındaki bu çelişki- evrimsel biyoloji literatüründe Lewontin Paradoksu olarak anılmaktadır ve evrimsel biyolojinin zorlu problemlerinden biri olarak aktif araştırma konusudur. Bu konuşmada, Lewontin paradoksunun çözümüne işaret eden modern çalışmalar ve yaklaşımlar, klasik Hill-Robertson etkisinin genişletilmiş bağlamında, “bağlantılı seçilim (linked selection)” sürecine vurgu yapılarak özetlenecektir.

Tarih: 8 Ağustos 2020 – 18:00 (GMT+3)

Dil: Türkçe

Aşağıdaki linkten webinara kayıt olabilirsiniz:

https://www.bigmarker.com/bioinfonet/Lewontin-Paradoksu-ve-Dusundurdukleri

Drivers of Genetic Diversity in Regions of Low Recombination – Kimberly Gilbert

Presenter

Kimberly Gilbert

Dr. Gilbert obtained her PhD from the University of British Columbia in 2016, studying theoretical population genetics and the impact of demography on evolutionary processes and inferences. Her research broadly includes both theoretical and empirical data analysis in topics of evolutionary biology, including population structure, effective population size, local adaptation, and mutation load. She is currently a postdoctoral fellow at the University of Lausanne, Switzerland. More information is available on her website: http://kjgilbert.github.io/

Abstract

Linked selection is a major driver of genetic diversity. Selection against deleterious mutations removes linked neutral diversity (background selection [BGS]), creating a positive correlation between recombination rates and genetic diversity. Purifying selection against recessive variants, however, can also lead to associative overdominance (AOD), due to an apparent heterozygote advantage at linked neutral loci that opposes the loss of neutral diversity by BGS. Zhao and Charlesworth (2016) identified the conditions under which AOD should dominate over BGS in a single-locus model and suggested that the effect of AOD could become stronger if multiple linked deleterious variants co-segregate. We present a model describing how and under which conditions multi-locus dynamics can amplify the effects of AOD. We derive the conditions for a transition from BGS to AOD due to pseudo-overdominance, i.e., a form of balancing selection that maintains complementary deleterious haplotypes that mask the effect of recessive deleterious mutations. Simulations confirm these findings and show that multi-locus AOD can increase diversity in low-recombination regions much more strongly than previously appreciated. While BGS is known to drive genome-wide diversity in humans, the observation of a resurgence of genetic diversity in regions of very low recombination is indicative of AOD. We identify 22 such regions in the human genome consistent with multi-locus AOD. Our results demonstrate that AOD may play an important role in the evolution of low-recombination regions of many species.

Date: July 16th, 2020 – 6:00 pm (GMT+3)

Language: English

To register the webinar, you can visit this link:

https://www.bigmarker.com/bioinfonet/Drivers-of-Genetic-Diversity-in-Regions-of-Low-Recombination

REVIEW: A Brief Introduction to Microencapsulation

Introduction

The containment of a  core material inside of a  small capsule is called  microencapsulation. A polymeric material coates liquid or solid substances to protect polymeric material from circumambient area1. Microcapsules size vary between 50 nm to 2 mm2. Microcapsule’s size and structure differs according to core material being solid, liquid or gas as in figure 12

Figure 1: (a) Mononuclear microcapsules carrying solid material, (b) Aggregated microcapsules carrying liquid material2.
Figure 2: Schematic presentation of a microcapsule2.

Coating material must be adhesive to the core material  in order to cover core material properly. Coating materials must work as an harmonious aid to core material in required strength, flexibility, impermeability, optical properties, and stability. Its release must be  controllable under required conditions1.

Figure 3 : Coating material examples1

Water Soluble MaterialsWater Insoluble Materials Waxes and Lipid Materials
GelatinCalcium alginateParaffin
Gum ArabicPolyethyleneCarnauba
StarchPolyamide (Nylon)Spermaceti
PolyvinylpyrrolidoneSiliconesBeeswax
Polyacrylic acidPolymethacrylateStearic acid
Carboxymethyl-celluloseCellulose nitrateGlyceryl stearates
Figure 4 : Alginate coated adipose stem cells extracted from (A) rat and (B) human3
 
Figure 5 : Confocal laser scanning microscope image of rhodamine-labeled hydrogel microcapsules4.

Method

  The microencapsulation of adipose stem cells  coating with alginate is shown in figure 6. The cross- linking solution contains calcium chloride and glucose and is buffered with HEPES. Calcium chloride provides divalent cations to alginate during cross-linking. Glucose is useful for maintaining physiological osmolality of the cross-linking solution for the  adipose stem cells. HEPES is used tomaintain pH at or below pH 7.33.

Figure 6: Schematic presentation of method used for  microencapsulation of adipose stem cells3.

The generation of hydrogel microcapsules with a microfluidic system is shown in figure 7. Oligosaccharides and  peptide–starPEG were inserted through two distinct channels. The flow rates of the oil phase and Oligosaccharides and  peptide–starPEG have been set  to get required droplet formation4.

Figure 7 : Scheme of the microfluidic system used for hydrogel microcapsule generation4.

Conclusion 

Microencapsulation can be used to encapsulate different materials therefore it is useful for treatment of different diseases that occurs in various tissues. There are various methods to make microcapsules. Microcapsule generation method must be chosen carefully according to the materials that microcapsule made out of. Microcapsules can be used to deliver drug molucules, various cell types into the targeted tissue. As technology improves, microencapsulation mehods will also improve and become more effective. 

References

1. MICROENCAPSULATION. Int J Pharm Sci Rev Res. 2010;5(2):58-62.

2.  M.N. Singh, K.S.Y. Hemant, M. Ram  and HGS. Microencapsulation: A promising technique for controlled drug delivery. Res Pharm Sci. 2010;5(2):65-77.

3.  Shirae K. Leslie , Ramsey C. Kinney , Zvi Schwartz  and BDB, Abstract. Microencapsulation of Stem Cells for Therapy. In: Vol 1479. ; 2017:225-235. doi:10.1007/978-1-4939-6364-5

4.  Wieduwild R, Krishnan S, Chwalek K, et al. Noncovalent Hydrogel Beads as Microcarriers for Cell Culture. Angew Chemie. 2015;127(13):4034-4038. doi:10.1002/ange.201411400

INSaFLU ve galaxyproject ile SARS-CoV-2 varyantlarının karşılaştırılması – RSG-Türkiye Aktif Üyeleri

Çalışmayı Yapanlar

  • Nazlı S. Kara, İstinye Üniversitesi
  • Meltem Kutnu, ODTÜ
  • Yasemin Utkueri, Sabancı Üniversitesi
  • Funda Yılmaz, Radbound University
  • Elif Bozlak, University of Veterinary Medicine Vienna; Vienna Graduate School of Population Genetics
  • Evrim Fer, University of Arizona

Özet

2020 BioHackathon’u, var olan varyant tespit etme iş akışlarının COVID-19 için geliştirilmesi veya üretilen büyük miktardaki verinin analiz edilebilmesi için yeni iş akışları oluşturulmasına ev sahipliği yapmıştır. Bunlardan bazıları Galaxy Project, INSaFLU ve nf-core’dur. Bu iş akışları yeni nesil dizileme teknolojisi ile dizilenen genom verisini analiz eder ve anotasyonu yapılmış tek nükleotid polimorfizm (SNP) ve kısa ekle-sil (indel) varyantlarını çıktı olarak verir. Kullandıkları algoritmalara göre farklı avantaj ve dezavantajları vardır. Bu çalışmada Galaxy Project tarafından yayımlanmış SARS-CoV-2 genom varyantlarını INSaFLU iş akışıyla belirlenen varyantlarla karşılaştırmayı, böylece bu iki iş akışının performanslarını değerlendirebilmeyi amaçladık. Sonuç olarak iki iş akışı tarafından ortak olarak bulunan 600’e yakın varyant bulduk. Bu varyantların neredeyse yarısının replikaz poliprotein 1ab’de olduğunu tespit ettik. Ortak olarak bulunan varyantlarda non-synonymous varyantların synonymous varyantlardan fazla olduğu gördük. Çalışmada tespit edilen ortak ve özgün varyantlar ileriki araştırmalarda daha detaylı incelenebilir.

Tarih: 21 Haziran 2020 – 20:00 (GMT+3)

Dil: Türkçe

Aşağıdaki linkten webinara kayıt olabilirsiniz:

https://www.bigmarker.com/bioinfonet/INSaFLU-ve-galaxyproject-ile-SARSCoV2-varyantlarinin-karsilastirilmasi

Phylogenetic Analysis of SARS-CoV-2 Genomes in Turkey – Aylin Bircan

Presenter

Aylin Bircan

Aylin Bircan received my BSc degree in Chemistry from Koc University in 2012. She worked in Quality Control and Assurance departments of several pharmaceutical companies. In 2018 she received my MSc degree in Computational Biology and Bioinformatics from Kadir Has University. Since 2018 September, she has been a Ph.D. student in Molecular Biology, Genetics and Bioengineering program at Sabanci University. she has been working on phylogenetic analysis of Class C GPCRs under the supervision of Dr. Ogün Adebali.

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has emerged in Wuhan, and spread across the continents, and caused the COVID-19 pandemic. In this talk, I will talk about our recent study which focuses on comprehensive genomic analysis of the virus isolates in Turkey. We built a phylogenetic tree with globally obtained 15,277 severe SARS-CoV-2 genomes, and clustered the virus isolates based on the phylogenetic tree and previously annotated classification methods. We performed a phylogenetic analysis of the first thirty SARS-CoV-2 genomes isolated and sequenced in Turkey to identify specific groups circulating in the country. Our results suggest that the first introduction of the virus to the country is earlier than the first reported case of infection. Virus genomes isolated from Turkey are scattered among most types in the phylogenetic tree. Two of the seventeen sub-clusters were found enriched with the isolates of Turkey, which possibly have spread expansively in the country. Finally, we traced virus genomes based on heir phylogenetic placements. This analysis suggested multiple independent international introductions of the virus and revealed a hub for the inland transmission.

Date: June 17th, 2020 – 2:00 pm (GMT+3)

Language: English

To register the webinar, you can visit this link:
https://www.bigmarker.com/bioinfonet/Phylogenetic-Analysis-of-SARS-CoV-2-Genomes-in-Turkey

A short review of RNA sequencing and its applications

What are the omics sciences?

Omics sciences are targeting quantification of whole biomolecules such as RNA and proteins at organism, tissue, or a single-cell level. Omics sciences are separated into several branches such as genomics, transcriptomics, and proteomics1.

What is transcriptomics?

Transcriptomics is one of the omics sciences dissecting the organism’s transcriptome which is the sum of all of its RNA molecules2,3.

What is RNA sequencing?

RNA sequencing (RNA-seq) is a technique providing quantification of all RNAs in bulk tissues or each cell. The transcript amounts of each gene across samples are calculated by using this technique. It is utilizing next-generation sequencing (NGS) platforms deciphering the sequencing of biomolecules such as DNA and RNA4,5.

What are the kinds of RNA-seq?

Bulk tissue RNA-seq

The whole transcriptome of target bulk tissues is sequenced to make transcriptomics analyses. Here, target bulk tissue can contain various cell types, and therefore, the whole transcriptome is mixed with RNAs of those cells. This approach is the most common usage of RNA-seq and is performed for some aims such as elucidating of diseases7.

Single-cell RNA-seq

In contrast to bulk tissue RNA-seq, single-cell RNA-seq (scRNA-seq) is performed in individual cells. The whole transcriptome of each cell in a tissue is sequenced to make transcriptomics analysis. The scRNA-seq has revealed that the transcriptome of each cell in a tissue is different from each other and individual cells can be separated into specific clusters according to its transcriptomic signature. The scRNA-seq has helped the discovery of some cells such as ionocyte cells, which could be relevant to the pathology of cystic fibrosis7,8.

Spatial RNA-seq

The relationship between cells and their relative locations within a tissue sample can be critical to understanding disease pathology. Spatial transcriptomics is a technology that allows the measurement of all the gene activity in a tissue sample and map where the activity is occurring. This technique is utilized in the understanding of biological processes and disease. Spatial RNA-seq can be performed at intact tissue sections as well as a single-cell level. The general aim of this technique is a combination of gene expression and morphological information and providing information on tissue architecture and micro-environment for the generation of sub-cellular data. Current bulk and scRNA-seq methods provide users with highly detailed data regarding tissues or cell populations but do not capture spatial information7,9,10.

RNA-seq analysis work-flow

1) Experimental design

There are many various library types in RNA-seq resulted in sequencing reads (sequenced transcripts) with different characteristics. For instance, reads can be single-end in which a transcript is read from its only an end (5’ or 3’), however, in the paired-end libraries, a transcript is read from both its 5’ and 3’ end. Paired-end sequencing can additionally help disambiguate read mappings and is preferred for alternative-exon quantification, fusion transcript detection, particularly when working with poorly annotated transcriptomes7. In addition to that, libraries can be stranded or unstranded. The strandedness for libraries is important to determine which DNA strand reads coming from and it is utilized to assign reads to relevant genes. If strandedness information of libraries is misused, then reads are not assigned to true genes, thus gene expression results gonna be wrong11. Besides, technical replicates can be utilized in this process in which one sample is sequenced more than one by using the same high-throughput platform to increase the elimination of technical bias.

2) Laboratory performance

After RNA extraction from all samples, libraries are prepared for sequencing according to the selected library type. After detection of library type, libraries are sequenced to read depth of 10–30 million reads per sample on a high-throughput platform7.

3) Data analysis

After sequencing has been completed, the starting point for analysis is the data files, which contain base-called sequencing reads, usually in the form of FASTQ. The reads having poor quality in FASTQ files are eliminated before the alignment process in which raw sequences are aligned to a reference genome to find their relevant genes. Each sequence read is converted to one or more genomic coordinates and Sequence Alignment Map (SAM) files containing those coordinates are obtained after alignment process7,12. This process has traditionally been accomplished using distinct alignment tools, such as TopHat13, STAR14, or HISAT15, which rely on a reference genome. The SAM files are converted to Binary Alignment Map (BAM) files for further analyses because of their large size and this process is carried out by using Samtools16. After alignment and file conversation steps, reads (transcripts) quantification across samples is performed by using some tools such as featureCounts17 to obtain expression matrix in which each row corresponds to individual genes, however, each column corresponds to individual samples7. Normalization of transcripts abundance across samples is made by using expression matrix to lessen range-based gene expression differences between samples7,18,19. Normalization methods are shown in (Figure 1)20.


Figure 1. Normalization methods that are used in RNA-seq analyses.

After normalization step, genes with low expression across samples are filtered to prevent statistical noise7, and then statistically meaningful genes (namely, differentially expressed genes) can be detected by using some tools such as edgeR21, DESeq222. In the end, obtained genes can be used for enrichment analyses such as KEGG and Reactome to find out which pathways are affected. RNA-seq technology is utilized for distinct aims, some of which are shown in (Figure 2). The representations of RNA-seq results are shown in (Figure 3).


Figure 2. RNA-seq usage fields.



Figure 3. Representation of differential expression, splicing, and co-expression results. In differential expression figure, each row represents the expression amount of a gene, however, each column represents each sample. Red color shows higher expressions, but the yellow color shows lower expressions. In the co-expression figure, a network containing the interaction of each gene with other genes is depicted. In the differential alternative splicing figure, differential usage of E010 exon between control and knockdown groups is depicted.

A detailed RNA-seq work-flow is shown in (Figure 4)12.


Figure 4. An example of differential expression work-flow.

The various tools that are used for RNA-seq and their tutorials were listed below as well as visualization tools that are used for high-throughput data.

Table 1. List of RNA-seq tool and their usage fields.

Tool names Usage Tutorial Link
DESeq222 Differential expression https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
edgeR21 Differential expression https://bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
DEXSeq23 Differential splicing https://bioconductor.org/packages/release/bioc/vignettes/DEXSeq/inst/doc/DEXSeq.html
WGCNA24 Co-expression https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/
GATK25 Variant-calling https://gatk.broadinstitute.org/hc/en-us

Table 2. List of high-throughput visualization and enrichment tools.

Tool names Usage
pheatmap26 Heatmap plot for differentially expressed genes
ggplot227 Most various visualizations ranging from bar charts to violin plots
igraph28 Network visualization for co-expression networks and other network types
Enrichr29 Enrichment analysis of genes
DAVID30 Enrichment analysis of genes

Note/ Most of the listed tools are dependent on the R statistical computing environment.

Table 3. Examples of differential expression work-flows.

Examples Links
Example 1 https://www.bioconductor.org/help/course-materials/2016/CSAMA/lab-3-rnaseq/rnaseq_gene_CSAMA2016.html
Example 2 https://digibio.blogspot.com/2017/11/rna-seq-analysis-hisat2-featurecounts.html
Example 3 https://bioinformaticsworkbook.org/dataAnalysis/RNA-Seq/RNA-SeqIntro/RNAseq-using-a-genome.html
Example 4 https://uclouvain-cbio.github.io/BSS2019/rnaseq_gene_summerschool_belgium_2019.html

In addition to differential expression pipelines above, If you want to examine my pipeline containing differential expression analysis with DESeq2, you can visit this https://github.com/kaanokay/Differential-Expression-Analysis/blob/master/HISAT2-featureCounts-DESeq2-workflow.md website address in which I attached my Linux and R scripts.

Transcriptome researches in autism spectrum disorder

Autism Spectrum Disorder (ASD) is an early-onset neuropsychiatric disorder. ASD is clinically described with behavioural abnormalities such as restrictive interest and repetitive behaviour. ASD is genetically heterogeneous and heritable (~50%) and 80% of its genetic background is unclear. Aberrations in autistic brains take mostly place in cortex regions (Figure 5) rather than cerebellum. When ASD is compared with other neuropsychiatric disorders such as schizophrenia and bipolar disorder, it has a higher heritability-rate than them, which means that it appears with the more strong genetic background than schizophrenia and bipolar disorder. Studies have revealed that ASD-related genes are enriched in brain-development, neuronal activity, signalling, and transcription regulation. Wnt signalling, synaptic function, and translational regulation are pathways that are affected by mutations in ASD-related genes31.


Figure 5. Brain regions most affected in autism.

Transcriptome studies have shown that mRNA, microRNA (miRNA), small nucleolar RNA (snoRNA), and long non-coding RNA (lncRNAs) misexpression occurred in autistic brains. Genes with mRNA misregulation are especially enriched in immune and neuronal pathways, briefly neuronal development and immune system activation are both misregulated in the brains of individuals with ASD. Misregulated miRNAs in autistic brains target mostly genes with synaptic function. Additionally, alternative splicing is misregulated in splicing regulators and this causes mis-splicing patterns in autistic individuals31.

To summarize, RNA-seq is strong technology for understanding diseases and it can be used for various aims.

That’s all 🙂

If you have any questions about this short review and my differential expression pipeline in GitHub, you feel free to contact me via kaan.okay@msfr.ibg.edu.tr e-mail address.

Very thanks for your interest and time!

REFERENCES

1) https://en.wikipedia.org/wiki/Omics.

2) https://en.wikipedia.org/wiki/Transcriptomics_technologies.

3) https://en.wikipedia.org/wiki/Transcriptome.

4) Kadakkuzha, B. M., Liu, X. an, Swarnkar, S. & Chen, Y. Genomic and proteomic mechanisms and models in toxicity and safety evaluation of nutraceuticals. in Nutraceuticals: Efficacy, Safety and Toxicity 227–237 (Elsevier Inc., 2016). doi:10.1016/B978-0-12-802147-7.00018-8.

5) Behjati, S. & Tarpey, P. S. What is next generation sequencing? Arch. Dis. Child. Educ. Pract. Ed. 98, 236–238 (2013).

6) https://www.ebi.ac.uk/training/online/course/functional-genomics-ii-common-technologies-and-data-analysis-methods/performing-rna-seq.

7) Stark, R., Grzelak, M. & Hadfield, J. RNA sequencing: the teenage years. Nat. Rev. Genet. 20, 631–656 (2019).

8) https://en.wikipedia.org/wiki/Single_cell_sequencing.

9) https://www.10xgenomics.com/spatial-transcriptomics/.

10) https://www.diva-portal.org/smash/get/diva2:1068517/FULLTEXT01.pdf.

11) https://salmon.readthedocs.io/en/latest/library_type.html.

12) https://bioinformaticsworkbook.org/dataAnalysis/RNA-Seq/RNA-SeqIntro/RNAseq-using-a-genome.html.

13) Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: Discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

14) Dobin, A. et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

15) Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

16) Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

17) Liao, Y., Smyth, G. K. & Shi, W. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

18) Evans, C., Hardin, J. & Stoebel, D. M. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Brief. Bioinform. 19, 776–792 (2018).

19) Liu, X. et al. Normalization Methods for the Analysis of Unbalanced Transcriptome Data: A Review. Front. Bioeng. Biotechnol. 7, 1–11 (2019).

20) https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html.

21) Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2009).

22) Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, (2014).

23) Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-Seq data. Nat. Preced. 1–30 (2012) doi:10.1038/npre.2012.6837.2.

24) Langfelder, P. & Horvath, S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics 9, (2008).

25) McKenna, A. et al. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

26) https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf.

27) https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf.

28) https://cran.r-project.org/web/packages/igraph/igraph.pdf.

29) https://amp.pharm.mssm.edu/Enrichr/.

30) https://david.ncifcrf.gov/.

31) Quesnel-Vallières, M., Weatheritt, R. J., Cordes, S. P. & Blencowe, B. J. Autism spectrum disorder: insights into convergent mechanisms from transcriptomics. Nat. Rev. Genet. 20, 51–63 (2019).

RSG-Turkey is a member of The International Society for Computational Biology (ISCB) Student Council (SC) Regional Student Groups (RSG). We are a non-profit community composed of early career researchers interested in computational biology and bioinformatics.

Contact: turkey.rsg@gmail.com

Follow us on social media!