Disentangling the role of prokaryotes in regulating export flux via suspended and sinking organic matter in the southern ocean
- Authors: Dithugoe, Choaro David
- Date: 2022-10-14
- Subjects: Microbial ecology , Bioinformatics , Biochemistry , Oceanography , Metagenomics , Carbon cycle (Biogeochemistry) , Prokaryotes
- Language: English
- Type: Academic theses , Doctoral theses , text
- Identifier: http://hdl.handle.net/10962/365745 , vital:65782 , DOI https://doi.org/10.21504/10962/365745
- Description: The role of phytoplankton in regulating atmospheric carbon dioxide in the marine environment has been the subject of extensive research. We lack, however, comparative insights regarding the functional contributions of bacteria, archaea, fungi, and viruses (the microbiota) to organic matter export especially in understudied polar marine environments such as the Southern Ocean. This knowledge deficit is in part due to the high levels of microbial diversity which obscures efforts to study the relationship between diversity and ecosystem functions including their roles in the sequestration of carbon and nitrogen. Elucidating their precise contributions to organic matter export may be central to potential ecosystems feedbacks to global climate change. We examined several factors which may influence organic matter export to depth including net primary production, phytoplankton biomass, temperature, and prokaryotic functional capacity in the Southern Ocean. A Marine Snow Catcher was used to collect suspended and sinking material 10 metres below mixed layer depth at Southern Ocean Time Series (SOTS) in autumn (March-April) and in the Atlantic sector of the Southern Ocean in winter (July-August) and spring (October-November) 2019. The suspended and sinking material was used to determine the particulate organic carbon and nitrogen concentrations which were then used to calculate fluxes and export ratio ((e-ratio) - particulate organic carbon flux divided by net primary production). Additionally, genomic DNA was extracted from the suspended and sinking material and sequenced to obtain Shotgun metagenomic data which was employed to reconstruct metagenome assembled genome (MAGs) and their functional capacity using bioinformatic tools such as DRAM. Data from the Atlantic sector of the Southern Ocean, demonstrate that net primary production and temperature were inversely related to the e-ratio which is consistent with previous findings from the northern region of the Southern Ocean. Genomic functional capacity from SOTS suggested that r-strategist (organisms adapted to live in unstable environments) bacteria (e.g., Gammaproteobacteria) were prominent in the suspended pool. By contrast, the sinking particle-pool appeared to be dominated by K- strategists (organisms adapted to stable environment). The opposite was true for the archaea. This finding (i.e., bacteria) differs from a previous study in the northern region of the Southern Ocean, showing that microbes with K-strategists were more abundant in the suspended fraction. K-strategists typically degrade sinking organic matter into suspended organic matter or dissolved organic matter reducing the organic carbon export efficiency. Furthermore, Data from the Atlantic sector of the Southern Ocean revealed that seasonal temperature changes might dictate the rate of regional prokaryotic degradation across the zones. Resulting in rapid degradation at the northerly warmer regions and slow degradation further south. The data further provide evidence of chemolithoautotrophic mechanisms, with prokaryotes harbouring key pathways, required to transform dissolved inorganic carbon into complex organic forms, including recalcitrant dissolved organic carbon. Collectively, the SOTS and Atlantic sector of the Southern Ocean data suggest that shifts in prokaryotic community structure and functional capacity may regulate (either degradation or synthesis of organic matter) carbon export to depth. , Thesis (PhD) -- Faculty of Science, Zoology and Entomology, 2022
- Full Text:
- Date Issued: 2022-10-14
- Authors: Dithugoe, Choaro David
- Date: 2022-10-14
- Subjects: Microbial ecology , Bioinformatics , Biochemistry , Oceanography , Metagenomics , Carbon cycle (Biogeochemistry) , Prokaryotes
- Language: English
- Type: Academic theses , Doctoral theses , text
- Identifier: http://hdl.handle.net/10962/365745 , vital:65782 , DOI https://doi.org/10.21504/10962/365745
- Description: The role of phytoplankton in regulating atmospheric carbon dioxide in the marine environment has been the subject of extensive research. We lack, however, comparative insights regarding the functional contributions of bacteria, archaea, fungi, and viruses (the microbiota) to organic matter export especially in understudied polar marine environments such as the Southern Ocean. This knowledge deficit is in part due to the high levels of microbial diversity which obscures efforts to study the relationship between diversity and ecosystem functions including their roles in the sequestration of carbon and nitrogen. Elucidating their precise contributions to organic matter export may be central to potential ecosystems feedbacks to global climate change. We examined several factors which may influence organic matter export to depth including net primary production, phytoplankton biomass, temperature, and prokaryotic functional capacity in the Southern Ocean. A Marine Snow Catcher was used to collect suspended and sinking material 10 metres below mixed layer depth at Southern Ocean Time Series (SOTS) in autumn (March-April) and in the Atlantic sector of the Southern Ocean in winter (July-August) and spring (October-November) 2019. The suspended and sinking material was used to determine the particulate organic carbon and nitrogen concentrations which were then used to calculate fluxes and export ratio ((e-ratio) - particulate organic carbon flux divided by net primary production). Additionally, genomic DNA was extracted from the suspended and sinking material and sequenced to obtain Shotgun metagenomic data which was employed to reconstruct metagenome assembled genome (MAGs) and their functional capacity using bioinformatic tools such as DRAM. Data from the Atlantic sector of the Southern Ocean, demonstrate that net primary production and temperature were inversely related to the e-ratio which is consistent with previous findings from the northern region of the Southern Ocean. Genomic functional capacity from SOTS suggested that r-strategist (organisms adapted to live in unstable environments) bacteria (e.g., Gammaproteobacteria) were prominent in the suspended pool. By contrast, the sinking particle-pool appeared to be dominated by K- strategists (organisms adapted to stable environment). The opposite was true for the archaea. This finding (i.e., bacteria) differs from a previous study in the northern region of the Southern Ocean, showing that microbes with K-strategists were more abundant in the suspended fraction. K-strategists typically degrade sinking organic matter into suspended organic matter or dissolved organic matter reducing the organic carbon export efficiency. Furthermore, Data from the Atlantic sector of the Southern Ocean revealed that seasonal temperature changes might dictate the rate of regional prokaryotic degradation across the zones. Resulting in rapid degradation at the northerly warmer regions and slow degradation further south. The data further provide evidence of chemolithoautotrophic mechanisms, with prokaryotes harbouring key pathways, required to transform dissolved inorganic carbon into complex organic forms, including recalcitrant dissolved organic carbon. Collectively, the SOTS and Atlantic sector of the Southern Ocean data suggest that shifts in prokaryotic community structure and functional capacity may regulate (either degradation or synthesis of organic matter) carbon export to depth. , Thesis (PhD) -- Faculty of Science, Zoology and Entomology, 2022
- Full Text:
- Date Issued: 2022-10-14
Sequence, structure, dynamics, and substrate specificity analyses of bacterial Glycoside Hydrolase 1 enzymes from several activities
- Authors: Veldman, Wayde Michael
- Date: 2022-04-08
- Subjects: Glycosidases , Bioinformatics , Molecular dynamics , Ligands (Biochemistry) , Enzymes , Ligand binding (Biochemistry) , Sequence alignment (Bioinformatics) , Structural bioinformatics
- Language: English
- Type: Doctoral thesis , text
- Identifier: http://hdl.handle.net/10962/233805 , vital:50129 , DOI 10.21504/10962/233810
- Description: Glycoside hydrolase 1 (GH1) enzymes are a ubiquitous family of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. Despite their conserved catalytic domain, these enzymes have many different enzyme activities and/or substrate specificities as a change of only a few residues in the active site can alter their function. Most GH1 active site residues are situated in loop regions, and it is known that enzymes are more likely to develop new functions (broad specificity) if they possess an active site with a high proportion of loops. Furthermore, the GH1 active site consists of several subsites and cooperative binding makes the binding affinity of sites difficult to measure because the properties of one subsite are influenced by the binding of the other subsites. Extensive knowledge of protein-ligand interactions is critical to the comprehension of biology at the molecular level. However, the structural determinants and molecular details of GH1 ligand specificity and affinity are very broad, highly complex, not well understood, and therefore still need to be clarified. The aim of this study was to computationally characterise the activity of three newly solved GH1 crystallographic structures sent to us by our collaborators, and to provide evidence for their ligand-binding specificities. In addition, the differences in structural and biochemical contributions to enzyme specificity and/or function between different GH1 activities/enzymes was assessed, and the sequence/structure/function relationship of several activities of GH1 enzymes was analysed and compared. To accomplish the research aims, sequence analyses involving sequence identity, phylogenetics, and motif discovery were performed. As protein structure is more conserved than sequence, the discovered motifs were mapped to 3D structures for structural analysis and comparisons. To obtain information on enzyme mechanism or mode of action, as well as structure-function relationship, computational methods such as docking, molecular dynamics, binding free energy calculations, and essential dynamics were implemented. These computational approaches can provide information on the active site, binding residues, protein-ligand interactions, binding affinity, conformational change, and most structural or dynamic elements that play a role in enzyme function. The three new structures received from our collaborators are the first GH1 crystallographic structures from Bacillus licheniformis ever determined. As phospho-glycoside compounds were unavailable for purchase for use in activity assays, and as the active sites of the structures were absent of ligand, in silico docking and MD simulations were performed to provide evidence for their GH1 activities and substrate specificities. First though, the amino acid sequences of all known characterised bacterial GH1 enzymes were retrieved from the CAZy database and compared to the sequences of the three new B. licheniformis crystallographic structures which provided evidence of the putative 6Pβ-glucosidase activity of enzyme BlBglH, and dual 6Pβ-glucosidase/6Pβ-galactosidase (dual-phospho) activity of enzymes BlBglB and BlBglC. As all three enzymes were determined to be putative 6Pβ-glycosidase activity enzymes, much of the thesis focused on the overall analysis and comparison of the 6Pβ-glucosidase, 6Pβ-galactosidase, and dual-phospho activities that make up the 6Pβ-glycosidases. The 6Pβ-glycosidase active site residues were identified through consensus of binding interactions using all known 6Pβ-glycosidase PDB structures complexed complete ligand substrates. With regards to the 6Pβ-glucosidase activity, it was found that the L8b loop is longer and forms extra interactions with the L8a loop likely leading to increased L8 loop rigidity which would prevent the displacement of residue Ala423 ensuring a steric clash with galactoconfigured ligands and may engender substrate specificity for gluco-configured ligands only. Also, during molecular dynamics simulations using enzyme BlBglH (6Pβ-glucosidase activity), it was revealed that the favourable binding of substrate stabilises the loops that surround and make up the enzyme active site. Using the BlBglC (dual-phospho activity) enzyme structure with either galacto- (PNP6Pgal) or gluco-configured (PNP6Pglc) ligands, MD simulations in triplicate revealed important details of the broad specificity of dual-phospho activity enzymes. The ligand O4 hydroxyl position is the only difference between PNP6Pgal and PNP6Pgal, and it was found that residues Gln23 and Trp433 bind strongly to the ligand O3 hydroxyl group in the PNP6Pgal-enzyme complex, but to the ligand O4 hydroxyl group in the PNP6Pglc-enzyme complex. Also, His124 formed many hydrogen bonds with the PNP6Pgal O3 hydroxyl group but had none with PNP6Pglc. Alternatively, residues Tyr173, Tyr301, Gln302 and Thr321 formed hydrogen bonds with PNP6Pglc but not PNP6Pgal. Lastly, using multiple 3D structures from various GH1 activities, a large network of conserved interactions between active site residues (and other important residues) was uncovered, which most likely stabilise the loop regions that contain these residues, helping to retain their positions needed for binding molecules. Alternatively, there exists several differing residue-residue interactions when comparing each of the activities which could contribute towards individual activity substrate specificity by causing slightly different overall structure and malleability of the active site. Altogether, the findings in this thesis shed light on the function, mechanisms, dynamics, and ligand-binding of GH1 enzymes – particularly of the 6Pβ-glycosidase activities. , Thesis (PhD) -- Faculty of Science, Biochemistry and Microbiology, 2022
- Full Text:
- Date Issued: 2022-04-08
- Authors: Veldman, Wayde Michael
- Date: 2022-04-08
- Subjects: Glycosidases , Bioinformatics , Molecular dynamics , Ligands (Biochemistry) , Enzymes , Ligand binding (Biochemistry) , Sequence alignment (Bioinformatics) , Structural bioinformatics
- Language: English
- Type: Doctoral thesis , text
- Identifier: http://hdl.handle.net/10962/233805 , vital:50129 , DOI 10.21504/10962/233810
- Description: Glycoside hydrolase 1 (GH1) enzymes are a ubiquitous family of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. Despite their conserved catalytic domain, these enzymes have many different enzyme activities and/or substrate specificities as a change of only a few residues in the active site can alter their function. Most GH1 active site residues are situated in loop regions, and it is known that enzymes are more likely to develop new functions (broad specificity) if they possess an active site with a high proportion of loops. Furthermore, the GH1 active site consists of several subsites and cooperative binding makes the binding affinity of sites difficult to measure because the properties of one subsite are influenced by the binding of the other subsites. Extensive knowledge of protein-ligand interactions is critical to the comprehension of biology at the molecular level. However, the structural determinants and molecular details of GH1 ligand specificity and affinity are very broad, highly complex, not well understood, and therefore still need to be clarified. The aim of this study was to computationally characterise the activity of three newly solved GH1 crystallographic structures sent to us by our collaborators, and to provide evidence for their ligand-binding specificities. In addition, the differences in structural and biochemical contributions to enzyme specificity and/or function between different GH1 activities/enzymes was assessed, and the sequence/structure/function relationship of several activities of GH1 enzymes was analysed and compared. To accomplish the research aims, sequence analyses involving sequence identity, phylogenetics, and motif discovery were performed. As protein structure is more conserved than sequence, the discovered motifs were mapped to 3D structures for structural analysis and comparisons. To obtain information on enzyme mechanism or mode of action, as well as structure-function relationship, computational methods such as docking, molecular dynamics, binding free energy calculations, and essential dynamics were implemented. These computational approaches can provide information on the active site, binding residues, protein-ligand interactions, binding affinity, conformational change, and most structural or dynamic elements that play a role in enzyme function. The three new structures received from our collaborators are the first GH1 crystallographic structures from Bacillus licheniformis ever determined. As phospho-glycoside compounds were unavailable for purchase for use in activity assays, and as the active sites of the structures were absent of ligand, in silico docking and MD simulations were performed to provide evidence for their GH1 activities and substrate specificities. First though, the amino acid sequences of all known characterised bacterial GH1 enzymes were retrieved from the CAZy database and compared to the sequences of the three new B. licheniformis crystallographic structures which provided evidence of the putative 6Pβ-glucosidase activity of enzyme BlBglH, and dual 6Pβ-glucosidase/6Pβ-galactosidase (dual-phospho) activity of enzymes BlBglB and BlBglC. As all three enzymes were determined to be putative 6Pβ-glycosidase activity enzymes, much of the thesis focused on the overall analysis and comparison of the 6Pβ-glucosidase, 6Pβ-galactosidase, and dual-phospho activities that make up the 6Pβ-glycosidases. The 6Pβ-glycosidase active site residues were identified through consensus of binding interactions using all known 6Pβ-glycosidase PDB structures complexed complete ligand substrates. With regards to the 6Pβ-glucosidase activity, it was found that the L8b loop is longer and forms extra interactions with the L8a loop likely leading to increased L8 loop rigidity which would prevent the displacement of residue Ala423 ensuring a steric clash with galactoconfigured ligands and may engender substrate specificity for gluco-configured ligands only. Also, during molecular dynamics simulations using enzyme BlBglH (6Pβ-glucosidase activity), it was revealed that the favourable binding of substrate stabilises the loops that surround and make up the enzyme active site. Using the BlBglC (dual-phospho activity) enzyme structure with either galacto- (PNP6Pgal) or gluco-configured (PNP6Pglc) ligands, MD simulations in triplicate revealed important details of the broad specificity of dual-phospho activity enzymes. The ligand O4 hydroxyl position is the only difference between PNP6Pgal and PNP6Pgal, and it was found that residues Gln23 and Trp433 bind strongly to the ligand O3 hydroxyl group in the PNP6Pgal-enzyme complex, but to the ligand O4 hydroxyl group in the PNP6Pglc-enzyme complex. Also, His124 formed many hydrogen bonds with the PNP6Pgal O3 hydroxyl group but had none with PNP6Pglc. Alternatively, residues Tyr173, Tyr301, Gln302 and Thr321 formed hydrogen bonds with PNP6Pglc but not PNP6Pgal. Lastly, using multiple 3D structures from various GH1 activities, a large network of conserved interactions between active site residues (and other important residues) was uncovered, which most likely stabilise the loop regions that contain these residues, helping to retain their positions needed for binding molecules. Alternatively, there exists several differing residue-residue interactions when comparing each of the activities which could contribute towards individual activity substrate specificity by causing slightly different overall structure and malleability of the active site. Altogether, the findings in this thesis shed light on the function, mechanisms, dynamics, and ligand-binding of GH1 enzymes – particularly of the 6Pβ-glycosidase activities. , Thesis (PhD) -- Faculty of Science, Biochemistry and Microbiology, 2022
- Full Text:
- Date Issued: 2022-04-08
Computer aided approaches against Human African Trypanosomiasis
- Authors: Kimuda, Magambo Phillip
- Date: 2020
- Subjects: African trypanosomiasis , African trypanosomiasis -- Chemotherapy , Genomics , Macrophage migration inhibitory factor , Trypanosoma brucei , Pteridines , Tetrahydrofolate dehydrogenase , Adenylic acid , Molecular dynamics , Principal components analysis , Bioinformatics , Single nucleotide polymorphisms , Single Nucleotide Variants , Candidate Gene Association Study (CGAS)
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/142542 , vital:38089
- Description: The thesis presented here is divided into two parts under a common theme that is the use of computer based tools, genomics, and in vitro experiments to develop innovative ways of tackling Human African Trypanosomiasis (HAT). Part I of this thesis focused on the human host genetic determinants while Part II focused on the discovery of novel chemotherapeutics against the parasite. Part I is further sub-divided into two parts: The first involves a Candidate Gene Association Study (CGAS) on an African population to identify genetic determinants associated with disease and/or susceptibility to HAT. The second involves studying the effects of missense Single Nucleotide Variants (SNVs) on protein structure, dynamics, and function using Macrophage Migration Inhibitory Factor (MIF) as a case study. Part II is also sub-divided into two parts: The first involves a computer based rational drug discovery of potential inhibitors against the Trypanosoma the folate pathway; particularly by targeting Trypanosoma brucei Pteridine Reductase (TbPTR1) which is an enzyme used by trypanosomes to overcome T. brucei Dihydrofolate Reductase (TbDHFR) inhibition. Lastly the derivation of CHARMM force-field parameters that can be used to accurately model the geometry and dynamics of the T. brucei Phosphodiesterase B1 enzyme (TbrPDEB1) bimetallic active site center. The derived parameters were then used in MD simulations to characterise protein-ligand residue interactions that are important in TbrPDEB1 inhibition with the goal of targeting the cyclic Adenosine Monophosphate (cAMP) signalling pathway. In the CGAS we were unable to detect any genetic associations in the Ugandan cohort analysed that passed correction for multiple testing in spite of the study being sufficiently powered. Additionally, our study found no association of the Apo lipoprotein 1 (APOL1) G2 allele association with protection against acute HAT that has been previously reported. Future investigations for example, Genome Wide Association Studies using larger samples sizes (>3000 cases and controls) are required. Macrophage migration inhibitory factor (MIF) is a cytokine that is important in both innate and adaptive immunity that has been shown to play a role in T. brucei pathogenicity using murine models. A total of 27 missense SNVs were modelled using homology modelling to create MIF protein mutants that were investigated using in silico effect prediction tools, molecular dynamics (MD), Principal Component Analysis (PCA), and Dynamic Residue Network (DRN) analysis. Our results demonstrate that mutations P2Q, I5M, P16Q, L23F, T24S, T31I, Y37H, H41P, M48V, P44L, G52C, S54R, I65M, I68T, S75F, N106S, and T113S caused significant conformational changes. Further, DRN analysis showed that residues P2, T31, Y37, G52, I65, I68, S75, N106, and T113S are part of a similar local residue interaction network with functional significance. These results show how polymorphisms such as missense SNVs can affect protein conformation, dynamics, and function. Trypanosomes are auxotrophic for folates and pterins but require them for survival. They scavenge them from their hosts. PTR1 is a multifunctional enzyme that is unique to trypanosomatids that reduces both pterins and folates. In the presence of DHFR inhibitors, PTR1 is over-expressed thus providing an escape from the effects of DHFR inhibition. Both TbPTR1 and TbDHFR are pharmacologically and genetically validated drug targets. In this study 5742 compounds were screened using molecular docking, and 13 promising binding modes were further analysed using MD simulations. The trajectories were analysed using RMSD, Rg, RMSF, PCA, Essential Dynamics Analysis (EDA), Molecular Mechanics Poisson–Boltzmann surface area (MM-PBSA) binding free energy calculations, and DRN analysis. The computational screening approach allowed us to identify five of the compounds, named RUBi004, RUBi007, RUBi014, RUBi016 and RUBi018 that exhibited antitrypanosomal growth activities against trypanosomes in culture with IC50 values of 12.5 ± 4.8 μM, 32.4 ± 4.2 μM, 5.9 ± 1.4 μM, 28.2 ± 3.3 μM, and 9.7 ± 2.1 μM, respectively. Further when used in combination with WR99210 a known TbDHFR inhibitor RUBi004, RUBi007, RUBi014 and RUBi018 showed antagonism while RUBi016 showed an additive effect. These results indicate that the four compounds might be competing with TbDHFR while RUBi016 might be more specific for TbPTR1. These compounds provide scaffolds that can be further optimised to improve their potency and specificity. Lastly, using a systematic approach we derived CHARMM force-field parameters to accurately describe the TbrPDEB1 bi-metal catalytic center. For dynamics, we employed mixed bonded and non-bonded approach. We optimised the structure using a two-layer QM/MM ONIOM (B3LYP/6-31(g): UFF). The TbrPDEB1 bi-metallic center bonds, angles, and dihedrals were parameterized by fitting the energy profiles from Potential Energy Surface (PES) scans to the CHARMM potential energy function. The parameters were validated by means of MD simulations and analysed using RMSD, Rg, RMSF, hydrogen bonding, bond/angle/dihedral evaluations, EDA, PCA, and DRN analysis. The force-field parameters were able to accurately reproduce the geometry and dynamics of the TbrPDEB1 bi-metal catalytic center during MD simulations. Molecular docking was used to identify 6 potential hits, that inhibited trypanosome growth in vitro. The derived force-field parameters were used to simulate the 6 protein-ligand complexes with the aim of elucidating crucial protein-ligand residue interactions. Using the most potent ligand RUBi022 that had an IC50 of 14.96 μM we were able to identify key residue interactions that can be of use in in silico prediction of potential TbrPDEB1 inhibitors. Overall we demonstrate how bioinformatics tools can complement current disease eradication strategies. Future work will focus on identifying variants identified in Genome Wide Association Studies and partnering with wet labs to carry out further enzyme-ligand activity relationship studies, structure determination or characterisation of appropriate protein-ligand complexes by crystallography, and site specific mutation studies
- Full Text:
- Date Issued: 2020
- Authors: Kimuda, Magambo Phillip
- Date: 2020
- Subjects: African trypanosomiasis , African trypanosomiasis -- Chemotherapy , Genomics , Macrophage migration inhibitory factor , Trypanosoma brucei , Pteridines , Tetrahydrofolate dehydrogenase , Adenylic acid , Molecular dynamics , Principal components analysis , Bioinformatics , Single nucleotide polymorphisms , Single Nucleotide Variants , Candidate Gene Association Study (CGAS)
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/142542 , vital:38089
- Description: The thesis presented here is divided into two parts under a common theme that is the use of computer based tools, genomics, and in vitro experiments to develop innovative ways of tackling Human African Trypanosomiasis (HAT). Part I of this thesis focused on the human host genetic determinants while Part II focused on the discovery of novel chemotherapeutics against the parasite. Part I is further sub-divided into two parts: The first involves a Candidate Gene Association Study (CGAS) on an African population to identify genetic determinants associated with disease and/or susceptibility to HAT. The second involves studying the effects of missense Single Nucleotide Variants (SNVs) on protein structure, dynamics, and function using Macrophage Migration Inhibitory Factor (MIF) as a case study. Part II is also sub-divided into two parts: The first involves a computer based rational drug discovery of potential inhibitors against the Trypanosoma the folate pathway; particularly by targeting Trypanosoma brucei Pteridine Reductase (TbPTR1) which is an enzyme used by trypanosomes to overcome T. brucei Dihydrofolate Reductase (TbDHFR) inhibition. Lastly the derivation of CHARMM force-field parameters that can be used to accurately model the geometry and dynamics of the T. brucei Phosphodiesterase B1 enzyme (TbrPDEB1) bimetallic active site center. The derived parameters were then used in MD simulations to characterise protein-ligand residue interactions that are important in TbrPDEB1 inhibition with the goal of targeting the cyclic Adenosine Monophosphate (cAMP) signalling pathway. In the CGAS we were unable to detect any genetic associations in the Ugandan cohort analysed that passed correction for multiple testing in spite of the study being sufficiently powered. Additionally, our study found no association of the Apo lipoprotein 1 (APOL1) G2 allele association with protection against acute HAT that has been previously reported. Future investigations for example, Genome Wide Association Studies using larger samples sizes (>3000 cases and controls) are required. Macrophage migration inhibitory factor (MIF) is a cytokine that is important in both innate and adaptive immunity that has been shown to play a role in T. brucei pathogenicity using murine models. A total of 27 missense SNVs were modelled using homology modelling to create MIF protein mutants that were investigated using in silico effect prediction tools, molecular dynamics (MD), Principal Component Analysis (PCA), and Dynamic Residue Network (DRN) analysis. Our results demonstrate that mutations P2Q, I5M, P16Q, L23F, T24S, T31I, Y37H, H41P, M48V, P44L, G52C, S54R, I65M, I68T, S75F, N106S, and T113S caused significant conformational changes. Further, DRN analysis showed that residues P2, T31, Y37, G52, I65, I68, S75, N106, and T113S are part of a similar local residue interaction network with functional significance. These results show how polymorphisms such as missense SNVs can affect protein conformation, dynamics, and function. Trypanosomes are auxotrophic for folates and pterins but require them for survival. They scavenge them from their hosts. PTR1 is a multifunctional enzyme that is unique to trypanosomatids that reduces both pterins and folates. In the presence of DHFR inhibitors, PTR1 is over-expressed thus providing an escape from the effects of DHFR inhibition. Both TbPTR1 and TbDHFR are pharmacologically and genetically validated drug targets. In this study 5742 compounds were screened using molecular docking, and 13 promising binding modes were further analysed using MD simulations. The trajectories were analysed using RMSD, Rg, RMSF, PCA, Essential Dynamics Analysis (EDA), Molecular Mechanics Poisson–Boltzmann surface area (MM-PBSA) binding free energy calculations, and DRN analysis. The computational screening approach allowed us to identify five of the compounds, named RUBi004, RUBi007, RUBi014, RUBi016 and RUBi018 that exhibited antitrypanosomal growth activities against trypanosomes in culture with IC50 values of 12.5 ± 4.8 μM, 32.4 ± 4.2 μM, 5.9 ± 1.4 μM, 28.2 ± 3.3 μM, and 9.7 ± 2.1 μM, respectively. Further when used in combination with WR99210 a known TbDHFR inhibitor RUBi004, RUBi007, RUBi014 and RUBi018 showed antagonism while RUBi016 showed an additive effect. These results indicate that the four compounds might be competing with TbDHFR while RUBi016 might be more specific for TbPTR1. These compounds provide scaffolds that can be further optimised to improve their potency and specificity. Lastly, using a systematic approach we derived CHARMM force-field parameters to accurately describe the TbrPDEB1 bi-metal catalytic center. For dynamics, we employed mixed bonded and non-bonded approach. We optimised the structure using a two-layer QM/MM ONIOM (B3LYP/6-31(g): UFF). The TbrPDEB1 bi-metallic center bonds, angles, and dihedrals were parameterized by fitting the energy profiles from Potential Energy Surface (PES) scans to the CHARMM potential energy function. The parameters were validated by means of MD simulations and analysed using RMSD, Rg, RMSF, hydrogen bonding, bond/angle/dihedral evaluations, EDA, PCA, and DRN analysis. The force-field parameters were able to accurately reproduce the geometry and dynamics of the TbrPDEB1 bi-metal catalytic center during MD simulations. Molecular docking was used to identify 6 potential hits, that inhibited trypanosome growth in vitro. The derived force-field parameters were used to simulate the 6 protein-ligand complexes with the aim of elucidating crucial protein-ligand residue interactions. Using the most potent ligand RUBi022 that had an IC50 of 14.96 μM we were able to identify key residue interactions that can be of use in in silico prediction of potential TbrPDEB1 inhibitors. Overall we demonstrate how bioinformatics tools can complement current disease eradication strategies. Future work will focus on identifying variants identified in Genome Wide Association Studies and partnering with wet labs to carry out further enzyme-ligand activity relationship studies, structure determination or characterisation of appropriate protein-ligand complexes by crystallography, and site specific mutation studies
- Full Text:
- Date Issued: 2020
Generation of a virtual library of terpenes using graph theory, and its application in exploration of the mechanisms of terpene biosynthesis
- Authors: Dendera, Washington
- Date: 2020
- Subjects: Terpenes , Plants -- Metabolism , Computational biology , Bioinformatics , Organic compounds -- Synthesis , Monoterpenes , Molecular biology -- Computer simulation
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/123453 , vital:35439
- Description: Terpenes form a large group of organic compounds which have proven to be of use to many living organisms being used by plants for metabolism (Pichersky and Gershenzon, 1934; McGarvey and Croteau, 1995; Gershenzon and Dudareva, 2007), defence or as a means to attract pollinators and also used by humans in medical, pharmaceutical and food industry (Bicas, Dionísio and Pastore, 2009; Marmulla and Harder, 2014; Kandi et al., 2015). Following on literature methods to generate chemical libraries using graph theoretic techniques, complete libraries of all possible terpene isomers have been constructed with the goal of construction of derivative libraries of possible carbocation intermediates which are important in the elucidation of mechanisms in the biosynthesis of terpenes. Virtual library generation of monoterpenes was first achieved by generating graphs of order 7, 8, 9 and 10 using the Nauty and Traces suite. These were screened and processed with a set of collated Python scripts written to recognize the graphs in text format and translate them to molecules, minimizing through Tinker whilst discarding graphs that violate chemistry laws. As a result of the computational time required only order 7 and order 10 graphs were processed. Out of the 873 graphs generated from order seven, 353 were converted to molecules and from the 11,7 million produced from order 10 half were processed resulting in the production of 442928 compounds (repeats included). For screening, 55 366 compounds were docked in the active site of limonene synthase; of these 2355 ligands had a good Vina docking score with a binding energy of between -7.0 and -7.4 kcal.mol-1. When these best docked molecules were overlaid in the active site a map of possible ligand positions within the active site of limonene synthase was traced out.
- Full Text:
- Date Issued: 2020
- Authors: Dendera, Washington
- Date: 2020
- Subjects: Terpenes , Plants -- Metabolism , Computational biology , Bioinformatics , Organic compounds -- Synthesis , Monoterpenes , Molecular biology -- Computer simulation
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/123453 , vital:35439
- Description: Terpenes form a large group of organic compounds which have proven to be of use to many living organisms being used by plants for metabolism (Pichersky and Gershenzon, 1934; McGarvey and Croteau, 1995; Gershenzon and Dudareva, 2007), defence or as a means to attract pollinators and also used by humans in medical, pharmaceutical and food industry (Bicas, Dionísio and Pastore, 2009; Marmulla and Harder, 2014; Kandi et al., 2015). Following on literature methods to generate chemical libraries using graph theoretic techniques, complete libraries of all possible terpene isomers have been constructed with the goal of construction of derivative libraries of possible carbocation intermediates which are important in the elucidation of mechanisms in the biosynthesis of terpenes. Virtual library generation of monoterpenes was first achieved by generating graphs of order 7, 8, 9 and 10 using the Nauty and Traces suite. These were screened and processed with a set of collated Python scripts written to recognize the graphs in text format and translate them to molecules, minimizing through Tinker whilst discarding graphs that violate chemistry laws. As a result of the computational time required only order 7 and order 10 graphs were processed. Out of the 873 graphs generated from order seven, 353 were converted to molecules and from the 11,7 million produced from order 10 half were processed resulting in the production of 442928 compounds (repeats included). For screening, 55 366 compounds were docked in the active site of limonene synthase; of these 2355 ligands had a good Vina docking score with a binding energy of between -7.0 and -7.4 kcal.mol-1. When these best docked molecules were overlaid in the active site a map of possible ligand positions within the active site of limonene synthase was traced out.
- Full Text:
- Date Issued: 2020
Bioinformatics tool development with a focus on structural bioinformatics and the analysis of genetic variation in humans
- Authors: Brown, David K
- Date: 2018
- Subjects: Bioinformatics , Human genetics -- Variation , High performance computing , Workflow management systems , Molecular dynamics , Next generation sequencing , Human Mutation Analysis (HUMA)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/60708 , vital:27820
- Description: This thesis is divided into three parts, united under the general theme of bioinformatics tool development and variation analysis. Part 1 describes the design and development of the Job Management System (JMS), a workflow management system for high performance computing (HPC). HPC has become an integral part of bioinformatics. Computational methods for molecular dynamics and next generation sequencing (NGS) analysis, which require complex calculations on large datasets, are not yet feasible on desktop computers. As such, powerful computer clusters have been employed to perform these calculations. However, making use of these HPC clusters requires familiarity with command line interfaces. This excludes a large number of researchers from taking advantage of these resources. JMS was developed as a tool to make it easier for researchers without a computer science background to make use of HPC. Additionally, JMS can be used to host computational tools and pipelines and generates both web-based interfaces and RESTful APIs for those tools. The web-based interfaces can be used to quickly and easily submit jobs to the underlying cluster. The RESTful web API, on the other hand, allows JMS to provided backend functionality for external tools and web servers that want to run jobs on the cluster. Numerous tools and workflows have already been added to JMS, several of which have been incorporated into external web servers. One such web server is the Human Mutation Analysis (HUMA) web server and database. HUMA, the topic of part 2 of this thesis, is a platform for the analysis of genetic variation in humans. HUMA aggregates data from various existing databases into a single, connected and related database. The advantages of this are realized in the powerful querying abilities that it provides. HUMA includes protein, gene, disease, and variation data and can be searched from the angle of any one of these categories. For example, searching for a protein will return the protein data (e.g. protein sequences, structures, domains and families, and other meta-data). However, the related nature of the database means that genes, diseases, variation, and literature related to the protein will also be returned, giving users a powerful and holistic view of all data associated with the protein. HUMA also provides links to the original sources of the data, allowing users to follow the links to find additional details. HUMA aims to be a platform for the analysis of genetic variation. As such, it also provides tools to visualize and analyse the data (several of which run on the underlying cluster, via JMS). These tools include alignment and 3D structure visualization, homology modeling, variant analysis, and the ability to upload custom variation datasets and map them to proteins, genes and diseases. HUMA also provides collaboration features, allowing users to share and discuss datasets and job results. Finally, part 3 of this thesis focused on the development of a suite of tools, MD-TASK, to analyse genetic variation at the protein structure level via network analysis of molecular dynamics simulations. The use of MD-TASK in combination with the tools developed in the previous parts of this thesis is showcased via the analysis of variation in the renin-angiotensinogen complex, a vital part of the renin-angiotensin system.
- Full Text:
- Date Issued: 2018
- Authors: Brown, David K
- Date: 2018
- Subjects: Bioinformatics , Human genetics -- Variation , High performance computing , Workflow management systems , Molecular dynamics , Next generation sequencing , Human Mutation Analysis (HUMA)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/60708 , vital:27820
- Description: This thesis is divided into three parts, united under the general theme of bioinformatics tool development and variation analysis. Part 1 describes the design and development of the Job Management System (JMS), a workflow management system for high performance computing (HPC). HPC has become an integral part of bioinformatics. Computational methods for molecular dynamics and next generation sequencing (NGS) analysis, which require complex calculations on large datasets, are not yet feasible on desktop computers. As such, powerful computer clusters have been employed to perform these calculations. However, making use of these HPC clusters requires familiarity with command line interfaces. This excludes a large number of researchers from taking advantage of these resources. JMS was developed as a tool to make it easier for researchers without a computer science background to make use of HPC. Additionally, JMS can be used to host computational tools and pipelines and generates both web-based interfaces and RESTful APIs for those tools. The web-based interfaces can be used to quickly and easily submit jobs to the underlying cluster. The RESTful web API, on the other hand, allows JMS to provided backend functionality for external tools and web servers that want to run jobs on the cluster. Numerous tools and workflows have already been added to JMS, several of which have been incorporated into external web servers. One such web server is the Human Mutation Analysis (HUMA) web server and database. HUMA, the topic of part 2 of this thesis, is a platform for the analysis of genetic variation in humans. HUMA aggregates data from various existing databases into a single, connected and related database. The advantages of this are realized in the powerful querying abilities that it provides. HUMA includes protein, gene, disease, and variation data and can be searched from the angle of any one of these categories. For example, searching for a protein will return the protein data (e.g. protein sequences, structures, domains and families, and other meta-data). However, the related nature of the database means that genes, diseases, variation, and literature related to the protein will also be returned, giving users a powerful and holistic view of all data associated with the protein. HUMA also provides links to the original sources of the data, allowing users to follow the links to find additional details. HUMA aims to be a platform for the analysis of genetic variation. As such, it also provides tools to visualize and analyse the data (several of which run on the underlying cluster, via JMS). These tools include alignment and 3D structure visualization, homology modeling, variant analysis, and the ability to upload custom variation datasets and map them to proteins, genes and diseases. HUMA also provides collaboration features, allowing users to share and discuss datasets and job results. Finally, part 3 of this thesis focused on the development of a suite of tools, MD-TASK, to analyse genetic variation at the protein structure level via network analysis of molecular dynamics simulations. The use of MD-TASK in combination with the tools developed in the previous parts of this thesis is showcased via the analysis of variation in the renin-angiotensinogen complex, a vital part of the renin-angiotensin system.
- Full Text:
- Date Issued: 2018
The investigation of type-specific features of the copper coordinating AA9 proteins and their effect on the interaction with crystalline cellulose using molecular dynamics studies
- Authors: Moses, Vuyani
- Date: 2018
- Subjects: Copper proteins , Cellulose , Molecular dynamics , Cellulose -- Biodegradation , Bioinformatics
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/58327 , vital:27230
- Description: AA9 proteins are metallo-enzymes which are crucial for the early stages of cellulose degradation. AA9 proteins have been suggested to cleave glycosidic bonds linking cellulose through the use of their Cu2+ coordinating active site. AA9 proteins possess different regioselectivities depending on the resulting cleavage they form and as result, are grouped accordingly. Type 1 AA9 proteins cleave the C1 carbon of cellulose while Type 2 AA9 proteins cleave the C4 carbon and Type 3 AA9 proteins cleave either C1 or C4 carbons. The steric congestion of the AA9 active site has been proposed to be a contributor to the observed regioselectivity. As such, a bioinformatics characterisation of type-specific sequence and structural features was performed. Initially AA9 protein sequences were obtained from the Pfam database and multiple sequence alignment was performed. The sequences were phylogenetically characterised and sequences were grouped into their respective types and sub-groups were identified. A selection analysis was performed on AA9 LPMO types to determine the selective pressure acting on AA9 protein residues. Motif discovery was then performed to identify conserved sequence motifs in AA9 proteins. Once type-specific sequence features were identified structural mapping was performed to assess possible effects on substrate interaction. Physicochemical property analysis was also performed to assess biochemical differences between AA9 LPMO types. Molecular dynamics (MD) simulations were then employed to dynamically assess the consequences of the discovered type-specific features on AA9-cellulose interaction. Due to the absence of AA9 specific force field parameters MD simulations were not readily applicable. As a result, Potential Energy Surface (PES) scans were performed to evaluate the force field parameters for the AA9 active site using the PM6 semi empirical approach and least squares fitting. A Type 1 AA9 active site was constructed from the crystal structure 4B5Q, encompassing only the Cu2+ coordinating residues, the Cu2+ ion and two water residues. Due to the similarity in AA9 active sites, the Type force field parameters were validated on all three AA9 LPMO types. Two MD simulations for each AA9 LPMO types were conducted using two separate Lennard-Jones parameter sets. Once completed, the MD trajectories were analysed for various features including the RMSD, RMSF, radius of gyration, coordination during simulation, hydrogen bonding, secondary structure conservation and overall protein movement. Force field parameters were successfully evaluated and validated for AA9 proteins. MD simulations of AA9 proteins were able to reveal the presence of unique type-specific binding modes of AA9 active sites to cellulose. These binding modes were characterised by the presence of unique type-specific loops which were present in Type 2 and 3 AA9 proteins but not in Type 1 AA9 proteins. The loops were found to result in steric congestion that affects how the Cu2+ ion interacts with cellulose. As a result, Cu2+ binding to cellulose was observed for Type 1 and not Type 2 and 3 AA9 proteins. In this study force field parameters have been evaluated for the Type 1 active site of AA9 proteins and this parameters were evaluated on all three types and binding. Future work will focus on identifying the nature of the reactive oxygen species and performing QM/MM calculations to elucidate the reactive mechanism of all three AA9 LPMO types.
- Full Text:
- Date Issued: 2018
- Authors: Moses, Vuyani
- Date: 2018
- Subjects: Copper proteins , Cellulose , Molecular dynamics , Cellulose -- Biodegradation , Bioinformatics
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/58327 , vital:27230
- Description: AA9 proteins are metallo-enzymes which are crucial for the early stages of cellulose degradation. AA9 proteins have been suggested to cleave glycosidic bonds linking cellulose through the use of their Cu2+ coordinating active site. AA9 proteins possess different regioselectivities depending on the resulting cleavage they form and as result, are grouped accordingly. Type 1 AA9 proteins cleave the C1 carbon of cellulose while Type 2 AA9 proteins cleave the C4 carbon and Type 3 AA9 proteins cleave either C1 or C4 carbons. The steric congestion of the AA9 active site has been proposed to be a contributor to the observed regioselectivity. As such, a bioinformatics characterisation of type-specific sequence and structural features was performed. Initially AA9 protein sequences were obtained from the Pfam database and multiple sequence alignment was performed. The sequences were phylogenetically characterised and sequences were grouped into their respective types and sub-groups were identified. A selection analysis was performed on AA9 LPMO types to determine the selective pressure acting on AA9 protein residues. Motif discovery was then performed to identify conserved sequence motifs in AA9 proteins. Once type-specific sequence features were identified structural mapping was performed to assess possible effects on substrate interaction. Physicochemical property analysis was also performed to assess biochemical differences between AA9 LPMO types. Molecular dynamics (MD) simulations were then employed to dynamically assess the consequences of the discovered type-specific features on AA9-cellulose interaction. Due to the absence of AA9 specific force field parameters MD simulations were not readily applicable. As a result, Potential Energy Surface (PES) scans were performed to evaluate the force field parameters for the AA9 active site using the PM6 semi empirical approach and least squares fitting. A Type 1 AA9 active site was constructed from the crystal structure 4B5Q, encompassing only the Cu2+ coordinating residues, the Cu2+ ion and two water residues. Due to the similarity in AA9 active sites, the Type force field parameters were validated on all three AA9 LPMO types. Two MD simulations for each AA9 LPMO types were conducted using two separate Lennard-Jones parameter sets. Once completed, the MD trajectories were analysed for various features including the RMSD, RMSF, radius of gyration, coordination during simulation, hydrogen bonding, secondary structure conservation and overall protein movement. Force field parameters were successfully evaluated and validated for AA9 proteins. MD simulations of AA9 proteins were able to reveal the presence of unique type-specific binding modes of AA9 active sites to cellulose. These binding modes were characterised by the presence of unique type-specific loops which were present in Type 2 and 3 AA9 proteins but not in Type 1 AA9 proteins. The loops were found to result in steric congestion that affects how the Cu2+ ion interacts with cellulose. As a result, Cu2+ binding to cellulose was observed for Type 1 and not Type 2 and 3 AA9 proteins. In this study force field parameters have been evaluated for the Type 1 active site of AA9 proteins and this parameters were evaluated on all three types and binding. Future work will focus on identifying the nature of the reactive oxygen species and performing QM/MM calculations to elucidate the reactive mechanism of all three AA9 LPMO types.
- Full Text:
- Date Issued: 2018
Comparative study of clan CA cysteine proteases: an insight into the protozoan parasites
- Authors: Moyo, Sipho Dugunye
- Date: 2015
- Subjects: Cysteine proteinases , Proteolytic enzymes , Protozoan diseases , Parasites , Protozoan diseases -- Chemotherapy , Bioinformatics , Plasmodium , Antiprotozoal agents
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4165 , http://hdl.handle.net/10962/d1020309
- Description: Protozoan infections such as Malaria, Leishmaniasis, Toxoplasmosis, Chaga’s disease and African trypanosomiasis caused by the Plasmodium, Leishmania, Toxoplasma and Trypanosoma genuses respectively; inflict a huge economic, health and social impact in endemic regions particularly tropical and sub-tropical regions. The combined infections are estimated at over a billion annually and approximately 1.1 million deaths annually. The global burden of the protozoan infections is worsened by the increased drug resistance, toxicity and the relatively high cost of treatment and prophylaxis. Therefore there has been a high demand for new drugs and drug targets that play a role in parasite virulence. Cysteine proteases have been validated as viable drug targets due to their role in the infectivity stage of the parasites within the human host. There is a variety of cysteine proteases hence they are subdivided into families and in this study we focus on the clan CA, papain family C1 proteases. The current inhibitors for the protozoan cysteine proteases lack selectivity and specificity which contributes to drug toxicity. Therefore there is a need to identify the differences and similarities between the host, vector and protozoan proteases. This study uses a variety of bioinformatics tools to assess these differences and similarities. The Plasmodium cysteine protease FP-2 is the most characterized protease hence it was used as a reference to all the other proteases and its homologs were retrieved, aligned and the evolutionary relationships established. The homologs were also analysed for common motifs and the physicochemical properties determined which were validated using the Kruskal-Wallis test. These analyses revealed that the host and vector cathepsins share similar properties while the parasite cathepsins differ. At sub-site level sub-site 2 showed greater variations suggesting diverse ligand specificity within the proteases, a revelation that is vital in the design of antiprotozoan inhibitors.
- Full Text:
- Date Issued: 2015
- Authors: Moyo, Sipho Dugunye
- Date: 2015
- Subjects: Cysteine proteinases , Proteolytic enzymes , Protozoan diseases , Parasites , Protozoan diseases -- Chemotherapy , Bioinformatics , Plasmodium , Antiprotozoal agents
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4165 , http://hdl.handle.net/10962/d1020309
- Description: Protozoan infections such as Malaria, Leishmaniasis, Toxoplasmosis, Chaga’s disease and African trypanosomiasis caused by the Plasmodium, Leishmania, Toxoplasma and Trypanosoma genuses respectively; inflict a huge economic, health and social impact in endemic regions particularly tropical and sub-tropical regions. The combined infections are estimated at over a billion annually and approximately 1.1 million deaths annually. The global burden of the protozoan infections is worsened by the increased drug resistance, toxicity and the relatively high cost of treatment and prophylaxis. Therefore there has been a high demand for new drugs and drug targets that play a role in parasite virulence. Cysteine proteases have been validated as viable drug targets due to their role in the infectivity stage of the parasites within the human host. There is a variety of cysteine proteases hence they are subdivided into families and in this study we focus on the clan CA, papain family C1 proteases. The current inhibitors for the protozoan cysteine proteases lack selectivity and specificity which contributes to drug toxicity. Therefore there is a need to identify the differences and similarities between the host, vector and protozoan proteases. This study uses a variety of bioinformatics tools to assess these differences and similarities. The Plasmodium cysteine protease FP-2 is the most characterized protease hence it was used as a reference to all the other proteases and its homologs were retrieved, aligned and the evolutionary relationships established. The homologs were also analysed for common motifs and the physicochemical properties determined which were validated using the Kruskal-Wallis test. These analyses revealed that the host and vector cathepsins share similar properties while the parasite cathepsins differ. At sub-site level sub-site 2 showed greater variations suggesting diverse ligand specificity within the proteases, a revelation that is vital in the design of antiprotozoan inhibitors.
- Full Text:
- Date Issued: 2015
A central enrichment-based comparison of two alternative methods of generating transcription factor binding motifs from protein binding microarray data
- Authors: Mahaye, Ntombikayise
- Date: 2013 , 2013-03-13
- Subjects: Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3890 , http://hdl.handle.net/10962/d1003049 , Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Description: Characterising transcription factor binding sites (TFBS) is an important problem in bioinformatics, since predicting binding sites has many applications such as predicting gene regulation. ChIP-seq is a powerful in vivo method for generating genome-wide putative binding regions for transcription factors (TFs). CentriMo is an algorithm that measures central enrichment of a motif and has previously been used as motif enrichment analysis (MEA) tool. CentriMo uses the fact that ChIP-seq peak calling methods are likely to be biased towards the centre of the putative binding region, at least in cases where there is direct binding. CentriMo calculates a binomial p-value representing central enrichment, based on the central bias of the binding site with the highest likelihood ratio. In cases where binding is indirect or involves cofactors, a more complex distribution of preferred binding sites may occur but, in many cases, a low CentriMo p-value and low width of maximum enrichment (about 100bp) are strong evidence that the motif in question is the true binding motif. Several other MEA tools have been developed, but they do not consider motif central enrichment. The study investigates the claim made by Zhao and Stormo (2011) that they have identified a simpler method than that used to derive the UniPROBE motif database for creating motifs from protein binding microarray (PBM) data, which they call BEEML-PBM (Binding Energy Estimation by Maximum Likelihood-PBM). To accomplish this, CentriMo is employed on 13 motifs from both motif databases. The results indicate that there is no conclusive difference in the quality of motifs from the original PBM and BEEML-PBM approaches. CentriMo provides an understanding of the mechanisms by which TFs bind to DNA. Out of 13 TFs for which ChIP-seq data is used, BEEML-PBM reports five better motifs and twice it has not had any central enrichment when the best PBM motif does. PBM approach finds seven motifs with better central enrichment. On the other hand, across all variations, the number of examples where PBM is better is not high enough to conclude that it is overall the better approach. Some TFs bind directly to DNA, some indirect or in combination with other TFs. Some of the predicted mechanisms are supported by literature evidence. This study further revealed that the binding specificity of a TF is different in different cell types and development stages. A TF is up-regulated in a cell line where it performs its biological function. The discovery of cell line differences, which has not been done before in any CentriMo study, is interesting and provides reasons to study this further.
- Full Text:
- Date Issued: 2013
- Authors: Mahaye, Ntombikayise
- Date: 2013 , 2013-03-13
- Subjects: Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3890 , http://hdl.handle.net/10962/d1003049 , Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Description: Characterising transcription factor binding sites (TFBS) is an important problem in bioinformatics, since predicting binding sites has many applications such as predicting gene regulation. ChIP-seq is a powerful in vivo method for generating genome-wide putative binding regions for transcription factors (TFs). CentriMo is an algorithm that measures central enrichment of a motif and has previously been used as motif enrichment analysis (MEA) tool. CentriMo uses the fact that ChIP-seq peak calling methods are likely to be biased towards the centre of the putative binding region, at least in cases where there is direct binding. CentriMo calculates a binomial p-value representing central enrichment, based on the central bias of the binding site with the highest likelihood ratio. In cases where binding is indirect or involves cofactors, a more complex distribution of preferred binding sites may occur but, in many cases, a low CentriMo p-value and low width of maximum enrichment (about 100bp) are strong evidence that the motif in question is the true binding motif. Several other MEA tools have been developed, but they do not consider motif central enrichment. The study investigates the claim made by Zhao and Stormo (2011) that they have identified a simpler method than that used to derive the UniPROBE motif database for creating motifs from protein binding microarray (PBM) data, which they call BEEML-PBM (Binding Energy Estimation by Maximum Likelihood-PBM). To accomplish this, CentriMo is employed on 13 motifs from both motif databases. The results indicate that there is no conclusive difference in the quality of motifs from the original PBM and BEEML-PBM approaches. CentriMo provides an understanding of the mechanisms by which TFs bind to DNA. Out of 13 TFs for which ChIP-seq data is used, BEEML-PBM reports five better motifs and twice it has not had any central enrichment when the best PBM motif does. PBM approach finds seven motifs with better central enrichment. On the other hand, across all variations, the number of examples where PBM is better is not high enough to conclude that it is overall the better approach. Some TFs bind directly to DNA, some indirect or in combination with other TFs. Some of the predicted mechanisms are supported by literature evidence. This study further revealed that the binding specificity of a TF is different in different cell types and development stages. A TF is up-regulated in a cell line where it performs its biological function. The discovery of cell line differences, which has not been done before in any CentriMo study, is interesting and provides reasons to study this further.
- Full Text:
- Date Issued: 2013
Information Flow and Introduction to Bioinformicts: BCH 323
- Bradley, G, Mabinya, L, Wilhelmi, B
- Authors: Bradley, G , Mabinya, L , Wilhelmi, B
- Date: 2010-02
- Subjects: Bioinformatics
- Language: English
- Type: Examination paper
- Identifier: vital:17850 , http://hdl.handle.net/10353/d1010478
- Description: Information Flow and Introduction to Bioinformicts: BCH 323, February 2010.
- Full Text: false
- Date Issued: 2010-02
- Authors: Bradley, G , Mabinya, L , Wilhelmi, B
- Date: 2010-02
- Subjects: Bioinformatics
- Language: English
- Type: Examination paper
- Identifier: vital:17850 , http://hdl.handle.net/10353/d1010478
- Description: Information Flow and Introduction to Bioinformicts: BCH 323, February 2010.
- Full Text: false
- Date Issued: 2010-02
A comparative bioinformatic analysis of zinc binuclear cluster proteins
- Authors: Mthombeni, Jabulani S
- Date: 2005
- Subjects: Bioinformatics , Zinc proteins , GABA
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4004 , http://hdl.handle.net/10962/d1004064 , Bioinformatics , Zinc proteins , GABA
- Description: Members of the zinc binuclear cluster family are important fungal transcriptional regulators sharing a common DNA binding domain. Da181p is a pleotropic zinc binuclear cluster protein involved in the induction of the UGA genes required for the γ-aminobutyrate nitrogen catabolic pathway in Saccharomyces cerevisiae. The zinc binuclear cluster domain is indispensable for function in Da181p and little is known about other domains in this protein. The aim of the study was to explore the zinc binuclear cluster protein family using comparative bioinformatics as a complement to biochemical and structural approaches. A database of all zinc binuclear cluster proteins was composed. A total of 118 zinc binuclear proteins are reported in this work. Thirty nine previously unidentified zinc binuclear cluster proteins were found. Four homologues of Da181p were identified by homology searching. Important sequence motifs were identified in the aligned sequences of Da181p and its homologues. The coiled coil motif found in the Ga14p zinc binuclear cluster protein could not be identified in Da181p and its homologues. This suggested that Da181p did not dimerise through this structural motif as other zinc binuclear cluster proteins. Solvent accessible site that could be phosphorylated by protein kinase C or casein kinase II and the role of such sites in the possible regulation of Da181p function were discussed.
- Full Text:
- Date Issued: 2005
- Authors: Mthombeni, Jabulani S
- Date: 2005
- Subjects: Bioinformatics , Zinc proteins , GABA
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4004 , http://hdl.handle.net/10962/d1004064 , Bioinformatics , Zinc proteins , GABA
- Description: Members of the zinc binuclear cluster family are important fungal transcriptional regulators sharing a common DNA binding domain. Da181p is a pleotropic zinc binuclear cluster protein involved in the induction of the UGA genes required for the γ-aminobutyrate nitrogen catabolic pathway in Saccharomyces cerevisiae. The zinc binuclear cluster domain is indispensable for function in Da181p and little is known about other domains in this protein. The aim of the study was to explore the zinc binuclear cluster protein family using comparative bioinformatics as a complement to biochemical and structural approaches. A database of all zinc binuclear cluster proteins was composed. A total of 118 zinc binuclear proteins are reported in this work. Thirty nine previously unidentified zinc binuclear cluster proteins were found. Four homologues of Da181p were identified by homology searching. Important sequence motifs were identified in the aligned sequences of Da181p and its homologues. The coiled coil motif found in the Ga14p zinc binuclear cluster protein could not be identified in Da181p and its homologues. This suggested that Da181p did not dimerise through this structural motif as other zinc binuclear cluster proteins. Solvent accessible site that could be phosphorylated by protein kinase C or casein kinase II and the role of such sites in the possible regulation of Da181p function were discussed.
- Full Text:
- Date Issued: 2005
Identification of cis-elements and transacting factors involved in the abiotic stress responses of plants
- Authors: Maclear, Athlee
- Date: 2005 , 2013-06-10
- Subjects: Plants -- Effect of stress on , Proteins -- Analysis , Bioinformatics , DNA , Plant genetics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4074 , http://hdl.handle.net/10962/d1007236 , Plants -- Effect of stress on , Proteins -- Analysis , Bioinformatics , DNA , Plant genetics
- Description: Many stress situations limit plant growth, resulting in crop production difficulties. Population growth, limited availability and over-utilization of arable land, and intolerant crop species have resulted in tremendous strain being placed on agriculturalists to produce enough to sustain the world's population. An understanding of the principles involved in plant resistance to environmental stress will enable scientists to harness these mechanisms to create stress-tolerant crop species, thus increasing crop production, and enabling the farming of previously unproductive land. This research project uses computational and bioinformatics techniques to explore the promoter regions of genes, encoding proteins that are up- or down-regulated in response to specific abiotic stresses, with the aim of identifying common patterns in the cis-elements governing the regulation of these abiotic stress responsive genes. An initial dataset of fifty known genes encoding for proteins reported to be up- or down-regulated in response to plant stresses that result in water-deficit at the cellular level viz. drought, low temperature, and salinity, were identified, and a postgreSQL database created to store relevant information pertaining to these genes and the proteins encoded by them. The genomic DNA was obtained where possible, and the promoter and intron regions identified. The Neural Network Promoter Prediction (NNPP) software package was used to predict the transcription start signal (TSS) and the promoter searching software tool, TESS (Transcription Element Search Software) used to identify known and user-defined cis-elements within the promoter regions of these genes. Currently available promoter prediction software analysis tools are reported to predict one promoter per kilobase of DNA, whilst functional promoters are thought to only occur one in 30-40 kilobases, which indicates that a large perccntage of predictions are likely to be false positives (pedersen et. al., 1999). NNPP was chosen as it was rated as the highest performing promoter prediction software tool by Fickett and Hatzigeorgiou (1997) in a thorough review of eukaryotic promoter prediction algorithms, however results were less than promising as very few predicted TSS were identified in the area 50 bps up- and downstream of the gene start site, where biologically functional TSSs are known to occur (Reese, 2000; Fickett and Hatzigeorgiou, 1997). TESS results seemed to support the hypothesis that drought, low-temperature and high salinity plant stress response proteins have similar as-elements in their promoter regions, and suggested links to various other gene regulation mechanisms viz. gibberellin-, light-, auxin- and development-regulated gene expression, highlighting the vast complexity of plant stress response processes. Although far from conclusive, results provide a valuable basis for future comparative promoter studies that will attempt to deduce possible common transcriptional initiation of abiotic stress response genes. , KMBT_363 , Adobe Acrobat 9.54 Paper Capture Plug-in
- Full Text:
- Date Issued: 2005
- Authors: Maclear, Athlee
- Date: 2005 , 2013-06-10
- Subjects: Plants -- Effect of stress on , Proteins -- Analysis , Bioinformatics , DNA , Plant genetics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4074 , http://hdl.handle.net/10962/d1007236 , Plants -- Effect of stress on , Proteins -- Analysis , Bioinformatics , DNA , Plant genetics
- Description: Many stress situations limit plant growth, resulting in crop production difficulties. Population growth, limited availability and over-utilization of arable land, and intolerant crop species have resulted in tremendous strain being placed on agriculturalists to produce enough to sustain the world's population. An understanding of the principles involved in plant resistance to environmental stress will enable scientists to harness these mechanisms to create stress-tolerant crop species, thus increasing crop production, and enabling the farming of previously unproductive land. This research project uses computational and bioinformatics techniques to explore the promoter regions of genes, encoding proteins that are up- or down-regulated in response to specific abiotic stresses, with the aim of identifying common patterns in the cis-elements governing the regulation of these abiotic stress responsive genes. An initial dataset of fifty known genes encoding for proteins reported to be up- or down-regulated in response to plant stresses that result in water-deficit at the cellular level viz. drought, low temperature, and salinity, were identified, and a postgreSQL database created to store relevant information pertaining to these genes and the proteins encoded by them. The genomic DNA was obtained where possible, and the promoter and intron regions identified. The Neural Network Promoter Prediction (NNPP) software package was used to predict the transcription start signal (TSS) and the promoter searching software tool, TESS (Transcription Element Search Software) used to identify known and user-defined cis-elements within the promoter regions of these genes. Currently available promoter prediction software analysis tools are reported to predict one promoter per kilobase of DNA, whilst functional promoters are thought to only occur one in 30-40 kilobases, which indicates that a large perccntage of predictions are likely to be false positives (pedersen et. al., 1999). NNPP was chosen as it was rated as the highest performing promoter prediction software tool by Fickett and Hatzigeorgiou (1997) in a thorough review of eukaryotic promoter prediction algorithms, however results were less than promising as very few predicted TSS were identified in the area 50 bps up- and downstream of the gene start site, where biologically functional TSSs are known to occur (Reese, 2000; Fickett and Hatzigeorgiou, 1997). TESS results seemed to support the hypothesis that drought, low-temperature and high salinity plant stress response proteins have similar as-elements in their promoter regions, and suggested links to various other gene regulation mechanisms viz. gibberellin-, light-, auxin- and development-regulated gene expression, highlighting the vast complexity of plant stress response processes. Although far from conclusive, results provide a valuable basis for future comparative promoter studies that will attempt to deduce possible common transcriptional initiation of abiotic stress response genes. , KMBT_363 , Adobe Acrobat 9.54 Paper Capture Plug-in
- Full Text:
- Date Issued: 2005
Stress-inducible protein 1: a bioinformatic analysis of the human, mouse and yeast STI1 gene structure
- Authors: Aken, Bronwen Louise
- Date: 2005
- Subjects: Molecular chaperones , Proteins -- Analysis , Heat shock proteins , Bioinformatics , Genetics -- Data processing
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3990 , http://hdl.handle.net/10962/d1004049 , Molecular chaperones , Proteins -- Analysis , Heat shock proteins , Bioinformatics , Genetics -- Data processing
- Description: Stress-inducible protein 1 (Sti1) is a 60 kDa eukaryotic protein that is important under stress and non-stress conditions. Human Sti1 is also known as the Hsp70/Hsp90 organising protein (Hop) that coordinates the functional cooperation of heat shock protein 70 (Hsp70) and heat shock protein 90 (Hsp90) during the folding of various transcription factors and kinases, including certain oncogenic proteins and prion proteins. Limited studies have been conducted on the STI1 gene structure. Thus, the aim of this study was to develop a comprehensive description of human STI1 (hSTI1), mouse STI1 (mSTI1), and yeast STI1 (ySTI1) genes, using a bioinformatic approach. Genes encoded near the STI1 loci were identified for the three organisms using National Centre for Biotechnology Information (NCBI) MapViewer and the Saccharomyces Genome Database. Exon/intron boundaries were predicted using Hidden Markov model gene prediction software (HMMGene) and Genscan, and by alignment of the mRNA sequence with the genomic DNA sequence. Transcription factor binding sites (TFBS) were predicted by scanning the region 1000 base pairs (bp) upstream of the STI1 orthologues’ transcription start site (TSS) with Alibaba, Transcription element search software (TESS) and Transcription factor search (TFSearch). The promoter region was defined by comparing the number, type and position of TFBS across the orthologous STI1 genes. Additional putative TFBS were identified for ySTI1 by searching with software that aligns nucleic acid conserved elements (AlignACE) for over-represented motifs in the region upstream of the TSS of genes thought to be co-regulated with ySTI1. This study showed that hSTI1 and mSTI1 occur in a region of synteny with a number of genes of related function. Both hSTI1 and mSTI1 comprised 14 putative exons, while ySTI1 was encoded on a single exon. Human and mouse STI1 shared a perfectly conserved 55 bp region spanning their predicted TSS, although their TATA boxes were not conserved. A putative CpG island was identified in the region from -500 to +100 bp relative to the hSTI1 and mSTI1 TSS. This region overlapped with a region of high TFBS density, suggesting that the core promoter region was located in the region approximately 100 to 200 bp upstream of the TSS. Several conserved clusters of TFBS were also identified upstream of this promoter region, including binding sites for stimulatory protein 1 (Sp1), heat shock factor (HSF), nuclear factor kappa B (NF-kappaB), and the cAMP/enhancer binding protein (C/EBP). Microarray data suggested that ySTI1 was co-regulated with several heat shock proteins and substrates of the Hsp70/Hsp90 heterocomplex, and several putative regulatory elements were identified in the upstream region of these co-regulated genes, including a motif for HSF binding. The results of this research suggest several avenues of future experimental work, including the confirmation of the proposed core promoter, upstream regulatory elements, and CpG island, and the investigation into the co-regulation of mammalian STI1 with its surrounding genes. These results could also be used to inform STI1 gene knockout experiments in mice, to assess the biological importance of mammalian STI1.
- Full Text:
- Date Issued: 2005
- Authors: Aken, Bronwen Louise
- Date: 2005
- Subjects: Molecular chaperones , Proteins -- Analysis , Heat shock proteins , Bioinformatics , Genetics -- Data processing
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3990 , http://hdl.handle.net/10962/d1004049 , Molecular chaperones , Proteins -- Analysis , Heat shock proteins , Bioinformatics , Genetics -- Data processing
- Description: Stress-inducible protein 1 (Sti1) is a 60 kDa eukaryotic protein that is important under stress and non-stress conditions. Human Sti1 is also known as the Hsp70/Hsp90 organising protein (Hop) that coordinates the functional cooperation of heat shock protein 70 (Hsp70) and heat shock protein 90 (Hsp90) during the folding of various transcription factors and kinases, including certain oncogenic proteins and prion proteins. Limited studies have been conducted on the STI1 gene structure. Thus, the aim of this study was to develop a comprehensive description of human STI1 (hSTI1), mouse STI1 (mSTI1), and yeast STI1 (ySTI1) genes, using a bioinformatic approach. Genes encoded near the STI1 loci were identified for the three organisms using National Centre for Biotechnology Information (NCBI) MapViewer and the Saccharomyces Genome Database. Exon/intron boundaries were predicted using Hidden Markov model gene prediction software (HMMGene) and Genscan, and by alignment of the mRNA sequence with the genomic DNA sequence. Transcription factor binding sites (TFBS) were predicted by scanning the region 1000 base pairs (bp) upstream of the STI1 orthologues’ transcription start site (TSS) with Alibaba, Transcription element search software (TESS) and Transcription factor search (TFSearch). The promoter region was defined by comparing the number, type and position of TFBS across the orthologous STI1 genes. Additional putative TFBS were identified for ySTI1 by searching with software that aligns nucleic acid conserved elements (AlignACE) for over-represented motifs in the region upstream of the TSS of genes thought to be co-regulated with ySTI1. This study showed that hSTI1 and mSTI1 occur in a region of synteny with a number of genes of related function. Both hSTI1 and mSTI1 comprised 14 putative exons, while ySTI1 was encoded on a single exon. Human and mouse STI1 shared a perfectly conserved 55 bp region spanning their predicted TSS, although their TATA boxes were not conserved. A putative CpG island was identified in the region from -500 to +100 bp relative to the hSTI1 and mSTI1 TSS. This region overlapped with a region of high TFBS density, suggesting that the core promoter region was located in the region approximately 100 to 200 bp upstream of the TSS. Several conserved clusters of TFBS were also identified upstream of this promoter region, including binding sites for stimulatory protein 1 (Sp1), heat shock factor (HSF), nuclear factor kappa B (NF-kappaB), and the cAMP/enhancer binding protein (C/EBP). Microarray data suggested that ySTI1 was co-regulated with several heat shock proteins and substrates of the Hsp70/Hsp90 heterocomplex, and several putative regulatory elements were identified in the upstream region of these co-regulated genes, including a motif for HSF binding. The results of this research suggest several avenues of future experimental work, including the confirmation of the proposed core promoter, upstream regulatory elements, and CpG island, and the investigation into the co-regulation of mammalian STI1 with its surrounding genes. These results could also be used to inform STI1 gene knockout experiments in mice, to assess the biological importance of mammalian STI1.
- Full Text:
- Date Issued: 2005
The role of parallel computing in bioinformatics
- Authors: Akhurst, Timothy John
- Date: 2005
- Subjects: Bioinformatics , Parallel programming (Computer science) , LINDA (Computer system) , Java (Computer program language) , Parallel processing (Electronic computers) , Genomics -- Data processing
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3986 , http://hdl.handle.net/10962/d1004045 , Bioinformatics , Parallel programming (Computer science) , LINDA (Computer system) , Java (Computer program language) , Parallel processing (Electronic computers) , Genomics -- Data processing
- Description: The need to intelligibly capture, manage and analyse the ever-increasing amount of publicly available genomic data is one of the challenges facing bioinformaticians today. Such analyses are in fact impractical using uniprocessor machines, which has led to an increasing reliance on clusters of commodity-priced computers. An existing network of cheap, commodity PCs was utilised as a single computational resource for parallel computing. The performance of the cluster was investigated using a whole genome-scanning program written in the Java programming language. The TSpaces framework, based on the Linda parallel programming model, was used to parallelise the application. Maximum speedup was achieved at between 30 and 50 processors, depending on the size of the genome being scanned. Together with this, the associated significant reductions in wall-clock time suggest that both parallel computing and Java have a significant role to play in the field of bioinformatics.
- Full Text:
- Date Issued: 2005
- Authors: Akhurst, Timothy John
- Date: 2005
- Subjects: Bioinformatics , Parallel programming (Computer science) , LINDA (Computer system) , Java (Computer program language) , Parallel processing (Electronic computers) , Genomics -- Data processing
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3986 , http://hdl.handle.net/10962/d1004045 , Bioinformatics , Parallel programming (Computer science) , LINDA (Computer system) , Java (Computer program language) , Parallel processing (Electronic computers) , Genomics -- Data processing
- Description: The need to intelligibly capture, manage and analyse the ever-increasing amount of publicly available genomic data is one of the challenges facing bioinformaticians today. Such analyses are in fact impractical using uniprocessor machines, which has led to an increasing reliance on clusters of commodity-priced computers. An existing network of cheap, commodity PCs was utilised as a single computational resource for parallel computing. The performance of the cluster was investigated using a whole genome-scanning program written in the Java programming language. The TSpaces framework, based on the Linda parallel programming model, was used to parallelise the application. Maximum speedup was achieved at between 30 and 50 processors, depending on the size of the genome being scanned. Together with this, the associated significant reductions in wall-clock time suggest that both parallel computing and Java have a significant role to play in the field of bioinformatics.
- Full Text:
- Date Issued: 2005
- «
- ‹
- 1
- ›
- »