Challenges with modelling transcription factor binding
- Machanick, Philip, Kibet, Caleb K
- Authors: Machanick, Philip , Kibet, Caleb K
- Date: 2017
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/439158 , vital:73551 , 10.1109/NEXTCOMP.2017.8016178
- Description: Modelling transcription factor binding presents a number of challenges. In its simplest form, binding can be modelled by a consensus sequence but a number of factors including degeneracy of binding sites, alternative modes of binding and differences between artificially-constructed and in vivo DNA make modelling binding complex. In this paper we outline difficulties and report on progress with improving modelling of binding. We focus on improving measurement of binding models, a necessary prerequisite for finding better models.
- Full Text:
- Date Issued: 2017
- Authors: Machanick, Philip , Kibet, Caleb K
- Date: 2017
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/439158 , vital:73551 , 10.1109/NEXTCOMP.2017.8016178
- Description: Modelling transcription factor binding presents a number of challenges. In its simplest form, binding can be modelled by a consensus sequence but a number of factors including degeneracy of binding sites, alternative modes of binding and differences between artificially-constructed and in vivo DNA make modelling binding complex. In this paper we outline difficulties and report on progress with improving modelling of binding. We focus on improving measurement of binding models, a necessary prerequisite for finding better models.
- Full Text:
- Date Issued: 2017
MARS: Motif Assessment and Ranking Suite for transcription factor binding motifs
- Kibet, Caleb K, Machanick, Philip
- Authors: Kibet, Caleb K , Machanick, Philip
- Date: 2016
- Language: English
- Type: article , text
- Identifier: http://hdl.handle.net/10962/61155 , vital:27985 , http://dx.doi.org/10.1101/065615
- Description: We describe MARS (Motif Assessment and Ranking Suite), a web-based suite of tools used to evaluate and rank PWM-based motifs. The increased number of learned motif models that are spread across databases and in different PWM formats, leading to a choice dilemma among the users, is our motivation. This increase has been driven by the difficulty of modelling transcription factor binding sites and the advance in high-throughput sequencing technologies at a continually reducing cost. Therefore, several experimental techniques have been developed resulting in diverse motif-finding algorithms and databases. We collate a wide variety of available motifs into a benchmark database, including the corresponding experimental ChIP-seq and PBM data obtained from ENCODE and UniPROBE databases, respectively. The implemented tools include: a data-independent consistency-based motif assessment and ranking (CB-MAR), which is based on the idea that `correct motifs' are more similar to each other while incorrect motifs will differ from each other; and a scoring and classification-based algorithms, which rank binding models by their ability to discriminate sequences known to contain binding sites from those without. The CB-MAR and scoring techniques have a 0.86 and 0.73 median rank correlation using ChIP-seq and PBM respectively. Best motifs selected by CB-MAR achieve a mean AUC of 0.75, comparable to those ranked by held out data at 0.76 { this is based on ChIP-seq motif discovery using five algorithms on 110 transcription factors. We have demonstrated the benefit of this web server in motif choice and ranking, as well as in motif.
- Full Text:
- Date Issued: 2016
- Authors: Kibet, Caleb K , Machanick, Philip
- Date: 2016
- Language: English
- Type: article , text
- Identifier: http://hdl.handle.net/10962/61155 , vital:27985 , http://dx.doi.org/10.1101/065615
- Description: We describe MARS (Motif Assessment and Ranking Suite), a web-based suite of tools used to evaluate and rank PWM-based motifs. The increased number of learned motif models that are spread across databases and in different PWM formats, leading to a choice dilemma among the users, is our motivation. This increase has been driven by the difficulty of modelling transcription factor binding sites and the advance in high-throughput sequencing technologies at a continually reducing cost. Therefore, several experimental techniques have been developed resulting in diverse motif-finding algorithms and databases. We collate a wide variety of available motifs into a benchmark database, including the corresponding experimental ChIP-seq and PBM data obtained from ENCODE and UniPROBE databases, respectively. The implemented tools include: a data-independent consistency-based motif assessment and ranking (CB-MAR), which is based on the idea that `correct motifs' are more similar to each other while incorrect motifs will differ from each other; and a scoring and classification-based algorithms, which rank binding models by their ability to discriminate sequences known to contain binding sites from those without. The CB-MAR and scoring techniques have a 0.86 and 0.73 median rank correlation using ChIP-seq and PBM respectively. Best motifs selected by CB-MAR achieve a mean AUC of 0.75, comparable to those ranked by held out data at 0.76 { this is based on ChIP-seq motif discovery using five algorithms on 110 transcription factors. We have demonstrated the benefit of this web server in motif choice and ranking, as well as in motif.
- Full Text:
- Date Issued: 2016
Transcription factor motif quality assessment requires systematic comparative analysis [version 2; referees: 2 approved]
- Kibet, Caleb K, Machanick, Philip
- Authors: Kibet, Caleb K , Machanick, Philip
- Date: 2016
- Language: English
- Type: article , text
- Identifier: http://hdl.handle.net/10962/61169 , vital:27987 , http://dx.doi.org/10.12688/f1000research.7408.2
- Description: Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. We also demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis.
- Full Text:
- Date Issued: 2016
- Authors: Kibet, Caleb K , Machanick, Philip
- Date: 2016
- Language: English
- Type: article , text
- Identifier: http://hdl.handle.net/10962/61169 , vital:27987 , http://dx.doi.org/10.12688/f1000research.7408.2
- Description: Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. We also demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis.
- Full Text:
- Date Issued: 2016
- «
- ‹
- 1
- ›
- »