A model for measuring and predicting stress for software developers using vital signs and activities
- Authors: Hibbers, Ilze
- Date: 2024-04
- Subjects: Machine learning , Neural networks (Computer science) , Computer software developers
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/63799 , vital:73614
- Description: Occupational stress is a well-recognised issue that affects individuals in various professions and industries. Reducing occupational stress has multiple benefits, such as improving employee's health and performance. This study proposes a model to measure and predict occupational stress using data collected in a real IT office environment. Different data sources, such as questionnaires, application software (RescueTime) and Fitbit smartwatches were used for collecting heart rate (HR), facial emotions, computer interactions, and application usage. The results of the Demand Control Support and Effort and Reward questionnaires indicated that the participants experienced high social support and an average level of workload. Participants also reported their daily perceived stress and workload level using a 5- point score. The perceived stress of the participants was overall neutral. There was no correlation found between HR, interactions, fear, and meetings. K-means and Bernoulli algorithms were applied to the dataset and two well-separated clusters were formed. The centroids indicated that higher heart rates were grouped either with meetings or had a higher difference in the center point values for interactions. Silhouette scores and 5-fold-validation were used to measure the accuracy of the clusters. However, these clusters were unable to predict the daily reported stress levels. Calculations were done on the computer usage data to measure interaction speeds and time spent working, in meetings, or away from the computer. These calculations were used as input into a decision tree with the reported daily stress levels. The results of the tree helped to identify which patterns lead to stressful days. The results indicated that days with high time pressure led to more reported stress. A new, more general tree was developed, which was able to predict 82 per cent of the daily stress reported. The main discovery of the research was that stress does not have a straightforward connection with computer interactions, facial emotions, or meetings. High interactions sometimes lead to stress and other times do not. So, predicting stress involves finding patterns and how data from different data sources interact with each other. Future work will revolve around validating the model in more office environments around South Africa. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
A model for measuring and predicting stress for software developers using vital signs and activities
- Authors: Hibbers, Ilze
- Date: 2024-04
- Subjects: Machine learning , Neural networks (Computer science) , Computer software developers
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/63799 , vital:73614
- Description: Occupational stress is a well-recognised issue that affects individuals in various professions and industries. Reducing occupational stress has multiple benefits, such as improving employee's health and performance. This study proposes a model to measure and predict occupational stress using data collected in a real IT office environment. Different data sources, such as questionnaires, application software (RescueTime) and Fitbit smartwatches were used for collecting heart rate (HR), facial emotions, computer interactions, and application usage. The results of the Demand Control Support and Effort and Reward questionnaires indicated that the participants experienced high social support and an average level of workload. Participants also reported their daily perceived stress and workload level using a 5- point score. The perceived stress of the participants was overall neutral. There was no correlation found between HR, interactions, fear, and meetings. K-means and Bernoulli algorithms were applied to the dataset and two well-separated clusters were formed. The centroids indicated that higher heart rates were grouped either with meetings or had a higher difference in the center point values for interactions. Silhouette scores and 5-fold-validation were used to measure the accuracy of the clusters. However, these clusters were unable to predict the daily reported stress levels. Calculations were done on the computer usage data to measure interaction speeds and time spent working, in meetings, or away from the computer. These calculations were used as input into a decision tree with the reported daily stress levels. The results of the tree helped to identify which patterns lead to stressful days. The results indicated that days with high time pressure led to more reported stress. A new, more general tree was developed, which was able to predict 82 per cent of the daily stress reported. The main discovery of the research was that stress does not have a straightforward connection with computer interactions, facial emotions, or meetings. High interactions sometimes lead to stress and other times do not. So, predicting stress involves finding patterns and how data from different data sources interact with each other. Future work will revolve around validating the model in more office environments around South Africa. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
Augmenting the Moore-Penrose generalised Inverse to train neural networks
- Authors: Fang, Bobby
- Date: 2024-04
- Subjects: Neural networks (Computer science) , Machine learning , Mathematical optimization -- Computer programs
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/63755 , vital:73595
- Description: An Extreme Learning Machine (ELM) is a non-iterative and fast feedforward neural network training algorithm which uses the Moore-Penrose generalised inverse of a matrix to compute the weights of the output layer of the neural network, using a random initialisation for the hidden layer. While ELM has been used to train feedforward neural networks, the effectiveness of the MP generalised to train recurrent neural networks is yet to be investigated. The primary aim of this research was to investigate how biases in the output layer and the MP generalised inverse can be used to train recurrent neural networks. To accomplish this, the Bias Augmented ELM (BA-ELM), which concatenated the hidden layer output matrix with a ones-column vector to simulate the biases in the output layer, was proposed. A variety of datasets generated from optimisation test functions, as well as using real-world regression and classification datasets, were used to validate BA-ELM. The results showed in specific circumstances that BA-ELM was able to perform better than ELM. Following this, Recurrent ELM (R-ELM) was proposed which uses a recurrent hidden layer instead of a feedforward hidden layer. Recurrent neural networks also rely on having functional feedback connections in the recurrent layer. A hybrid training algorithm, Recurrent Hybrid ELM (R-HELM), was proposed, which uses a gradient-based algorithm to optimise the recurrent layer and the MP generalised inverse to compute the output weights. The evaluation of R-ELM and R-HELM algorithms were carried out using three different recurrent architectures on two recurrent tasks derived from the Susceptible- Exposed-Infected-Removed (SEIR) epidemiology model. Various training hyperparameters were evaluated through hyperparameter investigations to investigate their effectiveness on the hybrid training algorithm. With optimal hyperparameters, the hybrid training algorithm was able to achieve better performance than the conventional gradient-based algorithm. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
- Authors: Fang, Bobby
- Date: 2024-04
- Subjects: Neural networks (Computer science) , Machine learning , Mathematical optimization -- Computer programs
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/63755 , vital:73595
- Description: An Extreme Learning Machine (ELM) is a non-iterative and fast feedforward neural network training algorithm which uses the Moore-Penrose generalised inverse of a matrix to compute the weights of the output layer of the neural network, using a random initialisation for the hidden layer. While ELM has been used to train feedforward neural networks, the effectiveness of the MP generalised to train recurrent neural networks is yet to be investigated. The primary aim of this research was to investigate how biases in the output layer and the MP generalised inverse can be used to train recurrent neural networks. To accomplish this, the Bias Augmented ELM (BA-ELM), which concatenated the hidden layer output matrix with a ones-column vector to simulate the biases in the output layer, was proposed. A variety of datasets generated from optimisation test functions, as well as using real-world regression and classification datasets, were used to validate BA-ELM. The results showed in specific circumstances that BA-ELM was able to perform better than ELM. Following this, Recurrent ELM (R-ELM) was proposed which uses a recurrent hidden layer instead of a feedforward hidden layer. Recurrent neural networks also rely on having functional feedback connections in the recurrent layer. A hybrid training algorithm, Recurrent Hybrid ELM (R-HELM), was proposed, which uses a gradient-based algorithm to optimise the recurrent layer and the MP generalised inverse to compute the output weights. The evaluation of R-ELM and R-HELM algorithms were carried out using three different recurrent architectures on two recurrent tasks derived from the Susceptible- Exposed-Infected-Removed (SEIR) epidemiology model. Various training hyperparameters were evaluated through hyperparameter investigations to investigate their effectiveness on the hybrid training algorithm. With optimal hyperparameters, the hybrid training algorithm was able to achieve better performance than the conventional gradient-based algorithm. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
Computer vision as a tool for tracking gastropod chemical trails
- Authors: Viviers, Andre
- Date: 2024-04
- Subjects: Computers , Electronic data processing , Machine learning
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/64863 , vital:73934
- Description: The difficulties encountered in previous gastropod research with human intervention (Raw, Miranda, & Perissinotto, 2013) inspired this dissertation. More specifically the tedious task of human intervention in the tracking of gastropod chemical trails, which is a time-consuming and error-prone exercise. In this study, computer vision is proposed as an alternative to human intervention. A machine learning literature review was conducted to identify relevant methodologies and techniques for the research. Furthermore, it investigates data preprocessing techniques on a variety of different data types. This sets the stage for a deeper investigation of techniques used for pre-processing image and video data. Following that, another literature review delved deeper into the computer vision pipeline. The review is divided into two parts: data pre-processing and model training. First, it provides a deeper investigation into relevant data pre-processing techniques for use in constructing a dataset comprised of gastropod images. Following that, it delves into the complexities of training a computer vision model. The study then investigates convolutional neural networks, revealing the neural networks’ suitability in image/video processing. A convolutional neural network is selected as the foundation for the best-effort model. This serves as the foundation for the subsequent experimental research. The first part of the experimental work involves creating a labelled dataset from the video dataset provided by Raw et al. (2013). By employing data preprocessing techniques in a strategic manner, an unlabeled dataset is generated. Then a labelled dataset is generated using a simple K-Means clustering algorithm and manual labelling. Thereafter, a best-effort model is trained to detect gastropods within images using this dataset. After making the labelled dataset, the next step in the exploration is to build a prototype that can find gastropods and draw trace lines based on their movement. Five evaluation runs serve to gauge the prototype’s effectiveness. Videos with varying properties from the original dataset are purposefully chosen for each run. The prototype’s trace lines are compared to the original dataset’s human-drawn pathways. The versatility of the prototype is demonstrated in the final evaluation by generating fine-grained trace lines post-processing. This enables the plot to be adjusted to different parameters based on the characteristics that the resulting plot should have. Through the versatility and accuracy demonstrated by the evaluation runs, this research found that a gastropod tracking solution based on computer vision can alleviate human intervention. The dissertation concludes with a discourse on the lessons learned from the research study. These are presented as guidelines to aid future work in developing a gastropod tracking solution based on computer vision. , Thesis (MIT) -- Faculty of Engineering, the Built Environment, and Technology, School of Information Technology, 2024
- Full Text:
- Date Issued: 2024-04
- Authors: Viviers, Andre
- Date: 2024-04
- Subjects: Computers , Electronic data processing , Machine learning
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/64863 , vital:73934
- Description: The difficulties encountered in previous gastropod research with human intervention (Raw, Miranda, & Perissinotto, 2013) inspired this dissertation. More specifically the tedious task of human intervention in the tracking of gastropod chemical trails, which is a time-consuming and error-prone exercise. In this study, computer vision is proposed as an alternative to human intervention. A machine learning literature review was conducted to identify relevant methodologies and techniques for the research. Furthermore, it investigates data preprocessing techniques on a variety of different data types. This sets the stage for a deeper investigation of techniques used for pre-processing image and video data. Following that, another literature review delved deeper into the computer vision pipeline. The review is divided into two parts: data pre-processing and model training. First, it provides a deeper investigation into relevant data pre-processing techniques for use in constructing a dataset comprised of gastropod images. Following that, it delves into the complexities of training a computer vision model. The study then investigates convolutional neural networks, revealing the neural networks’ suitability in image/video processing. A convolutional neural network is selected as the foundation for the best-effort model. This serves as the foundation for the subsequent experimental research. The first part of the experimental work involves creating a labelled dataset from the video dataset provided by Raw et al. (2013). By employing data preprocessing techniques in a strategic manner, an unlabeled dataset is generated. Then a labelled dataset is generated using a simple K-Means clustering algorithm and manual labelling. Thereafter, a best-effort model is trained to detect gastropods within images using this dataset. After making the labelled dataset, the next step in the exploration is to build a prototype that can find gastropods and draw trace lines based on their movement. Five evaluation runs serve to gauge the prototype’s effectiveness. Videos with varying properties from the original dataset are purposefully chosen for each run. The prototype’s trace lines are compared to the original dataset’s human-drawn pathways. The versatility of the prototype is demonstrated in the final evaluation by generating fine-grained trace lines post-processing. This enables the plot to be adjusted to different parameters based on the characteristics that the resulting plot should have. Through the versatility and accuracy demonstrated by the evaluation runs, this research found that a gastropod tracking solution based on computer vision can alleviate human intervention. The dissertation concludes with a discourse on the lessons learned from the research study. These are presented as guidelines to aid future work in developing a gastropod tracking solution based on computer vision. , Thesis (MIT) -- Faculty of Engineering, the Built Environment, and Technology, School of Information Technology, 2024
- Full Text:
- Date Issued: 2024-04
Supporting competitive robot game mission planning using machine learning
- Authors: Strydom, Elton
- Date: 2024-04
- Subjects: Machine learning , High performance computing , Robotics , LEGO Mindstorms toys Computer programming
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/64841 , vital:73929
- Description: This dissertation presents a study aimed at supporting the strategic planning and execution of missions in competitive robot games, particularly in the FIRST LEGO® League (FLL), through the use of machine learning techniques. The primary objective is to formulate guidelines for evaluating mission strategies using machine learning techniques within the FLL landscape, thereby supporting participants in the mission strategy design journey within the FLL robot game. The research methodology encompasses a literature review, focusing on the current practices in the FLL mission strategy design process. This is followed by a literature review of machine learning techniques on a broad level pivoting towards evolutionary algorithms. The study then delves into the specifics of genetic algorithms, exploring their suitability and potential advantages for mission strategy evaluation in competitive robotic environments within the FLL robot game. A significant portion of the research involves the development and testing of a prototype system that applies a genetic algorithm to simulate and evaluate different mission strategies, providing a practical tool for FLL teams. During the development of the evaluation prototype, guidelines were formulated aligning with the primary research objective which is to formulate guidelines for evaluating mission strategies in robot games using machine learning techniques. Key findings of this study highlight the effectiveness of genetic algorithms in identifying optimal mission strategies. The prototype demonstrates the feasibility of using machine learning to provide real-time, feedback to participating teams, enabling more informed decision-making in the formulation of mission strategies. , Thesis (MIT) -- Faculty of Engineering, the Built Environment, and Technology, School of Information Technology, 2024
- Full Text:
- Date Issued: 2024-04
- Authors: Strydom, Elton
- Date: 2024-04
- Subjects: Machine learning , High performance computing , Robotics , LEGO Mindstorms toys Computer programming
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/64841 , vital:73929
- Description: This dissertation presents a study aimed at supporting the strategic planning and execution of missions in competitive robot games, particularly in the FIRST LEGO® League (FLL), through the use of machine learning techniques. The primary objective is to formulate guidelines for evaluating mission strategies using machine learning techniques within the FLL landscape, thereby supporting participants in the mission strategy design journey within the FLL robot game. The research methodology encompasses a literature review, focusing on the current practices in the FLL mission strategy design process. This is followed by a literature review of machine learning techniques on a broad level pivoting towards evolutionary algorithms. The study then delves into the specifics of genetic algorithms, exploring their suitability and potential advantages for mission strategy evaluation in competitive robotic environments within the FLL robot game. A significant portion of the research involves the development and testing of a prototype system that applies a genetic algorithm to simulate and evaluate different mission strategies, providing a practical tool for FLL teams. During the development of the evaluation prototype, guidelines were formulated aligning with the primary research objective which is to formulate guidelines for evaluating mission strategies in robot games using machine learning techniques. Key findings of this study highlight the effectiveness of genetic algorithms in identifying optimal mission strategies. The prototype demonstrates the feasibility of using machine learning to provide real-time, feedback to participating teams, enabling more informed decision-making in the formulation of mission strategies. , Thesis (MIT) -- Faculty of Engineering, the Built Environment, and Technology, School of Information Technology, 2024
- Full Text:
- Date Issued: 2024-04
Natural Language Processing with machine learning for anomaly detection on system call logs
- Authors: Goosen, Christo
- Date: 2023-10-13
- Subjects: Natural language processing (Computer science) , Machine learning , Information security , Anomaly detection (Computer security) , Host-based intrusion detection system
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/424699 , vital:72176
- Description: Host intrusion detection systems and machine learning have been studied for many years especially on datasets like KDD99. Current research and systems are focused on low training and processing complex problems such as system call returns, which lack the system call arguments and potential traces of exploits run against a system. With respect to malware and vulnerabilities, signatures are relied upon, and the potential for natural language processing of the resulting logs and system call traces needs further experimentation. This research looks at unstructured raw system call traces from x86_64 bit GNU Linux operating systems with natural language processing and supervised and unsupervised machine learning techniques to identify current and unseen threats. The research explores whether these tools are within the skill set of information security professionals, or require data science professionals. The research makes use of an academic and modern system call dataset from Leipzig University and applies two machine learning models based on decision trees. Random Forest as the supervised algorithm is compared to the unsupervised Isolation Forest algorithm for this research, with each experiment repeated after hyper-parameter tuning. The research finds conclusive evidence that the Isolation Forest Tree algorithm is effective, when paired with a Principal Component Analysis, in identifying anomalies in the modern Leipzig Intrusion Detection Data Set (LID-DS) dataset combined with samples of executed malware from the Virus Total Academic dataset. The base or default model parameters produce sub-optimal results, whereas using a hyper-parameter tuning technique increases the accuracy to within promising levels for anomaly and potential zero day detection. , Thesis (MSc) -- Faculty of Science, Computer Science, 2023
- Full Text:
- Date Issued: 2023-10-13
- Authors: Goosen, Christo
- Date: 2023-10-13
- Subjects: Natural language processing (Computer science) , Machine learning , Information security , Anomaly detection (Computer security) , Host-based intrusion detection system
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/424699 , vital:72176
- Description: Host intrusion detection systems and machine learning have been studied for many years especially on datasets like KDD99. Current research and systems are focused on low training and processing complex problems such as system call returns, which lack the system call arguments and potential traces of exploits run against a system. With respect to malware and vulnerabilities, signatures are relied upon, and the potential for natural language processing of the resulting logs and system call traces needs further experimentation. This research looks at unstructured raw system call traces from x86_64 bit GNU Linux operating systems with natural language processing and supervised and unsupervised machine learning techniques to identify current and unseen threats. The research explores whether these tools are within the skill set of information security professionals, or require data science professionals. The research makes use of an academic and modern system call dataset from Leipzig University and applies two machine learning models based on decision trees. Random Forest as the supervised algorithm is compared to the unsupervised Isolation Forest algorithm for this research, with each experiment repeated after hyper-parameter tuning. The research finds conclusive evidence that the Isolation Forest Tree algorithm is effective, when paired with a Principal Component Analysis, in identifying anomalies in the modern Leipzig Intrusion Detection Data Set (LID-DS) dataset combined with samples of executed malware from the Virus Total Academic dataset. The base or default model parameters produce sub-optimal results, whereas using a hyper-parameter tuning technique increases the accuracy to within promising levels for anomaly and potential zero day detection. , Thesis (MSc) -- Faculty of Science, Computer Science, 2023
- Full Text:
- Date Issued: 2023-10-13
Selected medicinal plants leaves identification: a computer vision approach
- Authors: Deyi, Avuya
- Date: 2023-10-13
- Subjects: Deep learning (Machine learning) , Machine learning , Convolutional neural network , Computer vision in medicine , Medicinal plants
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/424552 , vital:72163
- Description: Identifying and classifying medicinal plants are valuable and essential skills during drug manufacturing because several active pharmaceutical ingredients (API) are sourced from medicinal plants. For many years, identifying and classifying medicinal plants have been exclusively done by experts in the domain, such as botanists, and herbarium curators. Recently, powerful computer vision technologies, using machine learning and deep convolutional neural networks, have been developed for classifying or identifying objects on images. A convolutional neural network is a deep learning architecture that outperforms previous advanced approaches in image classification and object detection based on its efficient features extraction on images. In this thesis, we investigate different convolutional neural networks and machine learning algorithms for identifying and classifying leaves of three species of the genus Brachylaena. The three species considered are Brachylaena discolor, Brachylaena ilicifolia and Brachylaena elliptica. All three species are used medicinally by people in South Africa to treat diseases like diabetes. From 1259 labelled images of those plants species (at least 400 for each species) split into training, evaluation and test sets, we trained and evaluated different deep convolutional neural networks and machine learning models. The VGG model achieved the best results with 98.26% accuracy from cross-validation. , Thesis (MSc) -- Faculty of Science, Mathematics, 2023
- Full Text:
- Date Issued: 2023-10-13
- Authors: Deyi, Avuya
- Date: 2023-10-13
- Subjects: Deep learning (Machine learning) , Machine learning , Convolutional neural network , Computer vision in medicine , Medicinal plants
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/424552 , vital:72163
- Description: Identifying and classifying medicinal plants are valuable and essential skills during drug manufacturing because several active pharmaceutical ingredients (API) are sourced from medicinal plants. For many years, identifying and classifying medicinal plants have been exclusively done by experts in the domain, such as botanists, and herbarium curators. Recently, powerful computer vision technologies, using machine learning and deep convolutional neural networks, have been developed for classifying or identifying objects on images. A convolutional neural network is a deep learning architecture that outperforms previous advanced approaches in image classification and object detection based on its efficient features extraction on images. In this thesis, we investigate different convolutional neural networks and machine learning algorithms for identifying and classifying leaves of three species of the genus Brachylaena. The three species considered are Brachylaena discolor, Brachylaena ilicifolia and Brachylaena elliptica. All three species are used medicinally by people in South Africa to treat diseases like diabetes. From 1259 labelled images of those plants species (at least 400 for each species) split into training, evaluation and test sets, we trained and evaluated different deep convolutional neural networks and machine learning models. The VGG model achieved the best results with 98.26% accuracy from cross-validation. , Thesis (MSc) -- Faculty of Science, Mathematics, 2023
- Full Text:
- Date Issued: 2023-10-13
A systematic methodology to evaluating optimised machine learning based network intrusion detection systems
- Authors: Chindove, Hatitye Ethridge
- Date: 2022-10-14
- Subjects: Intrusion detection systems (Computer security) , Machine learning , Computer networks Security measures , Principal components analysis
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/362774 , vital:65361
- Description: A network intrusion detection system (NIDS) is essential for mitigating computer network attacks in various scenarios. However, the increasing complexity of computer networks and attacks makes classifying unseen or novel network traffic challenging. Supervised machine learning techniques (ML) used in a NIDS can be affected by different scenarios. Thus, dataset recency, size, and applicability are essential factors when selecting and tuning a machine learning classifier. This thesis explores developing and optimising several supervised ML algorithms with relatively new datasets constructed to depict real-world scenarios. The methodology includes empirical analyses of systematic ML-based NIDS for a near real-world network system to improve intrusion detection. The thesis is experimental heavy for model assessment. Data preparation methods are explored, followed by feature engineering techniques. The model evaluation process involves three experiments testing against a validation, un-trained, and retrained set. They compare several traditional machine learning and deep learning classifiers to identify the best NIDS model. Results show that the focus on feature scaling, feature selection methods and ML algo- rithm hyper-parameter tuning per model is an essential optimisation component. Distance based ML algorithm performed much better with quantile transformation whilst the tree based algorithms performed better without scaling. Permutation importance performs as a feature selection method compared to feature extraction using Principal Component Analysis (PCA) when applied against all ML algorithms explored. Random forests, Sup- port Vector Machines and recurrent neural networks consistently achieved the best results with high macro f1-score results of 90% 81% and 73% for the CICIDS 2017 dataset; and 72% 68% and 73% against the CICIDS 2018 dataset. , Thesis (MSc) -- Faculty of Science, Computer Science, 2022
- Full Text:
- Date Issued: 2022-10-14
- Authors: Chindove, Hatitye Ethridge
- Date: 2022-10-14
- Subjects: Intrusion detection systems (Computer security) , Machine learning , Computer networks Security measures , Principal components analysis
- Language: English
- Type: Academic theses , Master's theses , text
- Identifier: http://hdl.handle.net/10962/362774 , vital:65361
- Description: A network intrusion detection system (NIDS) is essential for mitigating computer network attacks in various scenarios. However, the increasing complexity of computer networks and attacks makes classifying unseen or novel network traffic challenging. Supervised machine learning techniques (ML) used in a NIDS can be affected by different scenarios. Thus, dataset recency, size, and applicability are essential factors when selecting and tuning a machine learning classifier. This thesis explores developing and optimising several supervised ML algorithms with relatively new datasets constructed to depict real-world scenarios. The methodology includes empirical analyses of systematic ML-based NIDS for a near real-world network system to improve intrusion detection. The thesis is experimental heavy for model assessment. Data preparation methods are explored, followed by feature engineering techniques. The model evaluation process involves three experiments testing against a validation, un-trained, and retrained set. They compare several traditional machine learning and deep learning classifiers to identify the best NIDS model. Results show that the focus on feature scaling, feature selection methods and ML algo- rithm hyper-parameter tuning per model is an essential optimisation component. Distance based ML algorithm performed much better with quantile transformation whilst the tree based algorithms performed better without scaling. Permutation importance performs as a feature selection method compared to feature extraction using Principal Component Analysis (PCA) when applied against all ML algorithms explored. Random forests, Sup- port Vector Machines and recurrent neural networks consistently achieved the best results with high macro f1-score results of 90% 81% and 73% for the CICIDS 2017 dataset; and 72% 68% and 73% against the CICIDS 2018 dataset. , Thesis (MSc) -- Faculty of Science, Computer Science, 2022
- Full Text:
- Date Issued: 2022-10-14
A multispectral and machine learning approach to early stress classification in plants
- Authors: Poole, Louise Carmen
- Date: 2022-04-06
- Subjects: Machine learning , Neural networks (Computer science) , Multispectral imaging , Image processing , Plant stress detection
- Language: English
- Type: Master's thesis , text
- Identifier: http://hdl.handle.net/10962/232410 , vital:49989
- Description: Crop loss and failure can impact both a country’s economy and food security, often to devastating effects. As such, the importance of successfully detecting plant stresses early in their development is essential to minimize spread and damage to crop production. Identification of the stress and the stress-causing agent is the most critical and challenging step in plant and crop protection. With the development of and increase in ease of access to new equipment and technology in recent years, the use of spectroscopy in the early detection of plant diseases has become notably popular. This thesis narrows down the most suitable multispectral imaging techniques and machine learning algorithms for early stress detection. Datasets were collected of visible images and multispectral images. Dehydration was selected as the plant stress type for the main experiments, and data was collected from six plant species typically used in agriculture. Key contributions of this thesis include multispectral and visible datasets showing plant dehydration as well as a separate preliminary dataset on plant disease. Promising results on dehydration showed statistically significant accuracy improvements in the multispectral imaging compared to visible imaging for early stress detection, with multispectral input obtaining a 92.50% accuracy over visible input’s 77.50% on general plant species. The system was effective at stress detection on known plant species, with multispectral imaging introducing greater improvement to early stress detection than advanced stress detection. Furthermore, strong species discrimination was achieved when exclusively testing either early or advanced dehydration against healthy species. , Thesis (MSc) -- Faculty of Science, Ichthyology & Fisheries Sciences, 2022
- Full Text:
- Date Issued: 2022-04-06
- Authors: Poole, Louise Carmen
- Date: 2022-04-06
- Subjects: Machine learning , Neural networks (Computer science) , Multispectral imaging , Image processing , Plant stress detection
- Language: English
- Type: Master's thesis , text
- Identifier: http://hdl.handle.net/10962/232410 , vital:49989
- Description: Crop loss and failure can impact both a country’s economy and food security, often to devastating effects. As such, the importance of successfully detecting plant stresses early in their development is essential to minimize spread and damage to crop production. Identification of the stress and the stress-causing agent is the most critical and challenging step in plant and crop protection. With the development of and increase in ease of access to new equipment and technology in recent years, the use of spectroscopy in the early detection of plant diseases has become notably popular. This thesis narrows down the most suitable multispectral imaging techniques and machine learning algorithms for early stress detection. Datasets were collected of visible images and multispectral images. Dehydration was selected as the plant stress type for the main experiments, and data was collected from six plant species typically used in agriculture. Key contributions of this thesis include multispectral and visible datasets showing plant dehydration as well as a separate preliminary dataset on plant disease. Promising results on dehydration showed statistically significant accuracy improvements in the multispectral imaging compared to visible imaging for early stress detection, with multispectral input obtaining a 92.50% accuracy over visible input’s 77.50% on general plant species. The system was effective at stress detection on known plant species, with multispectral imaging introducing greater improvement to early stress detection than advanced stress detection. Furthermore, strong species discrimination was achieved when exclusively testing either early or advanced dehydration against healthy species. , Thesis (MSc) -- Faculty of Science, Ichthyology & Fisheries Sciences, 2022
- Full Text:
- Date Issued: 2022-04-06
Statistical and Mathematical Learning: an application to fraud detection and prevention
- Authors: Hamlomo, Sisipho
- Date: 2022-04-06
- Subjects: Credit card fraud , Bootstrap (Statistics) , Support vector machines , Neural networks (Computer science) , Decision trees , Machine learning , Cross-validation , Imbalanced data
- Language: English
- Type: Master's thesis , text
- Identifier: http://hdl.handle.net/10962/233795 , vital:50128
- Description: Credit card fraud is an ever-growing problem. There has been a rapid increase in the rate of fraudulent activities in recent years resulting in a considerable loss to several organizations, companies, and government agencies. Many researchers have focused on detecting fraudulent behaviours early using advanced machine learning techniques. However, credit card fraud detection is not a straightforward task since fraudulent behaviours usually differ for each attempt and the dataset is highly imbalanced, that is, the frequency of non-fraudulent cases outnumbers the frequency of fraudulent cases. In the case of the European credit card dataset, we have a ratio of approximately one fraudulent case to five hundred and seventy-eight non-fraudulent cases. Different methods were implemented to overcome this problem, namely random undersampling, one-sided sampling, SMOTE combined with Tomek links and parameter tuning. Predictive classifiers, namely logistic regression, decision trees, k-nearest neighbour, support vector machine and multilayer perceptrons, are applied to predict if a transaction is fraudulent or non-fraudulent. The model's performance is evaluated based on recall, precision, F1-score, the area under receiver operating characteristics curve, geometric mean and Matthew correlation coefficient. The results showed that the logistic regression classifier performed better than other classifiers except when the dataset was oversampled. , Thesis (MSc) -- Faculty of Science, Statistics, 2022
- Full Text:
- Date Issued: 2022-04-06
- Authors: Hamlomo, Sisipho
- Date: 2022-04-06
- Subjects: Credit card fraud , Bootstrap (Statistics) , Support vector machines , Neural networks (Computer science) , Decision trees , Machine learning , Cross-validation , Imbalanced data
- Language: English
- Type: Master's thesis , text
- Identifier: http://hdl.handle.net/10962/233795 , vital:50128
- Description: Credit card fraud is an ever-growing problem. There has been a rapid increase in the rate of fraudulent activities in recent years resulting in a considerable loss to several organizations, companies, and government agencies. Many researchers have focused on detecting fraudulent behaviours early using advanced machine learning techniques. However, credit card fraud detection is not a straightforward task since fraudulent behaviours usually differ for each attempt and the dataset is highly imbalanced, that is, the frequency of non-fraudulent cases outnumbers the frequency of fraudulent cases. In the case of the European credit card dataset, we have a ratio of approximately one fraudulent case to five hundred and seventy-eight non-fraudulent cases. Different methods were implemented to overcome this problem, namely random undersampling, one-sided sampling, SMOTE combined with Tomek links and parameter tuning. Predictive classifiers, namely logistic regression, decision trees, k-nearest neighbour, support vector machine and multilayer perceptrons, are applied to predict if a transaction is fraudulent or non-fraudulent. The model's performance is evaluated based on recall, precision, F1-score, the area under receiver operating characteristics curve, geometric mean and Matthew correlation coefficient. The results showed that the logistic regression classifier performed better than other classifiers except when the dataset was oversampled. , Thesis (MSc) -- Faculty of Science, Statistics, 2022
- Full Text:
- Date Issued: 2022-04-06
A model for recommending related research papers: A natural language processing approach
- Authors: Van Heerden, Juandre Anton
- Date: 2022-04
- Subjects: Machine learning , Artificial intelligence
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/55668 , vital:53405
- Description: The volume of information generated lately has led to information overload, which has impacted researchers’ decision-making capabilities. Researchers have access to a variety of digital libraries to retrieve information. Digital libraries often offer access to a number of journal articles and books. Although digital libraries have search mechanisms it still takes much time to find related research papers. The main aim of this study was to develop a model that uses machine learning techniques to recommend related research papers. The conceptual model was informed by literature on recommender systems in other domains. Furthermore, a literature survey on machine learning techniques helped to identify candidate techniques that could be used. The model comprises four phases. These phases are completed twice, the first time for learning from the data and the second time when a recommendation is sought. The four phases are: (1) identify and remove stopwords, (2) stemming the data, (3) identify the topics for the model, and (4) measuring similarity between documents. The model is implemented and demonstrated using a prototype to recommend research papers using a natural language processing approach. The prototype underwent three iterations. The first iteration focused on understanding the problem domain by exploring how recommender systems and related techniques work. The second iteration focused on pre-processing techniques, topic modeling and similarity measures of two probability distributions. The third iteration focused on refining the prototype, and documenting the lessons learned throughout the process. Practical lessons were learned while finalising the model and constructing the prototype. These practical lessons should help to identify opportunities for future research. , Thesis (MIT) -- Faculty of Engineering the Built Environment and Technology, Information Technology, 2022
- Full Text:
- Date Issued: 2022-04
- Authors: Van Heerden, Juandre Anton
- Date: 2022-04
- Subjects: Machine learning , Artificial intelligence
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/55668 , vital:53405
- Description: The volume of information generated lately has led to information overload, which has impacted researchers’ decision-making capabilities. Researchers have access to a variety of digital libraries to retrieve information. Digital libraries often offer access to a number of journal articles and books. Although digital libraries have search mechanisms it still takes much time to find related research papers. The main aim of this study was to develop a model that uses machine learning techniques to recommend related research papers. The conceptual model was informed by literature on recommender systems in other domains. Furthermore, a literature survey on machine learning techniques helped to identify candidate techniques that could be used. The model comprises four phases. These phases are completed twice, the first time for learning from the data and the second time when a recommendation is sought. The four phases are: (1) identify and remove stopwords, (2) stemming the data, (3) identify the topics for the model, and (4) measuring similarity between documents. The model is implemented and demonstrated using a prototype to recommend research papers using a natural language processing approach. The prototype underwent three iterations. The first iteration focused on understanding the problem domain by exploring how recommender systems and related techniques work. The second iteration focused on pre-processing techniques, topic modeling and similarity measures of two probability distributions. The third iteration focused on refining the prototype, and documenting the lessons learned throughout the process. Practical lessons were learned while finalising the model and constructing the prototype. These practical lessons should help to identify opportunities for future research. , Thesis (MIT) -- Faculty of Engineering the Built Environment and Technology, Information Technology, 2022
- Full Text:
- Date Issued: 2022-04
Applying insights from machine learning towards guidelines for the detection of text-based fake news
- Authors: Ngada, Okuhle
- Date: 2021-12
- Subjects: Machine learning , Fake News
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/60243 , vital:64141
- Description: Web-based technologies have fostered an online environment where information can be disseminated in a fast and cost-effective manner whilst targeting large and diverse audiences. Unfortunately, the rise and evolution of web-based technologies have also created an environment where false information, commonly referred to as “fake news”, spreads rapidly. The effects of this spread can be catastrophic. Finding solutions to the problem of fake news is complicated for a myriad of reasons, such as: what is defined as fake news, the lack of quality datasets available to researchers, the topics covered in such data, and the fact that datasets exist in a variety of languages. The effects of false information dissemination can result in reputational damage, financial damage to affected brands, and ultimately, misinformed online news readers who can make misinformed decisions. The objective of the study is to propose a set of guidelines that can be used by other system developers to implement misinformation detection tools and systems. The guidelines are constructed using findings from the experimentation phase of the project and information uncovered in the literature review conducted as part of the study. A selection of machine and deep learning approaches are examined to test the applicability of cues that could separate fake online articles from real online news articles. Key performance metrics such as precision, recall, accuracy, F1-score, and ROC are used to measure the performance of the selected machine learning and deep learning models. To demonstrate the practicality of the guidelines and allow for reproducibility of the research, each guideline provides background information relating to the identified problem, a solution to the problem through pseudocode, code excerpts using the Python programming language, and points of consideration that may assist with the implementation. , Thesis (MA) --Faculty of Engineering, the Built Environment, and Technology, 2021
- Full Text:
- Date Issued: 2021-12
Applying insights from machine learning towards guidelines for the detection of text-based fake news
- Authors: Ngada, Okuhle
- Date: 2021-12
- Subjects: Machine learning , Fake News
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/60243 , vital:64141
- Description: Web-based technologies have fostered an online environment where information can be disseminated in a fast and cost-effective manner whilst targeting large and diverse audiences. Unfortunately, the rise and evolution of web-based technologies have also created an environment where false information, commonly referred to as “fake news”, spreads rapidly. The effects of this spread can be catastrophic. Finding solutions to the problem of fake news is complicated for a myriad of reasons, such as: what is defined as fake news, the lack of quality datasets available to researchers, the topics covered in such data, and the fact that datasets exist in a variety of languages. The effects of false information dissemination can result in reputational damage, financial damage to affected brands, and ultimately, misinformed online news readers who can make misinformed decisions. The objective of the study is to propose a set of guidelines that can be used by other system developers to implement misinformation detection tools and systems. The guidelines are constructed using findings from the experimentation phase of the project and information uncovered in the literature review conducted as part of the study. A selection of machine and deep learning approaches are examined to test the applicability of cues that could separate fake online articles from real online news articles. Key performance metrics such as precision, recall, accuracy, F1-score, and ROC are used to measure the performance of the selected machine learning and deep learning models. To demonstrate the practicality of the guidelines and allow for reproducibility of the research, each guideline provides background information relating to the identified problem, a solution to the problem through pseudocode, code excerpts using the Python programming language, and points of consideration that may assist with the implementation. , Thesis (MA) --Faculty of Engineering, the Built Environment, and Technology, 2021
- Full Text:
- Date Issued: 2021-12
Application of machine learning, molecular modelling and structural data mining against antiretroviral drug resistance in HIV-1
- Sheik Amamuddy, Olivier Serge André
- Authors: Sheik Amamuddy, Olivier Serge André
- Date: 2020
- Subjects: Machine learning , Molecules -- Models , Data mining , Neural networks (Computer science) , Antiretroviral agents , Protease inhibitors , Drug resistance , Multidrug resistance , Molecular dynamics , Renin-angiotensin system , HIV (Viruses) -- South Africa , HIV (Viruses) -- Social aspects -- South Africa , South African Natural Compounds Database
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115964 , vital:34282
- Description: Millions are affected with the Human Immunodeficiency Virus (HIV) world wide, even though the death toll is on the decline. Antiretrovirals (ARVs), more specifically protease inhibitors have shown tremendous success since their introduction into therapy since the mid 1990’s by slowing down progression to the Acquired Immune Deficiency Syndrome (AIDS). However, Drug Resistance Mutations (DRMs) are constantly selected for due to viral adaptation, making drugs less effective over time. The current challenge is to manage the infection optimally with a limited set of drugs, with differing associated levels of toxicities in the face of a virus that (1) exists as a quasispecies, (2) may transmit acquired DRMs to drug-naive individuals and (3) that can manifest class-wide resistance due to similarities in design. The presence of latent reservoirs, unawareness of infection status, education and various socio-economic factors make the problem even more complex. Adequate timing and choice of drug prescription together with treatment adherence are very important as drug toxicities, drug failure and sub-optimal treatment regimens leave room for further development of drug resistance. While CD4 cell count and the determination of viral load from patients in resource-limited settings are very helpful to track how well a patient’s immune system is able to keep the virus in check, they can be lengthy in determining whether an ARV is effective. Phenosense assay kits answer this problem using viruses engineered to contain the patient sequences and evaluating their growth in the presence of different ARVs, but this can be expensive and too involved for routine checks. As a cheaper and faster alternative, genotypic assays provide similar information from HIV pol sequences obtained from blood samples, inferring ARV efficacy on the basis of drug resistance mutation patterns. However, these are inherently complex and the various methods of in silico prediction, such as Geno2pheno, REGA and Stanford HIVdb do not always agree in every case, even though this gap decreases as the list of resistance mutations is updated. A major gap in HIV treatment is that the information used for predicting drug resistance is mainly computed from data containing an overwhelming majority of B subtype HIV, when these only comprise about 12% of the worldwide HIV infections. In addition to growing evidence that drug resistance is subtype-related, it is intuitive to hypothesize that as subtyping is a phylogenetic classification, the more divergent a subtype is from the strains used in training prediction models, the less their resistance profiles would correlate. For the aforementioned reasons, we used a multi-faceted approach to attack the virus in multiple ways. This research aimed to (1) improve resistance prediction methods by focusing solely on the available subtype, (2) mine structural information pertaining to resistance in order to find any exploitable weak points and increase knowledge of the mechanistic processes of drug resistance in HIV protease. Finally, (3) we screen for protease inhibitors amongst a database of natural compounds [the South African natural compound database (SANCDB)] to find molecules or molecular properties usable to come up with improved inhibition against the drug target. In this work, structural information was mined using the Anisotropic Network Model, Dynamics Cross-Correlation, Perturbation Response Scanning, residue contact network analysis and the radius of gyration. These methods failed to give any resistance-associated patterns in terms of natural movement, internal correlated motions, residue perturbation response, relational behaviour and global compaction respectively. Applications of drug docking, homology-modelling and energy minimization for generating features suitable for machine-learning were not very promising, and rather suggest that the value of binding energies by themselves from Vina may not be very reliable quantitatively. All these failures lead to a refinement that resulted in a highly sensitive statistically-guided network construction and analysis, which leads to key findings in the early dynamics associated with resistance across all PI drugs. The latter experiment unravelled a conserved lateral expansion motion occurring at the flap elbows, and an associated contraction that drives the base of the dimerization domain towards the catalytic site’s floor in the case of drug resistance. Interestingly, we found that despite the conserved movement, bond angles were degenerate. Alongside, 16 Artificial Neural Network models were optimised for HIV proteases and reverse transcriptase inhibitors, with performances on par with Stanford HIVdb. Finally, we prioritised 9 compounds with potential protease inhibitory activity using virtual screening and molecular dynamics (MD) to additionally suggest a promising modification to one of the compounds. This yielded another molecule inhibiting equally well both opened and closed receptor target conformations, whereby each of the compounds had been selected against an array of multi-drug-resistant receptor variants. While a main hurdle was a lack of non-B subtype data, our findings, especially from the statistically-guided network analysis, may extrapolate to a certain extent to them as the level of conservation was very high within subtype B, despite all the present variations. This network construction method lays down a sensitive approach for analysing a pair of alternate phenotypes for which complex patterns prevail, given a sufficient number of experimental units. During the course of research a weighted contact mapping tool was developed to compare renin-angiotensinogen variants and packaged as part of the MD-TASK tool suite. Finally the functionality, compatibility and performance of the MODE-TASK tool were evaluated and confirmed for both Python2.7.x and Python3.x, for the analysis of normals modes from single protein structures and essential modes from MD trajectories. These techniques and tools collectively add onto the conventional means of MD analysis.
- Full Text:
- Date Issued: 2020
- Authors: Sheik Amamuddy, Olivier Serge André
- Date: 2020
- Subjects: Machine learning , Molecules -- Models , Data mining , Neural networks (Computer science) , Antiretroviral agents , Protease inhibitors , Drug resistance , Multidrug resistance , Molecular dynamics , Renin-angiotensin system , HIV (Viruses) -- South Africa , HIV (Viruses) -- Social aspects -- South Africa , South African Natural Compounds Database
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/115964 , vital:34282
- Description: Millions are affected with the Human Immunodeficiency Virus (HIV) world wide, even though the death toll is on the decline. Antiretrovirals (ARVs), more specifically protease inhibitors have shown tremendous success since their introduction into therapy since the mid 1990’s by slowing down progression to the Acquired Immune Deficiency Syndrome (AIDS). However, Drug Resistance Mutations (DRMs) are constantly selected for due to viral adaptation, making drugs less effective over time. The current challenge is to manage the infection optimally with a limited set of drugs, with differing associated levels of toxicities in the face of a virus that (1) exists as a quasispecies, (2) may transmit acquired DRMs to drug-naive individuals and (3) that can manifest class-wide resistance due to similarities in design. The presence of latent reservoirs, unawareness of infection status, education and various socio-economic factors make the problem even more complex. Adequate timing and choice of drug prescription together with treatment adherence are very important as drug toxicities, drug failure and sub-optimal treatment regimens leave room for further development of drug resistance. While CD4 cell count and the determination of viral load from patients in resource-limited settings are very helpful to track how well a patient’s immune system is able to keep the virus in check, they can be lengthy in determining whether an ARV is effective. Phenosense assay kits answer this problem using viruses engineered to contain the patient sequences and evaluating their growth in the presence of different ARVs, but this can be expensive and too involved for routine checks. As a cheaper and faster alternative, genotypic assays provide similar information from HIV pol sequences obtained from blood samples, inferring ARV efficacy on the basis of drug resistance mutation patterns. However, these are inherently complex and the various methods of in silico prediction, such as Geno2pheno, REGA and Stanford HIVdb do not always agree in every case, even though this gap decreases as the list of resistance mutations is updated. A major gap in HIV treatment is that the information used for predicting drug resistance is mainly computed from data containing an overwhelming majority of B subtype HIV, when these only comprise about 12% of the worldwide HIV infections. In addition to growing evidence that drug resistance is subtype-related, it is intuitive to hypothesize that as subtyping is a phylogenetic classification, the more divergent a subtype is from the strains used in training prediction models, the less their resistance profiles would correlate. For the aforementioned reasons, we used a multi-faceted approach to attack the virus in multiple ways. This research aimed to (1) improve resistance prediction methods by focusing solely on the available subtype, (2) mine structural information pertaining to resistance in order to find any exploitable weak points and increase knowledge of the mechanistic processes of drug resistance in HIV protease. Finally, (3) we screen for protease inhibitors amongst a database of natural compounds [the South African natural compound database (SANCDB)] to find molecules or molecular properties usable to come up with improved inhibition against the drug target. In this work, structural information was mined using the Anisotropic Network Model, Dynamics Cross-Correlation, Perturbation Response Scanning, residue contact network analysis and the radius of gyration. These methods failed to give any resistance-associated patterns in terms of natural movement, internal correlated motions, residue perturbation response, relational behaviour and global compaction respectively. Applications of drug docking, homology-modelling and energy minimization for generating features suitable for machine-learning were not very promising, and rather suggest that the value of binding energies by themselves from Vina may not be very reliable quantitatively. All these failures lead to a refinement that resulted in a highly sensitive statistically-guided network construction and analysis, which leads to key findings in the early dynamics associated with resistance across all PI drugs. The latter experiment unravelled a conserved lateral expansion motion occurring at the flap elbows, and an associated contraction that drives the base of the dimerization domain towards the catalytic site’s floor in the case of drug resistance. Interestingly, we found that despite the conserved movement, bond angles were degenerate. Alongside, 16 Artificial Neural Network models were optimised for HIV proteases and reverse transcriptase inhibitors, with performances on par with Stanford HIVdb. Finally, we prioritised 9 compounds with potential protease inhibitory activity using virtual screening and molecular dynamics (MD) to additionally suggest a promising modification to one of the compounds. This yielded another molecule inhibiting equally well both opened and closed receptor target conformations, whereby each of the compounds had been selected against an array of multi-drug-resistant receptor variants. While a main hurdle was a lack of non-B subtype data, our findings, especially from the statistically-guided network analysis, may extrapolate to a certain extent to them as the level of conservation was very high within subtype B, despite all the present variations. This network construction method lays down a sensitive approach for analysing a pair of alternate phenotypes for which complex patterns prevail, given a sufficient number of experimental units. During the course of research a weighted contact mapping tool was developed to compare renin-angiotensinogen variants and packaged as part of the MD-TASK tool suite. Finally the functionality, compatibility and performance of the MODE-TASK tool were evaluated and confirmed for both Python2.7.x and Python3.x, for the analysis of normals modes from single protein structures and essential modes from MD trajectories. These techniques and tools collectively add onto the conventional means of MD analysis.
- Full Text:
- Date Issued: 2020
Guidelines for the use of machine learning to predict student project group academic performance
- Authors: Evezard, Ryan
- Date: 2020
- Subjects: Academic achievement , Machine learning
- Language: English
- Type: Thesis , Masters , MIT
- Identifier: http://hdl.handle.net/10948/46042 , vital:39476
- Description: Education plays a crucial role in the growth and development of a country. However, in South Africa, there is a limited capacity and an increasing demand of students seeking an education. In an attempt to address this demand, universities are pressured into accepting more students to increase their throughput. This pressure leads to educators having less time to give students individual attention. This study aims to address this problem by demonstrating how machine learning can be used to predict student group academic performance so that educators may allocate more resources and attention to students and groups at risk. The study focused on data obtained from the third-year capstone project for the diploma in Information Technology at the Nelson Mandela University. Learning analytics and educational data mining and their processes were discussed with an in-depth look at the machine learning techniques involved therein. Artificial neural networks, decision trees and naïve Bayes classifiers were proposed and motivated for prediction modelling. An experiment was performed resulting in proposed guidelines, which give insight and recommendations for the use of machine learning to predict student group academic performance.
- Full Text:
- Date Issued: 2020
- Authors: Evezard, Ryan
- Date: 2020
- Subjects: Academic achievement , Machine learning
- Language: English
- Type: Thesis , Masters , MIT
- Identifier: http://hdl.handle.net/10948/46042 , vital:39476
- Description: Education plays a crucial role in the growth and development of a country. However, in South Africa, there is a limited capacity and an increasing demand of students seeking an education. In an attempt to address this demand, universities are pressured into accepting more students to increase their throughput. This pressure leads to educators having less time to give students individual attention. This study aims to address this problem by demonstrating how machine learning can be used to predict student group academic performance so that educators may allocate more resources and attention to students and groups at risk. The study focused on data obtained from the third-year capstone project for the diploma in Information Technology at the Nelson Mandela University. Learning analytics and educational data mining and their processes were discussed with an in-depth look at the machine learning techniques involved therein. Artificial neural networks, decision trees and naïve Bayes classifiers were proposed and motivated for prediction modelling. An experiment was performed resulting in proposed guidelines, which give insight and recommendations for the use of machine learning to predict student group academic performance.
- Full Text:
- Date Issued: 2020
Technology in conservation: towards a system for in-field drone detection of invasive vegetation
- James, Katherine Margaret Frances
- Authors: James, Katherine Margaret Frances
- Date: 2020
- Subjects: Drone aircraft in remote sensing , Neural networks (Computer science) , Drone aircraft in remote sensing -- Case studies , Machine learning , Computer vision , Environmental monitoring -- Remote sensing , Invasive plants -- Monitoring
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/143408 , vital:38244
- Description: Remote sensing can assist in monitoring the spread of invasive vegetation. The adoption of camera-carrying unmanned aerial vehicles, commonly referred to as drones, as remote sensing tools has yielded images of higher spatial resolution than traditional techniques. Drones also have the potential to interact with the environment through the delivery of bio-control or herbicide, as seen with their adoption in precision agriculture. Unlike in agricultural applications, however, invasive plants do not have a predictable position relative to each other within the environment. To facilitate the adoption of drones as an environmental monitoring and management tool, drones need to be able to intelligently distinguish between invasive and non-invasive vegetation on the fly. In this thesis, we present the augmentation of a commercially available drone with a deep machine learning model to investigate the viability of differentiating between an invasive shrub and other vegetation. As a case study, this was applied to the shrub genus Hakea, originating in Australia and invasive in several countries including South Africa. However, for this research, the methodology is important, rather than the chosen target plant. A dataset was collected using the available drone and manually annotated to facilitate the supervised training of the model. Two approaches were explored, namely, classification and semantic segmentation. For each of these, several models were trained and evaluated to find the optimal one. The chosen model was then interfaced with the drone via an Android application on a mobile device and its performance was preliminarily evaluated in the field. Based on these findings, refinements were made and thereafter a thorough field evaluation was performed to determine the best conditions for model operation. Results from the classification task show that deep learning models are capable of distinguishing between target and other shrubs in ideal candidate windows. However, classification in this manner is restricted by the proposal of such candidate windows. End-to-end image segmentation using deep learning overcomes this problem, classifying the image in a pixel-wise manner. Furthermore, the use of appropriate loss functions was found to improve model performance. Field tests show that illumination and shadow pose challenges to the model, but that good recall can be achieved when the conditions are ideal. False positive detection remains an issue that could be improved. This approach shows the potential for drones as an environmental monitoring and management tool when coupled with deep machine learning techniques and outlines potential problems that may be encountered.
- Full Text:
- Date Issued: 2020
- Authors: James, Katherine Margaret Frances
- Date: 2020
- Subjects: Drone aircraft in remote sensing , Neural networks (Computer science) , Drone aircraft in remote sensing -- Case studies , Machine learning , Computer vision , Environmental monitoring -- Remote sensing , Invasive plants -- Monitoring
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/143408 , vital:38244
- Description: Remote sensing can assist in monitoring the spread of invasive vegetation. The adoption of camera-carrying unmanned aerial vehicles, commonly referred to as drones, as remote sensing tools has yielded images of higher spatial resolution than traditional techniques. Drones also have the potential to interact with the environment through the delivery of bio-control or herbicide, as seen with their adoption in precision agriculture. Unlike in agricultural applications, however, invasive plants do not have a predictable position relative to each other within the environment. To facilitate the adoption of drones as an environmental monitoring and management tool, drones need to be able to intelligently distinguish between invasive and non-invasive vegetation on the fly. In this thesis, we present the augmentation of a commercially available drone with a deep machine learning model to investigate the viability of differentiating between an invasive shrub and other vegetation. As a case study, this was applied to the shrub genus Hakea, originating in Australia and invasive in several countries including South Africa. However, for this research, the methodology is important, rather than the chosen target plant. A dataset was collected using the available drone and manually annotated to facilitate the supervised training of the model. Two approaches were explored, namely, classification and semantic segmentation. For each of these, several models were trained and evaluated to find the optimal one. The chosen model was then interfaced with the drone via an Android application on a mobile device and its performance was preliminarily evaluated in the field. Based on these findings, refinements were made and thereafter a thorough field evaluation was performed to determine the best conditions for model operation. Results from the classification task show that deep learning models are capable of distinguishing between target and other shrubs in ideal candidate windows. However, classification in this manner is restricted by the proposal of such candidate windows. End-to-end image segmentation using deep learning overcomes this problem, classifying the image in a pixel-wise manner. Furthermore, the use of appropriate loss functions was found to improve model performance. Field tests show that illumination and shadow pose challenges to the model, but that good recall can be achieved when the conditions are ideal. False positive detection remains an issue that could be improved. This approach shows the potential for drones as an environmental monitoring and management tool when coupled with deep machine learning techniques and outlines potential problems that may be encountered.
- Full Text:
- Date Issued: 2020
- «
- ‹
- 1
- ›
- »