- Title
- The classification performance of ensemble decision tree classifiers: a case study of detecting fraud in credit card transactions
- Creator
- Chogugudza, Mcdonald
- Subject
- fraud
- Subject
- Commercial fraud
- Subject
- Accounting fraud
- Date
- 2022-11
- Type
- Master's theses
- Type
- text
- Identifier
- http://hdl.handle.net/10353/27590
- Identifier
- vital:69317
- Description
- In this dissertation, we propose ensemble decision tree classifiers as an ideal classification technique for solving the problem of fraud in the domain of credit card transactions. Ensemble tree classifiers have been applied in many areas like speech recognition, image recognition and medical diagnostics and have shown excellent results. At the centre of fraud, credit card fraud has been a major concern. The rise in credit card fraud is largely attributed to the nature in which it can be done. A fraudster does not need to always be physically present to commit fraud making it the number one target for criminals. Card-Not-Present refers to this type of fraud where an electronic transaction can be conducted without the need for a client to be present. This can be done via telephonic calls or the web. To be able to come up with better classifiers it was important for the researcher to first investigate what causes misclassifications in fraud detection systems. A systematic literature review was done to uncover the factors that have been identified as causes of misclassifications. It was discovered that many factors lead to misclassifications and several authors have proposed techniques to handle these factors. However, there is no universal techniques for addressing factors that lead to misclassifications as different domains have different datasets which require different techniques. This study investigates how parameters involved in modelling fraud detection systems impact the classification performance of ensemble decision tree classifiers. The factors that were investigated include sample size, sampling technique, learning method and choice of split criterion and how they affect classification performance. A series of experiments were conducted to investigate how the aforementioned factors contributed to better classifiers. Ecommerce data from Vesta corporation made available on Kaggle was used in the experiments. The data was split into two sets, one for training the models and the other for testing the performance of the models. Accuracy, confusion matrix, precision and recall were used as performance measures. Our results showed that a larger sample size resulted in better classifiers. This is attributed to models having more instances to learn from which covers most patterns of fraudulent transactions. The sampling technique was shown to be pivotal in classification performance as under sampling showed a great reduction in performance as it achieved a maximum accuracy of 89.6223 while oversampling produced increased performance with maximum accuracy of 99.9531. Furthermore, our results showed that the choice of split criterion impacts the performance of ensemble tree classifiers. The use of entropy as the choice of split criterion resulted in better classifiers compared to the use of the Gini index. However, the downside is that entropy requires more time to execute compared to the Gini index. Lastly, the learning method proved to impact the performance of ensemble classifiers. Models that used supervised learning had better performance compared to those that use unsupervised learning in detecting credit card fraud. The conclusions from this research are insightful when designing fraud detection systems that use ensemble decision tree classifiers as base learners.
- Description
- Thesis (Msci) -- Faculty of Science and Agriculture, 2022
- Format
- computer
- Format
- online resource
- Format
- application/pdf
- Format
- 1 online resource (ix, 106 leaves)
- Format
- Publisher
- University of Fort Hare
- Publisher
- Faculty of Science and Agriculture
- Language
- English
- Rights
- University of Fort Hare
- Rights
- All Rights Reserved
- Rights
- Open Access
- Hits: 670
- Visitors: 678
- Downloads: 62
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | SOURCE1 | McDonald Chogugudza Dissertation 2023 (1).pdf | 2 MB | Adobe Acrobat PDF | View Details Download |