Using data mining techniques for the prediction of student dropouts from university science programs

Vambe, William Tichaona

Title: Using data mining techniques for the prediction of student dropouts from university science programs
Creator: Vambe, William Tichaona
Subject: Data mining Dropout behavior, Prediction of
Date: 2016
Type: Thesis
Type: Masters
Type: MSc
Identifier: http://hdl.handle.net/10353/12314
Identifier: vital:39252
Description: Data Mining has taken a center stage in education for addressing student dropout challenges as it has become one of the major threat affecting Higher Educational Institutes (HEIs). Being able to predict students who are likely to dropout helps the university to assist those facing challenges early. This will results in producing more graduates with the intellectual capital who will provide skills in the industries, hence addressing the major challenge of skill shortage being faced in South Africa. Studies and researches as purported in literature have been done to address this major threat of dropout challenge by using the theoretical approach which banked on Tinto’s model, followed by the traditional and statistical approach. However, the two lacked accuracy and the automation aspect which makes them difficult and time-consuming to use as they require to be tested periodically for them to be validated. Recently data mining has become a vital tool for predicting non-linear phenomenon including where there is missing data and bringing about accuracy and automation aspect. Data mining usefulness and reliability assessment in education made it possible to be used for prediction by different researchers. As such this research used data mining approach that integrates classification and prediction techniques to analyze student academic data at the University of Fort Hare to create a model for student dropout using preentry data and university academic performance of each student. Following Knowledge Discovery from Database (KDD) framework, data for the students enrolled in the Bachelor of Science programs between 2003 and 2014 was selected. It went through preprocessing and transformation as to deal with the missing data and noise data. Classification algorithms were then used for student characterization. Decision trees (J48) which are found in Weka software were used to build the model for data mining and prediction. The reason for choosing decision trees was it’s ability to deal with textual, nominal and numeric data as was the case with our input data and because they have good precision.The model was then trained using a train data set, validated and evaluated with another data set. Experimental results demonstrations that data mining is useful in predicting students who have chances to drop out. A critical analysis of correctly classifying instances, the confusion matrix and ROC area shows that the model can correctly classify and predict those who are likely to dropout. The model accuracy was 66percent which is a good percentage as supported in literature which means the results produced can be reliably used for assessment and make strategic decisions. Furthermore, the model took a matter of seconds to compute the results when supplied with 400 instances which prove that it is effective and efficient. Grounding our conclusion from these experimental results, this research proved that Data Mining is useful for bringing about automation, accuracy in prediction of student dropouts and the results can be reliably depended on for decision making by faculty managers who are the decision makers.
Format: 148 leaves
Format: pdf
Publisher: University of Fort Hare
Publisher: Faculty of Science and Agriculture
Language: English
Rights: University of Fort Hare

Hits: 875
Visitors: 893
Downloads: 53

Collections

UFH Department of Computer Science

		Thumbnail	File	Description	Size	Format
View Details Download			SOURCE1	MSc (Computer Science) VAMBE, WT.pdf	7 MB	Adobe Acrobat PDF	View Details Download