Statistical learning methods for photovoltaic energy output prediction
- Authors: Magaya, Aphiwe
- Date: 2024-04
- Subjects: Photovoltaic power generation , Mathematical statistics , Statistics
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/64138 , vital:73656
- Description: Predicting solar energy accurately is important for the integration of more renewable energy into the grid, which can help to alleviate the energy demand on traditional coal-powered sources in South Africa. This study aims to assess several statistical learning models to predict the energy output of a 1MW photovoltaic system installed on the Nelson Mandela University South Campus in Gqeberha. Weather data (including temperature, wind speed, wind direction, precipitation, air pressure, and humidity) and solar irradiance data (including global horizontal radiation, diffuse radiation, and direct radiation) are used to predict the energy output of this system using Artificial Neural Networks (ANN), Support Vector Machines (SVM), Multiple Linear Regression (MLR), and Regression Trees (RT). The performance of each of the models was compared and the results indicated that the ANN model performed best. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
- Authors: Magaya, Aphiwe
- Date: 2024-04
- Subjects: Photovoltaic power generation , Mathematical statistics , Statistics
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/64138 , vital:73656
- Description: Predicting solar energy accurately is important for the integration of more renewable energy into the grid, which can help to alleviate the energy demand on traditional coal-powered sources in South Africa. This study aims to assess several statistical learning models to predict the energy output of a 1MW photovoltaic system installed on the Nelson Mandela University South Campus in Gqeberha. Weather data (including temperature, wind speed, wind direction, precipitation, air pressure, and humidity) and solar irradiance data (including global horizontal radiation, diffuse radiation, and direct radiation) are used to predict the energy output of this system using Artificial Neural Networks (ANN), Support Vector Machines (SVM), Multiple Linear Regression (MLR), and Regression Trees (RT). The performance of each of the models was compared and the results indicated that the ANN model performed best. , Thesis (MSc) -- Faculty of Science, School of Computer Science, Mathematics, Physics and Statistics, 2024
- Full Text:
- Date Issued: 2024-04
Estimating Bayesian tolerance intervals for a two - factor factorial model
- Authors: Besele, Kagiso Francis
- Date: 2021-04
- Subjects: Gqenerha (South Africa) , Eastern Cape (South Africa) , Mathematical statistics
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/52302 , vital:43587
- Description: Quality improvement efforts have become the cornerstone of all manufacturing processes. Quality can be defined in terms of variability reduction, and since variability is a statistical concept, statistical techniques such as statistical quality control present techniques for assessing process variation. Methods such as experimental design provide a way to ascertain factor relationships and give a basis for computing variability that arises from each process variable, ultimately providing a way of calculating total process variability. This in turn results in variance components and eventually variance component estimation. As with any statistical model, estimates may be classified in any one of two ways, point estimates or interval estimates. Interval estimates that provide information about an entire population, and not only information on a few observations from a sample or knowledge about only a population parameter, are known as tolerance intervals. Wolfinger (1998) provided a Bayesian simulationbased approach for ascertaining three types of tolerance intervals using a balanced one-way random effects model. In this study, the method initially proposed by Wolfinger (1998), is extended in order to estimate tolerance intervals for the balanced two-way crossed classification random effects model with interaction. The suggested and derived techniques will be applied to the thermal impedance data initially collected by Houf and Berman (1988), and the method presented by Wolfinger (1998) will be expanded to also include the estimation of tolerance intervals for averages of observations from new or unknown measurements. This Bayesian approach provides a thorough but yet simplistic paradigm to using tolerance intervals in manufacturing settings. , Thesis (MSc) -- Faculty of Science, Statistics, 2021
- Full Text: false
- Date Issued: 2021-04
- Authors: Besele, Kagiso Francis
- Date: 2021-04
- Subjects: Gqenerha (South Africa) , Eastern Cape (South Africa) , Mathematical statistics
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10948/52302 , vital:43587
- Description: Quality improvement efforts have become the cornerstone of all manufacturing processes. Quality can be defined in terms of variability reduction, and since variability is a statistical concept, statistical techniques such as statistical quality control present techniques for assessing process variation. Methods such as experimental design provide a way to ascertain factor relationships and give a basis for computing variability that arises from each process variable, ultimately providing a way of calculating total process variability. This in turn results in variance components and eventually variance component estimation. As with any statistical model, estimates may be classified in any one of two ways, point estimates or interval estimates. Interval estimates that provide information about an entire population, and not only information on a few observations from a sample or knowledge about only a population parameter, are known as tolerance intervals. Wolfinger (1998) provided a Bayesian simulationbased approach for ascertaining three types of tolerance intervals using a balanced one-way random effects model. In this study, the method initially proposed by Wolfinger (1998), is extended in order to estimate tolerance intervals for the balanced two-way crossed classification random effects model with interaction. The suggested and derived techniques will be applied to the thermal impedance data initially collected by Houf and Berman (1988), and the method presented by Wolfinger (1998) will be expanded to also include the estimation of tolerance intervals for averages of observations from new or unknown measurements. This Bayesian approach provides a thorough but yet simplistic paradigm to using tolerance intervals in manufacturing settings. , Thesis (MSc) -- Faculty of Science, Statistics, 2021
- Full Text: false
- Date Issued: 2021-04
A Statistical assessment of available solar resource across multiple sites in South Africa
- Authors: Eastwood, Kirstie
- Date: 2019
- Subjects: Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10948/39907 , vital:35505
- Description: Around the globe, fossil fuels remain the primary source of energy at around 78% of the world’s total energy consumption. However, the associated carbon emissions, environmental impact, the depletion of fossil fuels and price and cost volatility are factors influencing the growing popularity of research into renewable energy. Solar power is acknowledged as the fastest-growing renewable energy, but the uncertainty surrounding the long-term projections of solar irradiance available for energy conversion is a hindrance when discussing the financial risk with potential investors. This study investigates the quality of freely available solar resource data in South Africa as well as proposes techniques for potential solar farm site comparisons. Tolerance intervals derived within a Bayesian framework provide information on the future available solar resource across multiple sites. These techniques capture the inherent variability in the available solar resource which equips investors with statistical methods that lead to the better understanding of the solar resource and thus aids in better decision-making.
- Full Text:
- Date Issued: 2019
- Authors: Eastwood, Kirstie
- Date: 2019
- Subjects: Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10948/39907 , vital:35505
- Description: Around the globe, fossil fuels remain the primary source of energy at around 78% of the world’s total energy consumption. However, the associated carbon emissions, environmental impact, the depletion of fossil fuels and price and cost volatility are factors influencing the growing popularity of research into renewable energy. Solar power is acknowledged as the fastest-growing renewable energy, but the uncertainty surrounding the long-term projections of solar irradiance available for energy conversion is a hindrance when discussing the financial risk with potential investors. This study investigates the quality of freely available solar resource data in South Africa as well as proposes techniques for potential solar farm site comparisons. Tolerance intervals derived within a Bayesian framework provide information on the future available solar resource across multiple sites. These techniques capture the inherent variability in the available solar resource which equips investors with statistical methods that lead to the better understanding of the solar resource and thus aids in better decision-making.
- Full Text:
- Date Issued: 2019
Improved tree species discrimination at leaf level with hyperspectral data combining binary classifiers
- Authors: Dastile, Xolani Collen
- Date: 2011
- Subjects: Mathematical statistics , Analysis of variance , Nearest neighbor analysis (Statistics) , Trees--Classification
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:5567 , http://hdl.handle.net/10962/d1002807 , Mathematical statistics , Analysis of variance , Nearest neighbor analysis (Statistics) , Trees--Classification
- Description: The purpose of the present thesis is to show that hyperspectral data can be used for discrimination between different tree species. The data set used in this study contains the hyperspectral measurements of leaves of seven savannah tree species. The data is high-dimensional and shows large within-class variability combined with small between-class variability which makes discrimination between the classes challenging. We employ two classification methods: G-nearest neighbour and feed-forward neural networks. For both methods, direct 7-class prediction results in high misclassification rates. However, binary classification works better. We constructed binary classifiers for all possible binary classification problems and combine them with Error Correcting Output Codes. We show especially that the use of 1-nearest neighbour binary classifiers results in no improvement compared to a direct 1-nearest neighbour 7-class predictor. In contrast to this negative result, the use of neural networks binary classifiers improves accuracy by 10% compared to a direct neural networks 7-class predictor, and error rates become acceptable. This can be further improved by choosing only suitable binary classifiers for combination.
- Full Text:
- Date Issued: 2011
- Authors: Dastile, Xolani Collen
- Date: 2011
- Subjects: Mathematical statistics , Analysis of variance , Nearest neighbor analysis (Statistics) , Trees--Classification
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:5567 , http://hdl.handle.net/10962/d1002807 , Mathematical statistics , Analysis of variance , Nearest neighbor analysis (Statistics) , Trees--Classification
- Description: The purpose of the present thesis is to show that hyperspectral data can be used for discrimination between different tree species. The data set used in this study contains the hyperspectral measurements of leaves of seven savannah tree species. The data is high-dimensional and shows large within-class variability combined with small between-class variability which makes discrimination between the classes challenging. We employ two classification methods: G-nearest neighbour and feed-forward neural networks. For both methods, direct 7-class prediction results in high misclassification rates. However, binary classification works better. We constructed binary classifiers for all possible binary classification problems and combine them with Error Correcting Output Codes. We show especially that the use of 1-nearest neighbour binary classifiers results in no improvement compared to a direct 1-nearest neighbour 7-class predictor. In contrast to this negative result, the use of neural networks binary classifiers improves accuracy by 10% compared to a direct neural networks 7-class predictor, and error rates become acceptable. This can be further improved by choosing only suitable binary classifiers for combination.
- Full Text:
- Date Issued: 2011
SL-model for paired comparisons
- Authors: Sjölander, Morné Rowan
- Date: 2006
- Subjects: Paired comparisons (Statistics) , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:10574 , http://hdl.handle.net/10948/605 , Paired comparisons (Statistics) , Mathematical statistics
- Description: The method of paired comparisons can be found all the way back to 1860, where Fechner made the first publication in this method, using it for his psychometric investigations [4]. Thurstone formalised the method by providing a mathematical background to it [9-11] and in 1927 the method’s birth took place with his psychometric publications, one being “a law of comparative judgment” [12-14]. The law of comparative judgment is a set of equations relating the proportion of times any stimulus k is judged greater on a given attribute than any other stimulus j to the scales and discriminal dispersions of the two stimuli on the psychological continuum. The amount of research done for discrete models of paired comparisons is not a lot. This study develops a new discrete model, the SL-model for paired comparisons. Paired comparisons data processing in which objects have an upper limit to their scores was also not yet developed, and making such a model is one of the aims of this report. The SLmodel is thus developed in this context; however, the model easily generalises to not necessarily having an upper limit on scores.
- Full Text:
- Date Issued: 2006
- Authors: Sjölander, Morné Rowan
- Date: 2006
- Subjects: Paired comparisons (Statistics) , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:10574 , http://hdl.handle.net/10948/605 , Paired comparisons (Statistics) , Mathematical statistics
- Description: The method of paired comparisons can be found all the way back to 1860, where Fechner made the first publication in this method, using it for his psychometric investigations [4]. Thurstone formalised the method by providing a mathematical background to it [9-11] and in 1927 the method’s birth took place with his psychometric publications, one being “a law of comparative judgment” [12-14]. The law of comparative judgment is a set of equations relating the proportion of times any stimulus k is judged greater on a given attribute than any other stimulus j to the scales and discriminal dispersions of the two stimuli on the psychological continuum. The amount of research done for discrete models of paired comparisons is not a lot. This study develops a new discrete model, the SL-model for paired comparisons. Paired comparisons data processing in which objects have an upper limit to their scores was also not yet developed, and making such a model is one of the aims of this report. The SLmodel is thus developed in this context; however, the model easily generalises to not necessarily having an upper limit on scores.
- Full Text:
- Date Issued: 2006
An evaluation of paired comparison models
- Venter, Daniel Jacobus Lodewyk
- Authors: Venter, Daniel Jacobus Lodewyk
- Date: 2004
- Subjects: Paired comparisons (Statistics) , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:11087 , http://hdl.handle.net/10948/364 , Paired comparisons (Statistics) , Mathematical statistics
- Description: Introduction: A typical task in quantitative data analysis is to derive estimates of population parameters based on sample statistics. For manifest variables this is usually a straightforward process utilising suitable measurement instruments and standard statistics such the mean, median and standard deviation. Latent variables on the other hand are typically more elusive, making it difficult to obtain valid and reliable measurements. One of the most widely used methods of estimating the parameter value of a latent variable is to use a summated score derived from a set of individual scores for each of the various attributes of the latent variable. A serious limitation of this method and other similar methods is that the validity and reliability of measurements depend on whether the statements included in the questionnaire cover all characteristics of the variable being measured and also on respondents’ ability to correctly indicate their perceived assessment of the characteristics on the scale provided. Methods without this limitation and that are especially useful where a set of objects/entities must be ranked based on the parameter values of one or more latent variables, are methods of paired comparisons. Although the underlying assumptions and algorithms of these methods often differ dramatically, they all rely on data derived from a series of comparisons, each consisting of a pair of specimens selected from the set of objects/entities being investigated. Typical examples of the comparison process are: subjects (judges) who have to indicate for each pair of objects which of the two they prefer; sport teams that compete against each other in matches that involve two teams at a time. The resultant data of each comparison range from a simple dichotomy to indicate which of the two objects are preferred/better, to an interval or ratio scale score for e d Bradley-Terry models, and were based on statistical theory assuming that the variable(s) being measured is either normally (Thurstone-Mosteller) or exponentially (Bradley-Terry) distributed. For many years researchers had to rely on these PCM’s when analysing paired comparison data without any idea about the implications if the distribution of the data from which their sample were obtained differed from the assumed distribution for the applicable PCM being utilised. To address this problem, PCM’s were subsequently developed to cater for discrete variables and variables with distributions that are neither normal or exponential. A question that remained unanswered is how the performance, as measured by the accuracy of parameter estimates, of PCM's are affected if they are applied to data from a range of discrete and continuous distribution that violates the assumptions on which the applicable paired comparison algorithm is based. This study is an attempt to answer this question by applying the most popular PCM's to a range of randomly derived data sets that spans typical continuous and discrete data distributions. It is hoped that the results of this study will assist researchers when selecting the most appropriate PCM to obtain accurate estimates of the parameters of the variables in their data sets.
- Full Text:
- Date Issued: 2004
- Authors: Venter, Daniel Jacobus Lodewyk
- Date: 2004
- Subjects: Paired comparisons (Statistics) , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:11087 , http://hdl.handle.net/10948/364 , Paired comparisons (Statistics) , Mathematical statistics
- Description: Introduction: A typical task in quantitative data analysis is to derive estimates of population parameters based on sample statistics. For manifest variables this is usually a straightforward process utilising suitable measurement instruments and standard statistics such the mean, median and standard deviation. Latent variables on the other hand are typically more elusive, making it difficult to obtain valid and reliable measurements. One of the most widely used methods of estimating the parameter value of a latent variable is to use a summated score derived from a set of individual scores for each of the various attributes of the latent variable. A serious limitation of this method and other similar methods is that the validity and reliability of measurements depend on whether the statements included in the questionnaire cover all characteristics of the variable being measured and also on respondents’ ability to correctly indicate their perceived assessment of the characteristics on the scale provided. Methods without this limitation and that are especially useful where a set of objects/entities must be ranked based on the parameter values of one or more latent variables, are methods of paired comparisons. Although the underlying assumptions and algorithms of these methods often differ dramatically, they all rely on data derived from a series of comparisons, each consisting of a pair of specimens selected from the set of objects/entities being investigated. Typical examples of the comparison process are: subjects (judges) who have to indicate for each pair of objects which of the two they prefer; sport teams that compete against each other in matches that involve two teams at a time. The resultant data of each comparison range from a simple dichotomy to indicate which of the two objects are preferred/better, to an interval or ratio scale score for e d Bradley-Terry models, and were based on statistical theory assuming that the variable(s) being measured is either normally (Thurstone-Mosteller) or exponentially (Bradley-Terry) distributed. For many years researchers had to rely on these PCM’s when analysing paired comparison data without any idea about the implications if the distribution of the data from which their sample were obtained differed from the assumed distribution for the applicable PCM being utilised. To address this problem, PCM’s were subsequently developed to cater for discrete variables and variables with distributions that are neither normal or exponential. A question that remained unanswered is how the performance, as measured by the accuracy of parameter estimates, of PCM's are affected if they are applied to data from a range of discrete and continuous distribution that violates the assumptions on which the applicable paired comparison algorithm is based. This study is an attempt to answer this question by applying the most popular PCM's to a range of randomly derived data sets that spans typical continuous and discrete data distributions. It is hoped that the results of this study will assist researchers when selecting the most appropriate PCM to obtain accurate estimates of the parameters of the variables in their data sets.
- Full Text:
- Date Issued: 2004
A linear model for valuating preferences of freshwater inflows into forty selected estuaries along the South African coastline
- Authors: Smith, Melnick Jurgen
- Subjects: Estuaries -- South Africa -- Eastern Cape , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:10581 , http://hdl.handle.net/10948/d1020916
- Description: According to the National Water Act of 1998, an estuary is an enclosed body of water that is either periodically or permanently open to the ocean. Within an estuary, the seawater is diluted to a measurable degree, creating a unique aquatic environment for animals and plants. Estuaries are environmental and economic assets to the population. The health status of our local estuaries, however, is being compromized due to a steady decrease in the freshwater inflow and supply. Tides and climatic conditions do have an impact upon the dynamics of an estuary, but these factors remain relatively constant throughout each year. The freshwater inflow and supply, however, are highly variable and are directly influenced by human involvement. Upstream abstraction for industrial and domestic use, for example, could lead to mouth closure where the ocean meets the river. The National Water Act of 1998 was established to address the lack of research and predominant mismanagement of freshwater inflow into South Africa’s estuaries (Allanson and Baird, 1999). To ensure proper water resource management, different water allocation costs and benefits need to be compared and analyzed to secure an optimum solution (Mlangeni, 2007). Like many environmental services yielded to man, estuary services are not traded in any markets. Alternative markets are thus sought to allow the estimation of the values of such services. Among the available valuation techniques are the Contingent Valuation Method (CVM), Travel Cost Method (TCM) and Hedonic Pricing Method (HPM). The involved benefits of water allocations are predicted in this study by use of the CVM which elicits respondents’ willingness to pay (WTP) towards predetermined changes in freshwater inflow into estuaries. The CVM was applied throughout the Water Research Commission’s (WRC) Project K5/1413 from 2000 to 2008 (Hosking, 2010). Each individual study employed specialized surveys which ideally created a close correspondence between the answers provided by respondents to the supposed scenarios and their voluntary exchanges in markets should money actually have been handled (Mlangeni, 2007). Much criticism has been directed towards the CVM, but careful use and application of the method has been shown to produce significant and satisfactory results (Hosking, 2010). The primary aim of this study was to collectively analyze the collated data provided by the WRC and compare the results with the findings of previous studies. Each variable was analyzed separately in order to reveal any discrepancies between the respective findings. A supplementary objective of this study was to add to the body of knowledge pertaining to South Africa’s estuaries and guide management in the distribution of freshwater towards proficient levels (Du Preez and Hosking, 2010). The associated change in the cumulative consumer surplus with an increased freshwater supply into forty selected estuaries was therefore investigated. The subsequent benefits due to a superior freshwater supply are therefore reflected (Du Preez and Hosking, 2010). The data gathered by each of the individual researchers throughout their studies (supported by the WRC) were combined to form a single dataset including all recorded information supplied by the corresponding respondents. As the investigation progressed, improvements were made upon the questionnaires posed to the considered estuary populations. Consequently, some of the data in the combined dataset were “missing”, since previous studies did not include certain questions, while later studies omitted others. Data imputation was employed to create an imputed dataset, enabling the modeling of the public’s WTP through regression techniques. A linear model was utilized in this study, also incorporating interaction between the predictor variables. The double-log functional form was implemented to estimate the public’s WTP. The population’s total willingness to pay (TWTP) was further estimated by aggregation. A summary of the respective results is displayed in in Table 1.
- Full Text:
- Authors: Smith, Melnick Jurgen
- Subjects: Estuaries -- South Africa -- Eastern Cape , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:10581 , http://hdl.handle.net/10948/d1020916
- Description: According to the National Water Act of 1998, an estuary is an enclosed body of water that is either periodically or permanently open to the ocean. Within an estuary, the seawater is diluted to a measurable degree, creating a unique aquatic environment for animals and plants. Estuaries are environmental and economic assets to the population. The health status of our local estuaries, however, is being compromized due to a steady decrease in the freshwater inflow and supply. Tides and climatic conditions do have an impact upon the dynamics of an estuary, but these factors remain relatively constant throughout each year. The freshwater inflow and supply, however, are highly variable and are directly influenced by human involvement. Upstream abstraction for industrial and domestic use, for example, could lead to mouth closure where the ocean meets the river. The National Water Act of 1998 was established to address the lack of research and predominant mismanagement of freshwater inflow into South Africa’s estuaries (Allanson and Baird, 1999). To ensure proper water resource management, different water allocation costs and benefits need to be compared and analyzed to secure an optimum solution (Mlangeni, 2007). Like many environmental services yielded to man, estuary services are not traded in any markets. Alternative markets are thus sought to allow the estimation of the values of such services. Among the available valuation techniques are the Contingent Valuation Method (CVM), Travel Cost Method (TCM) and Hedonic Pricing Method (HPM). The involved benefits of water allocations are predicted in this study by use of the CVM which elicits respondents’ willingness to pay (WTP) towards predetermined changes in freshwater inflow into estuaries. The CVM was applied throughout the Water Research Commission’s (WRC) Project K5/1413 from 2000 to 2008 (Hosking, 2010). Each individual study employed specialized surveys which ideally created a close correspondence between the answers provided by respondents to the supposed scenarios and their voluntary exchanges in markets should money actually have been handled (Mlangeni, 2007). Much criticism has been directed towards the CVM, but careful use and application of the method has been shown to produce significant and satisfactory results (Hosking, 2010). The primary aim of this study was to collectively analyze the collated data provided by the WRC and compare the results with the findings of previous studies. Each variable was analyzed separately in order to reveal any discrepancies between the respective findings. A supplementary objective of this study was to add to the body of knowledge pertaining to South Africa’s estuaries and guide management in the distribution of freshwater towards proficient levels (Du Preez and Hosking, 2010). The associated change in the cumulative consumer surplus with an increased freshwater supply into forty selected estuaries was therefore investigated. The subsequent benefits due to a superior freshwater supply are therefore reflected (Du Preez and Hosking, 2010). The data gathered by each of the individual researchers throughout their studies (supported by the WRC) were combined to form a single dataset including all recorded information supplied by the corresponding respondents. As the investigation progressed, improvements were made upon the questionnaires posed to the considered estuary populations. Consequently, some of the data in the combined dataset were “missing”, since previous studies did not include certain questions, while later studies omitted others. Data imputation was employed to create an imputed dataset, enabling the modeling of the public’s WTP through regression techniques. A linear model was utilized in this study, also incorporating interaction between the predictor variables. The double-log functional form was implemented to estimate the public’s WTP. The population’s total willingness to pay (TWTP) was further estimated by aggregation. A summary of the respective results is displayed in in Table 1.
- Full Text:
- «
- ‹
- 1
- ›
- »