The consolidation of forecests with regression models
- Venter, Daniel Jacobus Lodewyk
- Authors: Venter, Daniel Jacobus Lodewyk
- Date: 2014
- Subjects: Regression analysis -- Mathematical models , Forecasting -- Mathematical models
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: vital:10582 , http://hdl.handle.net/10948/d1020964
- Description: The primary objective of this study was to develop a dashboard for the consolidation of multiple forecasts utilising a range of multiple linear regression models. The term dashboard is used to describe with a single word the characteristics of the forecasts consolidation application that was developed to provide the required functionalities via a graphical user interface structured as a series of interlinked screens. Microsoft Excel© was used as the platform to develop the dashboard named ConFoRM (acronym for Consolidate Forecasts with Regression Models). The major steps of the consolidation process incorporated in ConFoRM are: 1. Input historical data. Select appropriate analysis and holdout samples. 3. Specify regression models to be considered as candidates for the final model to be used for the consolidation of forecasts. 4. Perform regression analysis and holdout analysis for each of the models specified in step 3. 5. Perform post-holdout testing to assess the performance of the model with best holdout validation results on out-of-sample data. 6. Consolidate forecasts. Two data transformations are available: the removal of growth and time-periods effect from the time series; a translation of the time series by subtracting ̅i, the mean of all the forecasts for data record i, from the variable being predicted and its related forecasts for each data record I. The pre-defined regression models available for ordinary least square linear regression models (LRM) are: a. A set of k simple LRM’s, one for each of the k forecasts; b. A multiple LRM that includes all the forecasts: c. A multiple LRM that includes all the forecasts and as many of the first-order interactions between the input forecasts as allowed by the sample size and the maximum number of predictors provided by the dashboard with the interactions included in the model to be those with the highest individual correlation with the variable being predicted; d. A multiple LRM that includes as many of the forecasts and first-order interactions between the input forecasts as allowed by the sample size and the maximum number of predictors provided by the dashboard: with the forecasts and interactions included in the model to be those with the highest individual correlation with the variable being predicted; e. A simple LRM with the predictor variable being the mean of the forecasts: f. A set of simple LRM’s with the predictor variable in each case being the weighted mean of the forecasts with different formulas for the weights Also available is an ad hoc user specified model in terms of the forecasts and the predictor variables generated by the dashboard for the pre-defined models. Provision is made in the regression analysis for both of forward entry and backward removal regression. Weighted least squares (WLS) regression can be performed optionally based on the age of forecasts with smaller weight for older forecasts.
- Full Text:
- Date Issued: 2014
- Authors: Venter, Daniel Jacobus Lodewyk
- Date: 2014
- Subjects: Regression analysis -- Mathematical models , Forecasting -- Mathematical models
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: vital:10582 , http://hdl.handle.net/10948/d1020964
- Description: The primary objective of this study was to develop a dashboard for the consolidation of multiple forecasts utilising a range of multiple linear regression models. The term dashboard is used to describe with a single word the characteristics of the forecasts consolidation application that was developed to provide the required functionalities via a graphical user interface structured as a series of interlinked screens. Microsoft Excel© was used as the platform to develop the dashboard named ConFoRM (acronym for Consolidate Forecasts with Regression Models). The major steps of the consolidation process incorporated in ConFoRM are: 1. Input historical data. Select appropriate analysis and holdout samples. 3. Specify regression models to be considered as candidates for the final model to be used for the consolidation of forecasts. 4. Perform regression analysis and holdout analysis for each of the models specified in step 3. 5. Perform post-holdout testing to assess the performance of the model with best holdout validation results on out-of-sample data. 6. Consolidate forecasts. Two data transformations are available: the removal of growth and time-periods effect from the time series; a translation of the time series by subtracting ̅i, the mean of all the forecasts for data record i, from the variable being predicted and its related forecasts for each data record I. The pre-defined regression models available for ordinary least square linear regression models (LRM) are: a. A set of k simple LRM’s, one for each of the k forecasts; b. A multiple LRM that includes all the forecasts: c. A multiple LRM that includes all the forecasts and as many of the first-order interactions between the input forecasts as allowed by the sample size and the maximum number of predictors provided by the dashboard with the interactions included in the model to be those with the highest individual correlation with the variable being predicted; d. A multiple LRM that includes as many of the forecasts and first-order interactions between the input forecasts as allowed by the sample size and the maximum number of predictors provided by the dashboard: with the forecasts and interactions included in the model to be those with the highest individual correlation with the variable being predicted; e. A simple LRM with the predictor variable being the mean of the forecasts: f. A set of simple LRM’s with the predictor variable in each case being the weighted mean of the forecasts with different formulas for the weights Also available is an ad hoc user specified model in terms of the forecasts and the predictor variables generated by the dashboard for the pre-defined models. Provision is made in the regression analysis for both of forward entry and backward removal regression. Weighted least squares (WLS) regression can be performed optionally based on the age of forecasts with smaller weight for older forecasts.
- Full Text:
- Date Issued: 2014
An evaluation of paired comparison models
- Venter, Daniel Jacobus Lodewyk
- Authors: Venter, Daniel Jacobus Lodewyk
- Date: 2004
- Subjects: Paired comparisons (Statistics) , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:11087 , http://hdl.handle.net/10948/364 , Paired comparisons (Statistics) , Mathematical statistics
- Description: Introduction: A typical task in quantitative data analysis is to derive estimates of population parameters based on sample statistics. For manifest variables this is usually a straightforward process utilising suitable measurement instruments and standard statistics such the mean, median and standard deviation. Latent variables on the other hand are typically more elusive, making it difficult to obtain valid and reliable measurements. One of the most widely used methods of estimating the parameter value of a latent variable is to use a summated score derived from a set of individual scores for each of the various attributes of the latent variable. A serious limitation of this method and other similar methods is that the validity and reliability of measurements depend on whether the statements included in the questionnaire cover all characteristics of the variable being measured and also on respondents’ ability to correctly indicate their perceived assessment of the characteristics on the scale provided. Methods without this limitation and that are especially useful where a set of objects/entities must be ranked based on the parameter values of one or more latent variables, are methods of paired comparisons. Although the underlying assumptions and algorithms of these methods often differ dramatically, they all rely on data derived from a series of comparisons, each consisting of a pair of specimens selected from the set of objects/entities being investigated. Typical examples of the comparison process are: subjects (judges) who have to indicate for each pair of objects which of the two they prefer; sport teams that compete against each other in matches that involve two teams at a time. The resultant data of each comparison range from a simple dichotomy to indicate which of the two objects are preferred/better, to an interval or ratio scale score for e d Bradley-Terry models, and were based on statistical theory assuming that the variable(s) being measured is either normally (Thurstone-Mosteller) or exponentially (Bradley-Terry) distributed. For many years researchers had to rely on these PCM’s when analysing paired comparison data without any idea about the implications if the distribution of the data from which their sample were obtained differed from the assumed distribution for the applicable PCM being utilised. To address this problem, PCM’s were subsequently developed to cater for discrete variables and variables with distributions that are neither normal or exponential. A question that remained unanswered is how the performance, as measured by the accuracy of parameter estimates, of PCM's are affected if they are applied to data from a range of discrete and continuous distribution that violates the assumptions on which the applicable paired comparison algorithm is based. This study is an attempt to answer this question by applying the most popular PCM's to a range of randomly derived data sets that spans typical continuous and discrete data distributions. It is hoped that the results of this study will assist researchers when selecting the most appropriate PCM to obtain accurate estimates of the parameters of the variables in their data sets.
- Full Text:
- Date Issued: 2004
- Authors: Venter, Daniel Jacobus Lodewyk
- Date: 2004
- Subjects: Paired comparisons (Statistics) , Mathematical statistics
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:11087 , http://hdl.handle.net/10948/364 , Paired comparisons (Statistics) , Mathematical statistics
- Description: Introduction: A typical task in quantitative data analysis is to derive estimates of population parameters based on sample statistics. For manifest variables this is usually a straightforward process utilising suitable measurement instruments and standard statistics such the mean, median and standard deviation. Latent variables on the other hand are typically more elusive, making it difficult to obtain valid and reliable measurements. One of the most widely used methods of estimating the parameter value of a latent variable is to use a summated score derived from a set of individual scores for each of the various attributes of the latent variable. A serious limitation of this method and other similar methods is that the validity and reliability of measurements depend on whether the statements included in the questionnaire cover all characteristics of the variable being measured and also on respondents’ ability to correctly indicate their perceived assessment of the characteristics on the scale provided. Methods without this limitation and that are especially useful where a set of objects/entities must be ranked based on the parameter values of one or more latent variables, are methods of paired comparisons. Although the underlying assumptions and algorithms of these methods often differ dramatically, they all rely on data derived from a series of comparisons, each consisting of a pair of specimens selected from the set of objects/entities being investigated. Typical examples of the comparison process are: subjects (judges) who have to indicate for each pair of objects which of the two they prefer; sport teams that compete against each other in matches that involve two teams at a time. The resultant data of each comparison range from a simple dichotomy to indicate which of the two objects are preferred/better, to an interval or ratio scale score for e d Bradley-Terry models, and were based on statistical theory assuming that the variable(s) being measured is either normally (Thurstone-Mosteller) or exponentially (Bradley-Terry) distributed. For many years researchers had to rely on these PCM’s when analysing paired comparison data without any idea about the implications if the distribution of the data from which their sample were obtained differed from the assumed distribution for the applicable PCM being utilised. To address this problem, PCM’s were subsequently developed to cater for discrete variables and variables with distributions that are neither normal or exponential. A question that remained unanswered is how the performance, as measured by the accuracy of parameter estimates, of PCM's are affected if they are applied to data from a range of discrete and continuous distribution that violates the assumptions on which the applicable paired comparison algorithm is based. This study is an attempt to answer this question by applying the most popular PCM's to a range of randomly derived data sets that spans typical continuous and discrete data distributions. It is hoped that the results of this study will assist researchers when selecting the most appropriate PCM to obtain accurate estimates of the parameters of the variables in their data sets.
- Full Text:
- Date Issued: 2004
- «
- ‹
- 1
- ›
- »