Not enough data to create a plot.
Try a different view from the menu above.
Marwala, Tshilidzi
Autoencoder, Principal Component Analysis and Support Vector Regression for Data Imputation
Marivate, Vukosi N., Nelwamodo, Fulufhelo V., Marwala, Tshilidzi
Data collection often results in records that have missing values or variables. This investigation compares 3 different data imputation models and identifies their merits by using accuracy measures. Autoencoder Neural Networks, Principal components and Support Vector regression are used for prediction and combined with a genetic algorithm to then impute missing variables. The use of PCA improves the overall performance of the autoencoder network while the use of support vector regression shows promising potential for future investigation. Accuracies of up to 97.4 % on imputation of some of the variables were achieved.
Bayesian Approach to Neuro-Rough Models
Marwala, Tshilidzi, Crossingham, Bodie
This paper proposes a neuro-rough model based on multi-layered perceptron and rough set. The neuro-rough model is then tested on modelling the risk of HIV from demographic data. The model is formulated using Bayesian framework and trained using Monte Carlo method and Metropolis criterion. When the model was tested to estimate the risk of HIV infection given the demographic data it was found to give the accuracy of 62%. The proposed model is able to combine the accuracy of the Bayesian MLP model and the transparency of Bayesian rough set model.
Using Genetic Algorithms to Optimise Rough Set Partition Sizes for HIV Data Analysis
Crossingham, Bodie, Marwala, Tshilidzi
In this paper, we present a method to optimise rough set partition sizes, to which rule extraction is performed on HIV data. The genetic algorithm optimisation technique is used to determine the partition sizes of a rough set in order to maximise the rough sets prediction accuracy. The proposed method is tested on a set of demographic properties of individuals obtained from the South African antenatal survey. Six demographic variables were used in the analysis, these variables are; race, age of mother, education, gravidity, parity, and age of father, with the outcome or decision being either HIV positive or negative. Rough set theory is chosen based on the fact that it is easy to interpret the extracted rules. The prediction accuracy of equal width bin partitioning is 57.7% while the accuracy achieved after optimising the partitions is 72.8%. Several other methods have been used to analyse the HIV data and their results are stated and compared to that of rough set theory (RST).
Fault Classification in Cylinders Using Multilayer Perceptrons, Support Vector Machines and Guassian Mixture Models
Marwala, Tshilidzi, Mahola, Unathi, Chakraverty, Snehashish
Gaussian mixture models (GMM) and support vector machines (SVM) are introduced to classify faults in a population of cylindrical shells. The proposed procedures are tested on a population of 20 cylindrical shells and their performance is compared to the procedure, which uses multi-layer perceptrons (MLP). The modal properties extracted from vibration data are used to train the GMM, SVM and MLP. It is observed that the GMM produces 98%, SVM produces 94% classification accuracy while the MLP produces 88% classification rates.
Bayesian approach to rough set
Marwala, Tshilidzi, Crossingham, Bodie
This paper proposes an approach to training rough set models using Bayesian framework trained using Markov Chain Monte Carlo (MCMC) method. The prior probabilities are constructed from the prior knowledge that good rough set models have fewer rules. Markov Chain Monte Carlo sampling is conducted through sampling in the rough set granule space and Metropolis algorithm is used as an acceptance criteria. The proposed method is tested to estimate the risk of HIV given demographic data. The results obtained shows that the proposed approach is able to achieve an average accuracy of 58% with the accuracy varying up to 66%. In addition the Bayesian rough set give the probabilities of the estimated HIV status as well as the linguistic rules describing how the demographic parameters drive the risk of HIV.