Goto

Collaborating Authors

A One-Sided Classification Toolkit with Applications in the Analysis of Spectroscopy Data

arXiv.org Machine Learning

This dissertation investigates the use of one-sided classification algorithms in the application of separating hazardous chlorinated solvents from other materials, based on their Raman spectra. The experimentation is carried out using a new one-sided classification toolkit that was designed and developed from the ground up. In the one-sided classification paradigm, the objective is to separate elements of the target class from all outliers. These one-sided classifiers are generally chosen, in practice, when there is a deficiency of some sort in the training examples. Sometimes outlier examples can be rare, expensive to label, or even entirely absent. However, this author would like to note that they can be equally applicable when outlier examples are plentiful but nonetheless not statistically representative of the complete outlier concept. It is this scenario that is explicitly dealt with in this research work. In these circumstances, one-sided classifiers have been found to be more robust that conventional multi-class classifiers. The term "unexpected" outliers is introduced to represent outlier examples, encountered in the test set, that have been taken from a different distribution to the training set examples. These are examples that are a result of an inadequate representation of all possible outliers in the training set. It can often be impossible to fully characterise outlier examples given the fact that they can represent the immeasurable quantity of "everything else" that is not a target. The findings from this research have shown the potential drawbacks of using conventional multi-class classification algorithms when the test data come from a completely different distribution to that of the training samples.



Network Based Pricing for 3D Printing Services in Two-Sided Manufacturing-as-a-Service Marketplace

arXiv.org Machine Learning

This paper presents approaches to determine a network based pricing for 3D printing services in the context of a two-sided manufacturing-as-a-service marketplace. The intent is to provide cost analytics to enable service bureaus to better compete in the market by moving away from setting ad-hoc and subjective prices. A data mining approach with machine learning methods is used to estimate a price range based on the profile characteristics of 3D printing service suppliers. The model considers factors such as supplier experience, supplier capabilities, customer reviews and ratings from past orders, and scale of operations among others to estimate a price range for suppliers' services. Data was gathered from existing marketplace websites, which was then used to train and test the model. The model demonstrates an accuracy of 65% for US based suppliers and 59% for Europe based suppliers to classify a supplier's 3D Printer listing in one of the seven price categories. The improvement over baseline accuracy of 25% demonstrates that machine learning based methods are promising for network based pricing in manufacturing marketplaces. Conventional methodologies for pricing services through activity based costing are inefficient in strategically pricing 3D printing service offering in a connected marketplace. As opposed to arbitrarily determining prices, this work proposes an approach to determine prices through data mining methods to estimate competitive prices. Such tools can be built into online marketplaces to help independent service bureaus to determine service price rates.


Machine learning enables polymer cloud-point engineering via inverse design

arXiv.org Machine Learning

Inverse design is an outstanding challenge in disordered systems with multiple length scales such as polymers, particularly when designing polymers with desired phase behavior. We demonstrate high-accuracy tuning of poly(2-oxazoline) cloud point via machine learning. With a design space of four repeating units and a range of molecular masses, we achieve an accuracy of 4 {\deg}C root mean squared error (RMSE) in a temperature range of 24-90 {\deg}C, employing gradient boosting with decision trees. The RMSE is >3x better than linear and polynomial regression. We perform inverse design via particle-swarm optimization, predicting and synthesizing 17 polymers with constrained design at 4 target cloud points from 37 to 80 {\deg}C. Our approach challenges the status quo in polymer design with a machine learning algorithm, that is capable of fast and systematic discovery of new polymers.


Dynamic Advisor-Based Ensemble (dynABE): Case Study in Stock Trend Prediction of a Major Critical Metal Producer

arXiv.org Machine Learning

The demand of metals by modern technology has been shifting from common base metals to a variety of minor metals, such as cobalt or indium. The industrial importance and limited geological availability of some minor metals have led to them being considered more "critical," and there is a growing interest in such critical metals and their producing companies. In this research, we create a novel framework, Dynamic Advisor-Based Ensemble (dynABE), to predict the stock trend of major critical metal producers. Specifically, dynABE first utilizes domain knowledge to group the features into different "advisors," each advisor dealing with a particular economic sector. Then through ensembles of weak classifiers, each advisor produces a prediction result, and all the advisors are combined again in a biased online update fashion to dynamically make the final prediction. Based on a misclassification error of 32% for Jinchuan Group's stock (HKG: 2362), we further test a simple stock trading strategy, which leads to a back-tested return of 296%, or an excess return of 130% within one year. In addition, the feature set selected by dynABE also suggests potentially influential factors to metal criticality, because stock prices of major producers influence metal production. Therefore, not only does this research propose a novel framework for specialized stock trend prediction, it also provides domain insights into dynamic features that potentially influence metal criticality.