Goto

Collaborating Authors

 Regression


A Graph-Based Approach for Active Learning in Regression

arXiv.org Machine Learning

Active learning aims to reduce labeling efforts by selectively asking humans to annotate the most important data points from an unlabeled pool and is an example of human-machine interaction. Though active learning has been extensively researched for classification and ranking problems, it is relatively understudied for regression problems. Most existing active learning for regression methods use the regression function learned at each active learning iteration to select the next informative point to query. This introduces several challenges such as handling noisy labels, parameter uncertainty and overcoming initially biased training data. Instead, we propose a feature-focused approach that formulates both sequential and batch-mode active regression as a novel bipartite graph optimization problem. We conduct experiments on both noise-free and noisy settings. Our experimental results on benchmark data sets demonstrate the effectiveness of our proposed approach.


Getting Started with TensorFlow and Keras โ€“ Maker.io Digi-Key Electronics

#artificialintelligence

In this tutorial, we show you how to configure TensorFlow with Keras on a computer and build a simple linear regression model. If you have access to a modern NVIDIA graphics card (GPU), you can enable tensorflow-gpu to take advantage of the parallel processing afforded by CUDA. The field of Artificial Intelligence (AI) has been around for quite some time. As we move to build an understanding and use cases for Edge AI, we first need to understand some of the popular frameworks for building machine learning models on personal computers (and servers!). These models can then be deployed to edge devices, such as single-board computers (like the Raspberry Pi) and microcontrollers.


Privacy-Preserving Gaussian Process Regression -- A Modular Approach to the Application of Homomorphic Encryption

arXiv.org Machine Learning

Much of machine learning relies on the use of large amounts of data to train models to make predictions. When this data comes from multiple sources, for example when evaluation of data against a machine learning model is offered as a service, there can be privacy issues and legal concerns over the sharing of data. Fully homomorphic encryption (FHE) allows data to be computed on whilst encrypted, which can provide a solution to the problem of data privacy. However, FHE is both slow and restrictive, so existing algorithms must be manipu - lated to make them work efficiently under the FHE paradigm. Some commonly used machine learning algorithms, such as Gaussian process regression, are poorly suited to FHE and cannot be manipulated to work both efficiently and accurately. In this paper, we show that a modular approach, which applies FHE to only the sensitive steps of a workflow that need protection, allows one party to make predictions on the ir data using a Gaussian process regression model built from an - other party's data, without either party gaining access to t he other's data, in a way which is both accurate and efficient. This construction is, to our knowledge, the first example of an effectively encrypted Gaussian process.


Regularization Helps with Mitigating Poisoning Attacks: Distributionally-Robust Machine Learning Using the Wasserstein Distance

arXiv.org Machine Learning

We use distributionally-robust optimization for machine learning to mitigate the effect of data poisoning attacks. We provide performance guarantees for the trained model on the original data (not including the poison records) by training the model for the worst-case distribution on a neighbourhood around the empirical distribution (extracted from the training dataset corrupted by a poisoning attack) defined using the Wasserstein distance. We relax the distributionally-robust machine learning problem by finding an upper bound for the worst-case fitness based on the empirical sampled-averaged fitness and the Lipschitz-constant of the fitness function (on the data for given model parameters) as regularizer. For regression models, we prove that this regularizer is equal to the dual norm of the model parameters. We use the Wine Quality dataset, the Boston Housing Market dataset, and the Adult dataset for demonstrating the results of this paper.


Fast quantum learning with statistical guarantees

arXiv.org Machine Learning

A wide class of quantum algorithms for learning problems exp loit fast quantum linear algebra subroutines to achieve runtimes that are exponentially faster than their classical counterparts [ Cil 18 ]. Examples of these algorithms are quantum support vector m achines [ RML14 ], quantum linear regression [ WBL12; SSP16 ], and quantum least squares [ KP17; CGJ18 ]. A careful analysis of these algorithms identified a number of caveats that limit their practical applicability such as the need for a strong form of quantum ac cess to the input data, restrictions on structural properties of the data matrix (such as conditi on number or sparsity), and modes of access to the output [ Aar15 ]. Furthermore, if one assumes that it is efficient to (classic ally) sample elements of the training data in a way proportional to their norm, then it is possible to show that classical algorithms are only polynomially slowe r (albeit the scaling of the quantum algorithms can be considerably better) [ Tan18; CL W18; Chi 19a; GLT18; Chi 19b ]. In this work we continue to investigate the limitations of qu antum algorithms for learning problems.


A random forest based approach for predicting spreads in the primary catastrophe bond market

arXiv.org Machine Learning

We introduce a random forest approach to enable spreads' prediction in the primary catastrophe bond market. We investigate whether all information provided to investors in the offering circular prior to a new issuance is equally important in predicting its spread. The whole population of non-life catastrophe bonds issued from December 2009 to May 2018 is used. The random forest shows an impressive predictive power on unseen primary catastrophe bond data explaining 93% of the total variability. For comparison, linear regression, our benchmark model, has inferior predictive performance explaining only 47% of the total variability. All details provided in the offering circular are predictive of spread but in a varying degree. The stability of the results is studied. The usage of random forest can speed up investment decisions in the catastrophe bond industry.


Technical Background for "A Precision Medicine Approach to Develop and Internally Validate Optimal Exercise and Weight Loss Treatments for Overweight and Obese Adults with Knee Osteoarthritis"

arXiv.org Machine Learning

A precision medicine (PM) pipeline was developed to determine the optimal treatment regime for participants in an exercise (E), dietary weight loss (D), and D+E randomized clinical trial for knee osteoarthritis to maximize their expected outcomes. Using data from 343 participants of the Intensive Diet and Exercise for Arthritis (IDEA) trial, we applied 24 machine-learning models to develop individualized treatment rules on seven outcomes: SF-36 physical component score, weight loss, WOMAC pain/function/stiffness scores, compressive force, and IL-6. The optimal precision medicine model (PMM) was selected based on jackknife value function estimates that indicate improvement in the outcome(s) had future participants followed the estimated decision rule, which is then compared against the optimal single, fixed treatment model called zero-order model (ZOM) with a Z-test. Multiple outcome random forest was the optimal model for the WOMAC outcomes. The PMMs supported the overall findings from IDEA that the D+E intervention was optimal for most participants, but there was evidence that a subgroup of participants would likely benefit more from diet alone for two outcomes. This article provides detailed technical background for the clinical data analysis.


Predicting Regression Probability Distributions with Imperfect Data Through Optimal Transformations

arXiv.org Machine Learning

The goal of regression analysis is to predict the value of a numeric outcome variable y given a vector of joint values of other (predictor) variables x. Usually a particular x-vector does not specify a repeatable value for y, but rather a probability distribution of possible y--values, p(y|x). This distribution has a location, scale and shape, all of which can depend on x, and are needed to infer likely values for y given x. Regression methods usually assume that training data y-values are perfect numeric realizations from some well behaived p(y|x). Often actual training data y-values are discrete, truncated and/or arbitrary censored. Regression procedures based on an optimal transformation strategy are presented for estimating location, scale and shape of p(y|x) as general functions of x, in the possible presence of such imperfect training data. In addition, validation diagnostics are presented to ascertain the quality of the solutions.


Data-Driven Prediction Model of Components Shift during Reflow Process in Surface Mount Technology

arXiv.org Machine Learning

In surface mount technology (SMT), mounted components on soldered pads are subject to move during reflow process. This capability is known as self-alignment and is the result of fluid dynamic behaviour of molten solder paste. This capability is critical in SMT because inaccurate self-alignment causes defects such as overhanging, tombstoning, etc. while on the other side, it can enable components to be perfectly self-assembled on or near the desire position. The aim of this study is to develop a machine learning model that predicts the components movement during reflow in x and y-directions as well as rotation. Our study is composed of two steps: (1) experimental data are studied to reveal the relationships between self-alignment and various factors including component geometry, pad geometry, etc. (2) advanced machine learning prediction models are applied to predict the distance and the direction of components shift using support vector regression (SVR), neural network (NN), and random forest regression (RFR). As a result, RFR can predict components shift with the average fitness of 99%, 99%, and 96% and with average prediction error of 13.47 (um), 12.02 (um), and 1.52 (deg.) for component shift in x, y, and rotational directions, respectively. This enhancement provides the future capability of the parameters' optimization in the pick and placement machine to control the best placement location and minimize the intrinsic defects caused by the self-alignment.


Optimization of Passive Chip Components Placement with Self-Alignment Effect for Advanced Surface Mounting Technology

arXiv.org Machine Learning

Surface mount technology (SMT) is an enhanced method in electronic packaging in which electronic components are placed directly on soldered printing circuit board (PCB) and are permanently attached on PCB with the aim of reflow soldering process. During reflow process, once deposited solder pastes start melting, electronic components move in a direction that achieve their highest symmetry. This motion is known as self-alignment since can correct potential mounting misalignment. In this study, two noticeable machine learning algorithms, including support vector regression (SVR) and random forest regression (RFR) are proposed as a prediction technique to (1) diagnose the relation among component self-alignment, deposited solder paste status and placement machining parameters, (2) predict the final component position on PCB in x, y, and rotational directions before entering in the reflow process. Based on the prediction result, a non-linear optimization model (NLP) is developed to optimize placement parameters at initial stage. Resultantly, RFR outperforms in terms of prediction model fitness and error. The optimization model is run for 6 samples in which the minimum Euclidean distance from component position after reflow process from ideal position (i.e., the center of pads) is outlined as 25.57 ({\mu}m) regarding defined boundaries in model.