Goto

Collaborating Authors

 Regression


Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer

Journal of Artificial Intelligence Research

Knowledge transfer between tasks can improve the performance of learned models, but requires an accurate estimate of inter-task relationships to identify the relevant knowledge to transfer. These inter-task relationships are typically estimated based on training data for each task, which is inefficient in lifelong learning settings where the goal is to learn each consecutive task rapidly from as little data as possible. To reduce this burden, we develop a lifelong learning method based on coupled dictionary learning that utilizes high-level task descriptions to model inter-task relationships. We show that using task descriptors improves the performance of the learned task policies, providing both theoretical justification for the benefit and empirical demonstration of the improvement across a variety of learning problems. Given only the descriptor for a new task, the lifelong learner is also able to accurately predict a model for the new task through zero-shot learning using the coupled dictionary, eliminating the need to gather training data before addressing the task.


Ensemble Forecasting of Monthly Electricity Demand using Pattern Similarity-based Methods

arXiv.org Machine Learning

This work presents ensemble forecasting of monthly electricity demand using pattern similarity-based forecasting methods (PSFMs). PSFMs applied in this study include $k$-nearest neighbor model, fuzzy neighborhood model, kernel regression model, and general regression neural network. An integral part of PSFMs is a time series representation using patterns of time series sequences. Pattern representation ensures the input and output data unification through filtering a trend and equalizing variance. Two types of ensembles are created: heterogeneous and homogeneous. The former consists of different type base models, while the latter consists of a single-type base model. Five strategies are used for controlling a diversity of members in a homogeneous approach. The diversity is generated using different subsets of training data, different subsets of features, randomly disrupted input and output variables, and randomly disrupted model parameters. An empirical illustration applies the ensemble models as well as individual PSFMs for comparison to the monthly electricity demand forecasting for 35 European countries.


Are Direct Links Necessary in RVFL NNs for Regression?

arXiv.org Machine Learning

A random vector functional link network (RVFL) is widely used as a universal approximator for classification and regression problems. The big advantage of RVFL is fast training without backpropagation. This is because the weights and biases of hidden nodes are selected randomly and stay untrained. Recently, alternative architectures with randomized learning are developed which differ from RVFL in that they have no direct links and a bias term in the output layer. In this study, we investigate the effect of direct links and output node bias on the regression performance of RVFL. For generating random parameters of hidden nodes we use the classical method and two new methods recently proposed in the literature. We test the RVFL performance on several function approximation problems with target functions of different nature: nonlinear, nonlinear with strong fluctuations, nonlinear with linear component and linear. Surprisingly, we found that the direct links and output node bias do not play an important role in improving RVFL accuracy for typical nonlinear regression problems. Keywords: Random vector functional link network · Neural networks with random hidden nodes · Randomized learning algorithms.


Machine learning with Python: An introduction

#artificialintelligence

Machine learning is one of our most important technologies for the future. Self-driving cars, voice-controlled speakers, and face detection software all are built on machine learning technologies and frameworks. As a software developer you may wonder how this will impact your daily work, including the tools and frameworks you should learn. If you're reading this article, my guess is you've already decided to learn more about machine learning. In my previous article, "Machine Learning for Java developers," I introduced Java developers to setting up a machine learning algorithm and developing a simple prediction function in Java.


Machine learning with Python: An introduction

#artificialintelligence

Machine learning is one of our most important technologies for the future. Self-driving cars, voice-controlled speakers, and face detection software all are built on machine learning technologies and frameworks. As a software developer you may wonder how this will impact your daily work, including the tools and frameworks you should learn. If you're reading this article, my guess is you've already decided to learn more about machine learning. In my previous article, "Machine Learning for Java developers," I introduced Java developers to setting up a machine learning algorithm and developing a simple prediction function in Java.


Random Machines Regression Approach: an ensemble support vector regression model with free kernel choice

arXiv.org Machine Learning

Machine learning techniques always aim to reduce the generalized prediction error. In order to reduce it, ensemble methods present a good approach combining several models that results in a greater forecasting capacity. The Random Machines already have been demonstrated as strong technique, i.e: high predictive power, to classification tasks, in this article we propose an procedure to use the bagged-weighted support vector model to regression problems. Simulation studies were realized over artificial datasets, and over real data benchmarks. The results exhibited a good performance of Regression Random Machines through lower generalization error without needing to choose the best kernel function during tuning process.


Robust Q-learning

arXiv.org Machine Learning

A dynamic treatment strategy is a sequence of decision rules that maps individual characteristics to a treatment option at each decision point (i.e., a specific point in time in which a treatment is to be considered or altered). An optimal dynamic treatment strategy seeks to make these decisions to maximize a particular expected health outcome (Lavori & Dawson, 2000; Murphy, 2005; Nahum-Shani et al., 2012a; Lei et al., 2012; Davidian et al., 2016). This is similar to clinical decision making whereby care providers tailor the type/dose of treatment over the course of clinical care based on ongoing information regarding patient progress in treatment. The main goal of precision medicine (i.e., developing an effective dynamic treatment strategy) is to use patient characteristics to inform a personalized treatment plan as a sequence of decision rules that leads to the best possible health outcome for each patient (Nahum-Shani et al., 2012a; Chakraborty & Moodie, 2013; Moodie & Kosorok, 2015; Butler et al., 2018). Q-learning is a reinforcement learning algorithm that is widely used to estimate an optimal dynamic treatment strategy using data from multistage randomized clinical trials or observational studies (Watkins & Dayan, 1992; Nahum-Shani et al., 2012b; Laber et al., 2014).


Semiparametric Inference For Causal Effects In Graphical Models With Hidden Variables

arXiv.org Machine Learning

The last decade witnessed the development of algorithms that completely solve the identifiability problem for causal effects in hidden variable causal models associated with directed acyclic graphs. However, much of this machinery remains underutilized in practice owing to the complexity of estimating identifying functionals yielded by these algorithms. In this paper, we provide simple graphical criteria and semiparametric estimators that bridge the gap between identification and estimation for causal effects involving a single treatment and a single outcome. First, we provide influence function based doubly robust estimators that cover a significant subset of hidden variable causal models where the effect is identifiable. We further characterize an important subset of this class for which we demonstrate how to derive the estimator with the lowest asymptotic variance, i.e., one that achieves the semiparametric efficiency bound. Finally, we provide semiparametric estimators for any single treatment causal effect parameter identified via the aforementioned algorithms. The resulting estimators resemble influence function based estimators that are sequentially reweighted, and exhibit a partial double robustness property, provided the parts of the likelihood corresponding to a set of weight models are correctly specified. Our methods are easy to implement and we demonstrate their utility through simulations.


On the role of surrogates in the efficient estimation of treatment effects with limited outcome data

arXiv.org Machine Learning

We study the problem of estimating treatment effects when the outcome of primary interest (e.g., long-term health status) is only seldom observed but abundant surrogate observations (e.g., short-term health outcomes) are available. To investigate the role of surrogates in this setting, we derive the semiparametric efficiency lower bounds of average treatment effect (ATE) both with and without presence of surrogates, as well as several intermediary settings. These bounds characterize the best-possible precision of ATE estimation in each case, and their difference quantifies the efficiency gains from optimally leveraging the surrogates in terms of key problem characteristics when only limited outcome data are available. We show these results apply in two important regimes: when the number of surrogate observations is comparable to primary-outcome observations and when the former dominates the latter. Importantly, we take a missing-data approach that circumvents strong surrogate conditions which are commonly assumed in previous literature but almost always fail in practice. To show how to leverage the efficiency gains of surrogate observations, we propose ATE estimators and inferential methods based on flexible machine learning methods to estimate nuisance parameters that appear in the influence functions. We show our estimators enjoy efficiency and robustness guarantees under weak conditions.


Integrating Informativeness, Representativeness and Diversity in Pool-Based Sequential Active Learning for Regression

arXiv.org Machine Learning

In many real-world machine learning applications, unlabeled samples are easy to obtain, but it is expensive and/or time-consuming to label them. Active learning is a common approach for reducing this data labeling effort. It optimally selects the best few samples to label, so that a better machine learning model can be trained from the same number of labeled samples. This paper considers active learning for regression (ALR) problems. Three essential criteria -- informativeness, representativeness, and diversity -- have been proposed for ALR. However, very few approaches in the literature have considered all three of them simultaneously. We propose three new ALR approaches, with different strategies for integrating the three criteria. Extensive experiments on 12 datasets in various domains demonstrated their effectiveness.