Regression
On The Problem of Relevance in Statistical Inference
Mukhopadhyay, Subhadeep, Wang, Kaijun
How many statistical inference tools we have for inference from massive data? A huge number, but only when we are ready to assume the given database is homogenous, consisting of a large cohort of "similar" cases. Why we need the homogeneity assumption? To make `learning from the experience of others' or `borrowing strength' possible. But, what if, we are dealing with a massive database of heterogeneous cases (which is a norm in almost all modern data-science applications including neuroscience, genomics, healthcare, and astronomy)? How many methods we have in this situation? Not much, if not ZERO. Why? It's not obvious how to go about gathering strength when each piece of information is fuzzy. The danger is that, if we include irrelevant cases, borrowing information might heavily damage the quality of the inference! This raises some fundamental questions for big data inference: When (not) to borrow? Whom (not) to borrow? How (not) to borrow? These questions are at the heart of the "Problem of Relevance" in statistical inference -- a puzzle that has remained too little addressed since its inception nearly half a century ago. Here we offer the first practical theory of relevance with precisely describable statistical formulation and algorithm. Through examples, we demonstrate how our new statistical perspective answers previously unanswerable questions in a realistic and feasible way.
An Extension of LIME with Improvement of Interpretability and Fidelity
Shi, Sheng, Du, Yangzhou, Fan, Wei
While deep learning makes significant achievements in Artificial Intelligence (AI), the lack of transparency has limited its broad application in various vertical domains. Explainability is not only a gateway between AI and real world, but also a powerful feature to detect flaw of the models and bias of the data. Local Interpretable Model-agnostic Explanation (LIME) is a widely-accepted technique that explains the prediction of any classifier faithfully by learning an interpretable model locally around the predicted instance. As an extension of LIME, this paper proposes an high-interpretability and high-fidelity local explanation method, known as Local Explanation using feature Dependency Sampling and Nonlinear Approximation (LEDSNA). Given an instance being explained, LEDSNA enhances interpretability by feature sampling with intrinsic dependency. Besides, LEDSNA improves the local explanation fidelity by approximating nonlinear boundary of local decision. We evaluate our method with classification tasks in both image domain and text domain. Experiments show that LEDSNA's explanation of the back-box model achieves much better performance than original LIME in terms of interpretability and fidelity.
Symbolic Regression Driven by Training Data and Prior Knowledge
Kubalík, J., Derner, E., Babuška, R.
In symbolic regression, the search for analytic models is typically driven purely by the prediction error observed on the training data samples. However, when the data samples do not sufficiently cover the input space, the prediction error does not provide sufficient guidance toward desired models. Standard symbolic regression techniques then yield models that are partially incorrect, for instance, in terms of their steady-state characteristics or local behavior. If these properties were considered already during the search process, more accurate and relevant models could be produced. We propose a multi-objective symbolic regression approach that is driven by both the training data and the prior knowledge of the properties the desired model should manifest. The properties given in the form of formal constraints are internally represented by a set of discrete data samples on which candidate models are exactly checked. The proposed approach was experimentally evaluated on three test problems with results clearly demonstrating its capability to evolve realistic models that fit the training data well while complying with the prior knowledge of the desired model characteristics at the same time. It outperforms standard symbolic regression by several orders of magnitude in terms of the mean squared deviation from a reference model.
Improving the Decision-Making Process of Self-Adaptive Systems by Accounting for Tactic Volatility
Palmerino, Jeffrey, Yu, Qi, Desell, Travis, Krutz, Daniel E.
When self-adaptive systems encounter changes within their surrounding environments, they enact tactics to perform necessary adaptations. For example, a self-adaptive cloud-based system may have a tactic that initiates additional computing resources when response time thresholds are surpassed, or there may be a tactic to activate a specific security measure when an intrusion is detected. In real-world environments, these tactics frequently experience tactic volatility which is variable behavior during the execution of the tactic. Unfortunately, current self-adaptive approaches do not account for tactic volatility in their decision-making processes, and merely assume that tactics do not experience volatility. This limitation creates uncertainty in the decision-making process and may adversely impact the system's ability to effectively and efficiently adapt. Additionally, many processes do not properly account for volatility that may effect the system's Service Level Agreement (SLA). This can limit the system's ability to act proactively, especially when utilizing tactics that contain latency. To address the challenge of sufficiently accounting for tactic volatility, we propose a Tactic Volatility Aware (TVA) solution. Using Multiple Regression Analysis (MRA), TVA enables self-adaptive systems to accurately estimate the cost and time required to execute tactics. TVA also utilizes Autoregressive Integrated Moving Average (ARIMA) for time series forecasting, allowing the system to proactively maintain specifications.
Machine Learning using C for Linear and Logistic Regression
The applications of machine learning transcend boundaries and industries so why should we let tools and languages hold us back? Yes, Python is the language of choice in the industry right now but a lot of us come from a background where Python isn't taught! The computer science faculty in universities are still teaching programming in C – so that's what most of us end up learning first. I understand why you should learn Python – it's the primary language in the industry and it has all the libraries you need to get started with machine learning. But what if your university doesn't teach it?
Active Learning for Gaussian Process Considering Uncertainties with Application to Shape Control of Composite Fuselage
Yue, Xiaowei, Wen, Yuchen, Hunt, Jeffrey H., Shi, Jianjun
This paper has been accepted by IEEE Transactions on Automation Science and Engineering. 1 This preprint is an accepted version, not the IEEE published version. Abstract--In the machine learning domain, active learning is an iterative data selection algorithm for maximizing information acquisition and improving model performance with limited training samples. It is very useful, especially for the industrial applications where training samples are expensive, time-consuming, or difficult to obtain. Existing methods mainly focus on active learning for classification, and a few methods are designed for regression such as linear regression or Gaussian process. Uncertainties from measurement errors and intrinsic input noise inevitably exist in the experimental data, which further affects the modeling performance. The existing active learning methods do not incorporate these uncertainties for Gaussian process. In this paper, we propose two new active learning algorithms for the Gaussian process with uncertainties, which are variance-based weighted active learning algorithm and D-optimal weighted active learning algorithm. Through numerical study, we show that the proposed approach can incorporate the impact from uncertainties, and realize better prediction performance. This approach has been applied to improving the predictive modeling for automatic shape control of composite fuselage. I. INTRODUCTION Active learning is a type of iterative supervised learning which focuses on maximizing information acquisition with limited samples. In statistics literature, this process is also called optimal experimental design, or sequential design. The main idea of active learning is to iteratively pose "query" or "design" to explore the most informative new experimental samples according to the information obtained from the current samples. In many machine learning applications, especially in some industrial systems, the explanatory data are rich and easy to get, but the response data are very expensive, time-consuming, or difficult to obtain. For example, when training autonomous driving algorithms, a lot of media (e.g., images, videos) require that oracle users mark them with particular labels, such as "vehicle", "street sign" or "road lines". It can be tedious, redundant and time-consuming to annotate lots of these instances.
Alternating Minimization Converges Super-Linearly for Mixed Linear Regression
Ghosh, Avishek, Ramchandran, Kannan
We address the problem of solving mixed random linear equations. We have unlabeled observations coming from multiple linear regressions, and each observation corresponds to exactly one of the regression models. The goal is to learn the linear regressors from the observations. Classically, Alternating Minimization (AM) (which is a variant of Expectation Maximization (EM)) is used to solve this problem. AM iteratively alternates between the estimation of labels and solving the regression problems with the estimated labels. Empirically, it is observed that, for a large variety of non-convex problems including mixed linear regression, AM converges at a much faster rate compared to gradient based algorithms. However, the existing theory suggests similar rate of convergence for AM and gradient based methods, failing to capture this empirical behavior. In this paper, we close this gap between theory and practice for the special case of a mixture of $2$ linear regressions. We show that, provided initialized properly, AM enjoys a \emph{super-linear} rate of convergence in certain parameter regimes. To the best of our knowledge, this is the first work that theoretically establishes such rate for AM. Hence, if we want to recover the unknown regressors upto an error (in $\ell_2$ norm) of $\epsilon$, AM only takes $\mathcal{O}(\log \log (1/\epsilon))$ iterations. Furthermore, we compare AM with a gradient based heuristic algorithm empirically and show that AM dominates in iteration complexity as well as wall-clock time.
Moment-Based Domain Adaptation: Learning Bounds and Algorithms
This thesis contributes to the mathematical foundation of domain adaptation as emerging field in machine learning. In contrast to classical statistical learning, the framework of domain adaptation takes into account deviations between probability distributions in the training and application setting. Domain adaptation applies for a wider range of applications as future samples often follow a distribution that differs from the ones of the training samples. A decisive point is the generality of the assumptions about the similarity of the distributions. Therefore, in this thesis we study domain adaptation problems under as weak similarity assumptions as can be modelled by finitely many moments.
Development of a Machine Learning Model and Mobile Application to Aid in Predicting Dosage of Vitamin K Antagonists Among Indian Patients
M, Amruthlal, S, Devika, A, Ameer Suhail P, Menon, Aravind K, Krishnan, Vignesh, Thomas, Alan, Thomas, Manu, G, Sanjay, R, Lakshmi Kanth L, Jose, Jimmy, S, Harikrishnan
Patients who undergo mechanical heart valve replacements or have conditions like Atrial Fibrillation have to take Vitamin K Antagonists (VKA) drugs to prevent coagulation of blood. These drugs have narrow therapeutic range and need to be very closely monitored due to life threatening side effects. The dosage of VKA drug is determined and revised by a physician based on Prothrombin Time - International Normalised Ratio (PT-INR) value obtained through a blood test. Our work aimed at predicting the maintenance dosage of warfarin, the present most widely recommended anticoagulant drug, using the de-identified medical data collected from 109 patients from Kerala. A Support Vector Machine (SVM) Regression model was built to predict the maintenance dosage of warfarin, for patients who have been undergoing treatment from a physician and have reached stable INR values between 2.0 and 4.0.
Safe Screening Rules for $\ell_0$-Regression
Atamtürk, Alper, Gómez, Andrés
We give safe screening rules to eliminate variables from regression with $\ell_0$ regularization or cardinality constraint. These rules are based on guarantees that a feature may or may not be selected in an optimal solution. The screening rules can be computed from a convex relaxation solution in linear time, without solving the $\ell_0$ optimization problem. Thus, they can be used in a preprocessing step to safely remove variables from consideration apriori. Numerical experiments on real and synthetic data indicate that, on average, 76\% of the variables can be fixed to their optimal values, hence, reducing the computational burden for optimization substantially. Therefore, the proposed fast and effective screening rules extend the scope of algorithms for $\ell_0$-regression to larger data sets.