Bassamboo, Achal, Deep, Vikas, Juneja, Sandeep, Zeevi, Assaf

We consider the problem of designing an adaptive sequence of questions that optimally classify a candidate's ability into one of several categories or discriminative grades. A candidate's ability is modeled as an unknown parameter, which, together with the difficulty of the question asked, determines the likelihood with which s/he is able to answer a question correctly. The learning algorithm is only able to observe these noisy responses to its queries. We consider this problem from a fixed confidence-based $\delta$-correct framework, that in our setting seeks to arrive at the correct ability discrimination at the fastest possible rate while guaranteeing that the probability of error is less than a pre-specified and small $\delta$. In this setting we develop lower bounds on any sequential questioning strategy and develop geometrical insights into the problem structure both from primal and dual formulation. In addition, we arrive at algorithms that essentially match these lower bounds. Our key conclusions are that, asymptotically, any candidate needs to be asked questions at most at two (candidate ability-specific) levels, although, in a reasonably general framework, questions need to be asked only at a single level. Further, and interestingly, the problem structure facilitates endogenous exploration, so there is no need for a separately designed exploration stage in the algorithm.

Cai, Feiyang, Li, Jiani, Koutsoukos, Xenofon

Learning-enabled components (LECs) are widely used in cyber-physical systems (CPS) since they can handle the uncertainty and variability of the environment and increase the level of autonomy. However, it has been shown that LECs such as deep neural networks (DNN) are not robust and adversarial examples can cause the model to make a false prediction. The paper considers the problem of efficiently detecting adversarial examples in LECs used for regression in CPS. The proposed approach is based on inductive conformal prediction and uses a regression model based on variational autoencoder. The architecture allows to take into consideration both the input and the neural network prediction for detecting adversarial, and more generally, out-of-distribution examples. We demonstrate the method using an advanced emergency braking system implemented in an open source simulator for self-driving cars where a DNN is used to estimate the distance to an obstacle. The simulation results show that the method can effectively detect adversarial examples with a short detection delay.

Chaturvedi, Anamay, Scarlett, Jonathan

Graphical model selection in Markov random fields is a fundamental problem in statistics and machine learning. Two particularly prominent models, the Ising model and Gaussian model, have largely developed in parallel using different (though often related) techniques, and several practical algorithms with rigorous sample complexity bounds have been established for each. In this paper, we adapt a recently proposed algorithm of Klivans and Meka (FOCS, 2017), based on the method of multiplicative weight updates, from the Ising model to the Gaussian model, via non-trivial modifications to both the algorithm and its analysis. The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature, has a low runtime $O(mp^2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.

Harvey, Nicholas J. A., Liaw, Christopher, Perkins, Edwin, Randhawa, Sikander

The multiplicative weights method is an algorithm for the problem of prediction with expert advice. It achieves the minimax regret asymptotically if the number of experts is large, and the time horizon is known in advance. Optimal algorithms are also known if there are exactly two or three experts, and the time horizon is known in advance. In the anytime setting, where the time horizon is not known in advance, algorithms can be obtained by the doubling trick, but they are not optimal, let alone practical. No minimax optimal algorithm was previously known in the anytime setting, regardless of the number of experts. We design the first minimax optimal algorithm for minimizing regret in the anytime setting. We consider the case of two experts, and prove that the optimal regret is $\gamma \sqrt{t} / 2$ at all time steps $t$, where $\gamma$ is a natural constant that arose 35 years ago in studying fundamental properties of Brownian motion. The algorithm is designed by considering a continuous analogue, which is solved using ideas from stochastic calculus.

In recent work Long and Servedio LS05short presented a martingale boosting'' algorithm that works by constructing a branching program over weak classifiers and has a simple analysis based on elementary properties of random walks. LS05short showed that this martingale booster can tolerate random classification noise when it is run with a noise-tolerant weak learner; however, a drawback of the algorithm is that it is not adaptive, i.e. it cannot effectively take advantage of variation in the quality of the weak classifiers it receives. In this paper we present a variant of the original martingale boosting algorithm and prove that it is adaptive. This adaptiveness is achieved by modifying the original algorithm so that the random walks that arise in its analysis have different step size depending on the quality of the weak learner at each stage. The new algorithm inherits the desirable properties of the original LS05short algorithm, such as random classification noise tolerance, and has several other advantages besides adaptiveness: it requires polynomially fewer calls to the weak learner than the original algorithm, and it can be used with confidence-rated weak hypotheses that output real values rather than Boolean predictions.

Bashchenko, Oksana, Marchal, Alexis

We develop a methodology for detecting asset bubbles using a neural network. We rely on the theory of local martingales in continuous-time and use a deep network to estimate the diffusion coefficient of the price process more accurately than the current estimator, obtaining an improved detection of bubbles. We show the outperformance of our algorithm over the existing statistical method in a laboratory created with simulated data. We then apply the network classification to real data and build a zero net exposure trading strategy that exploits the risky arbitrage emanating from the presence of bubbles in the US equity market from 2006 to 2008. The profitability of the strategy provides an estimation of the economical magnitude of bubbles as well as support for the theoretical assumptions relied on.

Garcelon, Evrard, Ghavamzadeh, Mohammad, Lazaric, Alessandro, Pirotta, Matteo

In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a well-tested and reliable baseline policy running in production (e.g., a recommender system). Nonetheless, the baseline policy is often suboptimal. In this case, it is desirable to deploy online learning algorithms (e.g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself. In this paper, we study the conservative learning problem in the contextual linear bandit setting and introduce a novel algorithm, the Conservative Constrained LinUCB (CLUCB2). We derive regret bounds for CLUCB2 that match existing results and empirically show that it outperforms state-of-the-art conservative bandit algorithms in a number of synthetic and real-world problems. Finally, we consider a more realistic constraint where the performance is verified only at predefined checkpoints (instead of at every step) and show how this relaxed constraint favorably impacts the regret and empirical performance of CLUCB2.

Cai, Feiyang, Koutsoukos, Xenofon

Xenofon Koutsoukos V anderbilt University Nashville, TN xenofon.koutsoukos@vanderbilt.edu Abstract --Cyber-physical systems (CPS) greatly benefit by using machine learning components that can handle the uncertainty and variability of the real-world. Typical components such as deep neural networks, however, introduce new types of hazards that may impact system safety. The system behavior depends on data that are available only during runtime and may be different than the data used for training. Out-of-distribution data may lead to a large error and compromise safety. The paper considers the problem of efficiently detecting out-of-distribution data in CPS control systems. Detection must be robust and limit the number of false alarms while being computational efficient for real-time monitoring. The proposed approach leverages inductive confor-mal prediction and anomaly detection for developing a method that has a well-calibrated false alarm rate. We use variational autoencoders and deep support vector data description to learn models that can be used efficiently compute the nonconformity of new inputs relative to the training set and enable real-time detection of out-of-distribution high-dimensional inputs. We demonstrate the method using an advanced emergency braking system and a self-driving end-to-end controller implemented in an open source simulator for self-driving cars. The simulation results show very small number of false positives and detection delay while the execution time is comparable to the execution time of the original machine learning components. I NTRODUCTION Learning-enabled components (LECs) such as neural networks are used in many classes of cyber-physical systems (CPS). Semi-autonomous and autonomous vehicles, in particular, are CPS examples where LECs can play a significant role for perception, planning, and control if they are complemented with methods for analyzing and ensuring safety [1], [2]. However, there are several characteristics of LECs that can complicate safety analysis. LECs encode knowledge in a form that is not transparent.

A standard assumption in machine learning has been the assumption that the data are generated in the IID fashion, i.e., independently from the same distribution. This assumption is also known as the assumption of randomness (see, e.g., [11, Section 7.1] and [27]). In this paper we are interested in testing this assumption. Conformal martingales are constructed on top of conventional machinelearning algorithms and have been used as a means of detecting deviations from randomness both in theoretical work (see, e.g., [27, Section 7.1], [4], [3]) and in practice (in the framework of the Microsoft Azure module on time series anomaly detection [28]). They provide an online measure of the amount of evidence found against the hypothesis of randomness and can be said to perform conformal change detection: if the assumption of randomness is satisfied, a fixed nonnegative conformal martingale with a positive initial value is not expected to increase its initial value manyfold; on the other hand, if the hypothesis of randomness is violated, a properly designed nonnegative conformal martingale with a positive initial value can be expected to increase its value substantially. Correspondingly, we have two desiderata for such a martingale S: - Validity is satisfied automatically: S is not expected to ever increase its initial value by much, under the hypothesis of randomness.

Fernandez, Tamara, Rivera, Nicolas

Weighted log-rank tests are arguably the most widely used tests by practitioners for the two-sample problem in the context of right-censored data. Many approaches have been considered to make weighted log-rank tests more robust against a broader family of alternatives, among them: considering linear combinations of weighted log-rank tests or taking the maximum between a finite collection of them. In this paper, we propose as test-statistic the supremum of a collection of (potentially infinite) weight-indexed log-rank tests where the index space is the unit ball of a reproducing kernel Hilbert space (RKHS). By using the good properties of the RKHS's we provide an exact and simple evaluation of the test-statistic and establish relationships between previous tests in the literature. Additionally, we show that for a special family of RKHS's, the proposed test is omnibus. We finalise by performing an empirical evaluation of the proposed methodology and show an application to a real data scenario.