Country
Prob2Vec: Mathematical Semantic Embedding for Problem Retrieval in Adaptive Tutoring
Su, Du, Yekkehkhany, Ali, Lu, Yi, Lu, Wenmiao
We propose a new application of embedding techniques for problem retrieval in adaptive tutoring. The objective is to retrieve problems whose mathematical concepts are similar. There are two challenges: First, like sentences, problems helpful to tutoring are never exactly the same in terms of the underlying concepts. Instead, good problems mix concepts in innovative ways, while still displaying continuity in their relationships. Second, it is difficult for humans to determine a similarity score that is consistent across a large enough training set. We propose a hierarchical problem embedding algorithm, called Prob2Vec, that consists of abstraction and embedding steps. Prob2Vec achieves 96.88\% accuracy on a problem similarity test, in contrast to 75\% from directly applying state-of-the-art sentence embedding methods. It is interesting that Prob2Vec is able to distinguish very fine-grained differences among problems, an ability humans need time and effort to acquire. In addition, the sub-problem of concept labeling with imbalanced training data set is interesting in its own right. It is a multi-label problem suffering from dimensionality explosion, which we propose ways to ameliorate. We propose the novel negative pre-training algorithm that dramatically reduces false negative and positive ratios for classification, using an imbalanced training data set.
FTT-NAS: Discovering Fault-Tolerant Neural Architecture
Ning, Xuefei, Ge, Guangjun, Li, Wenshuo, Zhu, Zhenhua, Zheng, Yin, Chen, Xiaoming, Gao, Zhen, Wang, Yu, Yang, Huazhong
With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from the cloud to the edge. When deploying neural networks (NNs) onto the devices under complex environments, there are various types of possible faults: soft errors caused by cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, and malicious attackers. Thus the safety risk of deploying NNs is now drawing much attention. In this paper, after the analysis of the possible faults in various types of NN accelerators, we formalize and implement various fault models from the algorithmic perspective. We propose Fault-Tolerant Neural Architecture Search (FT-NAS) to automatically discover convolutional neural network (CNN) architectures that are reliable to various faults in nowadays devices. Then we incorporate fault-tolerant training (FTT) in the search process to achieve better results, which is referred to as FTT-NAS. Experiments on CIFAR-10 show that the discovered architectures outperform other manually designed baseline architectures significantly, with comparable or fewer floating-point operations (FLOPs) and parameters. Specifically, with the same fault settings, F-FTT-Net discovered under the feature fault model achieves an accuracy of 86.2% (VS. 68.1% achieved by MobileNet-V2), and W-FTT-Net discovered under the weight fault model achieves an accuracy of 69.6% (VS. 60.8% achieved by ResNet-20). By inspecting the discovered architectures, we find that the operation primitives, the weight quantization range, the capacity of the model, and the connection pattern have influences on the fault resilience capability of NN models.
BusTime: Which is the Right Prediction Model for My Bus Arrival Time?
Liu, Dairui, Sun, Jingxiang, Wang, Shen
With the rise of big data technologies, many smart transportation applications have been rapidly developed in recent years including bus arrival time predictions. This type of applications help passengers to plan trips more efficiently without wasting unpredictable amount of waiting time at bus stops. Many studies focus on improving the prediction accuracy of various machine learning and statistical models, while much less work demonstrate their applicability of being deployed and used in realistic urban settings. This paper tries to fill this gap by proposing a general and practical evaluation framework for analysing various widely used prediction models (i.e. delay, k-nearest-neighbour, kernel regression, additive model, and recurrent neural network using long short term memory) for bus arrival time. In particular, this framework contains a raw bus GPS data pre-processing method that needs much less number of input data points while still maintain satisfactory prediction results. This pre-processing method enables various models to predict arrival time at bus stops only, by using a KD-tree based nearest point search method. Based on this framework, using raw bus GPS dataset in different scales from the city of Dublin, Ireland, we also present preliminary results for city managers by analysing the practical strengths and weaknesses in both training and predicting stages of commonly used prediction models.
Black-box Methods for Restoring Monotonicity
Gergatsouli, Evangelia, Lucier, Brendan, Tzamos, Christos
In many practical applications, heuristic or approximation algorithms are used to efficiently solve the task at hand. However their solutions frequently do not satisfy natural monotonicity properties of optimal solutions. In this work we develop algorithms that are able to restore monotonicity in the parameters of interest. Specifically, given oracle access to a (possibly non-monotone) multi-dimensional real-valued function $f$, we provide an algorithm that restores monotonicity while degrading the expected value of the function by at most $\varepsilon$. The number of queries required is at most logarithmic in $1/\varepsilon$ and exponential in the number of parameters. We also give a lower bound showing that this exponential dependence is necessary. Finally, we obtain improved query complexity bounds for restoring the weaker property of $k$-marginal monotonicity. Under this property, every $k$-dimensional projection of the function $f$ is required to be monotone. The query complexity we obtain only scales exponentially with $k$.
Predicting Real-Time Locational Marginal Prices: A GAN-Based Video Prediction Approach
In this paper, we propose an unsupervised data-driven approach to predict real-time locational marginal prices (RTLMPs). The proposed approach is built upon a general data structure for organizing system-wide heterogeneous market data streams into the format of market data images and videos. Leveraging this general data structure, the system-wide RTLMP prediction problem is formulated as a video prediction problem. A video prediction model based on generative adversarial networks (GAN) is proposed to learn the spatio-temporal correlations among historical RTLMPs and predict system-wide RTLMPs for the next hour. An autoregressive moving average (ARMA) calibration method is adopted to improve the prediction accuracy. The proposed RTLMP prediction method takes public market data as inputs, without requiring any confidential information on system topology, model parameters, or market operating details. Case studies using public market data from ISO New England (ISO-NE) and Southwest Power Pool (SPP) demonstrate that the proposed method is able to learn spatio-temporal correlations among RTLMPs and perform accurate RTLMP prediction.
Event-Based Control for Online Training of Neural Networks
Zhao, Zilong, Cerf, Sophie, Robu, Bogdan, Marchand, Nicolas
Convolutional Neural Network (CNN) has become the most used method for image classification tasks. During its training the learning rate and the gradient are two key factors to tune for influencing the convergence speed of the model. Usual learning rate strategies are time-based i.e. monotonous decay over time. Recent state-of-the-art techniques focus on adaptive gradient algorithms i.e. Adam and its versions. In this paper we consider an online learning scenario and we propose two Event-Based control loops to adjust the learning rate of a classical algorithm E (Exponential)/PD (Proportional Derivative)-Control. The first Event-Based control loop will be implemented to prevent sudden drop of the learning rate when the model is approaching the optimum. The second Event-Based control loop will decide, based on the learning speed, when to switch to the next data batch. Experimental evaluationis provided using two state-of-the-art machine learning image datasets (CIFAR-10 and CIFAR-100). Results show the Event-Based E/PD is better than the original algorithm (higher final accuracy, lower final loss value), and the Double-Event-BasedE/PD can accelerate the training process, save up to 67% training time compared to state-of-the-art algorithms and even result in better performance.
Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks
Zheng, Liyuan, Shi, Yuanyuan, Ratliff, Lillian J., Zhang, Baosen
This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints. Despite its success in many domains, reinforcement learning is challenging to apply to problems with hard constraints, especially if both the state variables and actions are constrained. Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding a projection step to a learned policy. Yet, this approach requires solving an optimization problem at every policy execution step, which can lead to significant computational costs. To tackle this problem, this paper proposes a new approach, termed Vertex Networks (VNs), with guarantees on safety during exploration and on learned control policies by incorporating the safety constraints into the policy network architecture. Leveraging the geometric property that all points within a convex set can be represented as the convex combination of its vertices, the proposed algorithm first learns the convex combination weights and then uses these weights along with the pre-calculated vertices to output an action. The output action is guaranteed to be safe by construction. Numerical examples illustrate that the proposed VN algorithm outperforms vanilla reinforcement learning in a variety of benchmark control tasks.
Locally Interpretable Predictions of Parkinson's Disease Progression
Li, Qiaomei, Cummings, Rachel, Mintz, Yonatan
In precision medicine, machine learning techniques have been commonly proposed to aid physicians in early screening of chronic diseases. Many of these diseases become more difficult to treat as they progress, so accurate early screening is critical to ensure resources are directed towards the most effective treatment plan [Pagan, 2012]. Since the final treatment decision must inevitably be made by a doctor, these screening procedures should be interpretable such that a clinician can explain the decision-making process to patients for informed consent. However, the types of models that achieve the highest level of accuracy given early screening data tend to be extremely complex, meaning that even machine learning experts have difficulties explaining why certain predictions are made, leading many to describe them as "black box" [Breiman, 2001]. In this paper, we bridge this gap by providing a novel approach for explaining black box model predictions which can give high fidelity explanations with lower model complexity. In particular we will focus on early screening of Parkinson's Disease (PD). PD is a complicated neurodegenerative disorder that affects the central nervous system and specifically the motor control of individuals [mjf, 2019]. This disorder is estimated to affect 930,000 individuals in the US by 2020, and is more prevalent in the geriatric population affecting more then 1% of the population over the age of 60 and 5% of the population over age 85 [Findley, 2007, Kowal et al., 2013, Rossi et al., 2018]. These statistics and other recent studies on Parkinson's epidemiology indicate that as the population ages, the prevalence of PD is expected to grow to over 1.2 million by 2030 in the US alone, increasing the total economic burden of the disorder to approximately $26 billion USD [Kowal et al., 2013, Rossi et al., 2018].
Weighted Meta-Learning
Cai, Diana, Sheth, Rishit, Mackey, Lester, Fusi, Nicolo
Meta-learning leverages related source tasks to learn an initialization that can be quickly fine-tuned to a target task with limited labeled examples. However, many popular meta-learning algorithms, such as model-agnostic meta-learning (MAML), only assume access to the target samples for fine-tuning. In this work, we provide a general framework for meta-learning based on weighting the loss of different source tasks, where the weights are allowed to depend on the target samples. In this general setting, we provide upper bounds on the distance of the weighted empirical risk of the source tasks and expected target risk in terms of an integral probability metric (IPM) and Rademacher complexity, which apply to a number of meta-learning settings including MAML and a weighted MAML variant. We then develop a learning algorithm based on minimizing the error bound with respect to an empirical IPM, including a weighted MAML algorithm, $\alpha$-MAML. Finally, we demonstrate empirically on several regression problems that our weighted meta-learning algorithm is able to find better initializations than uniformly-weighted meta-learning algorithms, such as MAML.
Adversarial Robustness on In- and Out-Distribution Improves Explainability
Augustin, Maximilian, Meinke, Alexander, Hein, Matthias
Neural networks have led to major improvements in image classification but suffer from being non-robust to adversarial changes, unreliable uncertainty estimates on out-distribution samples and their inscrutable black-box decisions. In this work we propose RATIO, a training procedure for Robustness via Adversarial Training on In- and Out-distribution, which leads to robust models with reliable and robust confidence estimates on the out-distribution. RATIO has similar generative properties to adversarial training so that visual counterfactuals produce class specific features. While adversarial training comes at the price of lower clean accuracy, RATIO achieves state-of-the-art $l_2$-adversarial robustness on CIFAR10 and maintains better clean accuracy.