Country
FedDANE: A Federated Newton-Type Method
Li, Tian, Sahu, Anit Kumar, Zaheer, Manzil, Sanjabi, Maziar, Talwalkar, Ameet, Smith, Virginia
Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when learning over both convex and non-convex functions. Despite encouraging theoretical results, we find that the method has underwhelming performance empirically. In particular, through empirical simulations on both synthetic and real-world datasets, FedDANE consistently underperforms baselines of FedAvg and FedProx in realistic federated settings. We identify low device participation and statistical device heterogeneity as two underlying causes of this underwhelming performance, and conclude by suggesting several directions of future work.
Regularization via Structural Label Smoothing
Li, Weizhi, Dasarathy, Gautam, Berisha, Visar
Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network by softening the ground-truth labels in the training data in an attempt to penalize overconfident outputs. Existing approaches typically use cross-validation to impose this smoothing, which is uniform across all training data. In this paper, we show that such label smoothing imposes a quantifiable bias in the Bayes error rate of the training data, with regions of the feature space with high overlap and low marginal likelihood having a lower bias and regions of low overlap and high marginal likelihood having a higher bias. These theoretical results motivate a simple objective function for data-dependent smoothing to mitigate the potential negative consequences of the operation while maintaining its desirable properties as a regularizer. We call this approach Structural Label Smoothing (SLS). We implement SLS and empirically validate on synthetic, Higgs, SVHN, CIFAR-10, and CIFAR-100 datasets. The results confirm our theoretical insights and demonstrate the effectiveness of the proposed method in comparison to traditional label smoothing.
Causal Mosaic: Cause-Effect Inference via Nonlinear ICA and Ensemble Method
We address the problem of distinguishing cause from effect in bivariate setting. Based on recent developments in nonlinear independent component analysis (ICA), we train nonparametrically general nonlinear causal models that allow non-additive noise. Further, we build an ensemble framework, namely Causal Mosaic, which models a causal pair by a mixture of nonlinear models. We compare this method with other recent methods on artificial and real world benchmark datasets, and our method shows state-of-the-art performance.
A System for Real-Time Interactive Analysis of Deep Learning Training
Shah, Shital, Fernandez, Roland, Drucker, Steven
Performing diagnosis or exploratory analysis during the training of deep learning models is challenging but often necessary for making a sequence of decisions guided by the incremental observations. Currently available systems for this purpose are limited to monitoring only the logged data that must be specified before the training process starts. Each time a new information is desired, a cycle of stop-change-restart is required in the training process. These limitations make interactive exploration and diagnosis tasks difficult, imposing long tedious iterations during the model development. We present a new system that enables users to perform interactive queries on live processes generating real-time information that can be rendered in multiple formats on multiple surfaces in the form of several desired visualizations simultaneously. To achieve this, we model various exploratory inspection and diagnostic tasks for deep learning training processes as specifications for streams using a map-reduce paradigm with which many data scientists are already familiar. Our design achieves generality and extensibility by defining composable primitives which is a fundamentally different approach than is used by currently available systems. The open source implementation of our system is available as TensorWatch project at https://github.com/microsoft/tensorwatch.
Feature-Robustness, Flatness and Generalization Error for Deep Neural Networks
Petzka, Henning, Adilova, Linara, Kamp, Michael, Sminchisescu, Cristian
The performance of deep neural networks is often attributed to their automated, task-related feature construction. It remains an open question, though, why this leads to solutions with good generalization, even in cases where the number of parameters is larger than the number of samples. Back in the 90s, Hochreiter and Schmidhuber observed that flatness of the loss surface around a local minimum correlates with low generalization error. For several flatness measures, this correlation has been empirically validated. However, it has recently been shown that existing measures of flatness cannot theoretically be related to generalization: if a network uses ReLU activations, the network function can be reparameterized without changing its output in such a way that flatness is changed almost arbitrarily. This paper proposes a natural modification of existing flatness measures that results in invariance to reparameterization. The proposed measures imply a robustness of the network to changes in the input and the hidden layers. Connecting this feature robustness to generalization leads to a generalized definition of the representativeness of data. With this, the generalization error of a model trained on representative data can be bounded by its feature robustness which depends on our novel flatness measure.
A Unified Conversational Assistant Framework for Business Process Automation
Rizk, Yara, Bhandwalder, Abhishek, Boag, Scott, Chakraborti, Tathagata, Isahagian, Vatche, Khazaeni, Yasaman, Pollock, Falk, Unuvar, Merve
Business process automation is a booming multi-billion-dollar industry that promises to remove menial tasks from workers' plates -- through the introduction of autonomous agents -- and free up their time and brain power for more creative and engaging tasks. However, an essential component to the successful deployment of such autonomous agents is the ability of business users to monitor their performance and customize their execution. A simple and user-friendly interface with a low learning curve is necessary to increase the adoption of such agents in banking, insurance, retail and other domains. As a result, proactive chatbots will play a crucial role in the business automation space. Not only can they respond to users' queries and perform actions on their behalf but also initiate communication with the users to inform them of the system's behavior. This will provide business users a natural language interface to interact with, monitor and control autonomous agents. In this work, we present a multi-agent orchestration framework to develop such proactive chatbots by discussing the types of skills that can be composed into agents and how to orchestrate these agents. Two use cases on a travel preapproval business process and a loan application business process are adopted to qualitatively analyze the proposed framework based on four criteria: performance, coding overhead, scalability, and agent overlap.
High-Level Plan for Behavioral Robot Navigation with Natural Language Directions and R-NET
Shrestha, Amar, Pugdeethosapol, Krittaphat, Fang, Haowen, Qiu, Qinru
When the navigational environment is known, it can be represented as a graph where landmarks are nodes, the robot behaviors that move from node to node are edges, and the route is a set of behavioral instructions. The route path from source to destination can be viewed as a class of combinatorial optimization problems where the path is a sequential subset from a set of discrete items. The pointer network is an attention-based recurrent network that is suitable for such a task. In this paper, we utilize a modified R-NET with gated attention and self-matching attention translating natural language instructions to a high-level plan for behavioral robot navigation by developing an understanding of the behavioral navigational graph to enable the pointer network to produce a sequence of behaviors representing the path. Tests on the navigation graph dataset show that our model outperforms the state-of-the-art approach for both known and unknown environments.
Emo-CNN for Perceiving Stress from Audio Signals: A Brain Chemistry Approach
Deshmukh, Anup Anand, Soladie, Catherine, Seguier, Renaud
Emotion plays a key role in many applications like healthcare, to gather patients emotional behavior. There are certain emotions which are given more importance due to their effectiveness in understanding human feelings. In this paper, we propose an approach that models human stress from audio signals. The research challenge in speech emotion detection is defining the very meaning of stress and being able to categorize it in a precise manner. Supervised Machine Learning models, including state of the art Deep Learning classification methods, rely on the availability of clean and labelled data. One of the problems in affective computation and emotion detection is the limited amount of annotated data of stress. The existing labelled stress emotion datasets are highly subjective to the perception of the annotator. We address the first issue of feature selection by exploiting the use of traditional MFCC features in Convolutional Neural Network. Our experiments show that Emo-CNN consistently and significantly outperforms the popular existing methods over multiple datasets. It achieves 90.2% categorical accuracy on the Emo-DB dataset. To tackle the second and the more significant problem of subjectivity in stress labels, we use Lovheim's cube, which is a 3-dimensional projection of emotions. The cube aims at explaining the relationship between these neurotransmitters and the positions of emotions in 3D space. The learnt emotion representations from the Emo-CNN are mapped to the cube using three component PCA (Principal Component Analysis) which is then used to model human stress. This proposed approach not only circumvents the need for labelled stress data but also complies with the psychological theory of emotions given by Lovheim's cube. We believe that this work is the first step towards creating a connection between Artificial Intelligence and the chemistry of human emotions.
The Past and Present of Imitation Learning: A Citation Chain Study
I NTRODUCTION Imitation Learning is a promising area of active research. Early research in'programming by example' began in Software Development [9] before attracting the interest of Robotics and Artificial Intelligence (AI) researchers, who began using the terms'Learning from Demonstration' and'Imitation Learning' to describe their line of work. Over the last 30 years, Imitation Learning has advanced significantly and been used to solve difficult tasks ranging from Autonomous Driving [12] to playing Atari games [5]. In the course of this development, different methods for performing Imitation Learning have fallen into and out of favor. In this paper, I will explore the development of these different methods and attempt to examine how the field has progressed. I will be discussing 4 landmark papers that sequentially cite and inform each other.
Multipurpose Intelligent Process Automation via Conversational Assistant
Moiseeva, Alena, Trautmann, Dietrich, Schütze, Hinrich
Intelligent Process Automation (IPA) is an emerging technology with a primary goal to assist the knowledge worker by taking care of repetitive, routine and low-cognitive tasks. Conversational agents that can interact with users in a natural language are potential application for IPA systems. Such intelligent agents can assist the user by answering specific questions and executing routine tasks that are ordinarily performed in a natural language (i.e., customer support). In this work, we tackle a challenge of implementing an IPA conversational assistant in a real-world industrial setting with a lack of structured training data. Our proposed system brings two significant benefits: First, it reduces repetitive and time-consuming activities and, therefore, allows workers to focus on more intelligent processes. Second, by interacting with users, it augments the resources with structured and to some extent labeled training data. We showcase the usage of the latter by re-implementing several components of our system with Transfer Learning (TL) methods.