Country
A Financial Service Chatbot based on Deep Bidirectional Transformers
Yu, Shi, Chen, Yuxin, Zaidi, Hussain
We develop a chatbot using Deep Bidirectional Transformer models (BERT) to handle client questions in financial investment customer service. The bot can recognize 381 intents, and decides when to say "I don't know" and escalates irrelevant/uncertain questions to human operators. Our main novel contribution is the discussion about uncertainty measure for BERT, where three different approaches are systematically compared on real problems. We investigated two uncertainty metrics, information entropy and variance of dropout sampling in BERT, followed by mixed-integer programming to optimize decision thresholds. Another novel contribution is the usage of BERT as a language model in automatic spelling correction. Inputs with accidental spelling errors can significantly decrease intent classification performance. The proposed approach combines probabilities from masked language model and word edit distances to find the best corrections for misspelled words. The chatbot and the entire conversational AI system are developed using open-source tools, and deployed within our company's intranet. The proposed approach can be useful for industries seeking similar in-house solutions in their specific business domains. We share all our code and a sample chatbot built on a public dataset on Github.
Fully Convolutional Neural Networks for Raw Eye Tracking Data Segmentation, Generation, and Reconstruction
In this paper, we use fully convolutional neural networks for the semantic segmentation of eye tracking data. We also use these networks for reconstruction, and in conjunction with a variational auto-encoder to generate eye movement data. The first improvement of our approach is that no input window is necessary, due to the use of fully convolutional networks and therefore any input size can be processed directly. The second improvement is that the used and generated data is raw eye tracking data (position X, Y and time) without preprocessing. This is achieved by pre-initializing the filters in the first layer and by building the input tensor along the z axis. We evaluated our approach on three publicly available datasets and compare the results to the state of the art.
Empirical Study on Airline Delay Analysis and Prediction
Patgiri, Ripon, Hussain, Sajid, Nongmeikapam, Aditya
The Big Data analytics are a logical analysis of very large scale datasets. The data analysis enhances an organization and improve the decision making process. In this article, we present Airline Delay Analysis and Prediction to analyze airline datasets with the combination of weather dataset. In this research work, we consider various attributes to analyze flight delay, for example, day-wise, airline-wise, cloud cover, temperature, etc. Moreover, we present rigorous experiments on various machine learning model to predict correctly the delay of a flight, namely, logistic regression with L2 regularization, Gaussian Naive Bayes, K-Nearest Neighbors, Decision Tree classifier and Random forest model. The accuracy of the Random Forest model is 82% with a delay threshold of 15 minutes of flight delay. The analysis is carried out using dataset from 1987 to 2008, the training is conducted with dataset from 2000 to 2007 and validated prediction result using 2008 data. Moreover, we have got recall 99% in the Random Forest model.
Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding
Yang, Yibo, Bamler, Robert, Mandt, Stephan
Deep Bayesian latent variable models have enabled new approaches to both model and data compression. Here, we propose a new algorithm for compressing latent representations in deep probabilistic models, such as variational autoencoders, in post-processing. The approach thus separates model design and training from the compression task. Our algorithm generalizes arithmetic coding to the continuous domain, using adaptive discretization accuracy that exploits estimates of posterior uncertainty. A consequence of the "plug and play" nature of our approach is that various rate-distortion trade-offs can be achieved with a single trained model, eliminating the need to train multiple models for different bit rates. Our experimental results demonstrate the importance of taking into account posterior uncertainties, and show that image compression with the proposed algorithm outperforms JPEG over a wide range of bit rates using only a single machine learning model. Further experiments on Bayesian neural word embeddings demonstrate the versatility of the proposed method.
Wireless Power Control via Counterfactual Optimization of Graph Neural Networks
Naderializadeh, Navid, Eisen, Mark, Ribeiro, Alejandro
We consider the problem of downlink power control in wireless networks, consisting of multiple transmitter-receiver pairs communicating with each other over a single shared wireless medium. To mitigate the interference among concurrent transmissions, we leverage the network topology to create a graph neural network architecture, and we then use an unsupervised primal-dual counterfactual optimization approach to learn optimal power allocation decisions. We show how the counterfactual optimization technique allows us to guarantee a minimum rate constraint, which adapts to the network size, hence achieving the right balance between average and $5^{th}$ percentile user rates throughout a range of network configurations.
Distributional Sliced-Wasserstein and Applications to Generative Modeling
Nguyen, Khai, Ho, Nhat, Pham, Tung, Bui, Hung
Sliced-Wasserstein distance (SWD) and its variation, Max Sliced-Wasserstein distance (Max-SWD), have been widely used in the recent years due to their fast computation and scalability when the probability measures lie in very high dimension. However, these distances still have their weakness, SWD requires a lot of projection samples because it uses the uniform distribution to sample projecting directions, Max-SWD uses only one projection, causing it to lose a large amount of information. In this paper, we propose a novel distance that finds optimal penalized probability measure over the slices, which is named Distributional Sliced-Wasserstein distance (DSWD). We show that the DSWD is a generalization of both SWD and Max-SWD, and the proposed distance could be found by searching for the push-forward measure over a set of measures satisfying some certain constraints. Moreover, similar to SWD, we can extend Generalized Sliced-Wasserstein distance (GSWD) to Distributional Generalized Sliced-Wasserstein distance (DGSWD). Finally, we carry out extensive experiments to demonstrate the favorable generative modeling performances of our distances over the previous sliced-based distances in large-scale real datasets.
Correlation-aware Deep Generative Model for Unsupervised Anomaly Detection
Fan, Haoyi, Zhang, Fengbin, Wang, Ruidong, Xi, Liang, Zuoyong, null, Li, null
Unsupervised anomaly detection aims to identify anomalous samples from highly complex and unstructured data, which is pervasive in both fundamental research and industrial applications. However, most existing methods neglect the complex correlation among data samples, which is important for capturing normal patterns from which the abnormal ones deviate. In this paper, we propose a method of Correlation aware unsupervised Anomaly detection via Deep Gaussian Mixture Model (CADGMM), which captures the complex correlation among data points for high-quality low-dimensional representation learning. More specifically, the relations among data samples are correlated firstly in forms of a graph structure, in which, the node denotes the sample and the edge denotes the correlation between two samples from the feature space. Then, a dual-encoder that consists of a graph encoder and a feature encoder, is employed to encode both the feature and correlation information of samples into the low-dimensional latent space jointly, followed by a decoder for data reconstruction. Finally, a separate estimation network as a Gaussian Mixture Model is utilized to estimate the density of the learned latent vector, and the anomalies can be detected by measuring the energy of the samples. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method.
Adaptive Region-Based Active Learning
Cortes, Corinna, DeSalvo, Giulia, Gentile, Claudio, Mohri, Mehryar, Zhang, Ningshan
We present a new active learning algorithm that adaptively partitions the input space into a finite number of regions, and subsequently seeks a distinct predictor for each region, both phases actively requesting labels. We prove theoretical guarantees for both the generalization error and the label complexity of our algorithm, and analyze the number of regions defined by the algorithm under some mild assumptions. We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and non-adaptive region-based active learning baselines.
A Distributionally Robust Area Under Curve Maximization Model
Area under ROC curve (AUC) is a widely used performance measure for classification models. We propose a new distributionally robust AUC maximization model (DR-AUC) that relies on the Kantorovich metric and approximates the AUC with the hinge loss function. We use duality theory to reformulate the DR-AUC model as a tractable convex quadratic optimization problem. The numerical experiments show that the proposed DR-AUC model -- benchmarked with the standard deterministic AUC and the support vector machine models - improves the out-of-sample performance over the majority of the considered datasets. The results are particularly encouraging since our numerical experiments are conducted with training sets of small size which have been known to be conducive to low out-of-sample performance.
On the Matrix-Free Generation of Adversarial Perturbations for Black-Box Attacks
Shibata, Hisaichi, Hanaoka, Shouhei, Nomura, Yukihiro, Hayashi, Naoto, Abe, Osamu
In general, adversarial perturbations superimposed on inputs are realistic threats for a deep neural network (DNN). In this paper, we propose a practical generation method of such adversarial perturbation to be applied to black-box attacks that demand access to an input-output relationship only. Thus, the attackers generate such perturbation without invoking inner functions and/or accessing the inner states of a DNN. Unlike the earlier studies, the algorithm to generate the perturbation presented in this study requires much fewer query trials. Moreover, to show the effectiveness of the adversarial perturbation extracted, we experiment with a DNN for semantic segmentation. The result shows that the network is easily deceived with the perturbation generated than using uniformly distributed random noise with the same magnitude.