Goto

Collaborating Authors

 Undirected Networks


State Compression of Markov Processes via Empirical Low-Rank Estimation

arXiv.org Machine Learning

Dimension reduction is a central problem in system engineering and data science. In scientific studies or engineering applications, one often needs to interact with unknown complex systems about which many noisy observations of system characteristics and system trajectories are available. The exact structures and dynamics of the system are typically masked by massive observations of noisy variables, many of which might not be relevant to the physical state of the system. It is often unclear how to describe the "state" of a system, when one can only access noisy observations. One may view each unique observation as a single state, however, this would generate a huge-or even infinite-dimensional process which is difficult to model or analyze. Although there exists a vast body of literatures on time series analysis [18], they typically require knowledge of specific models and might perform poorly when the models are misspecified. Anru Zhang is Assistant Professor, Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, Email: anruzhang@stat.wisc.edu; Mengdi Wang is Assistant Professor, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, Email: mengdiw@princeton.edu.


Equivalence of restricted Boltzmann machines and tensor network states

arXiv.org Machine Learning

The restricted Boltzmann machine (RBM) is one of the fundamental building blocks of deep learning. RBM finds wide applications in dimensional reduction, feature extraction, and recommender systems via modeling the probability distributions of a variety of input data including natural images, speech signals, and customer ratings, etc. We build a bridge between RBM and tensor network states (TNS) widely used in quantum many-body physics research. We devise efficient algorithms to translate an RBM into the commonly used TNS. Conversely, we give sufficient and necessary conditions to determine whether a TNS can be transformed into an RBM of given architectures. Revealing these general and constructive connections can cross-fertilize both deep learning and quantum many-body physics. Notably, by exploiting the entanglement entropy bound of TNS, we can rigorously quantify the expressive power of RBM on complex data sets. Insights into TNS and its entanglement capacity can guide the design of more powerful deep learning architectures. On the other hand, RBM can represent quantum many-body states with fewer parameters compared to TNS, which may allow more efficient classical simulations.


Deep Rewiring: Training very sparse deep networks

arXiv.org Machine Learning

Neuromorphic hardware tends to pose limits on the connectivity of deep networks that one can run on them. But also generic hardware and software implementations of deep learning run more efficiently for sparse networks. Several methods exist for pruning connections of a neural network after it was trained without connectivity constraints. We present an algorithm, DEEP R, that enables us to train directly a sparsely connected neural network. DEEP R automatically rewires the network during supervised training so that connections are there where they are most needed for the task, while its total number is all the time strictly bounded. We demonstrate that DEEP R can be used to train very sparse feedforward and recurrent neural networks on standard benchmark tasks with just a minor loss in performance. DEEP R is based on a rigorous theoretical foundation that views rewiring as stochastic sampling of network configurations from a posterior.


Towards Shockingly Easy Structured Classification: A Search-based Probabilistic Online Learning Framework

arXiv.org Artificial Intelligence

There are two major approaches for structured classification. One is the probabilistic gradient-based methods such as conditional random fields (CRF), which has high accuracy but with drawbacks: slow training, and no support of search-based optimization (which is important in many cases). The other one is the search-based learning methods such as perceptrons and margin infused relaxed algorithm (MIRA), which have fast training but also with drawbacks: low accuracy, no probabilistic information, and non-convergence in real-world tasks. We propose a novel and "shockingly easy" solution, a search-based probabilistic online learning method, to address most of those issues. This method searches the output candidates, derives probabilities, and conduct efficient online learning. We show that this method is with fast training, support search-based optimization, very easy to implement, with top accuracy, with probabilities, and with theoretical guarantees of convergence. Experiments on well-known tasks show that our method has better accuracy than CRF and almost as fast training speed as perceptron and MIRA. Results also show that SAPO can easily beat the state-of-the-art systems on those highly-competitive tasks, achieving record-breaking accuracies. The codes can be found at https://github.com/lancopku


Regression-Based Machine Learning for Algorithmic Trading

@machinelearnbot

Finally, a comprehensive hands-on machine learning course with specific focus on regression based models for the investment community and any passionate investors. In the past few years, there has been a massive adoption and growth in the use of data science, artificial intelligence and machine learning to find alpha. However, information on and application of machine learning to investment are scarce. This course has been designed to address that. It is meant to spark your creative juices.


Microsoft AI Residency (Cambridge, UK) - Microsoft Research

#artificialintelligence

You will have the opportunity to work alongside prominent researchers and engineers in Cambridge, UK, gaining skills and hands-on experience working on practical AI and machine learning problems that help tackle some of society's toughest challenges. The ideal candidate will have a passion for leveraging their expertise from computer science work and/or other technical fields to solve real-world challenges by applying artificial intelligence (AI) and machine learning. To apply for the 2018 Microsoft AI Residency Program, complete the application and submit the following items in one PDF or .doc: For questions related to this program, please contact AIResidency@microsoft.com. See here for information on AI residencies available in Redmond, US.


DxNAT - Deep Neural Networks for Explaining Non-Recurring Traffic Congestion

arXiv.org Machine Learning

Non-recurring traffic congestion is caused by temporary disruptions, such as accidents, sports games, adverse weather, etc. We use data related to real-time traffic speed, jam factors (a traffic congestion indicator), and events collected over a year from Nashville, TN to train a multi-layered deep neural network. The traffic dataset contains over 900 million data records. The network is thereafter used to classify the real-time data and identify anomalous operations. Compared with traditional approaches of using statistical or machine learning techniques, our model reaches an accuracy of 98.73 percent when identifying traffic congestion caused by football games. Our approach first encodes the traffic across a region as a scaled image. After that the image data from different timestamps is fused with event- and time-related data. Then a crossover operator is used as a data augmentation method to generate training datasets with more balanced classes. Finally, we use the receiver operating characteristic (ROC) analysis to tune the sensitivity of the classifier. We present the analysis of the training time and the inference time separately.


Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

arXiv.org Artificial Intelligence

This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.


On Structured Prediction Theory with Calibrated Convex Surrogate Losses

arXiv.org Machine Learning

We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for general structured prediction.


Bayesian Nonparametric Modeling of Driver Behavior using HDP Split-Merge Sampling Algorithm

arXiv.org Machine Learning

Modern vehicles are equipped with increasingly complex sensors. These sensors generate large volumes of data that provide opportunities for modeling and analysis. Here, we are interested in exploiting this data to learn aspects of behaviors and the road network associated with individual drivers. Our dataset is collected on a standard vehicle used to commute to work and for personal trips. A Hidden Markov Model (HMM) trained on the GPS position and orientation data is utilized to compress the large amount of position information into a small amount of road segment states. Each state has a set of observations, i.e. car signals, associated with it that are quantized and modeled as draws from a Hierarchical Dirichlet Process (HDP). The inference for the topic distributions is carried out using HDP split-merge sampling algorithm. The topic distributions over joint quantized car signals characterize the driving situation in the respective road state. In a novel manner, we demonstrate how the sparsity of the personal road network of a driver in conjunction with a hierarchical topic model allows data driven predictions about destinations as well as likely road conditions.