Undirected Networks
Structural Learning with Amortized Inference
Chang, Kai-Wei (University of Illinois at Urbana Champaign) | Upadhyay, Shyam (University of Illinois at Urbana Champaign) | Kundu, Gourab (University of Illinois at Urbana Champaign) | Roth, Dan (University of Illinois at Urbana Champaign)
Training a structured prediction model involves performing several loss-augmented inference steps. Over the lifetime of the training, many of these inference problems, although different, share the same solution. We propose AI-DCD, an Amortized Inference framework for Dual Coordinate Descent method, an approximate learning algorithm, that accelerates the training process by exploiting this redundancy of solutions, without compromising the performance of the model. We show the efficacy of our method by training a structured SVM using dual coordinate descent for an entityrelation extraction task. Our method learns the same model as an exact training algorithm would, but call the inference engine only in 10% – 24% of the inference problems encountered during training. We observe similar gains on a multi-label classification task and with a Structured Perceptron model for the entity-relation task.
Deep Modeling Complex Couplings within Financial Markets
Cao, Wei (University of Technology, Sydney) | Hu, Liang (University of Technology and Shanghai Jiaotong University) | Cao, Longbing (University of Technology)
The global financial crisis occurred in 2008 and its contagion to other regions, as well as the long-lasting impact on different markets, show that it is increasingly important to understand the complicated coupling relationships across financial markets. This is indeed very difficult as complex hidden coupling relationships exist between different financial markets in various countries, which are very hard to model. The couplings involve interactions between homogeneous markets from various countries (we call intra-market coupling), interactions between heterogeneous markets (inter-market coupling) and interactions between current and past market behaviors (temporal coupling). Very limited work has been done towards modeling such complex couplings, whereas some existing methods predict market movement by simply aggregating indicators from various markets but ignoring the inbuilt couplings. As a result, these methods are highly sensitive to observations, and may often fail when financial indicators change slightly. In this paper, a coupled deep belief network is designed to accommodate the above three types of couplings across financial markets. With a deep-architecture model to capture the high-level coupled features, the proposed approach can infer market trends. Experimental results on data of stock and currency markets from three countries show that our approach outperforms other baselines, from both technical and business perspectives.
Extracting Verb Expressions Implying Negative Opinions
Li, Huayi (University of Illinois at Chicago) | Mukherjee, Arjun (University of Houston) | Si, Jianfeng (Institute for Infocomm Research) | Liu, Bing (University of Illinois at Chicago)
Identifying aspect-based opinions has been studied extensively in recent years. However, existing work primarily focused on adjective, adverb, and noun expressions. Clearly, verb expressions can imply opinions too. We found that in many domains verb expressions can be even more important to applications because they often describe major issues of products or services. These issues enable brands and businesses to directly improve their products or services. To the best of our knowledge, this problem has not received much attention in the literature. In this paper, we make an attempt to solve this problem. Our proposed method first extracts verb expressions from reviews and then employs Markov Networks to model rich linguistic features and long distance relationships to identify negative issue expressions. Since our training data is obtained from titles of reviews whose labels are automatically inferred from review ratings, our approach is applicable to any domain without manual involvement. Experimental results using real-life review datasets show that our approach outperforms strong baselines.
Unsupervised Word Sense Disambiguation Using Markov Random Field and Dependency Parser
Chaplot, Devendra Singh (Samsung Electronics Co., Ltd.) | Bhattacharyya, Pushpak (IIT Bombay) | Paranjape, Ashwin (Stanford University)
Word Sense Disambiguation is a difficult problem to solve in the unsupervised setting. This is because in this setting inference becomes more dependent on the interplay between different senses in the context due to unavailability of learning resources. Using two basic ideas, sense dependency and selective dependency, we model the WSD problem as a Maximum A Posteriori (MAP) Inference Query on a Markov Random Field (MRF) built using WordNet and Link Parser or Stanford Parser. To the best of our knowledge this combination of dependency and MRF is novel, and our graph-based unsupervised WSD system beats state-of-the-art system on SensEval-2, SensEval-3 and SemEval-2007 English all-words datasets while being over 35 times faster.
Scalable Planning and Learning for Multiagent POMDPs
Amato, Christopher (Massachusetts Institute of Technology) | Oliehoek, Frans A (University of Amsterdam and University of Liverpool)
Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems.
Temporally Adaptive Restricted Boltzmann Machine for Background Modeling
Xu, Linli (University of Science and Technology of China) | Li, Yitan (University of Science and Technology of China) | Wang, Yubo (University of Science and Technology of China) | Chen, Enhong (University of Science and Technology of China)
We examine the fundamental problem of background modeling which is to model the background scenes in video sequences and segment the moving objects from the background. A novel approach is proposed based on the Restricted Boltzmann Machine (RBM) while exploiting the temporal nature of the problem. In particular, we augment the standard RBM to take a window of sequential video frames as input and generate the background model while enforcing the background smoothly adapting to the temporal changes. As a result, the augmented temporally adaptive model can generate stable background given noisy inputs and adapt quickly to the changes in background while keeping all the advantages of RBMs including exact inference and effective learning procedure. Experimental results demonstrate the effectiveness of the proposed method in modeling the temporal nature in background.
Forecasting Collector Road Speeds Under High Percentage of Missing Data
Xin, Xin (Beijing Institute of Technology) | Lu, Chunwei (Autopia Mobile Tech Group Inc.) | Wang, Yashen (Beijing Institute of Technology) | Huang, Heyan (Beijing Institute of Technology)
Accurate road speed predictions can help drivers in smart route planning. Although the issue has been studied previously, most existing work focus on arterial roads only, where sensors are configured closely for collecting complete real-time data. For collector roads where sensors sparsly cover, however, speed predictions are often ignored. With GPS-equipped floating car signals being available nowadays, we aim at forecasting collector road speeds by utilizing these signals. The main challenge compared with arterial roads comes from the missing data. In a time slot of the real case, over 90% of collector roads cannot be covered by enough floating cars. Thus most traditional approaches for arterial roads, relying on complete historical data, cannot be employed directly. Aiming at solving this problem, we propose a multi-view road speed prediction framework. In the first view, temporal patterns are modeled by a layered hidden Markov model; and in the second view, spatial patterns are modeled by a collective matrix factorization model. The two models are learned and inferred simultaneously in a co-regularized manner. Experiments conducted in the Beijing road network, based on 10K taxi signals in 2 years, have demonstrated that the approach outperforms traditional approaches by 10% in MAE and RMSE.
Learning Hybrid Models with Guarded Transitions
Santana, Pedro (Massachusetts Institute of Technology) | Lane, Spencer (Massachusetts Institute of Technology) | Timmons, Eric (Massachusetts Institute of Technology) | Williams, Brian (Massachusetts Institute of Technology) | Forster, Carlos (Instituto Tecnológico de Aeronáutica)
Innovative methods have been developed for diagnosis, activity monitoring, and state estimation that achieve high accuracy through the use of stochastic models involving hybrid discrete and continuous behaviors. A key bottleneck is the automated acquisition of these hybrid models, and recent methods have focused predominantly on Jump Markov processes and piecewise autoregressive models. In this paper, we present a novel algorithm capable of performing unsupervised learning of guarded Probabilistic Hybrid Automata (PHA) models, which extends prior work by allowing stochastic discrete mode transitions in a hybrid system to have a functional dependence on its continuous state. Our experiments indicate that guarded PHA models can yield significant performance improvements when used by hybrid state estimators, particularly when diagnosing the true discrete mode of the system, without any noticeable impact on their real-time performance.
A Regularized Linear Dynamical System Framework for Multivariate Time Series Analysis
Liu, Zitao (University of Pittsburgh) | Hauskrecht, Milos (University of Pittsburgh)
Linear Dynamical System (LDS) is an elegant mathematical framework for modeling and learning Multivariate Time Series (MTS). However, in general, it is difficult to set the dimension of an LDS's hidden state space. A small number of hidden states may not be able to model the complexities of a MTS, while a large number of hidden states can lead to overfitting. In this paper, we study learning methods that impose various regularization penalties on the transition matrix of the LDS model and propose a regularized LDS learning framework (rLDS) which aims to (1) automatically shut down LDSs' spurious and unnecessary dimensions, and consequently, address the problem of choosing the optimal number of hidden states; (2) prevent the overfitting problem given a small amount of MTS data; and (3) support accurate MTS forecasting. To learn the regularized LDS from data we incorporate a second order cone program and a generalized gradient descent method into the Maximum a Posteriori framework and use Expectation Maximization to obtain a low-rank transition matrix of the LDS model. We propose two priors for modeling the matrix which lead to two instances of our rLDS. We show that our rLDS is able to recover well the intrinsic dimensionality of the time series dynamics and it improves the predictive performance when compared to baselines on both synthetic and real-world MTS datasets.
Exploiting Determinism to Scale Relational Inference
Ibrahim, Mohamed Hamza (Ecole Polytechnique Montreal) | Pal, Christopher (Ecole Polytechnique Montreal) | Pesant, Gilles (Ecole Polytechnique Montreal)
One key challenge in statistical relational learning (SRL) is scalable inference. Unfortunately, most real-world problems in SRL have expressive models that translate into large grounded networks, representing a bottleneck for any inference method and weakening its scalability. In this paper we introduce Preference Relaxation (PR), a two-stage strategy that uses the determinism present in the underlying model to improve the scalability of relational inference. The basic idea of PR is that if the underlying model involves mandatory (i.e. hard) constraints as well as preferences (i.e. soft constraints) then it is potentially wasteful to allocate memory for all constraints in advance when performing inference. To avoid this, PR starts by relaxing preferences and performing inference with hard constraints only. It then removes variables that violate hard constraints, thereby avoiding irrelevant computations involving preferences. In addition it uses the removed variables to enlarge the evidence database. This reduces the effective size of the grounded network. Our approach is general and can be applied to various inference methods in relational domains. Experiments on real-world applications show how PR substantially scales relational inference with a minor impact on accuracy.