Asia
Decadal climate predictions using sequential learning algorithms
Ensembles of climate models are commonly used to improve climate predictions and assess the uncertainties associated with them. Weighting the models according to their performances holds the promise of further improving their predictions. Here, we use an ensemble of decadal climate predictions to demonstrate the ability of sequential learning algorithms (SLAs) to reduce the forecast errors and reduce the uncertainties. Three different SLAs are considered, and their performances are compared with those of an equally weighted ensemble, a linear regression and the climatology. Predictions of four different variables--the surface temperature, the zonal and meridional wind, and pressure--are considered. The spatial distributions of the performances are presented, and the statistical significance of the improvements achieved by the SLAs is tested. Based on the performances of the SLAs, we propose one to be highly suitable for the improvement of decadal climate predictions.
Efficient Nonnegative Tucker Decompositions: Algorithms and Uniqueness
Zhou, Guoxu, Cichocki, Andrzej, Zhao, Qibin, Xie, Shengli
Abstract--Nonnegative T ucker decomposition (NTD) is a powerful tool for the extraction of nonnegative parts-based and physically meaningful latent components from high-dimensional tensor data while preserving the natural multilinear structure of data. However, as the data tensor often has multiple modes and is large-scale, existing NTD algorithms suffer from a very high computational complexity in terms of both storage and computation time, which has been one major obstacle for practical applications of NTD. T o overcome these disadvantages, we show how low (multilinear) rank approximation (LRA) of tensors is able to significantly simplify the computation of the gradients of the cost function, upon which a family of efficient first-order NTD algorithms are developed. Besides dramatically reducing the storage complexity and running time, the new algorithms are quite flexible and robust to noise because any well-established LRA approaches can be applied. We also show how nonnegativity incorporating sparsity substantially improves the uniqueness property and partially alleviates the curse of dimensionality of the T ucker decompositions. Simulation results on synthetic and real-world data justify the validity and high efficiency of the proposed NTD algorithms. INDING information-rich and task-relevant variables hidden behind observation data is a fundamental task in data analysis and has been widely studied in the fields of signal and image processing and machine learning. Although the observation data can be very large, a much lower number of latent variables or components can capture the most significant features of the original data. Manuscript received ...This work was partially supported by the National Natural Science Foundation of China (grants U1201253), the Guangdong Province Natural Science Foundation (2014A030308009), the Guangdong Province Excellent Thesis Foundation (SYBZZXM201316), and the JSPS KAKENHI (26730125, 15K15955). Guoxu Zhou is with the School of Automation at Guangdong University of Technology, Guangzhou, China and the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Wako-shi, Saitama, Japan. Andrzej Cichocki is with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Wako-shi, Saitama, Japan and with Systems Research Institute, Polish Academy of Science, Warsaw, Poland. Qibin Zhao is with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Japan. Shengli Xie is with the Faculty of Automation, Guangdong University of Technology, Guangzhou 510006, China. This important topic has been extensively studied in the last several decades, particularly witnessed by the great success of blind source separation (BSS) techniques [1].
Exponential Family Matrix Completion under Structural Constraints
Gunasekar, Suriya, Ravikumar, Pradeep, Ghosh, Joydeep
We consider the matrix completion problem of recovering a structured matrix from noisy and partial measurements. Recent works have proposed tractable estimators with strong statistical guarantees for the case where the underlying matrix is low--rank, and the measurements consist of a subset, either of the exact individual entries, or of the entries perturbed by additive Gaussian noise, which is thus implicitly suited for thin--tailed continuous data. Arguably, common applications of matrix completion require estimators for (a) heterogeneous data--types, such as skewed--continuous, count, binary, etc., (b) for heterogeneous noise models (beyond Gaussian), which capture varied uncertainty in the measurements, and (c) heterogeneous structural constraints beyond low--rank, such as block--sparsity, or a superposition structure of low--rank plus elementwise sparseness, among others. In this paper, we provide a vastly unified framework for generalized matrix completion by considering a matrix completion setting wherein the matrix entries are sampled from any member of the rich family of exponential family distributions; and impose general structural constraints on the underlying matrix, as captured by a general regularizer $\mathcal{R}(.)$. We propose a simple convex regularized $M$--estimator for the generalized framework, and provide a unified and novel statistical analysis for this general class of estimators. We finally corroborate our theoretical results on simulated datasets.
Dynamic Poisson Factorization
Charlin, Laurent, Ranganath, Rajesh, McInerney, James, Blei, David M.
Models for recommender systems use latent factors to explain the preferences and behaviors of users with respect to a set of items (e.g., movies, books, academic papers). Typically, the latent factors are assumed to be static and, given these factors, the observed preferences and behaviors of users are assumed to be generated without order. These assumptions limit the explorative and predictive capabilities of such models, since users' interests and item popularity may evolve over time. To address this, we propose dPF, a dynamic matrix factorization model based on the recent Poisson factorization model for recommendations. dPF models the time evolving latent factors with a Kalman filter and the actions with Poisson distributions. We derive a scalable variational inference algorithm to infer the latent factors. Finally, we demonstrate dPF on 10 years of user click data from arXiv.org, one of the largest repository of scientific papers and a formidable source of information about the behavior of scientists. Empirically we show performance improvement over both static and, more recently proposed, dynamic recommendation models. We also provide a thorough exploration of the inferred posteriors over the latent variables.
Precise Phase Transition of Total Variation Minimization
Zhang, Bingwen, Xu, Weiyu, Cai, Jian-Feng, Lai, Lifeng
Characterizing the phase transitions of convex optimizations in recovering structured signals or data is of central importance in compressed sensing, machine learning and statistics. The phase transitions of many convex optimization signal recovery methods such as $\ell_1$ minimization and nuclear norm minimization are well understood through recent years' research. However, rigorously characterizing the phase transition of total variation (TV) minimization in recovering sparse-gradient signal is still open. In this paper, we fully characterize the phase transition curve of the TV minimization. Our proof builds on Donoho, Johnstone and Montanari's conjectured phase transition curve for the TV approximate message passing algorithm (AMP), together with the linkage between the minmax Mean Square Error of a denoising problem and the high-dimensional convex geometry for TV minimization.
Statistical Analysis of Loopy Belief Propagation in Random Fields
Yasuda, Muneki, Kataoka, Shun, Tanaka, Kazuyuki
Loopy belief propagation (LBP), which is equivalent to the Bethe approximation in statistical mechanics, is a message-passing-type inference method that is widely used to analyze systems based on Markov random fields (MRFs). In this paper, we propose a message-passing-type method to analytically evaluate the quenched average of LBP in random fields by using the replica cluster variation method. The proposed analytical method is applicable to general pair-wise MRFs with random fields whose distributions differ from each other and can give the quenched averages of the Bethe free energies over random fields, which are consistent with numerical results. The order of its computational cost is equivalent to that of standard LBP. In the latter part of this paper, we describe the application of the proposed method to Bayesian image restoration, in which we observed that our theoretical results are in good agreement with the numerical results for natural images.
Word vs. Class-Based Word Sense Disambiguation
Izquierdo, Ruben, Suarez, Armando, Rigau, German
As empirically demonstrated by the Word Sense Disambiguation (WSD) tasks of the last SensEval/SemEval exercises, assigning the appropriate meaning to words in context has resisted all attempts to be successfully addressed. Many authors argue that one possible reason could be the use of inappropriate sets of word meanings. In particular, WordNet has been used as a de-facto standard repository of word meanings in most of these tasks. Thus, instead of using the word senses defined in WordNet, some approaches have derived semantic classes representing groups of word senses. However, the meanings represented by WordNet have been only used for WSD at a very fine-grained sense level or at a very coarse-grained semantic class level (also called SuperSenses). We suspect that an appropriate level of abstraction could be on between both levels. The contributions of this paper are manifold. First, we propose a simple method to automatically derive semantic classes at intermediate levels of abstraction covering all nominal and verbal WordNet meanings. Second, we empirically demonstrate that our automatically derived semantic classes outperform classical approaches based on word senses and more coarse-grained sense groupings. Third, we also demonstrate that our supervised WSD system benefits from using these new semantic classes as additional semantic features while reducing the amount of training examples. Finally, we also demonstrate the robustness of our supervised semantic class-based WSD system when tested on out of domain corpus.
Knowledge-Based Textual Inference via Parse-Tree Transformations
Bar-Haim, Roy, Dagan, Ido, Berant, Jonathan
Textual inference is an important component in many applications for understanding natural language. Classical approaches to textual inference rely on logical representations for meaning, which may be regarded as "external" to the natural language itself. However, practical applications usually adopt shallower lexical or lexical-syntactic representations, which correspond closely to language structure. In many cases, such approaches lack a principled meaning representation and inference framework. We describe an inference formalism that operates directly on language-based structures, particularly syntactic parse trees. New trees are generated by applying inference rules, which provide a unified representation for varying types of inferences. We use manual and automatic methods to generate these rules, which cover generic linguistic structures as well as specific lexical-based inferences. We also present a novel packed data-structure and a corresponding inference algorithm that allows efficient implementation of this formalism. We proved the correctness of the new algorithm and established its efficiency analytically and empirically. The utility of our approach was illustrated on two tasks: unsupervised relation extraction from a large corpus, and the Recognizing Textual Entailment (RTE) benchmarks.
A Complete Derivation Of The Association Log-Likelihood Distance For Multi-Object Tracking
Altendorfer, Richard, Wirkert, Sebastian
The Mahalanobis distance is commonly used in multi-object trackers for measurement-to-track association. Starting with the original definition of the Mahalanobis distance we review its use in association. Given that there is no principle in multi-object tracking that sets the Mahalanobis distance apart as a distinguished statistical distance we revisit the global association hypotheses of multiple hypothesis tracking as the most general association setting. Those association hypotheses induce a distance-like quantity for assignment which we refer to as association log-likelihood distance. We compare the ability of the Mahalanobis distance to the association log-likelihood distance to yield correct association relations in Monte-Carlo simulations. It turns out that on average the distance based on association log-likelihood performs better than the Mahalanobis distance, confirming that the maximization of global association hypotheses is a more fundamental approach to association than the minimization of a certain statistical distance measure.
Theoretical and Experimental Analyses of Tensor-Based Regression and Classification
Wimalawarne, Kishan, Tomioka, Ryota, Sugiyama, Masashi
We theoretically and experimentally investigate tensor-based regression and classification. Our focus is regularization with various tensor norms, including the overlapped trace norm, the latent trace norm, and the scaled latent trace norm. We first give dual optimization methods using the alternating direction method of multipliers, which is computationally efficient when the number of training samples is moderate. We then theoretically derive an excess risk bound for each tensor norm and clarify their behavior. Finally, we perform extensive experiments using simulated and real data and demonstrate the superiority of tensor-based learning methods over vector- and matrix-based learning methods.