Asia
Support Vector Machines for Current Status Data
Travis-Lumer, Yael, Goldberg, Yair
In this paper we aim to develop a general, model free, method for analyzing current status data using machine learning techniques. In particular, we propose a support vector machine (SVM) learning method for estimation of the failure time expectation for current status data. SVM was originally introduced by Vapnik in the 1990's and is firmly related to statistical learning theory (Vapnik, 1999). The choice of SVMs for current status data is motivated by the fact that SVMs can be implemented easily, have fast training speed, produce decision functions that have a strong generalization ability and can guarantee convergence to the optimal solution, under some weak assumptions (Shivaswamy et al., 2007). Current status data is a data format where the failure timeT is restricted to knowledge of whether or notT exceeds a random monitoring timeC . This data format is quite common and includes examples from various fields. Jewell and van der Laan (2004) mention a few examples including: studying the distribution of the age of a child at weaning given observation points; when conducting a partner study of HIV infection over a number of clinic visits; and when a tumor under investigation is occult and an animal is sacrificed at a certain time point in order to determine presence or absence of the tumor.
On the Feasibility of Distributed Kernel Regression for Big Data
Xu, Chen, Zhang, Yongquan, Li, Runze
In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a strategy, a full dataset is first split into several manageable segments; the final output is then averaged from the individual outputs of the segments. Despite its popularity in practice, it remains largely unknown that whether such a distributive strategy provides valid theoretical inferences to the original data. In this paper, we address this fundamental issue for the distributed kernel regression (DKR), where the algorithmic feasibility is measured by the generalization performance of the resulting estimator. To justify DKR, a uniform convergence rate is needed for bounding the generalization error over the individual outputs, which brings new and challenging issues in the big data setup. Under mild conditions, we show that, with a proper number of segments, DKR leads to an estimator that is generalization consistent to the unknown regression function. The obtained results justify the method of DKR and shed light on the feasibility of using other distributed algorithms for processing big data. The promising preference of the method is supported by both simulation and real data examples.
Revisiting Algebra and Complexity of Inference in Graphical Models
Ravanbakhsh, Siamak, Greiner, Russell
This paper studies the form and complexity of inference in graphical models using the abstraction offered by algebraic structures. In particular, we broadly formalize inference problems in graphical models by viewing them as a sequence of operations based on commutative semigroups. We then study the computational complexity of inference by organizing various problems into an "inference hierarchy". When the underlying structure of an inference problem is a commutative semiring -- i.e. a combination of two commutative semigroups with the distributive law -- a message passing procedure called belief propagation can leverage this distributive law to perform polynomial-time inference for certain problems. After establishing the NP-hardness of inference in any commutative semiring, we investigate the relation between algebraic properties in this setting and further show that polynomial-time inference using distributive law does not (trivially) extend to inference problems that are expressed using more than two commutative semigroups. We then extend the algebraic treatment of message passing procedures to survey propagation, providing a novel perspective using a combination of two commutative semirings. This formulation generalizes the application of survey propagation to new settings.
Modeling Compositionality with Multiplicative Recurrent Neural Networks
We present the multiplicative recurrent neural network as a general model for compositional meaning in language, and evaluate it on the task of fine-grained sentiment analysis. We establish a connection to the previously investigated matrix-space models for compositionality, and show they are special cases of the multiplicative recurrent net. Our experiments show that these models perform comparably or better than Elman-type additive recurrent neural networks and outperform matrix-space models on a standard fine-grained sentiment analysis corpus. Furthermore, they yield comparable results to structural deep models on the recently published Stanford Sentiment Treebank without the need for generating parse trees.
Monotonous (Semi-)Nonnegative Matrix Factorization
NMF suffers from the scale and ordering ambiguities. Often, the source signals can be monotonous in nature. For example, in source separation problem, the source signals can be monotonously increasing or decreasing while the mixing matrix can have nonnegative entries. NMF methods may not be effective for such cases as it suffers from the ordering ambiguity. This paper proposes an approach to incorporate notion of monotonicity in NMF, labeled as monotonous NMF. An algorithm based on alternating least-squares is proposed for recovering monotonous signals from a data matrix. Further, the assumption on mixing matrix is relaxed to extend monotonous NMF for data matrix with real numbers as entries. The approach is illustrated using synthetic noisy data. The results obtained by monotonous NMF are compared with standard NMF algorithms in the literature, and it is shown that monotonous NMF estimates source signals well in comparison to standard NMF algorithms when the underlying sources signals are monotonous.
Distributed Evaluation of Nonmonotonic Multi-context Systems
Dao-Tran, Minh, Eiter, Thomas, Fink, Michael, Krennwallner, Thomas
Multi-context Systems (MCSs) are a formalism for systems consisting of knowledge bases (possibly heterogeneous and non-monotonic) that are interlinked via bridge rules, where the global system semantics emerges from the local semantics of the knowledge bases (also called contexts) in an equilibrium. While MCSs and related formalisms are inherently targeted for distributed set- tings, no truly distributed algorithms for their evaluation were available. We address this short- coming and present a suite of such algorithms which includes a basic algorithm DMCS, an ad- vanced version DMCSOPT that exploits topology-based optimizations, and a streaming algorithm DMCS-STREAMING that computes equilibria in packages of bounded size. The algorithms be- have quite differently in several respects, as experienced in thorough experimental evaluation of a system prototype. From the experimental results, we derive a guideline for choosing the appropriate algorithm and running mode in particular situations, determined by the parameter settings.
Detecting Concept-level Emotion Cause in Microblogging
In this paper, we propose a Concept-level Emotion Cause Model (CECM), instead of the mere word-level models, to discover causes of microblogging users' diversified emotions on specific hot event. A modified topic-supervised biterm topic model is utilized in CECM to detect'emotion topics' in event-related tweets, and then context-sensitive topical PageRank is utilized to detect meaningful multiword expressions as emotion causes. Experimental results on a dataset from Sina Weibo, one of the largest microblogging websites in China, show CECM can better detect emotion causes than baseline methods.
Unregularized Online Learning Algorithms with General Loss Functions
In this paper, we consider unregularized online learning algorithms in a Reproducing Kernel Hilbert Spaces (RKHS). Firstly, we derive explicit convergence rates of the unregularized online learning algorithms for classification associated with a general gamma-activating loss (see Definition 1 in the paper). Our results extend and refine the results in Ying and Pontil (2008) for the least-square loss and the recent result in Bach and Moulines (2011) for the loss function with a Lipschitz-continuous gradient. Moreover, we establish a very general condition on the step sizes which guarantees the convergence of the last iterate of such algorithms. Secondly, we establish, for the first time, the convergence of the unregularized pairwise learning algorithm with a general loss function and derive explicit rates under the assumption of polynomially decaying step sizes. Concrete examples are used to illustrate our main results. The main techniques are tools from convex analysis, refined inequalities of Gaussian averages, and an induction approach.
Social Trust Prediction via Max-norm Constrained 1-bit Matrix Completion
Wang, Jing, Shen, Jie, Xu, Huan
Social trust prediction addresses the significant problem of exploring interactions among users in social networks. Naturally, this problem can be formulated in the matrix completion framework, with each entry indicating the trustness or distrustness. However, there are two challenges for the social trust problem: 1) the observed data are with sign (1-bit) measurements; 2) they are typically sampled non-uniformly. Most of the previous matrix completion methods do not well handle the two issues. Motivated by the recent progress of max-norm, we propose to solve the problem with a 1-bit max-norm constrained formulation. Since max-norm is not easy to optimize, we utilize a reformulation of max-norm which facilitates an efficient projected gradient decent algorithm. We demonstrate the superiority of our formulation on two benchmark datasets.
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal Likelihood
Hayashi, Kohei, Maeda, Shin-ichi, Fujimaki, Ryohei
The marginal log-likelihood is a key concept of Bayesian model identification of latent variable models (LVMs), such as mixture models (MMs), probabilistic principal component analysis, and hidden Markov models (HMMs). Determination of dimensionality of latent variables is an essential task to uncover hidden structures behind the observed data as well as to mitigate overfitting. In general, LVMs are singular (i.e., mapping between parameters and probabilistic models is not one-to-one) and such classical information criteria based on the regularity assumption as the Bayesian information criterion (BIC) [Schwarz, 1978] are no longer justified. Since exact evaluation of 1 the marginal log-likelihood is often not available, approximation techniques have been developed using sampling (i.e., Markov Chain Monte Carlo methods (MCMCs) [Hastings, 1970]), a variational lower bound (i.e., the variational Bayes methods (VB) [Attias, 1999, Jordan et al., 1999]), or algebraic geometry (i.e., the widely applicable BIC (WBIC) [Watanabe, 2013]). However, model selection using these methods typically requires heavy computational cost (e.g., a large number of MCMC sampling in a high-dimensional space, an outer loop for VB/WBIC.) In the last few years, a new approximation technique and an inference method, factorized information criterion (FIC) and factorized asymptotic Bayesian inference (FAB), have been developed for some binary LVMs [Fujimaki and Morinaga, 2012, Fujimaki and Hayashi, 2012, Hayashi and Fujimaki, 2013, Eto et al., 2014]. Unlike existing methods which evaluate approximated marginal log-likelihoods calculated for each latent variable dimensionality (and therefore need an outer loop for model selection), FAB finds an effective dimensionality via an EMstyle alternating optimization procedure.