Asia
Clustering Noisy Signals with Structured Sparsity Using Time-Frequency Representation
Hope, Tom, Wagner, Avishai, Zuk, Or
Clustering of high-dimensional signals, sequences or functional data is a common task that arises in many domains [18, 19]. Such data come up in diverse fields, as in speech analysis, genomics, mass spectrometry, MRI or EEG measurements, and many more. Clustering seeks to partition data into groups with high overall similarity between members (instances) of the same group and dissimilarity to members of other groups. For time-series signals, this means partitioning the instances into groups of similarly behaving functions over time, where the measure of similarity is crucial and often application-specific. In many real-world scenarios, signals are high-dimensional (such as in genomics), noisy (as in low-quality speech recordings), and exhibit non-stationary behavior: for example peaks and other non-smooth local patterns, or changes in frequency over time.
Clustering is Easy When ....What?
It is well known that most of the common clustering objectives are NP-hard to optimize. In practice, however, clustering is being routinely carried out. One approach for providing theoretical understanding of this seeming discrepancy is to come up with notions of clusterability that distinguish realistically interesting input data from worst-case data sets. The hope is that there will be clustering algorithms that are provably efficient on such "clusterable" instances. This paper addresses the thesis that the computational hardness of clustering tasks goes away for inputs that one really cares about. In other words, that "Clustering is difficult only when it does not matter" (the \emph{CDNM thesis} for short). I wish to present a a critical bird's eye overview of the results published on this issue so far and to call attention to the gap between available and desirable results on this issue. A longer, more detailed version of this note is available as arXiv:1507.05307. I discuss which requirements should be met in order to provide formal support to the the CDNM thesis and then examine existing results in view of these requirements and list some significant unsolved research challenges in that direction.
Robust Non-linear Wiener-Granger Causality For Large High-dimensional Data
Wiener-Granger causality is a widely used framework of causal analysis for temporally resolved events. We introduce a new measure of Wiener-Granger causality based on kernelization of partial canonical correlation analysis with specific advantages in the context of large high-dimensional data. The introduced measure is able to detect non-linear and non-monotonous signals, is designed to be immune to noise, and offers tunability in terms of computational complexity in its estimations. Furthermore, we show that, under specified conditions, the introduced measure can be regarded as an estimate of conditional mutual information (transfer entropy). The functionality of this measure is assessed using comparative simulations where it outperforms other existing methods. The paper is concluded with an application to climatological data.
A Historical Analysis of the Field of OR/MS using Topic Models
Gatti, Christopher J., Brooks, James D., Nurre, Sarah G.
This study investigates the content of the published scientific literature in the fields of operations research and management science (OR/MS) since the early 1950s. Our study is based on 80,757 published journal abstracts from 37 of the leading OR/MS journals. We have developed a topic model, using Latent Dirichlet Allocation (LDA), and extend this analysis to reveal the temporal dynamics of the field, journals, and topics. Our analysis shows the generality or specificity of each of the journals, and we identify groups of journals with similar content, which are both consistent and inconsistent with intuition. We also show how journals have become more or less unique in their scope. A more detailed analysis of each journals' topics over time shows significant temporal dynamics, especially for journals with niche content. This study presents an observational, yet objective, view of the published literature from OR/MS that would be of interest to authors, editors, journals, and publishers. Furthermore, this work can be used by new entrants to the fields of OR/MS to understand the content landscape, as a starting point for discussions and inquiry of the field at large, or as a model for other fields to perform similar analyses.
Learning A Task-Specific Deep Architecture For Clustering
Wang, Zhangyang, Chang, Shiyu, Zhou, Jiayu, Wang, Meng, Huang, Thomas S.
While sparse coding-based clustering methods have shown to be successful, their bottlenecks in both efficiency and scalability limit the practical usage. In recent years, deep learning has been proved to be a highly effective, efficient and scalable feature learning tool. In this paper, we propose to emulate the sparse coding-based clustering pipeline in the context of deep learning, leading to a carefully crafted deep model benefiting from both. A feed-forward network structure, named TAGnet, is constructed based on a graph-regularized sparse coding algorithm. It is then trained with task-specific loss functions from end to end. We discover that connecting deep learning to sparse coding benefits not only the model performance, but also its initialization and interpretation. Moreover, by introducing auxiliary clustering tasks to the intermediate feature hierarchy, we formulate DTAGnet and obtain a further performance boost. Extensive experiments demonstrate that the proposed model gains remarkable margins over several state-of-the-art methods.
Remarks on kernel Bayes' rule
Johno, Hisashi, Nakamoto, Kazunori, Saigo, Tatsuhiko
Kernel Bayes' rule has been proposed as a nonparametric kernel-based method to realize Bayesian inference in reproducing kernel Hilbert spaces. However, we demonstrate both theoretically and experimentally that the prediction result by kernel Bayes' rule is in some cases unnatural. We consider that this phenomenon is in part due to the fact that the assumptions in kernel Bayes' rule do not hold in general.
Bayesian Masking: Sparse Bayesian Estimation with Weaker Shrinkage Bias
Kondo, Yohei, Hayashi, Kohei, Maeda, Shin-ichi
A common strategy for sparse linear regression is to introduce regularization, which eliminates irrelevant features by letting the corresponding weights be zeros. However, regularization often shrinks the estimator for relevant features, which leads to incorrect feature selection. Motivated by the above-mentioned issue, we propose Bayesian masking (BM), a sparse estimation method which imposes no regularization on the weights. The key concept of BM is to introduce binary latent variables that randomly mask features. Estimating the masking rates determines the relevance of the features automatically. We derive a variational Bayesian inference algorithm that maximizes the lower bound of the factorized information criterion (FIC), which is a recently developed asymptotic criterion for evaluating the marginal log-likelihood. In addition, we propose reparametrization to accelerate the convergence of the derived algorithm. Finally, we show that BM outperforms Lasso and automatic relevance determination (ARD) in terms of the sparsity-shrinkage trade-off.
Marginalizing Gaussian Process Hyperparameters using Sequential Monte Carlo
Svensson, Andreas, Dahlin, Johan, Schön, Thomas B.
Gaussian process regression is a popular method for non-parametric probabilistic modeling of functions. The Gaussian process prior is characterized by so-called hyperparameters, which often have a large influence on the posterior model and can be difficult to tune. This work provides a method for numerical marginalization of the hyperparameters, relying on the rigorous framework of sequential Monte Carlo. Our method is well suited for online problems, and we demonstrate its ability to handle real-world problems with several dimensions and compare it to other marginalization methods. We also conclude that our proposed method is a competitive alternative to the commonly used point estimates maximizing the likelihood, both in terms of computational load and its ability to handle multimodal posteriors.
$\alpha$-Discounting Multi-Criteria Decision Making ($\alpha$-D MCDM)
In this book we introduce a new procedure called \alpha-Discounting Method for Multi-Criteria Decision Making (\alpha-D MCDM), which is as an alternative and extension of Saaty Analytical Hierarchy Process (AHP). It works for any number of preferences that can be transformed into a system of homogeneous linear equations. A degree of consistency (and implicitly a degree of inconsistency) of a decision-making problem are defined. \alpha-D MCDM is afterwards generalized to a set of preferences that can be transformed into a system of linear and or non-linear homogeneous and or non-homogeneous equations and or inequalities. The general idea of \alpha-D MCDM is to assign non-null positive parameters \alpha_1, \alpha_2, and so on \alpha_p to the coefficients in the right-hand side of each preference that diminish or increase them in order to transform the above linear homogeneous system of equations which has only the null-solution, into a system having a particular non-null solution. After finding the general solution of this system, the principles used to assign particular values to all parameters \alpha is the second important part of \alpha-D, yet to be deeper investigated in the future. In the current book we propose the Fairness Principle, i.e. each coefficient should be discounted with the same percentage (we think this is fair: not making any favoritism or unfairness to any coefficient), but the reader can propose other principles. For consistent decision-making problems with pairwise comparisons, \alpha-Discounting Method together with the Fairness Principle give the same result as AHP. But for weak inconsistent decision-making problem, \alpha-Discounting together with the Fairness Principle give a different result from AHP. Many consistent, weak inconsistent, and strong inconsistent examples are given in this book.
Symbol Emergence in Robotics: A Survey
Taniguchi, Tadahiro, Nagai, Takayuki, Nakamura, Tomoaki, Iwahashi, Naoto, Ogata, Tetsuya, Asoh, Hideki
Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.