Goto

Collaborating Authors

 Asia


A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

arXiv.org Machine Learning

Most existing word embedding methods can be categorized into Neural Embedding Models and Matrix Factorization (MF)-based methods. However some models are opaque to probabilistic interpretation, and MF-based methods, typically solved using Singular Value Decomposition (SVD), may incur loss of corpus information. In addition, it is desirable to incorporate global latent factors, such as topics, sentiments or writing styles, into the word embedding model. Since generative models provide a principled way to incorporate latent factors, we propose a generative word embedding model, which is easy to interpret, and can serve as a basis of more sophisticated latent factor models. The model inference reduces to a low rank weighted positive semidefinite approximation problem. Its optimization is approached by eigendecomposition on a submatrix, followed by online blockwise regression, which is scalable and avoids the information loss in SVD. In experiments on 7 common benchmark datasets, our vectors are competitive to word2vec, and better than other MF-based methods.


Partial Sum Minimization of Singular Values in Robust PCA: Algorithm and Applications

arXiv.org Artificial Intelligence

Robust Principal Component Analysis (RPCA) via rank minimization is a powerful tool for recovering underlying low-rank structure of clean data corrupted with sparse noise/outliers. In many low-level vision problems, not only it is known that the underlying structure of clean data is low-rank, but the exact rank of clean data is also known. Yet, when applying conventional rank minimization for those problems, the objective function is formulated in a way that does not fully utilize a priori target rank information about the problems. This observation motivates us to investigate whether there is a better alternative solution when using rank minimization. In this paper, instead of minimizing the nuclear norm, we propose to minimize the partial sum of singular values, which implicitly encourages the target rank constraint. Our experimental analyses show that, when the number of samples is deficient, our approach leads to a higher success rate than conventional rank minimization, while the solutions obtained by the two approaches are almost identical when the number of samples is more than sufficient. We apply our approach to various low-level vision problems, e.g. high dynamic range imaging, motion edge detection, photometric stereo, image alignment and recovery, and show that our results outperform those obtained by the conventional nuclear norm rank minimization method.


Alternating Minimization Algorithm with Automatic Relevance Determination for Transmission Tomography under Poisson Noise

arXiv.org Machine Learning

We propose a globally convergent alternating minimization (AM) algorithm for image reconstruction in transmission tomography, which extends automatic relevance determination (ARD) to Poisson noise models with Beer's law. The algorithm promotes solutions that are sparse in the pixel/voxel-differences domain by introducing additional latent variables, one for each pixel/voxel, and then learning these variables from the data using a hierarchical Bayesian model. Importantly, the proposed AM algorithm is free of any tuning parameters with image quality comparable to standard penalized likelihood methods. Our algorithm exploits optimization transfer principles which reduce the problem into parallel 1D optimization tasks (one for each pixel/voxel), making the algorithm feasible for large-scale problems. This approach considerably reduces the computational bottleneck of ARD associated with the posterior variances. Positivity constraints inherent in transmission tomography problems are also enforced. We demonstrate the performance of the proposed algorithm for x-ray computed tomography using synthetic and real-world datasets. The algorithm is shown to have much better performance than prior ARD algorithms based on approximate Gaussian noise models, even for high photon flux.


A variational approach to the consistency of spectral clustering

arXiv.org Machine Learning

This paper establishes the consistency of spectral approaches to data clustering. We consider clustering of point clouds obtained as samples of a ground-truth measure. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect. We investigate the spectral convergence of both unnormalized and normalized graph Laplacians towards the appropriate operators in the continuum domain. We obtain sharp conditions on how the connectivity radius can be scaled with respect to the number of sample points for the spectral convergence to hold. We also show that the discrete clusters obtained via spectral clustering converge towards a continuum partition of the ground truth measure. Such continuum partition minimizes a functional describing the continuum analogue of the graph-based spectral partitioning. Our approach, based on variational convergence, is general and flexible.


Progressive EM for Latent Tree Models and Hierarchical Topic Detection

arXiv.org Machine Learning

Hierarchical latent tree analysis (HLTA) is recently proposed as a new method for topic detection. It differs fundamentally from the LDA-based methods in terms of topic definition, topic-document relationship, and learning method. It has been shown to discover significantly more coherent topics and better topic hierarchies. However, HLTA relies on the Expectation-Maximization (EM) algorithm for parameter estimation and hence is not efficient enough to deal with large datasets. In this paper, we propose a method to drastically speed up HLTA using a technique inspired by recent advances in the moments method. Empirical experiments show that our method greatly improves the efficiency of HLTA. It is as efficient as the state-of-the-art LDA-based method for hierarchical topic detection and finds substantially better topics and topic hierarchies.


Regularized Multi-Task Learning for Multi-Dimensional Log-Density Gradient Estimation

arXiv.org Machine Learning

Multi-task learning is a paradigm of machine learning for solving multiple related learning tasks simultaneously with the expectation that information brought by other related tasks can be mutually exploited to improve the accuracy [Caruana, 1997]. Multi-task learning is particularly useful when one has many related learning tasks to solve but only few training samples are available for each task, which is often the case in many real-world problems such as therapy screening [Bickel et al., 2008] and face verification [Wang et al., 2009]. Multi-task learning has been gathering a great deal of attention, and extensive studies have been conducted both theoretically and experimentally [Thrun, 1996, Evgeniou and Pontil, 2004, Ando and Zhang, 2005, Zhang, 2013, Baxter, 2000]. Thrun [1996] proposed the lifelong learning framework, which transfers the knowledge obtained from the tasks experienced in the past to a newly given task, and it was demonstrated to improve the performance of image recognition. Baxter Baxter [2000] defined a multi-task learning framework called inductive bias learning, and derived a generalization error bound. The semi-supervised multi-task learning method proposed by Ando and Zhang [2005] generates many auxiliary learning 2 tasks from unlabeled data and seeks a good feature mapping for the target learning task.


Exploiting Binary Floating-Point Representations for Constraint Propagation: The Complete Unabridged Version

arXiv.org Artificial Intelligence

Floating-point computations are quickly finding their way in the design of safety- and mission-critical systems, despite the fact that designing floating-point algorithms is significantly more difficult than designing integer algorithms. For this reason, verification and validation of floating-point computations is a hot research topic. An important verification technique, especially in some industrial sectors, is testing. However, generating test data for floating-point intensive programs proved to be a challenging problem. Existing approaches usually resort to random or search-based test data generation, but without symbolic reasoning it is almost impossible to generate test inputs that execute complex paths controlled by floating-point computations. Moreover, as constraint solvers over the reals or the rationals do not natively support the handling of rounding errors, the need arises for efficient constraint solvers over floating-point domains. In this paper, we present and fully justify improved algorithms for the propagation of arithmetic IEEE 754 binary floating-point constraints. The key point of these algorithms is a generalization of an idea by B. Marre and C. Michel that exploits a property of the representation of floating-point numbers.


ITSAT: An Efficient SAT-Based Temporal Planner

Journal of Artificial Intelligence Research

Planning as satisfiability is known as an efficient approach to deal with many types of planning problems. However, this approach has not been competitive with the state-space based methods in temporal planning. This paper describes ITSAT as an efficient SAT-based (satisfiability based) temporal planner capable of temporally expressive planning. The novelty of ITSAT lies in the way it handles temporal constraints of given problems without getting involved in the difficulties of introducing continuous variables into the corresponding satisfiability problems. We also show how, as in SAT-based classical planning, carefully devised preprocessing and encoding schemata can considerably improve the efficiency of SAT-based temporal planning. We present two preprocessing methods for mutex relation extraction and action compression. We also show that the separation of causal and temporal reasoning enables us to employ compact encodings that are based on the concept of parallel execution semantics. Although such encodings have been shown to be quite effective in classical planning, ITSAT is the first temporal planner utilizing this type of encoding. Our empirical results show that not only does ITSAT outperform the state-of-the-art temporally expressive planners, it is also competitive with the fast temporal planners that cannot handle required concurrency.


Tag-Weighted Topic Model For Large-scale Semi-Structured Documents

arXiv.org Machine Learning

To date, there have been massive Semi-Structured Documents (SSDs) during the evolution of the Internet. These SSDs contain both unstructured features (e.g., plain text) and metadata (e.g., tags). Most previous works focused on modeling the unstructured text, and recently, some other methods have been proposed to model the unstructured text with specific tags. To build a general model for SSDs remains an important problem in terms of both model fitness and efficiency. We propose a novel method to model the SSDs by a so-called Tag-Weighted Topic Model (TWTM). TWTM is a framework that leverages both the tags and words information, not only to learn the document-topic and topic-word distributions, but also to infer the tag-topic distributions for text mining tasks. We present an efficient variational inference method with an EM algorithm for estimating the model parameters. Meanwhile, we propose three large-scale solutions for our model under the MapReduce distributed computing platform for modeling large-scale SSDs. The experimental results show the effectiveness, efficiency and the robustness by comparing our model with the state-of-the-art methods in document modeling, tags prediction and text classification. We also show the performance of the three distributed solutions in terms of time and accuracy on document modeling.


IT-Dendrogram: A New Member of the In-Tree (IT) Clustering Family

arXiv.org Machine Learning

Previously, we proposed a physically-inspired method to construct data points into an effective in-tree (IT) structure, in which the underlying cluster structure in the dataset is well revealed. Although there are some edges in the IT structure requiring to be removed, such undesired edges are generally distinguishable from other edges and thus are easy to be determined. For instance, when the IT structures for the 2-dimensional (2D) datasets are graphically presented, those undesired edges can be easily spotted and interactively determined. However, in practice, there are many datasets that do not lie in the 2D Euclidean space, thus their IT structures cannot be graphically presented. But if we can effectively map those IT structures into a visualized space in which the salient features of those undesired edges are preserved, then the undesired edges in the IT structures can still be visually determined in a visualization environment. Previously, this purpose was reached by our method called IT-map. The outstanding advantage of IT-map is that clusters can still be found even with the so-called crowding problem in the embedding. In this paper, we propose another method, called IT-Dendrogram, to achieve the same goal through an effective combination of the IT structure and the single link hierarchical clustering (SLHC) method. Like IT-map, IT-Dendrogram can also effectively represent the IT structures in a visualization environment, whereas using another form, called the Dendrogram. IT-Dendrogram can serve as another visualization method to determine the undesired edges in the IT structures and thus benefit the IT-based clustering analysis. This was demonstrated on several datasets with different shapes, dimensions, and attributes. Unlike IT-map, IT-Dendrogram can always avoid the crowding problem, which could help users make more reliable cluster analysis in certain problems.