AITopics

1210.1461

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Unfolding Latent Tree Structures using 4th Order Tensors

Ishteva, Mariya, Park, Haesun, Song, Le

Discovering the latent structure from many observed variables is an important yet challenging learning task. Existing approaches for discovering latent structures often require the unknown number of hidden states as an input. In this paper, we propose a quartet based approach which is \emph{agnostic} to this number. The key contribution is a novel rank characterization of the tensor associated with the marginal distribution of a quartet. This characterization allows us to design a \emph{nuclear norm} based test for resolving quartet relations. We then use the quartet test as a subroutine in a divide-and-conquer algorithm for recovering the latent tree structure. Under mild conditions, the algorithm is consistent and its error probability decays exponentially with increasing sample size. We demonstrate that the proposed approach compares favorably to alternatives. In a real world stock dataset, it also discovers meaningful groupings of variables, and produces a model that fits the data better.

algorithm, banking & finance, health & medicine, (18 more...)

1210.1258

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Banking & Finance (1.00)
Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.34)

Papatheocharous, Efi, Papadopoulos, Harris, Andreou, Andreas S.

Feature Subset Selection for Software Cost Modelling and Estimation

arXiv.org Artificial IntelligenceOct-3-2012

Feature selection has been recently used in the area of software engineering for improving the accuracy and robustness of software cost models. The idea behind selecting the most informative subset of features from a pool of available cost drivers stems from the hypothesis that reducing the dimensionality of datasets will significantly minimise the complexity and time required to reach to an estimation using a particular modelling technique. This work investigates the appropriateness of attributes, obtained from empirical project databases and aims to reduce the cost drivers used while preserving performance. Finding suitable subset selections that may cater improved predictions may be considered as a pre-processing step of a particular technique employed for cost estimation (filter or wrapper) or an internal (embedded) step to minimise the fitting error. This paper compares nine relatively popular feature selection methods and uses the empirical values of selected attributes recorded in the ISBSG and Desharnais datasets to estimate software development effort.

algorithm, dataset, estimation, (14 more...)

arXiv.org Artificial Intelligence

1210.1161

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
Europe > Middle East > Cyprus > Larnaka > Larnaca (0.04)
(19 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Kumar, Abhishek, Sindhwani, Vikas, Kambadur, Prabhanjan

Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization

The separability assumption (Donoho & Stodden, 2003; Arora et al., 2012) turns non-negative matrix factorization (NMF) into a tractable problem. Recently, a new class of provably-correct NMF algorithms have emerged under this assumption. In this paper, we reformulate the separable NMF problem as that of finding the extreme rays of the conical hull of a finite set of vectors. From this geometric perspective, we derive new separable NMF algorithms that are highly scalable and empirically noise robust, and have several other favorable properties in relation to existing methods. A parallel implementation of our algorithm demonstrates high scalability on shared- and distributed-memory machines.

algorithm, artificial intelligence, machine learning, (13 more...)

1210.119

Country: North America > United States > Maryland (0.28)

Genre: Research Report (0.82)

Industry:

Government > Regional Government > North America Government > United States Government (0.68)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Balasubramanian, Krishnakumar, Yu, Kai, Lebanon, Guy

Smooth Sparse Coding via Marginal Regression for Learning Sparse Representations

We propose and analyze a novel framework for learning sparse representations, based on two statistical techniques: kernel smoothing and marginal regression. The proposed approach provides a flexible framework for incorporating feature similarity or temporal information present in data sets, via non-parametric kernel smoothing. We provide generalization bounds for dictionary learning using smooth sparse coding and show how the sample complexity depends on the L1 norm of kernel function used. Furthermore, we propose using marginal regression for obtaining sparse codes, which significantly improves the speed and allows one to scale to large dictionary sizes easily. We demonstrate the advantages of the proposed approach, both in terms of accuracy and speed by extensive experimentation on several real data sets. In addition, we demonstrate how the proposed approach could be used for improving semi-supervised sparse coding.

artificial intelligence, machine learning, sparse, (18 more...)

1210.1121

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Guimera, Roger, Llorente, Alejandro, Moro, Esteban, Sales-Pardo, Marta

Predicting human preferences using the block structure of complex social networks

With ever-increasing available data, predicting individuals' preferences and helping them locate the most relevant information has become a pressing need. Understanding and predicting preferences is also important from a fundamental point of view, as part of what has been called a "new" computational social science. Here, we propose a novel approach based on stochastic block models, which have been developed by sociologists as plausible models of complex networks of social interactions. Our model is in the spirit of predicting individuals' preferences based on the preferences of others but, rather than fitting a particular model, we rely on a Bayesian approach that samples over the ensemble of all possible models. We show that our approach is considerably more accurate than leading recommender algorithms, with major relative improvements between 38% and 99% over industry-level algorithms. Besides, our approach sheds light on decision-making processes by identifying groups of individuals that have consistently similar preferences, and enabling the analysis of the characteristics of those groups.

artificial intelligence, bayesian inference, machine learning, (15 more...)

doi: 10.1371/journal.pone.0044620

1210.1048

Country:

North America > United States (0.68)
Europe > Spain > Catalonia (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.41)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Szabo, Zoltan, Lorincz, Andras

Distributed High Dimensional Information Theoretical Image Registration via Random Projections

However, the estimation of these quantities is computationally intensive in high dimensions. On the other hand, consistent estimation from pairwise distances of the sample points is possible, which suits random projection(RP) based low dimensional embeddings. We adapt the RP technique to this task by means of a simple ensemble method. To the best of our knowledge, this is the first distributed, RP based information theoretical image registration approach. The efficiency of the method is demonstrated through numerical examples. Keywords: random projection, information theoretical image registration, high dimensional features, distributed solution 1. Introduction Machine learning methods are notoriously limited by the high dimensional nature of the data. This problem may be alleviated via the random projection (RP) technique, which has been successfully applied, e.g., in the fields of

dimension, machine learning, pattern recognition, (18 more...)

doi: 10.1016/j.dsp.2012.04.018

1210.0824

Country:

North America > United States (0.68)
Europe (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.84)

Mahrsi, Mohamed Khalil El, Rossi, Fabrice

Graph-Based Approaches to Clustering Network-Constrained Trajectory Data

Even though clustering trajectory data attracted considerable attention in the last few years, most of prior work assumed that moving objects can move freely in an euclidean space and did not consider the eventual presence of an underlying road network and its influence on evaluating the similarity between trajectories. In this paper, we present two approaches to clustering network-constrained trajectory data. The first approach discovers clusters of trajectories that traveled along the same parts of the road network. The second approach is segment-oriented and aims to group together road segments based on trajectories that they have in common. Both approaches use a graph model to depict the interactions between observations w.r.t. their similarity and cluster this similarity graph using a community detection algorithm. We also present experimental results obtained on synthetic data to showcase our propositions.

data mining, machine learning, trajectory, (17 more...)

1210.0762

Country:

North America > United States (0.30)
Europe (0.28)

Genre: Research Report (0.40)

Industry:

Transportation > Ground > Road (0.97)
Transportation > Infrastructure & Services (0.83)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Cerra, Daniele, Datcu, Mihai

A fast compression-based similarity measure with applications to content-based image retrieval

Compression-based similarity measures are effectively employed in applications on diverse data types with a basically parameter-free approach. Nevertheless, there are problems in applying these techniques to medium-to-large datasets which have been seldom addressed. This paper proposes a similarity measure based on compression with dictionaries, the Fast Compression Distance (FCD), which reduces the complexity of these methods, without degradations in performance. On its basis a content-based color image retrieval system is defined, which can be compared to state-of-the-art methods based on invariant color features. Through the FCD a better understanding of compression-based techniques is achieved, by performing experiments on datasets which are larger than the ones analyzed so far in literature.

artificial intelligence, data mining, machine learning, (18 more...)

doi: 10.1016/j.jvcir.2011.10.009

1210.0758

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Jenatton, Rodolphe, Gribonval, Rémi, Bach, Francis

Local stability and robustness of sparse dictionary learning in the presence of noise

Modelling signals as sparse linear combinations of atoms selected from a dictionary has become a popular paradigm in many fields, including signal processing, statistics, and machine learning. This line of research has witnessed the development of several well-founded theoretical frameworks (see, e.g., Wainwright [2009], Zhang [2009]) and efficient algorithmic tools (see, e.g., Bach et al. [2011] and references therein). However, the performance of such approaches hinges on the representation of the signals, which makes the question of designing "good" dictionaries prominent. A great deal of effort has been dedicated to come up with efficient predefined dictionaries, e.g., the various types of wavelets [Mallat, 2008]. These representations have notably contributed to many successful image processing applications such as compression, denoising and deblurring. More recently, the idea of simultaneously learning the dictionary and the sparse decompositions of the signals--also known as sparse dictionary learning, or simply, sparse coding--has emerged as a powerful framework, with state-of-the-art performance in many tasks, including inpainting and image classification (see, e.g., Mairal et al. [2010] and references therein). Although sparse dictionary learning can sometimes be formulated as convex[Bach et al., 2008, Bradley and Bagnell, 2009], nonparametric Bayesian [Zhou et al., 2009] and submodular [Krause and Cevher, 2010] problems, the most popular and widely used definition of sparse coding brings into play a non-convex optimization problem. Despite its empirical and practical success, there has only been little theoretical analysis of the properties of sparse dictionary learning.

artificial intelligence, machine learning, optimization problem, (19 more...)

1210.0685

Country: Europe > France (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)