AITopics | Data Science

Collaborating Authors

Data Science

News Overviews Instructional Materials AI-Alerts Classics

Stretchy Time Pattern Mining: A Deeper Analysis of Environment Sensor Data

Junior, Carlos Roberto Silveira (Federal University of São Carlos) | Santos, Marilde Terezinha Prado (Federal University of São Carlos) | Ribeiro, Marcela Xavier (Federal University of São Carlos)

AAAI ConferencesMay-19-2013

Mining sequential patterns on environment sensor data is a challenging task; the data can present noises and may also contain sparse patterns, which are difficult to be detected. The knowledge extracted from environment sensor data can be used to determine climate changes. However, there is a lack of methods that can handle this kind of database. In this paper, we propose a method to mine sequential patterns in sparse, incomplete and noisy sensor data. The proposed method, called Stretchy Time Windows (STW), allows the mining of sequential patterns that present time gaps between their events. We propose an algorithm to implement STW, called Miner of Stretchy Time Sequences (MSTS). The proposed algorithm works with sequences of any size and uses a balanced strategy to analyze the search space. Our experiments show that MSTS returns sequences that have a longer period of analysis than GSP a traditional frequent pattern mining algorithm. In fact, 5 times larger than GSP and higher number of patterns (2.3 times) when compared to previous methods.

deeper analysis, environment sensor data, stretchy time pattern mining

AAAI Conferences

The Twenty-Sixth International FLAIRS Conference

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Add feedback

Recognizing Artificial Faces Using Wavelet Based Adapted Median Binary Patterns

Mohamed, Abdallah (University of Louisville) | Yampolskiy, Roman (University of Louisville)

AAAI ConferencesMay-19-2013

Recognizing avatar faces is a challenge and very important issue for terrorism and security experts. Recently some avatar face recognition techniques are proposed but they are still limited. In this paper, we propose a novel face recognition technique based on discrete wavelet transform and Adapted Median Binary Pattern (AMBP) operator to recognize avatar faces from different virtual worlds. The original LBP operator mainly thresholds pixels in a specific predetermined window based on the central pixel’s value of that window. As a result the LBP operator becomes more sensitive to noise especially in near-uniform or flat area regions of an image. One way to reduce the effect of noise is to update the threshold automatically based on all pixels in the neighborhood using some simple statistical operations. Experiments conducted on two virtual world avatar face image datasets show that our technique performs better than original LBP, adapted LBP, Median Binary Pattern (MBP) and wavelet statistical adapted LBP in terms of accuracy.

median binary pattern, recognizing artificial face, wavelet

AAAI Conferences

The Twenty-Sixth International FLAIRS Conference

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (0.73)
Information Technology > Data Science > Data Quality > Data Transformation (0.53)

Add feedback

Online Learning in a Contract Selection Problem

Tekin, Cem, Liu, Mingyan

arXiv.org Machine LearningMay-14-2013

In an online contract selection problem there is a seller which offers a set of contracts to sequentially arriving buyers whose types are drawn from an unknown distribution. If there exists a profitable contract for the buyer in the offered set, i.e., a contract with payoff higher than the payoff of not accepting any contracts, the buyer chooses the contract that maximizes its payoff. In this paper we consider the online contract selection problem to maximize the sellers profit. Assuming that a structural property called ordered preferences holds for the buyer's payoff function, we propose online learning algorithms that have sub-linear regret with respect to the best set of contracts given the distribution over the buyer's type. This problem has many applications including spectrum contracts, wireless service provider data plans and recommendation systems.

computer based training, contract, educational technology, (22 more...)

arXiv.org Machine Learning

1305.3334

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.66)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Tensor Decompositions: A New Concept in Brain Data Analysis?

Cichocki, Andrzej

arXiv.org Machine LearningMay-2-2013

Matrix factorizations and their extensions to tensor factorizations and decompositions have become prominent techniques for linear and multilinear blind source separation (BSS), especially multiway Independent Component Analysis (ICA), NonnegativeMatrix and Tensor Factorization (NMF/NTF), Smooth Component Analysis (SmoCA) and Sparse Component Analysis (SCA). Moreover, tensor decompositions have many other potential applications beyond multilinear BSS, especially feature extraction, classification, dimensionality reduction and multiway clustering. In this paper, we briefly overview new and emerging models and approaches for tensor decompositions in applications to group and linked multiway BSS/ICA, feature extraction, classification andMultiway Partial Least Squares (MPLS) regression problems. Keywords: Multilinear BSS, linked multiway BSS/ICA, tensor factorizations and decompositions, constrained Tucker and CP models, Penalized Tensor Decompositions (PTD), feature extraction, classification, multiway PLS and CCA.

decomposition, health & medicine, neurology, (18 more...)

arXiv.org Machine Learning

1305.0395

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Testing Hypotheses by Regularized Maximum Mean Discrepancy

Danafar, Somayeh, Rancoita, Paola M. V., Glasmachers, Tobias, Whittingstall, Kevin, Schmidhuber, Juergen

arXiv.org Artificial IntelligenceMay-2-2013

Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare distributions by the distance between their embeddings. We show that Regularized Maximum Mean Discrepancy (RMMD), our novel measure for kernel-based hypothesis testing, yields substantial improvements even when sample sizes are small, and excels at hypothesis tests involving multiple comparisons with power control. We derive asymptotic distributions under the null and alternative hypotheses, and assess power control. Outstanding results are obtained on: challenging EEG data, MNIST, the Berkley Covertype, and the Flare-Solar dataset.

artificial intelligence, health & medicine, mmd, (19 more...)

arXiv.org Artificial Intelligence

1305.0423

Country:

Europe (0.93)
North America > United States (0.47)
North America > Canada > Quebec > Estrie Region > Sherbrooke (0.14)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Data Science > Data Mining (0.68)

Add feedback

Semi-Supervised Information-Maximization Clustering

Calandriello, Daniele, Niu, Gang, Sugiyama, Masashi

arXiv.org Machine LearningMay-1-2013

Semi-supervised clustering aims to introduce prior knowledge in the decision process of a clustering algorithm. In this paper, we propose a novel semi-supervised clustering algorithm based on the information-maximization principle. The proposed method is an extension of a previous unsupervised information-maximization clustering algorithm based on squared-loss mutual information to effectively incorporate must-links and cannot-links. The proposed method is computationally efficient because the clustering solution can be obtained analytically via eigendecomposition. Furthermore, the proposed method allows systematic optimization of tuning parameters such as the kernel width, given the degree of belief in the must-links and cannot-links. The usefulness of the proposed method is demonstrated through experiments.

artificial intelligence, data mining, information, (18 more...)

arXiv.org Machine Learning

1304.802

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Semi-supervised Eigenvectors for Large-scale Locally-biased Learning

Hansen, Toke J., Mahoney, Michael W.

arXiv.org Machine LearningApr-28-2013

In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks "nearby" that prespecified target region. For example, one might be interested in the clustering structure of a data graph near a prespecified "seed set" of nodes, or one might be interested in finding partitions in an image that are near a prespecified "ground truth" set of pixels. Locally-biased problems of this sort are particularly challenging for popular eigenvector-based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvector-based methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing a methodology to construct semi-supervised eigenvectors of a graph Laplacian, and we illustrate how these locally-biased eigenvectors can be used to perform locally-biased machine learning. These semi-supervised eigenvectors capture successively-orthogonalized directions of maximum variance, conditioned on being well-correlated with an input seed set of nodes that is assumed to be provided in a semi-supervised manner. We show that these semi-supervised eigenvectors can be computed quickly as the solution to a system of linear equations; and we also describe several variants of our basic method that have improved scaling properties. We provide several empirical examples demonstrating how these semi-supervised eigenvectors can be used to perform locally-biased learning; and we discuss the relationship between our results and recent machine learning algorithms that use global eigenvectors of the graph Laplacian.

eigenvector, neurology, optimization problem, (19 more...)

arXiv.org Machine Learning

1304.7528

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.93)
Health & Medicine > Health Care Technology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Analytic Expressions for Stochastic Distances Between Relaxed Complex Wishart Distributions

Frery, Alejandro C., Nascimento, Abraão D. C., Cintra, Renato J.

arXiv.org Machine LearningApr-19-2013

The scaled complex Wishart distribution is a widely used model for multilook full polarimetric SAR data whose adequacy has been attested in the literature. Classification, segmentation, and image analysis techniques which depend on this model have been devised, and many of them employ some type of dissimilarity measure. In this paper we derive analytic expressions for four stochastic distances between relaxed scaled complex Wishart distributions in their most general form and in important particular cases. Using these distances, inequalities are obtained which lead to new ways of deriving the Bartlett and revised Wishart distances. The expressiveness of the four analytic distances is assessed with respect to the variation of parameters. Such distances are then used for deriving new tests statistics, which are proved to have asymptotic chi-square distribution. Adopting the test size as a comparison criterion, a sensitivity study is performed by means of Monte Carlo experiments suggesting that the Bhattacharyya statistic outperforms all the others. The power of the tests is also assessed. Applications to actual data illustrate the discrimination and homogeneity identification capabilities of these distances.

artificial intelligence, machine learning, statistics, (15 more...)

arXiv.org Machine Learning

1304.5417

Country:

North America > United States (0.46)
South America > Brazil > Pernambuco (0.14)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.68)
Information Technology > Sensing and Signal Processing > Image Processing (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Sparse Coding and Dictionary Learning for Symmetric Positive Definite Matrices: A Kernel Approach

Harandi, Mehrtash T., Sanderson, Conrad, Hartley, Richard, Lovell, Brian C.

arXiv.org Machine LearningApr-16-2013

Recent advances suggest that a wide range of computer vision problems can be addressed more appropriately by considering non-Euclidean geometry. This paper tackles the problem of sparse coding and dictionary learning in the space of symmetric positive definite matrices, which form a Riemannian manifold. With the aid of the recently introduced Stein kernel (related to a symmetric version of Bregman matrix divergence), we propose to perform sparse coding by embedding Riemannian manifolds into reproducing kernel Hilbert spaces. This leads to a convex and kernel version of the Lasso problem, which can be solved efficiently. We furthermore propose an algorithm for learning a Riemannian dictionary (used for sparse coding), closely tied to the Stein kernel. Experiments on several classification tasks (face recognition, texture classification, person re-identification) show that the proposed sparse coding approach achieves notable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as tensor sparse coding, Riemannian locality preserving projection, and symmetry-driven accumulation of local features.

artificial intelligence, health & medicine, survey article, (20 more...)

arXiv.org Machine Learning

doi: 10.1007/978-3-642-33709-3_16

1304.4344

Country: Oceania > Australia (0.46)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Link Prediction with Social Vector Clocks

Lee, Conrad, Nick, Bobo, Brandes, Ulrik, Cunningham, Pádraig

arXiv.org Machine LearningApr-15-2013

State-of-the-art link prediction utilizes combinations of complex features derived from network panel data. We here show that computationally less expensive features can achieve the same performance in the common scenario in which the data is available as a sequence of interactions. Our features are based on social vector clocks, an adaptation of the vector-clock concept introduced in distributed computing to social interaction networks. In fact, our experiments suggest that by taking into account the order and spacing of interactions, social vector clocks exploit different aspects of link formation so that their combination with previous approaches yields the most accurate predictor to date.

inductive learning, survey article, us government, (25 more...)

arXiv.org Machine Learning

1304.4058

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (0.95)
Leisure & Entertainment > Sports > Olympic Games (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Data Science > Data Mining (0.90)
Information Technology > Information Management > Search (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback