Goto

Collaborating Authors

 Pattern Recognition


A Pattern Matching Based Model for Implicit Opinion Question Identification

AAAI Conferences

This paper presents the results of developing subjectivity classifiers for Implicit Opinion Question (IOQ) identification. IOQs are defined as opinion questions with no opinion words. An IOQ example is "will the U.S. government pay more attention to the Pacific Rim?" Our analysis on community questions of Yahoo! Answers shows that a large proportion of opinion questions are IOQs. It is thus important to develop techniques to identify such questions. In this research, we first propose an effective framework based on mutual information and sequential pattern mining to construct an opinion lexicon that not only contains opinion words but also patterns. The discovered words and patterns are then combined with a machine learning technique to identify opinion questions. The experimental results on two datasets demonstrate the effectiveness of our approach.


Machine Learning in Proof General: Interfacing Interfaces

arXiv.org Artificial Intelligence

It allows users to gather proof statistics related to shapes of goals, sequences of applied tactics, and proof tree structures from the libraries of interactive higher-order proofs written in Coq and SSReflect. The gathered data is clustered using the state-of-the-art machine learning algorithms available in MATLAB and Weka. ML4PG provides automated interfacing between Proof General and MATLAB/Weka. The results of clustering are used by ML4PG to provide proof hints in the process of interactive proof development.


Rotation invariants of two dimensional curves based on iterated integrals

arXiv.org Machine Learning

We introduce a novel class of rotation invariants of two dimensional curves based on iterated integrals. The invariants we present are in some sense complete and we describe an algorithm to calculate them, giving explicit computations up to order six. We present an application to online (stroke-trajectory based) character recognition. This seems to be the first time in the literature that the use of iterated integrals of a curve is proposed for (invariant) feature extraction in machine learning applications.


Stretchy Time Pattern Mining: A Deeper Analysis of Environment Sensor Data

AAAI Conferences

Mining sequential patterns on environment sensor data is a challenging task; the data can present noises and may also contain sparse patterns, which are difficult to be detected. The knowledge extracted from environment sensor data can be used to determine climate changes. However, there is a lack of methods that can handle this kind of database. In this paper, we propose a method to mine sequential patterns in sparse, incomplete and noisy sensor data. The proposed method, called Stretchy Time Windows (STW), allows the mining of sequential patterns that present time gaps between their events. We propose an algorithm to implement STW, called Miner of Stretchy Time Sequences (MSTS). The proposed algorithm works with sequences of any size and uses a balanced strategy to analyze the search space. Our experiments show that MSTS returns sequences that have a longer period of analysis than GSP a traditional frequent pattern mining algorithm. In fact, 5 times larger than GSP and higher number of patterns (2.3 times) when compared to previous methods.


Novel Curve Signatures and a Combination Method for Thai On-Line Handwriting Character Recognition

AAAI Conferences

There is no commercial character recognition software that supports Thai handwriting. Thai handwritten character recognition is needed to convert handwritten text written on mobile and tablet devices into computer encoded text. We propose a novel method that joins three curve signatures. The first signature is the normalized tangent angle function (TAF), which provides rough classification. The other two novel curve signatures are the relative position matrix (RPM), which is used to compare global curve features, and the straightened tangent angle function (STAF), which is used to compare the tangent angle along the cumulative unsigned curvature domain. In the recognition process, an input curve is extracted for these three signatures and the similarity against each character in the handwriting templates is measured. Then, the similarity scores are weighted and summed for ranking. Our experiment is done on 48 handwriting sample sets (44 Thai consonants appear in each set, and there are 4 sets per handwriting). Our methods yield an accuracy of 94.08% for personal handwriting, and 92.23% for general handwriting.


Computing as compression: the SP theory of intelligence

arXiv.org Artificial Intelligence

This paper provides an overview of the SP theory of intelligence and its central idea that artificial intelligence, mainstream computing, and much of human perception and cognition, may be understood as information compression. The background and origins of the SP theory are described, and the main elements of the theory, including the key concept of multiple alignment, borrowed from bioinformatics but with important differences. Associated with the SP theory is the idea that redundancy in information may be understood as repetition of patterns, that compression of information may be achieved via the matching and unification (merging) of patterns, and that computing and information compression are both fundamentally probabilistic. It appears that the SP system is Turing-equivalent in the sense that anything that may be computed with a Turing machine may, in principle, also be computed with an SP machine. One of the main strengths of the SP theory and the multiple alignment concept is in modelling concepts and phenomena in artificial intelligence. Within that area, the SP theory provides a simple but versatile means of representing different kinds of knowledge, it can model both the parsing and production of natural language, with potential for the understanding and translation of natural languages, it has strengths in pattern recognition, with potential in computer vision, it can model several kinds of reasoning, and it has capabilities in planning, problem solving, and unsupervised learning. The paper includes two examples showing how alternative parsings of an ambiguous sentence may be modelled as multiple alignments, and another example showing how the concept of multiple alignment may be applied in medical diagnosis.


Pattern Matching for Self- Tuning of MapReduce Jobs

arXiv.org Artificial Intelligence

In this paper, we study CPU utilization time patterns of several MapReduce applications. After extracting running patterns of several applications, they are saved in a reference database to be later used to tweak system parameters to efficiently execute unknown applications in future. To achieve this goal, CPU utilization patterns of new applications are compared with the already known ones in the reference database to find/predict their most probable execution patterns. Because of different patterns lengths, the Dynamic Time Warping (DTW) is utilized for such comparison; a correlation analysis is then applied to DTWs outcomes to produce feasible similarity patterns. Three real applications (WordCount, Exim Mainlog parsing and Terasort) are used to evaluate our hypothesis in tweaking system parameters in executing similar applications. Results were very promising and showed effectiveness of our approach on pseudo-distributed MapReduce platforms.


Learning to Predict from Textual Data

Journal of Artificial Intelligence Research

Given a current news event, we tackle the problem of generating plausible predictions of future events it might cause. We present a new methodology for modeling and predicting such future news events using machine learning and data mining techniques. Our Pundit algorithm generalizes examples of causality pairs to infer a causality predictor. To obtain precisely labeled causality examples, we mine 150 years of news articles and apply semantic natural language modeling techniques to headlines containing certain predefined causality patterns. For generalization, the model uses a vast number of world knowledge ontologies. Empirical evaluation on real news articles shows that our Pundit algorithm performs as well as non-expert humans.


Bayesian Group Nonnegative Matrix Factorization for EEG Analysis

arXiv.org Machine Learning

We propose a generative model of a group EEG analysis, based on appropriate kernel assumptions on EEG data. We derive the variational inference update rule using various approximation techniques. The proposed model outperforms the current state-of-the-art algorithms in terms of common pattern extraction. The validity of the proposed model is tested on the BCI competition dataset.


Classification Recouvrante Bas\'ee sur les M\'ethodes \`a Noyau

arXiv.org Machine Learning

Overlapping clustering problem is an important learning issue in which clusters are not mutually exclusive and each object may belongs simultaneously to several clusters. This paper presents a kernel based method that produces overlapping clusters on a high feature space using mercer kernel techniques to improve separability of input patterns. The proposed method, called OKM-K(Overlapping $k$-means based kernel method), extends OKM (Overlapping $k$-means) method to produce overlapping schemes. Experiments are performed on overlapping dataset and empirical results obtained with OKM-K outperform results obtained with OKM.