Goto

Collaborating Authors

 Inductive Learning


Exploring Social Context for Topic Identification in Short and Noisy Texts

AAAI Conferences

With the pervasion of social media, topic identification in short texts attracts increasing attention inย  recent years. However, in nature the texts of social media are short and noisy, and the structures are sparse and dynamic, resulting in difficulty to identify topic categories exactly from online social media. Inspired by social science findings that preference consistency and social contagion are observed in social media, we investigate topic identification in short and noisy texts by exploring social context from the perspective of social sciences. In particular, we present a mathematical optimization formulation that incorporates the preference consistency and social contagion theories into a supervised learning method, and conduct feature selection to tackle short and noisy texts in social media, which result in a Sociological framework for Topic Identification (STI). Experimental results on real-world datasets from Twitter and Citation Network demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of social context in topic identification.


Marginalized Denoising for Link Prediction and Multi-Label Learning

AAAI Conferences

Link prediction and multi-label learning on graphs are two important but challenging machine learning problems that have broad applications in diverse fields. Not only are the two problems inherently correlated and often appear concurrently, they are also exacerbated by incomplete data. We develop a novel algorithm to solve these two problems jointly under a unified framework, which helps reduce the impact of graph noise and benefits both tasks individually. We reduce multi-label learning problem into an additional link prediction task and solve both problems with marginalized denoising, which we co-regularize with Laplacian smoothing. This approach combines both learning tasks into a single convex objective function, which we optimize efficiently with iterative closed-form updates. The resulting approach performs significantly better than prior work on several important real-world applications with great consistency.


Extending Analogical Generalization with Near-Misses

AAAI Conferences

Concept learning is a central problem for cognitive systems. Generalization techniques can help organize examples by their commonalities, but comparisons with non-examples, near-misses, can provide discrimination. Early work on near-misses required hand-selected examples by a teacher who understood the learnerโ€™s internal representations. This paper introduces Analogical Learning by Integrating Generalization and Near-misses (ALIGN) and describes three key advances. First, domain-general cognitive models of analogical processes are used to handle a wider range of examples. Second, ALIGNโ€™s analogical generalization process constructs multiple probabilistic representations per concept via clustering, and hence can learn disjunctive concepts. Finally, ALIGN uses unsupervised analogical retrieval to find its own near-miss examples. We show that ALIGN out-performs analogical generalization on two perceptual data sets: (1) hand-drawn sketches; and (2) geospatial concepts from strategy-game maps.


On the Bayes-optimality of F-measure maximizers

arXiv.org Machine Learning

The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems.


Active Learning of Hierarchical Policies from State-Action Trajectories

AAAI Conferences

While most work on trajectory mining is applied to pre- dict movements of mobile users, in this paper we consider a more general problem of building behavior models of users from their state-action trajectories. We assume that the user behavior can be compactly modeled as a Probabilistic State-Dependent Grammar (PSDG) which represents a hierarchical policy. The key problem is that while the states and actions of the user are directly observed, his intentional structure is not. We propose to learn the userโ€™s policy from a set of selected trajectories and intention queries at selected states in the trajectory. Our main contributions are an algorithm for learning hierarchical policies from state-action trajectories, and principled heuristics for selecting suitable trajectories and intention queries. Experiments in multiple domains show that our approach is effective and more sample-efficient than learning non-hierarchical policies.


Supervised learning sets benchmark for robust spike detection from calcium imaging signals

arXiv.org Machine Learning

A fundamental challenge in calcium imaging has been to infer the timing of action potentials from the measured noisy calcium fluorescence traces. We systematically evaluate a range of spike inference algorithms on a large benchmark dataset recorded from varying neural tissue (V1 and retina) using different calcium indicators (OGB-1 and GCamp6). We show that a new algorithm based on supervised learning in flexible probabilistic models outperforms all previously published techniques, setting a new standard for spike inference from calcium signals. Importantly, it performs better than other algorithms even on datasets not seen during training. Future data acquired in new experimental conditions can easily be used to further improve its spike prediction accuracy and generalization performance. Finally, we show that comparing algorithms on artificial data is not informative about performance on real population imaging data, suggesting that a benchmark dataset may greatly facilitate future algorithmic developments.


Deep Distributed Random Samplings for Supervised Learning: An Alternative to Random Forests?

arXiv.org Machine Learning

In (\cite{zhang2014nonlinear,zhang2014nonlinear2}), we have viewed machine learning as a coding and dimensionality reduction problem, and further proposed a simple unsupervised dimensionality reduction method, entitled deep distributed random samplings (DDRS). In this paper, we further extend it to supervised learning incrementally. The key idea here is to incorporate label information into the coding process by reformulating that each center in DDRS has multiple output units indicating which class the center belongs to. The supervised learning method seems somewhat similar with random forests (\cite{breiman2001random}), here we emphasize their differences as follows. (i) Each layer of our method considers the relationship between part of the data points in training data with all training data points, while random forests focus on building each decision tree on only part of training data points independently. (ii) Our method builds gradually-narrowed network by sampling less and less data points, while random forests builds gradually-narrowed network by merging subclasses. (iii) Our method is trained more straightforward from bottom layer to top layer, while random forests build each tree from top layer to bottom layer by splitting. (iv) Our method encodes output targets implicitly in sparse codes, while random forests encode output targets by remembering the class attributes of the activated nodes. Therefore, our method is a simpler, more straightforward, and maybe a better alternative choice, though both methods use two very basic elements---randomization and nearest neighbor optimization---as the core. This preprint is used to protect the incremental idea from (\cite{zhang2014nonlinear,zhang2014nonlinear2}). Full empirical evaluation will be announced carefully later.


GENERALIZATION As SEARCH / 517 Generalization as Search

AI Classics

We learn (memorize) multiplication tables, learn (discover how) to walk, learn (build UP an understanding of, then an ability to synthesize) languages. Many subtasks and capabilities are involved in these various kinds of learning. One capability central to many kinds of learning is the ability to generalize: to take into account a large number of specific observations, then to extract and retain the important common features that characterize classes of these observations. This generalization problem has received considerable attention for two decades in the fields of Artificial Intelligence, Psychology, and Pattern Recognition (e.g., [Bruner, 1956],


10 An Experiment on Inductive Learning in Chess End Games

AI Classics

INTRODUCTION Further progress in the application of computers to many practical fields seems to depend heavily on the success in implementing learning and inductive processes within machines. For example, to develop a consultation system for medical or plant disease diagnosis, prognosis and decision making in general, it is very desirable, perhaps even necessary, to be able to'teach' the system through examples of correct and/or incorrect decisions, rather than by precisely describing the decision process in its full generality and then transforming this description into a computer program. A similar situation exists in computer chess. The development of computer programs playing at the master level (especially the end games) seems to be a formidable task if the programs are not eventually able to learn and improve on their decision making rules through the specific examples of games, rather than by being explicitly told all the rules. Due to easy access to human knowledge about chess and the relative simplicity of testing the results, chess is one of the most attractive testing domains for inductive inference programs. This report presents first results from an experiment on the application of an inductive learning program called AQVAL/1 developed at the University of Illinois, to chess end games.


MACHINE INTELLIGENCE 13

AI Classics

The two outstanding figures in the history of computer science are Alan Turing and John von Neumann, and they shared the view that logic was the key to understanding and automating computation. In particular, it was Turing who gave us in the mid-1930s the fundamental analysis, and the logical definition, of the concept of'computability by machine' and who discovered the surprising and beautiful basic fact that there exist universal machines which by suitable programming can be made to t This essay is an expanded and revised version of one entitled The Role of Logic in Computer Science and Artificial Intelligence, which was completed in January 1992 (and was later published in the Proceedings of the Fifth Generation computer Systems 1992 Conference). Since completing that essay I have had the benefit of extremely helpful discussions on many of the details with Professor Donald Michie and Professor I. J. Good, both of whom knew Turing well during the war years at Bletchley Park. Professor J. A. N. Lee, whose knowledge of the literature and archives of the history of computing is encyclopedic, also provided additional information, some of which is still unpublished. Further light has very recently been shed on the von Neumann side of the story by Norman Macrae's excellent biography John von Neumann (Macrae 1992). Accordingly, it seemed appropriate to undertake a more complete and thorough version of the FGCS'92 essay, focussing somewhat more on the interesting historical and biographical issues. I am grateful to Donald Michie and Stephen Muggleton for inviting me to contribute such a'second edition' to the present volume, and I would also like to thank the Institute for New Computer Technology (ICOT) for kind permission to make use of the FGCS'92 essay in this way. 1 LOGIC, COMPUTERS, TURING, AND VON NEUMANN