AITopics

Data streams containing objects that are (or can be) associated with more than one label at the same time are ubiquitous. In spite of its important applications, classification of streaming multi-label data is largely unexplored. Existing approaches try to tackle the problem by transferring traditional single-label stream classification practices to the multi-label domain. Nevertheless, they fail to consider some of the unique properties of the problem such as within and between class imbalance and multiple concept drift. To deal with these challenges, this paper proposes a novel multi-label stream classification approach that employs two windows for each label, one for positive and one for negative examples. Instance-sharing is exploited for space efficiency, while a time-efficient instantiation based on the k-Nearest Neighbor algorithm is also proposed. Finally, a batch-incremental thresholding technique is proposed to further deal with the class imbalance problem. Results of an empirical comparison against two other methods on three real world datasets are in favor of the proposed approach.

classification, classifier, negative example, (15 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Europe > Germany > Saxony-Anhalt > Magdeburg (0.05)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.88)

Wingate, David (Massachusetts Institute of Technology) | Goodman, Noah D. (Stanford University) | Roy, Daniel M. (Massachusetts Institute of Technology) | Kaelbling, Leslie P. (Massachusetts Institute of Technology) | Tenenbaum, Joshua B. (Massachusetts Institute of Technology)

Bayesian Policy Search with Policy Priors

We consider the problem of learning to act in partially observable, continuous-state-and-action worlds where we have abstract prior knowledge about the structure of the optimal policy in the form of a distribution over policies. Using ideas from planning-as-inference reductions and Bayesian unsupervised learning, we cast Markov Chain Monte Carlo as a stochastic, hill-climbing policy search algorithm. Importantly, this algorithm's search bias is directly tied to the prior and its MCMC proposal kernels, which means we can draw on the full Bayesian toolbox to express the search bias, including nonparametric priors and structured, recursive processes like grammars over action sequences. Furthermore, we can reason about uncertainty in the search bias itself by constructing a hierarchical prior and reasoning about latent variables that determine the abstract structure of the policy. This yields an adaptive search algorithm---our algorithm learns to learn a structured policy efficiently. We show how inference over the latent variables in these policy priors enables intra- and intertask transfer of abstract knowledge. We demonstrate the flexibility of this approach by learning meta search biases, by constructing a nonparametric finite state controller to model memory, by discovering motor primitives using a simple grammar over primitive actions, and by combining all three.

algorithm, maze, motor primitive, (14 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Local and Structural Consistency for Multi-Manifold Clustering

Wang, Yong (National University of Defense Technology) | Jiang, Yuan (Nanjing University) | Wu, Yi (National University of Defense Technology) | Zhou, Zhi-Hua (Nanjing University)

Data sets containing multi-manifold structures are ubiquitous in real-world tasks, and effective grouping of such data is an important yet challenging problem. Though there were many studies on this problem, it is not clear on how to design principled methods for the grouping of multiple hybrid manifolds. In this paper, we show that spectral methods are potentially helpful for hybridmanifold clustering when the neighborhood graph is constructed to connect the neighboring samples from the same manifold. However, traditional algorithms which identify neighbors according to Euclidean distance will easily connect samples belonging to different manifolds. To handle this drawback, we propose a new criterion, i.e., local and structural consistency criterion, which considers the neighboring information as well as the structural information implied by the samples. Based on this criterion, we develop a simple yet effective algorithm, named Local and Structural Consistency (LSC), for clustering with multiple hybrid manifolds. Experiments show that LSC achieves promising performance.

algorithm, manifold, neighbor, (13 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

Fast Nonnegative Matrix Tri-Factorization for Large-Scale Data Co-Clustering

Wang, Hua (University of Texas at Arlington) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington) | Makedon, Fillia (University of Texas at Arlington)

NonnegativeMatrix Factorization (NMF) based coclustering methods have attracted increasing attention in recent years because of their mathematical elegance and encouraging empirical results. However, the algorithms to solve NMF problems usually involve intensive matrix multiplications, which make them computationally inefficient. In this paper, instead of constraining the factor matrices of NMF to be nonnegative as existing methods, we propose a novel Fast Nonnegative Matrix Trifactorization (FNMTF) approach to constrain them to be cluster indicator matrices, a special type of nonnegative matrices. As a result, the optimization problem of our approach can be decoupled, which results in much smaller size subproblems requiring much less matrix multiplications, such that our approach works well for large-scale input data. Moreover, the resulted factor matrices can directly assign cluster labels to data points and features due to the nature of indicator matrices. In addition, through exploiting the manifold structures in both data and feature spaces, we further introduce the Locality Preserved FNMTF (LP-FNMTF) approach, by which the clustering performance is improved. The promising results in extensive experimental evaluations validate the effectiveness of the proposed methods.

algorithm, indicator matrix, matrix, (12 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country: North America > United States > Texas > Tarrant County > Arlington (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Wang, Chang (IBM T. J. Watson Research) | Mahadevan, Sridhar (University of Massachusetts)

Jointly Learning Data-Dependent Label and Locality-Preserving Projections

This paper describes a novel framework to jointly learn data-dependent label and locality-preserving projections. Given a set of data instances from multiple classes, the proposed approach can automatically learn which classes are more similar to each other, and construct discriminative features using both labeled and unlabeled data to map similar classes to similar locations in a lower dimensional space. In contrast to linear discriminant analysis (LDA) and its variants, which can only return c-1 features for a problem with c classes, the proposed approach can generate d features, where d is bounded only by the number of the input features. We describe and evaluate the new approach both theoretically and experimentally, and compare its performance with other state of the art methods.

discriminative projection, projection, topology, (15 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Heterogeneous Domain Adaptation using Manifold Alignment

Wang, Chang (IBM Research) | Mahadevan, Sridhar (University of Massachusetts)

We propose a manifold alignment based approach for heterogeneous domain adaptation. A key aspect of this approach is to construct mappings to link different feature spaces in order to transfer knowledge across domains. The new approach can reuse labeled data from multiple source domains in a target domain even in the case when the input domains do not share any common features or instances. As a pre-processing step, our approach can also be combined with existing domain adaptation approaches to learn a common feature space for all input domains. This paper extends existing manifold alignment approaches by making use of labels rather than correspondences to align the manifolds. This extension significantly broadens the application scope of manifold alignment, since the correspondence relationship required by existing alignment approaches is hard to obtain in many applications.

alignment, input domain, target domain, (15 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Utility-Based Fraud Detection

Torgo, Luis (LIAAD - Inesc Porto LA) | Lopes, Elsa (LIAAD - Inesc Porto LA)

Fraud detection is a key activity with serious socio-economical impact. Inspection activities associated with this task are usually constrained by limited available resources. Data analysis methods can provide help in the task of deciding where to allocate these limited resources in order to optimise the outcome of the inspection activities. This paper presents a multi-strategy learning method to address the question of which cases to inspect first. The proposed methodology is based on the utility theory and provides a ranking ordered by decreasing expected outcome of inspecting the candidate cases. This outcome is a function not only of the probability of the case being fraudulent but also of the inspection costs and expected payoff if the case is confirmed as a fraud. The proposed methodology is general and can be useful on fraud detection activities with limited inspection resources. We experimentally evaluate our proposal on both an artificial domain and on a real world task.

inspection cost, probability, transaction, (16 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Hawaii (0.04)
Europe > Portugal > Porto > Porto (0.04)

Genre: Research Report (0.69)

Industry: Law Enforcement & Public Safety > Fraud (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Angular Decomposition

Sun, Dengdi (Anhui University) | Ding, Chris H.Q. (University of Texas at Arlington) | Luo, Bin (Anhui University) | Tang, Jin (Anhui University)

Dimensionality reduction plays a vital role in pattern recognition. However, for normalized vector data, existing methods do not utilize the fact that the data is normalized. In this paper, we propose to employ an Angular Decomposition of the normalized vector data which corresponds to embedding them on a unit surface. On graph data for similarity/kernel matrices with constant diagonal elements, we propose the Angular Decomposition of the similarity matrices which corresponds to embedding objects on a unit sphere. In these angular embeddings, the Euclidean distance is equivalent to the cosine similarity. Thus data structures best described in the cosine similarity and data structures best captured by the Euclidean distance can both be effectively detected in our angular embedding. We provide the theoretical analysis, derive the computational algorithm, and evaluate the angular embedding on several datasets. Experiments on data clustering demonstrate that our method can provide a more discriminative subspace.

angular decomposition, graph data, vector data, (12 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Texas > Tarrant County > Arlington (0.04)
North America > United States > New York (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Sharara, Hossam (University of Maryland, College Park) | Getoor, Lise (University of Maryland, College Park) | Norton, Myra (Community Analytics, Baltimore)

Active Surveying: A Probabilistic Approach for Identifying Key Opinion Leaders

Opinion leaders play an important role in influencing people’s beliefs, actions and behaviors. Although a number of methods have been proposed for identifying influentials using secondary sources of information, the use of primary sources, such as surveys, is still favored in many domains. In this work we present a new surveying method which combines secondary data with partial knowledge from primary sources to guide the information gathering process. We apply our proposed active surveying method to the problem of identifying key opinion leaders in the medical field, and show how we are able to accurately identify the opinion leaders while minimizing the amount of primary data required, which results in significant cost reduction in data acquisition without sacrificing its integrity.

nomination, opinion leader, respondent, (16 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Maryland > Baltimore (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre:

Questionnaire & Opinion Survey (0.47)
Research Report (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Classification of Emerging Extreme Event Tracks in Multivariate Spatio-Temporal Physical Systems Using Dynamic Network Structures: Application to Hurricane Track Prediction

Sencan, Huseyin (North Carolina State University) | Chen, Zhengzhang (North Carolina State University) | Hendrix, William (Northwestern University) | Pansombut, Tatdow (North Carolina State University) | Semazzi, Frederick (North Carolina State University) | Choudhary, Alok (North Carolina State University) | Kumar, Vipin (University of Minnesota) | Melechko, Anatoli V. (North Carolina State University) | Samatova, Nagiza F. (Oak Ridge National Laboratory)

Understanding extreme events, such as hurricanes or forest fires, is of paramount importance because of their adverse impacts on human beings. Such events often propagate in space and time. Predicting-even a few days in advance-what locations will get affected by the event tracks could benefit our society in many ways. Arguably, simulations from “first principles,” where underlying physics-based models are described by a system of equations, provide least reliable predictions for variables characterizing the dynamics of these extreme events. Data-driven model building has been recently emerging as a complementary approach that could learn the relationships between historically observed or simulated multiple, spatio-temporal ancillary variables and the dynamic behavior of extreme events of interest. While promising, the methodology for predictive learning from such complex data is still in its infancy. In this paper, we propose a dynamic networks-based methodology for in-advance prediction of the dynamic tracks of emerging extreme events. By associating a network model of the system with the known tracks, our method is capable of learning the recurrent network motifs that could be used as discriminatory signatures for the event's behavioral class. When applied to classifying the behavior of the hurricane tracks at their early formation stages in Western Africa region, our method is able to predict whether hurricane tracks will hit the land of the North Atlantic region at least 10-15 days lead lag time in advance with more than 90% accuracy using 10-fold cross-validation. To the best of our knowledge, no comparable methodology exists for solving this problem using data-driven models.

classifier, hurricane, network motif, (14 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Africa > West Africa (0.24)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Portugal > Braga > Braga (0.05)
(5 more...)

Genre:

Research Report > Experimental Study (0.70)
Research Report > New Finding (0.47)

Industry:

Energy (0.94)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.67)