AITopics

In knowledge bases or information extraction results, differently expressed relations can be semantically similar (e.g., (X, wrote, Y) and (X,’s written work, Y)). Therefore, grouping semantically similar relations into clusters would facilitate and improve many applications, including knowledge base completion, information extraction, information retrieval, and more. This paper formulates relation clustering as a constrained tripartite graph clustering problem, presents an efficient clustering algorithm and exhibits the advantage of the constrained framework. We introduce several ways that provide side information via must-link and cannot link constraints to improve the clustering results. Different from traditional semi-supervised learning approaches, we propose to use the similarity of relation expressions and the knowledge of entity types to automatically construct the constraints for the algorithm. We show improved relation clustering results on two datasets extracted from human annotated knowledge base (i.e., Freebase) and open information extraction results (i.e., ReVerb data).

constraint, extraction, relation, (16 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Illinois (0.04)
North America > United States > New York (0.04)
Asia > China > Beijing > Beijing (0.04)

Industry: Media (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Tao, Shaozhe (University of Minnesota, Twin Cities) | Boley, Daniel (University of Minnesota, Twin Cities) | Zhang, Shuzhong (University of Minnesota, Twin Cities)

Convergence of Common Proximal Methods for L1-Regularized Least Squares

We compare the convergence behavior of ADMM (alternating direction method of multipliers), [F]ISTA ([fast] iterative shrinkage and thresholding algorithm) and CD (coordinate descent) methods on the model L1-regularized least squares problem (aka LASSO). We use an eigen analysis of the operators to compare their local convergence rates when close to the solution. We find that, when applicable, CD is often much faster than the other iterations, when close enough to the solution. When far from the solution, the spectral analysis implies that one can often get a sequence of iterates that appears to stagnate, but is actually taking small constant steps toward the solution. We also illustrate how the unaccelerated ISTA algorithm can sometimes be faster compared to FISTA when close enough to the solution.

ista, iteration, regime, (16 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Sketch the Storyline with CHARCOAL: A Non-Parametric Approach

Generating a coherent synopsis and revealing the development threads for news stories from the increasing amounts of news content remains aformidable challenge. In this paper, we proposed a hddCRP (hybird distant-dependent ChineseRestaurant Process) based HierARChical tOpic model for news Article cLustering, abbreviated as CHARCOAL. Given a bunch of news articles, the outcome of CHARCOAL is threefold: 1) it aggregates relevant new articles into clusters (i.e., stories); 2) it disentangles the chain links (i.e., storyline) between articles in their describing story; 3) it discerns the topics that each story is assigned (e.g., Malaysia Airlines Flight 370 story belongs to the aircraft accident topic and U.S presidential election stories belong to the politics topic). CHARCOAL completes this task by utilizing a hddCRP as prior, and the entities (e.g., names of persons, organizations, or locations) that appear in news articles as clues. Moveover, the adaptation of nonparametric nature in CHARCOAL makes our model can adaptively learn the appropriate number of stories and topics from news corpus. The experimental analysis and results demonstrate both interpretability and superiority of the proposed approach.

assignment, charcoal, storyline, (16 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Malaysia (0.25)
South America > Brazil (0.15)
Oceania > Australia (0.05)
(4 more...)

Industry:

Transportation > Air (1.00)
Government (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.82)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Polytree-Augmented Classifier Chains for Multi-Label Classification

Sun, Lu (Hokkaido University) | Kudo, Mineichi (Hokkaido University)

Multi-label classification is a challenging and appealing supervised learning problem where a subset of labels, rather than a single label seen in traditional classification problems, is assigned to a single test instance. Classifier chains based methods are a promising strategy to tackle multi-label classification problems as they model label correlations at acceptable complexity. However, these methods are difficult to approximate the underlying dependency in the label space, and suffer from the problems of poorly ordered chain and error propagation. In this paper, we propose a novel polytree-augmented classifier chains method to remedy these problems. A polytree is used to model reasonable conditional dependence between labels over attributes, under which the directional relationship between labels within causal basins could be appropriately determined. In addition, based on the max-sum algorithm, exact inference would be performed on polytrees at reasonable cost, preventing from error propagation. The experiments performed on both artificial and benchmark multi-label data sets demonstrated that the proposed method is competitive with the state-of-the-art multi-label classification methods.

classification, classifier, correlation, (17 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
Oceania > New Zealand > North Island > Waikato (0.04)
(2 more...)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Open Domain Short Text Conceptualization: A Generative + Descriptive Modeling Approach

Song, Yangqiu (University of Illinois at Urbana-Champaign) | Wang, Shusen (Zhejiang University) | Wang, Haixun (Google)

Concepts embody the knowledge to facilitate our cognitive processes of learning. Mapping short texts to a large set of open domain concepts has gained many successful applications. In this paper, we unify the existing conceptualization methods from a Bayesian perspective, and discuss the three modeling approaches: descriptive, generative, and discriminative models. Motivated by the discussion of their advantages and shortcomings, we develop a generative + descriptive modeling approach. Our model considers term relatedness in the context, and will result in disambiguated conceptualization. We show the results of short text clustering using a news title data set and a Twitter message data set, and demonstrate the effectiveness of the developed approach compared with the state-of-the-art conceptualization and topic modeling approaches.

concept vector, conceptualization, short text, (12 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > New York (0.04)
North America > United States > Illinois (0.04)
North America > Canada (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Information Management (0.93)
(2 more...)

A Geometric Theory of Feature Selection and Distance-Based Measures

Shin, Kilho (University of Hyogo) | Angulo, Adrian Pino (University of Hyogo)

Feature selection measures are often explained by the analogy to a rule to measure the “distance” of sets of features to the “closest” ideal sets of features. An ideal feature set is such that it can determine classes uniquely and correctly. This way of explanation was just an analogy before this paper. In this paper, we show a way to map arbitrary feature sets of datasets into a common metric space, which is indexed by a real number p with 1 ≤ p ≤ ∞. Since this determines the distance between an arbitrary pair of feature sets, even if they belong to different datasets, the distance of a feature set to the closest ideal feature set can be used as a feature selection measure. Surprisingly, when p = 1, the measure is identical to the Bayesian risk, which is probably the feature selection measure that is used the most widely in the literature. For 1 < p ≤ ∞, the measure is novel and has significantly different properties from the Bayesian risk. We also investigate the correlation between measurements by these measures and classification accuracy through experiments. As a result, we show that our novel measures with p > 1 exhibit stronger correlation than the Bayesian risk.

classification accuracy, dataset, distance 0, (11 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > California > Orange County > Irvine (0.04)
Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Semi-Orthogonal Multilinear PCA with Relaxed Start

Shi, Qiquan (Hong Kong Baptist University) | Lu, Haiping (Hong Kong Baptist University)

Principal component analysis (PCA) is an unsupervised method for learning low-dimensional features with orthogonal projections. Multilinear PCA methods extend PCA to deal with multidimensional data (tensors) directly via tensor-to-tensor projection or tensor-to-vector projection (TVP). However, under the TVP setting, it is difficult to develop an effective multilinear PCA method with the orthogonality constraint. This paper tackles this problem by proposing a novel Semi-Orthogonal Multilinear PCA (SO-MPCA) approach. SO-MPCA learns low-dimensional features directly from tensors via TVP by imposing the orthogonality constraint in only one mode. This formulation results in more captured variance and more learned features than full orthogonality. For better generalization, we further introduce a relaxed start (RS) strategy to get SO-MPCA-RS by fixing the starting projection vectors, which increases the bias and reduces the variance of the learning model. Experiments on both face (2D) and gait (3D) data demonstrate that SO-MPCA-RS outperforms other competing algorithms on the whole, and the relaxed start strategy is also effective for other TVP-based PCA methods.

constraint, tensor, variance, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Africa > Senegal > Kolda Region > Kolda (0.05)
Asia > China > Hong Kong (0.05)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Deep Linear Coding for Fast Graph Clustering

Shao, Ming (Northeastern University) | Li, Sheng (Northeastern University) | Ding, Zhengming (Northeastern University) | Fu, Yun (Northeastern University)

Clustering has been one of the most critical unsupervised learning techniques that has been widely applied in data mining problems. As one of its branches, graph clustering enjoys its popularity due to its appealing performance and strong theoretical supports. However, the eigen-decomposition problems involved are computationally expensive. In this paper, we propose a deep structure with a linear coder as the building block for fast graph clustering, called Deep Linear Coding (DLC). Different from conventional coding schemes, we jointly learn the feature transform function and discriminative codings, and guarantee that the learned codes are robust in spite of local distortions. In addition, we use the proposed linear coders as the building blocks to formulate a deep structure to further refine features in a layerwise fashion. Extensive experiments on clustering tasks demonstrate that our method performs well in terms of both time complexity and clustering accuracy. On a large-scale benchmark dataset (580K), our method runs 1500 times faster than the original spectral clustering.

graph, representation, spectral, (16 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Satchidanand, Sai Nageswar (Indian Institute of Technology Madras) | Ananthapadmanaban, Harini (Indian Institute of Technology Madras) | Ravindran, Balaraman (Indian Institute of Technology Madras)

Extended Discriminative Random Walk: A Hypergraph Approach to Multi-View Multi-Relational Transductive Learning

Transductive inference on graphs has been garnering increasing attention due to the connected nature of many real-life data sources, such as online social media and biological data (protein-protein interaction network, gene networks, etc.). Typically relational information in the data is encoded as edges in a graph but often it is important to model multi-way interactions, such as in collaboration networks and reaction networks. In this work we model multi-way relations as hypergraphs and extend the discriminative random walk (DRW) framework, originally proposed for transductive inference on single graphs, to the case of multiple hypergraphs. We use the extended DRW framework for inference on multi-view, multi-relational data in a natural way, by representing attribute descriptions of the data also as hypergraphs. We further exploit the structure of hypergraphs to modify the random walk operator to take into account class imbalance in the data. This work is among very few approaches to explicitly address class imbalance in the in-network classification setting, using random walks. We compare our approach to methods proposed for inference on hypergraphs, and to methods proposed for multi-view data and show that empirically we achieve better performance. We also compare to methods specifically tailored for class-imbalanced data and show that our approach achieves comparable performance even on non-network data.

hyperedge, hypergraph, relation, (13 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.54)
Information Technology > Services (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)

Refaat, Khaled S. (University of California, Los Angeles) | Darwiche, Adnan (University of California, Los Angeles)

Data Compression for Learning MRF Parameters

We propose a technique for decomposing and compressing the dataset in the parameter learning problem in Markov random fields. Our technique applies to incomplete datasets and exploits variables that are always observed in the given dataset. We show that our technique allows exact computation of the gradient and the likelihood, and can lead to orders-of-magnitude savings in learning time.

data decomposition, data example, dataset, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)