AITopics

The Nystrom method provides an efficient sampling approach for large scale clustering problems, by generating a low-rank matrix approximation. However, existing sampling methods are limited by accuracy and computing time. This paper proposes an improved Nystrom-based clustering algorithm with a new sampling procedure, Minimum Sum of Squared Similarities (MSSS). Experiments on synthetic and real data sets show that the proposed sampling performs with higher accuracy than existing algorithms, applied to Nystrom-based spectral clustering problems. Furthermore, we provide a theoretical analysis that allows us to define the upper bound of the Frobenius norm error of the MSSS.

algorithm, dataset, matrix, (14 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Orange County > Irvine (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Determining Expert Research Areas with Multi-Instance Learning of Hierarchical Multi-Label Classification Model

Wu, Tao (Purdue University) | Wang, Qifan (Purdue University) | Zhang, Zhiwei (Purdue University) | Si, Luo (Purdue University)

Automatically identifying the research areas of academic/industry researchers is an important task for building expertise organizations or search systems. In general, this task can be viewed as text classification that generates a set of research areas given the expertise of a researcher like documents of publications. However, this task is challenging because the evidence of a research area may only exist in a few documents instead of all documents. Moreover, the research areas are often organized in a hierarchy, which limits the effectiveness of existing text categorization methods. This paper proposes a novel approach, Multi-instance Learning of Hierarchical Multi-label Classification Model (MIHML) for the task, which effectively identifies multiple research areas in a hierarchy from individual documents within the profile of a researcher. An Expectation-Maximization (EM) optimization algorithm is designed to learn the model parameters. Extensive experiments have been conducted to demonstrate the superior performance of proposed research with a real world application.

em-hm 3, research area, training size, (14 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Wan, Ji (Institute Of Computing Technology of the Chinese Academy of Sciences) | Wu, Pengcheng (Singapore Management University) | Hoi, Steven C. H. (Singapore Management University) | Zhao, Peilin (Institute for Infocomm Research) | Gao, Xingyu (Institute of Computing Technology of the Chinese Academy of Sciences) | Wang, Dayong (Michigan State University) | Zhang, Yongdong (Institute of Computing Technology of the Chinese Academy of Sciences) | Li, Jintao (Institute of Computing Technology of the Chinese Academy of Sciences)

Online Learning to Rank for Content-Based Image Retrieval

A major challenge in Content-Based Image Retrieval (CBIR) is to bridge the semantic gap between low-level image contents and high-level semantic concepts. Although researchers have investigated a variety of retrieval techniques using different types of features and distance functions, no single best retrieval solution can fully tackle this challenge. In a real-world CBIR task, it is often highly desired to combine multiple types of different feature representations and diverse distance measures in order to close the semantic gap. In this paper, we investigate a new framework of learning to rank for CBIR, which aims to seek the optimal combination of different retrieval schemes by learning from large-scale training data in CBIR. We first formulate the problem formally as a learning to rank task, which can be solved in general by applying the existing batch learning to rank algorithms from text information retrieval (IR). To further address the scalability towards large-scale online CBIR applications, we present a family of online learning to rank algorithms, which are significantly more efficient and scalable than classical batch algorithms for large-scale online CBIR. Finally, we conduct an extensive set of experiments, in which encouraging results show that our technique is effective, scalable and promising for large-scale CBIR.

algorithm, online, rank algorithm, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Singapore (0.04)
Asia > China (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report (0.34)

Industry: Education > Educational Setting > Online (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.65)

Short and Sparse Text Topic Modeling via Self-Aggregation

Quan, Xiaojun (Institute for Infocomm Research) | Kit, Chunyu (City University of Hong Kong) | Ge, Yong (University of North Carolina at Charlotte) | Pan, Sinno Jialin (Nanyang Technological University)

The overwhelming amount of short text data on social media and elsewhere has posed great challenges to topic modeling due to the sparsity problem. Most existing attempts to alleviate this problem resort to heuristic strategies to aggregate short texts into pseudo-documents before the application of standard topic modeling. Although such strategies cannot be well generalized to more general genres of short texts, the success has shed light on how to develop a generalized solution. In this paper, we present a novel model towards this goal by integrating topic modeling with short text aggregation during topic inference. The aggregation is founded on general topical affinity of texts rather than particular heuristics, making the model readily applicable to various short texts. Experimental results on real-world datasets validate the effectiveness of this new model, suggesting that it can distill more meaningful topics from short texts.

short text, topic model, topic modeling, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > North Carolina > Mecklenburg County > Charlotte (0.04)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications > Social Media (0.90)

Web Page Classification Based on Uncorrelated Semi-Supervised Intra-View and Inter-View Manifold Discriminant Feature Extraction

Jing, Xiao-Yuan (Wuhan University) | Liu, Qian (Wuhan University and Nanjing University of Posts and Telecommunications) | Wu, Fei (Wuhan University) | Xu, Baowen (Wuhan University) | Zhu, Yangping (Wuhan University) | Chen, Songcan (Nanjing University of Aeronautics and Astronautics)

Web page classification has attracted increasing research interest. It is intrinsically a multi-view and semi-supervised application, since web pages usually contain two or more types of data, such a text, hyperlinks and images, and unlabeled pages are generally much more than labeled ones. Web page data is commonly high-dimensional. Thus, how to extract useful features from this kind of data in the multi-view semi-supervised scenario is important for web page classification. To our knowledge, only one method is specially presented for this topic. And with respect to a few semi-supervised multi-view feature extraction methods on other applications, there still exists much room for improvement. In this paper, we firstly design a feature extraction schema called semi-supervised intra-view and inter-view manifold discriminant (SI2MD) learning, which sufficiently utilizes the intra-view and inter-view discriminant information of labeled samples and the local neighborhood structures of unlabeled samples. We then design a semi-supervised uncorrelation constraint for the SI2MD schema to remove the multi-view correlation in the semi-supervised scenario. By combining the SI2MD schema with the constraint, we propose an uncorrelated semi-supervised intra-view and inter-view manifold discriminant (USI2MD) learning approach for web page classification. Experiments on public web page databases validate the proposed approach.

database, unlabeled sample, usi 2, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining > Feature Extraction (0.83)

Semantic Single Video Segmentation with Robust Graph Representation

Zhao, Handong (Northeastern University) | Fu, Yun (Northeastern University)

Graph-based video segmentation has demonstrated its influential impact from recent works. However, most of the existing approaches fail to make a semantic segmentation of the foreground objects, i.e. all the segmented objects are treated as one class. In this paper, we propose an approach to semantically segment the multi-class foreground objects from a single video sequence. To achieve this, we firstly generate a set of proposals for each frame and score them based on motion and appearance features. With these scores, the similarities between each proposal are measured. To tackle the vulnerability of the graph-based model, low-rank representation with l21-norm regularizer outlier detection is proposed to discover the intrinsic structure among proposals. With the "clean" graph representation, objects of different classes are more likely to be grouped into separated clusters. Two open public datasets MOViCS and ObMiC are used for evaluation under both intersection-over-union and F-measure metrics. The superior results compared with the state-of-the-arts demonstrate the effectiveness of the proposed method.

foreground, proposal, segmentation, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Data Science > Data Mining (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Generalized Transitive Distance with Minimum Spanning Random Forest

Yu, Zhiding (Carnegie Mellon University) | Liu, Weiyang (Peking University) | Liu, Wenbo (Carnegie Mellon University) | Peng, Xi (Research Agency for Science, Technology and Research (A*STAR) Singapore) | Hui, Zhuo (Carnegie Mellon University) | Kumar, B. V. K. Vijaya (Carnegie Mellon University)

Transitive distance is an ultrametric with elegant properties for clustering. Conventional transitive distance can be found by referring to the minimum spanning tree (MST). We show that such distance metric can be generalized onto a minimum spanning random forest (MSRF) with element-wise max pooling over the set of transitive distance matrices from an MSRF. Our proposed approach is both intuitively reasonable and theoretically attractive. Intuitively, max pooling alleviates undesired short links with single MST when noise is present. Theoretically, one can see that the distance metric obtained max pooling is still an ultrametric, rendering many good clustering properties. Comprehensive experiments on data clustering and image segmentation show that MSRF with max pooling improves the clustering performance over single MST and achieves state of the art performance on the Berkeley Segmentation Dataset.

segmentation, spectral, transitive distance, (12 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Face Clustering in Videos with Proportion Prior

Tang, Zhiqiang (Chinese Academy of Sciences) | Zhang, Yifan (Chinese Academy of Sciences) | Li, Zechao (Nanjing University of Science and Technology) | Lu, Hanqing (Chinese Academy of Sciences)

In this paper, we investigate the problem of face clustering in real-world videos. In many cases, the distribution of the face data is unbalanced. In movies or TV series videos, the leading casts appear quite often and the others appear much less. However, many clustering algorithms cannot well handle such severe unbalance between the data distribution, resulting in that the large class is split apart, and the small class is merged into the large ones and thus missing. On the other hand, the data distribution proportion information may be known beforehand. For example, we can obtain such information by counting the spoken lines of the characters in the script text. Hence, we propose to make use of the proportion prior to regularize the clustering. A Hidden Conditional Random Field(HCRF) model is presented to incorporate the proportion prior. In experiments on a public data set from real-world videos, we observe improvements on clustering performance against state-of-the-art methods.

constraint, proportion, video, (15 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment (0.67)
Media > Television (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Inferring Painting Style with Multi-Task Dictionary Learning

Recent advances in imaging and multimedia technologies have paved the way for automatic analysis of visual art. Despite notable attempts, extracting relevant patterns from paintings is still a challenging task. Different painters, born in different periods and places, have been influenced by different schools of arts. However, each individual artist also has a unique signature, which is hard to detect with algorithms and objective features. In this paper we propose a novel dictionary learning approach to automatically uncover the artistic style from paintings. Specifically, we present a multi-task learning algorithm to learn a style-specific dictionary representation. Intuitively, our approach, by automatically decoupling style-specific and artist-specific patterns, is expected to be more accurate for retrieval and recognition tasks than generic methods. To demonstrate the effectiveness of our approach, we introduce the DART dataset, containing more than 1.5K images of paintings representative of different styles. Our extensive experimental evaluation shows that our approach significantly outperforms state-of-the-art methods.

algorithm, learning, representation, (12 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > Singapore (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Arizona (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Modeling Inter- and Intra-Part Deformations for Object Structure Parsing

Cai, Ling (Xiamen University) | Ji, Rongrong (Xiamen University) | Liu, Wei (IBM T. J. Watson Research Center) | Hua, Gang (Stevens Institute of Technology)

Part deformation has been a longstanding challenge for object parsing, of which the primary difficulty lies in modeling the highly diverse object structures. To this end, we propose a novel structure parsing model to capture deformable object structures. The proposed model consists of two de-formable layers: the top layer is an undirected graph that incorporates inter-part deformations to infer object structures; the base layer is consisted of various independent nodes to characterize local intra-part deformations. To learn this two-layer model, we design a layer-wise learning algorithm,which employs matching pursuit and belief propagation for a low computational complexity inference. Specifically, active basis sparse coding is leveraged to build the nodes at the base layer, while the edge weights are estimated by a structural support vector machine. Experimental results on two benchmark datasets (i.e., faces and horses) demonstrate that the proposed model yields superior parsing performance over state-of-the-art models.

deformation, inter-part deformation, node, (17 more...)

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > United States (0.14)
Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)