AITopics

arXiv.org Artificial IntelligenceAug-7-2009

Node discovery problem for a social network

Maeno, Yoshiharu

Methods to solve a node discovery problem for a social network are presented. Covert nodes refer to the nodes which are not observable directly. They transmit the influence and affect the resulting collaborative activities among the persons in a social network, but do not appear in the surveillance logs which record the participants of the collaborative activities. Discovering the covert nodes is identifying the suspicious logs where the covert nodes would appear if the covert nodes became overt. The performance of the methods is demonstrated with a test dataset generated from computationally synthesized networks and a real organization.

data mining, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

0710.4975

Country:

North America > United States > Pennsylvania (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Africa > Middle East > Egypt (0.04)

Genre: Research Report (0.50)

Industry:

Law Enforcement & Public Safety > Terrorism (0.95)
Information Technology > Services (0.84)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Social Media (0.94)
(2 more...)

Bubeck, Sebastien, Meila, Marina, von Luxburg, Ulrike

How the initialization affects the stability of the k-means algorithm

arXiv.org Machine LearningJul-31-2009

We investigate the role of the initialization for the stability of the k-means clustering algorithm. As opposed to other papers, we consider the actual k-means algorithm and do not ignore its property of getting stuck in local optima. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they lead to different local optima. This enables us to prove that it is reasonable to select the number of clusters based on stability scores.

algorithm, initialization, k-means algorithm, (16 more...)

arXiv.org Machine Learning

0907.5494

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Teh, Yee Whye, Daumé, Hal III, Roy, Daniel

Bayesian Agglomerative Clustering with Coalescents

arXiv.org Machine LearningJul-4-2009

We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

0907.0781

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

Human Activity Encoding and Recognition Using Low-level Visual Features

Wang, Zheshen (Arizona State University) | Li, Baoxin (Arizona State University)

Automatic recognition of human activities is among the key capabilities of many intelligent systems with vision/perception. Most existing approaches to this problem require sophisticated feature extraction before classification can be performed. This paper presents a novel approach for human action recognition using only simple low-level visual features: motion captured from direct frame differencing. A codebook of key poses is first created from the training data through unsupervised clustering. Videos of actions are then coded as sequences of super-frames, defined as the key poses augmented with discriminative attributes. A weighted-sequence distance is proposed for comparing two super-frame sequences, which is further wrapped as a kernel embedded in a SVM classifier for the final classification. Compared with conventional methods, our approach provides a flexible non-parametric sequential structure with a corresponding distance measure for human action representation and classification without requiring complex feature extraction. The effectiveness of our approach is demonstrated with the widely-used KTH human activity dataset, for which the proposed method outperforms the existing state-of-the-art.

recognition, sequence, wsd, (16 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country:

North America > United States > Arizona (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

On-line Evolutionary Exponential Family Mixture

Zhang, Jianwen (Tsinghua University) | Song, Yangqiu (Tsinghua University) | Chen, Gang (Tsinghua University) | Zhang, Changshui (Tsinghua University)

This paper deals with evolutionary clustering, which refers to the problem of clustering data with distribution drifting along time. Starting from a density estimation view to clustering problems, we propose two general on-line frameworks. In the first framework, i.e., historical data dependent (HDD), current model distribution is designed to approximate both current and historical data distributions. In the second framework, i.e., historical model dependent (HMD), current model distribution is designed to approximate both current data distribution and historical model distribution. Both frameworks are based on the general exponential family mixture (EFM) model. As a result, all conventional clustering algorithms based on EFMs can be extended to evolutionary setting under the two frameworks. Empirical results validate the two frameworks.

algorithm, divergence, epoch, (15 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.05)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Sensing and Predicting the Pulse of the City through Shared Bicycling

Froehlich, Jon Edward (University of Washington) | Neumann, Joachim (Telefonica Research) | Oliver, Nuria (Telefonica Research)

City-wide urban infrastructures are increasingly reliant on network technology to improve and ex-pand their services. As a side effect of this digitali-zation, large amounts of data can be sensed and analyzed to uncover patterns of human behavior. In this paper, we focus on the digital footprints from one type of emerging urban infrastructure: shared bicycling systems. We provide a spatiotemporal analysis of 13 weeks of bicycle station usage from Barcelona's shared bicycling system, called Bicing. We apply clustering techniques to identify shared behaviors across stations and show how these behaviors relate to location, neighborhood, and time of day. We then compare experimental results from four predictive models of near-term station usage. Finally, we analyze the impact of factors such as time of day and station activity in the prediction capabilities of the algorithms.

bicycle, dayview, predictor, (14 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.68)

Industry: Transportation > Ground (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Badea, Liviu (National Institute for Research in Informatics)

Generalized Clustergrams for Overlapping Biclusters

Many real-life datasets, such as those produced by gene expression studies, exhibit complex substructures at various levels of granularity and thus do not have unique well-defined numbers of clusters. In such cases, it is important to be able to trace the evolution of the individual clusters as the number of dimensions of the clustering is varied. While the dendrograms produced by bottom-up clustering methods such as hierarchical clustering are very useful for this purpose, the approach is known to produce unreliable clusters due to its instability w.r.t. resampling. Moreover, hierarchical clustering does not apply to overlapping (bi)clusters, such as those obtained in gene expression studies. On the other hand, the instability w.r.t. the initialization of top-down methods, such as k-means, prevents the comparison between clusters obtained at different dimensionalities. In this paper, we present a method for constructing generalized dendrograms for overlapping biclusters, which depict the evolution of the biclusters as their number is varied. An essential ingredient is a stable biclustering method based on positive tensor factorization of a number of nonnegative matrix factorization runs. We apply our approach to a large colon cancer dataset, which shows several distinct subclasses whose dimensional evolution must be carefully analyzed to enable a more meaningful biological interpretation and sub-classification.

bicluster, evolution, factorization, (15 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Industry: Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

M 3 IC: Maximum Margin Multiple Instance Clustering

Zhang, Dan (Purdue University, West Lafayette) | Wang, Fei (Florida International University) | Si, Luo (Purdue University, West Lafayette) | Li, Tao (Florida International University)

Clustering, classification, and regression, are three major research topics in machine learning. So far, much work has been conducted in solving multiple instance classification and multiple instance regression problems, where supervised training patterns are given as bags and each bag consists of some instances. But the research on unsupervised multiple instance clustering is still limited . This paper formulates a novel Maximum Margin Multiple Instance Clustering problem for the multiple instance clustering task. To avoid solving a non-convex optimization problem directly, M 3 IC is further relaxed, which enables an efficient optimization solution with a combination of Constrained Concave-Convex Procedure CCCP) and the Cutting Plane method. Furthermore, this paper analyzes some important properties of the proposed method and the relationship between the proposed method and some other related ones. An extensive set of empirical results demonstrate the advantages of the proposed method against existing research for both effectiveness and efficiency.

constraint, dataset, iteration, (16 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Early Prediction on Time Series: A Nearest Neighbor Approach

Xing, Zhengzheng (Simon Fraser Univeristy) | Pei, Jian (Simon Fraser University) | Yu, Philip S. (University of Illinois at Chicago)

In this paper, we formulate the problem of early classification of time series data, which is important in some time-sensitive applications such as health-informatics. We introduce a novel concept of MPL (Minimum Prediction Length) and develop ECTS (Early Classification on Time Series), an effective 1-nearest neighbor classification method. ECTS makes early predictions and at the same time retains the accuracy comparable to that of a 1NN classifier using the full-length time series. Our empirical study using benchmark time series data sets shows that ECTS works well on the real data sets where 1NN classification is effective.

classification, classifier, time sery, (14 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.48)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)