AITopics

Country:

Asia > South Korea > Daejeon > Daejeon (0.40)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Neural Information Processing SystemsMar-15-2024, 06:55:21 GMT

Learning to Agglomerate Superpixel Hierarchies

The function that evaluates similarity is traditionally handdesigned, but there has been recent interest in supervised or semisupervised settings in which ground-truth clustered data is available for training. Here we show how to train a similarity function by regarding it as the action-value function of a reinforcement learning problem. We apply this general method to segment images by clustering superpixels, an application that we call Learning to Agglomerate Superpixel Hierarchies (LASH). When applied to a challenging dataset of brain images from serial electron microscopy, LASH dramatically improved segmentation accuracy when clustering supervoxels generated by state of the boundary detection algorithms. The naive strategy of directly training only supervoxel similarities and applying single linkage clustering produced less improvement.

Country: North America > United States > Massachusetts (0.04)

Genre: Research Report (0.93)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Neural Information Processing SystemsMar-15-2024, 02:42:45 GMT

Fast and Accurate k-llleans For Large Datasets Alex Wong School of EECS Department of Computer Science Oregon State University

Clustering is a popular problem with many applications. We consider the k-means problem in the situation where the data is too large to be stored in main memory and must be accessed sequentially, such as from a disk, and where we must use as little memory as possible. Our algorithm is based on recent theoretical results, with significant improvements to make it practical. Our approach greatly simplifies a recently developed algorithm, both in design and in analysis, and eliminates large constant factors in the approximation guarantee, the memory requirements, and the running time. We then incorporate approximate nearest neighbor search to compute k-means in o( nk) (where n is the number of data points; note that computing the cost, given a solution, takes 8(nk) time). We show that our algorithm compares favorably to existing algorithms - both theoretically and experimentally, thus providing state-of-the-art performance in both theory and practice.

algorithm, approximation, approximation factor, (15 more...)

Country:

North America > United States > Oregon (0.40)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
(2 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Neural Information Processing SystemsMar-15-2024, 01:43:31 GMT

History distribution matching method for predicting effectiveness of HIV combination therapies

This paper presents an approach that predicts the effectiveness of HIV combination therapies by simultaneously addressing several problems affecting the available HIV clinical data sets: the different treatment backgrounds of the samples, the uneven representation of the levels of therapy experience, the missing treatment history information, the uneven therapy representation and the unbalanced therapy outcome representation. The computational validation on clinical data shows that, compared to the most commonly used approach that does not account for the issues mentioned above, our model has significantly higher predictive power. This is especially true for samples stemming from patients with longer treatment history and samples associated with rare therapies. Furthermore, our approach is at least as powerful for the remaining samples.

clinical data, therapy, therapy sequence, (14 more...)

Country: Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Internal Medicine (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)

Technology:

Information Technology > Biomedical Informatics (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Neural Information Processing SystemsMar-15-2024, 00:47:27 GMT

Bayesian Partitioning of Large-Scale Distance Data

A Bayesian approach to partitioning distance matrices is presented. It is inspired by the Translation-invariant Wishart-Dirichlet process (TIWD) in [1] and shares a number of advantageous properties like the fully probabilistic nature of the inference model, automatic selection of the number of clusters and applicability in semi-supervised settings. In addition, our method (which we call fastTIWD) overcomes the main shortcoming of the original TIWD, namely its high computational costs.

matrix, partition, wishart distribution, (15 more...)

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Predictive Clustering of Vessel Behavior Based on Hierarchical Trajectory Representation

Zhang, Rui, Wu, Hanyue, Yin, Zhenzhong, Xiao, Zhu, Xiong, Yong, Liu, Kezhong

Vessel trajectory clustering, which aims to find similar trajectory patterns, has been widely leveraged in overwater applications. Most traditional methods use predefined rules and thresholds to identify discrete vessel behaviors. They aim for high-quality clustering and conduct clustering on entire sequences, whether the original trajectory or its sub-trajectories, failing to represent their evolution. To resolve this problem, we propose a Predictive Clustering of Hierarchical Vessel Behavior (PC-HiV). PC-HiV first uses hierarchical representations to transform every trajectory into a behavioral sequence. Then, it predicts evolution at each timestamp of the sequence based on the representations. By applying predictive clustering and latent encoding, PC-HiV improves clustering and predictions simultaneously. Experiments on real AIS datasets demonstrate PC-HiV's superiority over existing methods, showcasing its effectiveness in capturing behavioral evolution discrepancies between vessel types (tramp vs. liner) and within emission control areas. Results show that our method outperforms NN-Kmeans and Robust DAA by 3.9% and 6.4% of the purity score.

representation, sequence, trajectory, (16 more...)

2403.08838

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)
Asia > China > Hainan Province (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Mosenzon, Ron, Vakilian, Ali

Scalable Algorithms for Individual Preference Stable Clustering

In this paper, we study the individual preference (IP) stability, which is an notion capturing individual fairness and stability in clustering. Within this setting, a clustering is $\alpha$-IP stable when each data point's average distance to its cluster is no more than $\alpha$ times its average distance to any other cluster. In this paper, we study the natural local search algorithm for IP stable clustering. Our analysis confirms a $O(\log n)$-IP stability guarantee for this algorithm, where $n$ denotes the number of points in the input. Furthermore, by refining the local search approach, we show it runs in an almost linear time, $\tilde{O}(nk)$.

algorithm, avg, metric space, (14 more...)

2403.10365

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Open Continual Feature Selection via Granular-Ball Knowledge Transfer

Cao, Xuemei, Yang, Xin, Xia, Shuyin, Wang, Guoyin, Li, Tianrui

This paper presents a novel framework for continual feature selection (CFS) in data preprocessing, particularly in the context of an open and dynamic environment where unknown classes may emerge. CFS encounters two primary challenges: the discovery of unknown knowledge and the transfer of known knowledge. To this end, the proposed CFS method combines the strengths of continual learning (CL) with granular-ball computing (GBC), which focuses on constructing a granular-ball knowledge base to detect unknown classes and facilitate the transfer of previously learned knowledge for further feature selection. CFS consists of two stages: initial learning and open learning. The former aims to establish an initial knowledge base through multi-granularity representation using granular-balls. The latter utilizes prior granular-ball knowledge to identify unknowns, updates the knowledge base for granular-ball knowledge transfer, reinforces old knowledge, and integrates new knowledge. Subsequently, we devise an optimal feature subset mechanism that incorporates minimal new features into the existing optimal subset, often yielding superior results during each period. Extensive experimental results on public benchmark datasets demonstrate our method's superiority in terms of both effectiveness and efficiency compared to state-of-the-art feature selection methods.

feature selection, feature subset, selection, (13 more...)

2403.10253

Country:

Asia > China > Chongqing Province > Chongqing (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Zafarani-Moattar, Elnaz, Kangavari, Mohammad Reza, Rahmani, Amir Masoud

A comprehensive study on Frequent Pattern Mining and Clustering categories for topic detection in Persian text stream

Topic detection is a complex process and depends on language because it somehow needs to analyze text. There have been few studies on topic detection in Persian, and the existing algorithms are not remarkable. Therefore, we aimed to study topic detection in Persian. The objectives of this study are: 1) to conduct an extensive study on the best algorithms for topic detection, 2) to identify necessary adaptations to make these algorithms suitable for the Persian language, and 3) to evaluate their performance on Persian social network texts. To achieve these objectives, we have formulated two research questions: First, considering the lack of research in Persian, what modifications should be made to existing frameworks, especially those developed in English, to make them compatible with Persian? Second, how do these algorithms perform, and which one is superior? There are various topic detection methods that can be categorized into different categories. Frequent pattern and clustering are selected for this research, and a hybrid of both is proposed as a new category. Then, ten methods from these three categories are selected. All of them are re-implemented from scratch, changed, and adapted with Persian. These ten methods encompass different types of topic detection methods and have shown good performance in English. The text of Persian social network posts is used as the dataset. Additionally, a new multiclass evaluation criterion, called FS, is used in this paper for the first time in the field of topic detection. Approximately 1.4 billion tokens are processed during experiments. The results indicate that if we are searching for keyword-topics that are easily understandable by humans, the hybrid category is better. However, if the aim is to cluster posts for further analysis, the frequent pattern category is more suitable.

category, detection, topic detection, (15 more...)

2403.10237

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > Taiwan (0.04)
Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.87)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Graph Enhanced Reinforcement Learning for Effective Group Formation in Collaborative Problem Solving

Fang, Zheng, Ke, Fucai, Han, Jae Young, Feng, Zhijie, Cai, Toby

This study addresses the challenge of forming effective groups in collaborative problem-solving environments. Recognizing the complexity of human interactions and the necessity for efficient collaboration, we propose a novel approach leveraging graph theory and reinforcement learning. Our methodology involves constructing a graph from a dataset where nodes represent participants, and edges signify the interactions between them. We conceptualize each participant as an agent within a reinforcement learning framework, aiming to learn an optimal graph structure that reflects effective group dynamics. Clustering techniques are employed to delineate clear group structures based on the learned graph. Our approach provides theoretical solutions based on evaluation metrics and graph measurements, offering insights into potential improvements in group effectiveness and reductions in conflict incidences. This research contributes to the fields of collaborative work and educational psychology by presenting a data-driven, analytical approach to group formation. It has practical implications for organizational team building, classroom settings, and any collaborative scenario where group dynamics are crucial. The study opens new avenues for exploring the application of graph theory and reinforcement learning in social and behavioral sciences, highlighting the potential for empirical validation in future work.

interaction, participant, reinforcement, (16 more...)