AITopics

2411.18254

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.68)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

arXiv.org Artificial IntelligenceNov-27-2024

Towards Cross-device and Training-free Robotic Grasping in 3D Open World

Zhao, Weiguang, Jiang, Chenru, Zhang, Chengrui, Sun, Jie, Yan, Yuyao, Zhang, Rui, Huang, Kaizhu

Robotic grasping in the open world is a critical component of manufacturing and automation processes. While numerous existing approaches depend on 2D segmentation output to facilitate the grasping procedure, accurately determining depth from 2D imagery remains a challenge, often leading to limited performance in complex stacking scenarios. In contrast, techniques utilizing 3D point cloud data inherently capture depth information, thus enabling adeptly navigating and manipulating a diverse range of complex stacking scenes. However, such efforts are considerably hindered by the variance in data capture devices and the unstructured nature of the data, which limits their generalizability. Consequently, much research is narrowly concentrated on managing designated objects within specific settings, which confines their real-world applicability. This paper presents a novel pipeline capable of executing object grasping tasks in open-world scenarios even on previously unseen objects without the necessity for training. Additionally, our pipeline supports the flexible use of different 3D point cloud segmentation models across a variety of scenes. Leveraging the segmentation results, we propose to engage a training-free binary clustering algorithm that not only improves segmentation precision but also possesses the capability to cluster and localize unseen objects for executing grasping operations. In our experiments, we investigate a range of open-world scenarios, and the outcomes underscore the remarkable robustness and generalizability of our pipeline, consistent across various environments, robots, cameras, and objects. The code will be made available upon acceptance of the paper.

pipeline, scenario, segmentation, (15 more...)

2411.18133

Country:

Asia > China > Shaanxi Province > Xi'an (0.05)
Europe > United Kingdom > England > Merseyside > Liverpool (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Morshed, Fahim, Rahman, Md. Abdur, Ahmed, Sumon

A Novel Word Pair-based Gaussian Sentence Similarity Algorithm For Bengali Extractive Text Summarization

arXiv.org Artificial IntelligenceNov-27-2024

Extractive Text Summarization is the process of selecting the most representative parts of a larger text without losing any key information. Recent attempts at extractive text summarization in Bengali, either relied on statistical techniques like TF-IDF or used naive sentence similarity measures like the word averaging technique. All of these strategies suffer from expressing semantic relationships correctly. Here, we propose a novel Word pair-based Gaussian Sentence Similarity (WGSS) algorithm for calculating the semantic relation between two sentences. WGSS takes the geometric means of individual Gaussian similarity values of word embedding vectors to get the semantic relationship between sentences. It compares two sentences on a word-to-word basis which rectifies the sentence representation problem faced by the word averaging method. The summarization process extracts key sentences by grouping semantically similar sentences into clusters using the Spectral Clustering algorithm. After clustering, we use TF-IDF ranking to pick the best sentence from each cluster. The proposed method is validated using four different datasets, and it outperformed other recent models by 43.2% on average ROUGE scores (ranging from 2.5% to 95.4%). It is also experimented on other low-resource languages i.e. Turkish, Marathi, and Hindi language, where we find that the proposed method performs as similar as Bengali for these languages. In addition, a new high-quality Bengali dataset is curated which contains 250 articles and a pair of summaries for each of them. We believe this research is a crucial addition to Bengali Natural Language Processing (NLP) research and it can easily be extended into other low-resource languages. We made the implementation of the proposed model and data public on https://github.com/FMOpee/WGSS.

2411.17181

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.05)
Asia > Singapore (0.04)

Genre:

Overview (0.93)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Abduaziz, Kayumov, Kim, Min Sik, Shin, Ji Sun

Dynamic data summarization for hierarchical spatial clustering

Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) finds meaningful patterns in spatial data by considering density and spatial proximity. As the clustering algorithm is inherently designed for static applications, so have recent studies focused on accelerating the algorithm for static applications using approximate or parallel methods. However, much less attention has been given to dynamic environments, where even a single point insertion or deletion can require recomputing the clustering hierarchy from scratch due to the need of maintaining the minimum spanning tree (MST) over a complete graph. This paper addresses the challenge of enhancing the clustering algorithm for dynamic data. We present an exact algorithm that maintains density information and updates the clustering hierarchy of HDBSCAN during point insertions and deletions. Considering the hardness of adapting the exact algorithm to dynamic data involving modern workloads, we propose an online-offline framework. The online component efficiently summarizes dynamic data using a tree structure, called Bubble-tree, while the offline step performs the static clustering. Experimental results demonstrate that the data summarization adapts well to fully dynamic environments, providing compression quality on par with existing techniques while significantly improving runtime performance of the clustering algorithm in dynamic data workloads.

artificial intelligence, data mining, machine learning, (18 more...)

2412.07789

Country:

North America > Canada > Alberta (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Coşkun, Kutalmış, Tümer, Borahan, Hiller, Bjarne C., Becker, Martin

Evolving Markov Chains: Unsupervised Mode Discovery and Recognition from Data Streams

Markov chains are simple yet powerful mathematical structures to model temporally dependent processes. They generally assume stationary data, i.e., fixed transition probabilities between observations/states. However, live, real-world processes, like in the context of activity tracking, biological time series, or industrial monitoring, often switch behavior over time. Such behavior switches can be modeled as transitions between higher-level \emph{modes} (e.g., running, walking, etc.). Yet all modes are usually not previously known, often exhibit vastly differing transition probabilities, and can switch unpredictably. Thus, to track behavior changes of live, real-world processes, this study proposes an online and efficient method to construct Evolving Markov chains (EMCs). EMCs adaptively track transition probabilities, automatically discover modes, and detect mode switches in an online manner. In contrast to previous work, EMCs are of arbitrary order, the proposed update scheme does not rely on tracking windows, only updates the relevant region of the probability tensor, and enjoys geometric convergence of the expected estimates. Our evaluation of synthetic data and real-world applications on human activity recognition, electric motor condition monitoring, and eye-state recognition from electroencephalography (EEG) measurements illustrates the versatility of the approach and points to the potential of EMCs to efficiently track, model, and understand live, real-world processes.

artificial intelligence, data mining, machine learning, (17 more...)

2411.17528

Country:

North America > United States > New Jersey > Hudson County > Hoboken (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > Nassau County > Mineola (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.34)
Health & Medicine > Health Care Technology (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Holder, Christopher, Bagnall, Anthony

Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering

Time series data has become increasingly prevalent across numerous domains, driving a growing demand for time series machine learning techniques. Among these, time series clustering (TSCL) stands out as one of the most popular machine learning tasks. TSCL serves as a powerful exploratory analysis tool and is also employed as a preprocessing step or subroutine for various tasks, including anomaly detection, segmentation, and classification. The most popular TSCL algorithms are either fast (in terms of run time) but perform poorly on benchmark problems, or perform well on benchmarks but scale poorly. We present a new TSCL algorithm, the $k$-means (K) accelerated (A) Stochastic subgradient (S) Barycentre (B) Average (A) (KASBA) clustering algorithm. KASBA is a $k$-means clustering algorithm that uses the Move-Split-Merge (MSM) elastic distance at all stages of clustering, applies a randomised stochastic subgradient gradient descent to find barycentre centroids, links each stage of clustering to accelerate convergence and exploits the metric property of MSM distance to avoid a large proportion of distance calculations. It is a versatile and scalable clusterer designed for real-world TSCL applications. It allows practitioners to balance run time and clustering performance. We demonstrate through extensive experimentation that KASBA produces significantly better clustering than the faster state of the art clusterers and is offers orders of magnitude improvement in run time over the most performant $k$-means alternatives.

algorithm, centroid, time sery, (16 more...)

2411.17838

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)
North America > United States > California > Riverside County > Riverside (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.95)
Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Groll, Andreas, Khanna, Akshat, Zeldin, Leonid

A Machine Learning-based Anomaly Detection Framework in Life Insurance Contracts

Life insurance, like other forms of insurance, relies heavily on large volumes of data. The business model is based on an exchange where companies receive payments in return for the promise to provide coverage in case of an accident. Thus, trust in the integrity of the data stored in databases is crucial. One method to ensure data reliability is the automatic detection of anomalies. While this approach is highly useful, it is also challenging due to the scarcity of labeled data that distinguish between normal and anomalous contracts or inter\-actions. This manuscript discusses several classical and modern unsupervised anomaly detection methods and compares their performance across two different datasets. In order to facilitate the adoption of these methods by companies, this work also explores ways to automate the process, making it accessible even to non-data scientists.

anomaly, dataset, detection, (16 more...)

2411.17495

Genre: Research Report (0.64)

Industry:

Banking & Finance > Insurance (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

DWCL: Dual-Weighted Contrastive Learning for Multi-View Clustering

Zhang, Zhihui, Hao, Xiaoshuai, Yuan, Hanning, Chi, Lianhua, Guo, Qi, Li, Qi, Yuan, Ziqiang, Pang, Jinhui, Li, Yexin, Ruan, Sijie

Multi-view contrastive clustering (MVCC) has gained significant attention for generating consistent clustering structures from multiple views through contrastive learning. However, most existing MVCC methods create cross-views by combining any two views, leading to a high volume of unreliable pairs. Furthermore, these approaches often overlook discrepancies in multi-view representations, resulting in representation degeneration. To address these challenges, we introduce a novel model called Dual-Weighted Contrastive Learning (DWCL) for Multi-View Clustering. Specifically, to reduce the impact of unreliable cross-views, we introduce an innovative Best-Other (B-O) contrastive mechanism that enhances the representation of individual views at a low computational cost. Furthermore, we develop a dual weighting strategy that combines a view quality weight, reflecting the quality of each view, with a view discrepancy weight. This approach effectively mitigates representation degeneration by downplaying cross-views that are both low in quality and high in discrepancy. We theoretically validate the efficiency of the B-O contrastive mechanism and the effectiveness of the dual weighting strategy. Extensive experiments demonstrate that DWCL outperforms previous methods across eight multi-view datasets, showcasing superior performance and robustness in MVCC. Specifically, our method achieves absolute accuracy improvements of 5.4\% and 5.6\% compared to state-of-the-art methods on the Caltech6V7 and MSRCv1 datasets, respectively.

contrastive mechanism, dataset, mechanism, (14 more...)

2411.17354

Country:

Asia > China > Beijing > Beijing (0.05)
Oceania > Australia (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Interpretable label-free self-guided subspace clustering

Kopriva, Ivica

Majority subspace clustering (SC) algorithms depend on one or more hyperparameters that need to be carefully tuned for the SC algorithms to achieve high clustering performance. Hyperparameter optimization (HPO) is often performed using grid-search, assuming that some labeled data is available. In some domains, such as medicine, this assumption does not hold true in many cases. One avenue of research focuses on developing SC algorithms that are inherently free of hyperparameters. For hyperparameters-dependent SC algorithms, one approach to label-independent HPO tuning is based on internal clustering quality metrics (if available), whose performance should ideally match that of external (label-dependent) clustering quality metrics. In this paper, we propose a novel approach to label-independent HPO that uses clustering quality metrics, such as accuracy (ACC) or normalized mutual information (NMI), that are computed based on pseudo-labels obtained from the SC algorithm across a predefined grid of hyperparameters. Assuming that ACC (or NMI) is a smooth function of hyperparameter values it is possible to select subintervals of hyperparameters. These subintervals are then iteratively further split into halves or thirds until a relative error criterion is satisfied. In principle, the hyperparameters of any SC algorithm can be tuned using the proposed method. We demonstrate this approach on several single- and multi-view SC algorithms, comparing the achieved performance with their oracle versions across six datasets representing digits, faces and objects. The proposed method typically achieves clustering performance that is 5% to 7% lower than that of the oracle versions. We also make our proposed method interpretable by visualizing subspace bases, which are estimated from the computed clustering partitions. This aids in the initial selection of the hyperparameter search space.

algorithm, hyperparameter, sc algorithm, (16 more...)

2411.17291

Country:

Europe > Croatia > Zagreb County > Zagreb (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Diagnostic Medicine (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Afzali, Amirabbas, Hosseini, Hesam, Mirzai, Mohmmadamin, Amini, Arash

Clustering Time Series Data with Gaussian Mixture Embeddings in a Graph Autoencoder Framework

arXiv.org Artificial IntelligenceNov-25-2024

Time series data analysis is prevalent across various domains, including finance, healthcare, and environmental monitoring. Traditional time series clustering methods often struggle to capture the complex temporal dependencies inherent in such data. In this paper, we propose the Variational Mixture Graph Autoencoder (VMGAE), a graph-based approach for time series clustering that leverages the structural advantages of graphs to capture enriched data relationships and produces Gaussian mixture embeddings for improved separability. Comparisons with baseline methods are included with experimental results, demonstrating that our method significantly outperforms state-of-the-art time-series clustering techniques. We further validate our method on real-world financial data, highlighting its practical applications in finance. By uncovering community structures in stock markets, our method provides deeper insights into stock relationships, benefiting market prediction, portfolio optimization, and risk management.

artificial intelligence, machine learning, time sery, (17 more...)

2411.16972

Country:

North America > United States > New York (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)