AITopics

2404.04564

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Oceania > Australia > Western Australia > Perth (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Education (0.46)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(5 more...)

arXiv.org Artificial IntelligenceApr-5-2024

Low-Rank Robust Subspace Tensor Clustering for Metro Passenger Flow Modeling

Hu, Jiuyun, Li, Ziyue, Zhang, Chen, Tsung, Fugee, Yan, Hao

Tensor clustering has become an important topic, specifically in spatio-temporal modeling, due to its ability to cluster spatial modes (e.g., stations or road segments) and temporal modes (e.g., time of the day or day of the week). Our motivating example is from subway passenger flow modeling, where similarities between stations are commonly found. However, the challenges lie in the innate high-dimensionality of tensors and also the potential existence of anomalies. This is because the three tasks, i.e., dimension reduction, clustering, and anomaly decomposition, are inter-correlated to each other, and treating them in a separate manner will render a suboptimal performance. Thus, in this work, we design a tensor-based subspace clustering and anomaly decomposition technique for simultaneously outlier-robust dimension reduction and clustering for high-dimensional tensors. To achieve this, a novel low-rank robust subspace clustering decomposition model is proposed by combining Tucker decomposition, sparse anomaly decomposition, and subspace clustering. An effective algorithm based on Block Coordinate Descent is proposed to update the parameters. Prudent experiments prove the effectiveness of the proposed framework via the simulation study, with a gain of +25% clustering accuracy than benchmark methods in a hard case. The interrelations of the three tasks are also analyzed via ablation studies, validating the interrelation assumption. Moreover, a case study in the station clustering based on real passenger flow data is conducted, with quite valuable insights discovered.

decomposition, subspace, tensor, (10 more...)

2404.04403

Country:

Asia > China > Hong Kong > Sha Tin (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > Arizona > Maricopa County > Tempe (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceApr-4-2024

SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering

Zhang, Chuyu, Ren, Hui, He, Xuming

Deep clustering, which learns representation and semantic clustering without labels information, poses a great challenge for deep learning-based approaches. Despite significant progress in recent years, most existing methods focus on uniformly distributed datasets, significantly limiting the practical applicability of their methods. In this paper, we propose a more practical problem setting named deep imbalanced clustering, where the underlying classes exhibit an imbalance distribution. To address this challenge, we introduce a novel optimal transport-based pseudo-label learning framework. Our framework formulates pseudo-label generation as a Semantic-regularized Progressive Partial Optimal Transport (SP$^2$OT) problem, which progressively transports each sample to imbalanced clusters under several prior distribution and semantic relation constraints, thus generating high-quality and imbalance-aware pseudo-labels. To solve SP$^2$OT, we develop a Majorization-Minimization-based optimization algorithm. To be more precise, we employ the strategy of majorization to reformulate the SP$^2$OT problem into a Progressive Partial Optimal Transport problem, which can be transformed into an unbalanced optimal transport problem with augmented constraints and can be solved efficiently by a fast matrix scaling algorithm. Experiments on various datasets, including a human-curated long-tailed CIFAR100, challenging ImageNet-R, and large-scale subsets of fine-grained iNaturalist2018 datasets, demonstrate the superiority of our method.

algorithm, constraint, sp 2, (14 more...)

2404.03446

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Behera, Swarup Ranjan, Saradhi, Vijaya V.

Spectral Clustering in Convex and Constrained Settings

arXiv.org Artificial IntelligenceApr-3-2024

Spectral clustering methods have gained widespread recognition for their effectiveness in clustering high-dimensional data. Among these techniques, constrained spectral clustering has emerged as a prominent approach, demonstrating enhanced performance by integrating pairwise constraints. However, the application of such constraints to semidefinite spectral clustering, a variant that leverages semidefinite programming to optimize clustering objectives, remains largely unexplored. In this paper, we introduce a novel framework for seamlessly integrating pairwise constraints into semidefinite spectral clustering. Our methodology systematically extends the capabilities of semidefinite spectral clustering to capture complex data structures, thereby addressing real-world clustering challenges more effectively. Additionally, we extend this framework to encompass both active and self-taught learning scenarios, further enhancing its versatility and applicability. Empirical studies conducted on well-known datasets demonstrate the superiority of our proposed framework over existing spectral clustering methods, showcasing its robustness and scalability across diverse datasets and learning settings. By bridging the gap between constrained learning and semidefinite spectral clustering, our work contributes to the advancement of spectral clustering techniques, offering researchers and practitioners a versatile tool for addressing complex clustering challenges in various real-world applications. Access to the data, code, and experimental results is provided for further exploration (https://github.com/swarupbehera/SCCCS).

clustering, constraint, spectral clustering, (13 more...)

2404.03012

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > India > Assam > Guwahati (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Improving and Evaluating Machine Learning Methods for Forensic Shoeprint Matching

Jain, Divij, Kher, Saatvik, Liang, Lena, Wu, Yufeng, Zheng, Ashley, Cai, Xizhen, Plantinga, Anna, Upton, Elizabeth

We propose a machine learning pipeline for forensic shoeprint pattern matching that improves on the accuracy and generalisability of existing methods. We extract 2D coordinates from shoeprint scans using edge detection and align the two shoeprints with iterative closest point (ICP). We then extract similarity metrics to quantify how well the two prints match and use these metrics to train a random forest that generates a probabilistic measurement of how likely two prints are to have originated from the same outsole. We assess the generalisability of machine learning methods trained on lab shoeprint scans to more realistic crime scene shoeprint data by evaluating the accuracy of our methods on several shoeprint scenarios: partial prints, prints with varying levels of blurriness, prints with different amounts of wear, and prints from different shoe models. We find that models trained on one type of shoeprint yield extremely high levels of accuracy when tested on shoeprint pairs of the same scenario but fail to generalise to other scenarios. We also discover that models trained on a variety of scenarios predict almost as accurately as models trained on specific scenarios.

scenario, shoeprint, similarity metric, (14 more...)

2405.14878

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > West Virginia (0.04)
North America > United States > Virginia (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Zimmerman, Nicky, Giusti, Alessandro, Guzzi, Jérôme

Resource-Aware Collaborative Monte Carlo Localization with Distribution Compression

Global localization is essential in enabling robot autonomy, and collaborative localization is key for multi-robot systems. In this paper, we address the task of collaborative global localization under computational and communication constraints. We propose a method which reduces the amount of information exchanged and the computational cost. We also analyze, implement and open-source seminal approaches, which we believe to be a valuable contribution to the community. We exploit techniques for distribution compression in near-linear time, with error guarantees. We evaluate our approach and the implemented baselines on multiple challenging scenarios, simulated and real-world. Our approach can run online on an onboard computer. We release an open-source C++/ROS2 implementation of our approach, as well as the baselines

localization, particle, robot, (15 more...)

2404.0201

Genre:

Research Report (0.64)
Overview (0.47)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Draganov, Andrew, Saulpic, David, Schwiegelshohn, Chris

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly compress the data and perform the clustering on the compressed representation. Unfortunately, there is no universal best choice for compressing the number of points - while random sampling runs in sublinear time and coresets provide theoretical guarantees, the former does not enforce accuracy while the latter is too slow as the numbers of points and clusters grow. Indeed, it has been conjectured that any sensitivity-based coreset construction requires super-linear time in the dataset size. We examine this relationship by first showing that there does exist an algorithm that obtains coresets via sensitivity sampling in effectively linear time - within log-factors of the time it takes to read the data. Any approach that significantly improves on this must then resort to practical heuristics, leading us to consider the spectrum of sampling strategies across both real and artificial datasets in the static and streaming settings. Through this, we show the conditions in which coresets are necessary for preserving cluster validity as well as the settings in which faster, cruder sampling strategies are sufficient. As a result, we provide a comprehensive theoretical and practical blueprint for effective clustering regardless of data size. Our code is publicly available and has scripts to recreate the experiments.

algorithm, coreset, dataset, (16 more...)

2404.01936

Country:

North America > United States > Arizona > Maricopa County > Phoenix (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(21 more...)

Genre: Research Report (0.50)

Industry: Transportation (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Enhancing Functional Safety in Automotive AMS Circuits through Unsupervised Machine Learning

Arunachalam, Ayush, Kintz, Ian, Banerjee, Suvadeep, Raha, Arnab, Jin, Xiankun, Su, Fei, Prasanth, Viswanathan Pillai, Parekhji, Rubin A., Natarajan, Suriyaprakash, Basu, Kanad

Given the widespread use of safety-critical applications in the automotive field, it is crucial to ensure the Functional Safety (FuSa) of circuits and components within automotive systems. The Analog and Mixed-Signal (AMS) circuits prevalent in these systems are more vulnerable to faults induced by parametric perturbations, noise, environmental stress, and other factors, in comparison to their digital counterparts. However, their continuous signal characteristics present an opportunity for early anomaly detection, enabling the implementation of safety mechanisms to prevent system failure. To address this need, we propose a novel framework based on unsupervised machine learning for early anomaly detection in AMS circuits. The proposed approach involves injecting anomalies at various circuit locations and individual components to create a diverse and comprehensive anomaly dataset, followed by the extraction of features from the observed circuit signals. Subsequently, we employ clustering algorithms to facilitate anomaly detection. Finally, we propose a time series framework to enhance and expedite anomaly detection performance. Our approach encompasses a systematic analysis of anomaly abstraction at multiple levels pertaining to the automotive domain, from hardware- to block-level, where anomalies are injected to create diverse fault scenarios. By monitoring the system behavior under these anomalous conditions, we capture the propagation of anomalies and their effects at different abstraction levels, thereby potentially paving the way for the implementation of reliable safety mechanisms to ensure the FuSa of automotive SoCs. Our experimental findings indicate that our approach achieves 100% anomaly detection accuracy and significantly optimizes the associated latency by 5X, underscoring the effectiveness of our devised solution.

algorithm, anomaly, experiment, (16 more...)

2404.01632

Country: North America > United States > Texas (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Automobiles & Trucks (0.93)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Remil, Youcef, Bendimerad, Anes, Mathonat, Romain, Kaytoue, Mehdi

AIOps Solutions for Incident Management: Technical Guidelines and A Comprehensive Literature Review

arXiv.org Artificial IntelligenceApr-1-2024

The management of modern IT systems poses unique challenges, necessitating scalability, reliability, and efficiency in handling extensive data streams. Traditional methods, reliant on manual tasks and rule-based approaches, prove inefficient for the substantial data volumes and alerts generated by IT systems. Artificial Intelligence for Operating Systems (AIOps) has emerged as a solution, leveraging advanced analytics like machine learning and big data to enhance incident management. AIOps detects and predicts incidents, identifies root causes, and automates healing actions, improving quality and reducing operational costs. However, despite its potential, the AIOps domain is still in its early stages, decentralized across multiple sectors, and lacking standardized conventions. Research and industrial contributions are distributed without consistent frameworks for data management, target problems, implementation details, requirements, and capabilities. This study proposes an AIOps terminology and taxonomy, establishing a structured incident management procedure and providing guidelines for constructing an AIOps framework. The research also categorizes contributions based on criteria such as incident management tasks, application areas, data sources, and technical approaches. The goal is to provide a comprehensive review of technical and research aspects in AIOps for incident management, aiming to structure knowledge, identify gaps, and establish a foundation for future developments in the field.

acm sigkdd international conference, european software engineering conference, software fault localization, (14 more...)

2404.01363

Country:

North America > United States (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
(13 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Information Technology > Services (0.92)
(3 more...)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Quality (1.00)
(16 more...)

Alshahrani, Saied, Haroon, Hesham, Elfilali, Ali, Njie, Mariama, Matthews, Jeanna

Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition

arXiv.org Artificial IntelligenceMar-31-2024

Wikipedia articles (content pages) are commonly used corpora in Natural Language Processing (NLP) research, especially in low-resource languages other than English. Yet, a few research studies have studied the three Arabic Wikipedia editions, Arabic Wikipedia (AR), Egyptian Arabic Wikipedia (ARZ), and Moroccan Arabic Wikipedia (ARY), and documented issues in the Egyptian Arabic Wikipedia edition regarding the massive automatic creation of its articles using template-based translation from English to Arabic without human involvement, overwhelming the Egyptian Arabic Wikipedia with articles that do not only have low-quality content but also with articles that do not represent the Egyptian people, their culture, and their dialect. In this paper, we aim to mitigate the problem of template translation that occurred in the Egyptian Arabic Wikipedia by identifying these template-translated articles and their characteristics through exploratory analysis and building automatic detection systems. We first explore the content of the three Arabic Wikipedia editions in terms of density, quality, and human contributions and utilize the resulting insights to build multivariate machine learning classifiers leveraging articles' metadata to detect the template-translated articles automatically. We then publicly deploy and host the best-performing classifier, XGBoost, as an online application called EGYPTIAN WIKIPEDIA SCANNER and release the extracted, filtered, and labeled datasets to the research community to benefit from our datasets and the online, web-based detection system.

metadata, translation, wikipedia, (15 more...)

2404.00565

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Minnesota (0.04)
North America > Canada > Ontario > Toronto (0.04)
(11 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)