AITopics

2409.09957

Country:

North America > United States > Illinois (0.04)
Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)

Genre: Overview (1.00)

Industry:

Information Technology (0.67)
Education (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)

arXiv.org Artificial IntelligenceSep-15-2024

Towards Multi-view Graph Anomaly Detection with Similarity-Guided Contrastive Clustering

Zheng, Lecheng, Birge, John R., Zhang, Yifang, He, Jingrui

Anomaly detection on graphs plays an important role in many real-world applications. Usually, these data are composed of multiple types (e.g., user information and transaction records for financial data), thus exhibiting view heterogeneity. Therefore, it can be challenging to leverage such multi-view information and learn the graph's contextual information to identify rare anomalies. To tackle this problem, many deep learning-based methods utilize contrastive learning loss as a regularization term to learn good representations. However, many existing contrastive-based methods show that traditional contrastive learning losses fail to consider the semantic information (e.g., class membership information). In addition, we theoretically show that clustering-based contrastive learning also easily leads to a sub-optimal solution. To address these issues, in this paper, we proposed an autoencoder-based clustering framework regularized by a similarity-guided contrastive loss to detect anomalous nodes. Specifically, we build a similarity map to help the model learn robust representations without imposing a hard margin constraint between the positive and negative pairs. Theoretically, we show that the proposed similarity-guided loss is a variant of contrastive learning loss, and how it alleviates the issue of unreliable pseudo-labels with the connection to graph spectral clustering. Experimental results on several datasets demonstrate the effectiveness and efficiency of our proposed framework.

dataset, node, proceedings, (13 more...)

2409.0977

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(32 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Kaladagi, Basavaraj, Pujari, Jagadeesh

A Novel Framework For Text Detection From Natural Scene Images With Complex Background

arXiv.org Artificial IntelligenceSep-15-2024

Recognizing texts from camera images is a known hard problem because of the difficulties in text detection from the varied and complicated background. In this paper we propose a novel and efficient method to detect text region from images with complex background using Wavelet Transforms. The framework uses Wavelet Transformation of the original image in its grayscale form followed by Sub-band filtering. Then Region clustering technique is applied using centroids of the regions, further Bounding box is fitted to each region thus identifying the text regions. This method is much sophisticated and efficient than the previous methods as it doesn't stick to a particular font size of the text thus, making it generalized. The sample set used for experimental purpose consists of 50 images with varying backgrounds. Images with edge prominence are considered. Furthermore, our method can be easily customized for applications with different scopes.

background, text detection, text region, (10 more...)

2409.09635

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Ghosh, Sagar, Das, Swagatam

Consistent Spectral Clustering in Hyperbolic Spaces

arXiv.org Machine LearningSep-14-2024

Clustering, as an unsupervised technique, plays a pivotal role in various data analysis applications. Among clustering algorithms, Spectral Clustering on Euclidean Spaces has been extensively studied. However, with the rapid evolution of data complexity, Euclidean Space is proving to be inefficient for representing and learning algorithms. Although Deep Neural Networks on hyperbolic spaces have gained recent traction, clustering algorithms or non-deep machine learning models on non-Euclidean Spaces remain underexplored. In this paper, we propose a spectral clustering algorithm on Hyperbolic Spaces to address this gap. Hyperbolic Spaces offer advantages in representing complex data structures like hierarchical and tree-like structures, which cannot be embedded efficiently in Euclidean Spaces. Our proposed algorithm replaces the Euclidean Similarity Matrix with an appropriate Hyperbolic Similarity Matrix, demonstrating improved efficiency compared to clustering in Euclidean Spaces. Our contributions include the development of the spectral clustering algorithm on Hyperbolic Spaces and the proof of its weak consistency. We show that our algorithm converges at least as fast as Spectral Clustering on Euclidean Spaces. To illustrate the efficacy of our approach, we present experimental results on the Wisconsin Breast Cancer Dataset, highlighting the superior performance of Hyperbolic Spectral Clustering over its Euclidean counterpart. This work opens up avenues for utilizing non-Euclidean Spaces in clustering algorithms, offering new perspectives for handling complex data structures and improving clustering efficiency.

algorithm, clustering, spectral clustering, (14 more...)

2409.09304

Country:

North America > United States > Wisconsin (0.25)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(7 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningSep-14-2024

Distributed Clustering based on Distributional Kernel

Zhang, Hang, Xu, Yang, Gong, Lei, Zhu, Ye, Ting, Kai Ming

This paper introduces a new framework for clustering in a distributed network called Distributed Clustering based on Distributional Kernel (K) or KDC that produces the final clusters based on the similarity with respect to the distributions of initial clusters, as measured by K. It is the only framework that satisfies all three of the following properties. First, KDC guarantees that the combined clustering outcome from all sites is equivalent to the clustering outcome of its centralized counterpart from the combined dataset from all sites. Second, the maximum runtime cost of any site in distributed mode is smaller than the runtime cost in centralized mode. Third, it is designed to discover clusters of arbitrary shapes, sizes and densities. To the best of our knowledge, this is the first distributed clustering framework that employs a distributional kernel. The distribution-based clustering leads directly to significantly better clustering outcomes than existing methods of distributed clustering. In addition, we introduce a new clustering algorithm called Kernel Bounded Cluster Cores, which is the best clustering algorithm applied to KDC among existing clustering algorithms. We also show that KDC is a generic framework that enables a quadratic time clustering algorithm to deal with large datasets that would otherwise be impossible.

algorithm, clustering, dataset, (15 more...)

2409.09418

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
Oceania > Australia (0.04)
North America > United States > Texas (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Quetti, Federico Maria, Figini, Silvia, ballante, Elena

A Bayesian Approach to Clustering via the Proper Bayesian Bootstrap: the Bayesian Bagged Clustering (BBC) algorithm

arXiv.org Machine LearningSep-13-2024

The paper presents a novel approach for unsupervised techniques in the field of clustering. A new method is proposed to enhance existing literature models using the proper Bayesian bootstrap to improve results in terms of robustness and interpretability. Our approach is organized in two steps: k-means clustering is used for prior elicitation, then proper Bayesian bootstrap is applied as resampling method in an ensemble clustering approach. Results are analyzed introducing measures of uncertainty based on Shannon entropy. The proposal provides clear indication on the optimal number of clusters, as well as a better representation of the clustered data. Empirical results are provided on simulated data showing the methodological and empirical advances obtained.

bootstrap, dataset, different value, (13 more...)

2409.08954

Country:

Europe > Italy (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report (1.00)
Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

Maréchal, Loïc, Monnet, Nathan

Disentangling the sources of cyber risk premia

arXiv.org Artificial IntelligenceSep-13-2024

We use a methodology based on a machine learning algorithm to quantify firms' cyber risks based on their disclosures and a dedicated cyber corpus. The model can identify paragraphs related to determined cyber-threat types and accordingly attribute several related cyber scores to the firm. The cyber scores are unrelated to other firms' characteristics. Stocks with high cyber scores significantly outperform other stocks. The long-short cyber risk factors have positive risk premia, are robust to all factors' benchmarks, and help price returns. Furthermore, we suggest the market does not distinguish between different types of cyber risks but instead views them as a single, aggregate cyber risk.

cyber score, paragraph, portfolio, (15 more...)

2409.08728

Country:

North America > United States (0.46)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Germany > Berlin (0.04)
Asia > China (0.04)

Genre:

Financial News (0.93)
Research Report > New Finding (0.92)
Research Report > Experimental Study (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.69)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Machine LearningSep-12-2024

Federated One-Shot Ensemble Clustering

Duan, Rui, Xiong, Xin, Liu, Jueyi, Liao, Katherine P., Cai, Tianxi

Cluster analysis across multiple institutions poses significant challenges due to data-sharing restrictions. To overcome these limitations, we introduce the Federated One-shot Ensemble Clustering (FONT) algorithm, a novel solution tailored for multi-site analyses under such constraints. FONT requires only a single round of communication between sites and ensures privacy by exchanging only fitted model parameters and class labels. The algorithm combines locally fitted clustering models into a data-adaptive ensemble, making it broadly applicable to various clustering techniques and robust to differences in cluster proportions across sites. Our theoretical analysis validates the effectiveness of the data-adaptive weights learned by FONT, and simulation studies demonstrate its superior performance compared to existing benchmark methods. We applied FONT to identify subgroups of patients with rheumatoid arthritis across two health systems, revealing improved consistency of patient clusters across sites, while locally fitted clusters proved less transferable. FONT is particularly well-suited for real-world applications with stringent communication and privacy constraints, offering a scalable and practical solution for multi-site clustering.

algorithm, cluster membership, local model, (16 more...)

2409.08396

Country:

North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
North America > United States > Michigan (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.48)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Jing, Yanzhi, Zhao, Hongguang, Yu, Shujun

Establish seedling quality classification standard for Chrysanthemum efficiently with help of deep clustering algorithm

arXiv.org Artificial IntelligenceSep-11-2024

Chrysanthemum is one of the most popular flower in The establishment of seedling quality classification standards the world(Spaargaren and van Geest (2018)). With the advancement aims to ensure that the growth and yield of crops, of modern medical and chemical technology, horticultural plants, and forestry trees meet expected levels, researchers have found that edible chrysanthemum is rich thereby promoting the sustainable development of agriculture, in functional health ingredients(Jingyun, Baiyi and Baojun horticulture, and forestry(Sutton (1980)).These standards (2021)), such as a variety of vitamins, minerals and amino not only ensure production quality and increase yield acids, chlorogenic acid, quercetin and baicalin, etc(Rop, Mlcek and quality but also enhance plant resistance to pests and and Jurikova (2012)). The beneficial effects of Chrysanthemum diseases, promote varietal improvement, reduce production are primarily attributed to its phenolic bioactive risks, regulate market order, and facilitate international compounds, such as flavonoids and phenolic acids(Tian, trade(Novikov, Sokolov, Drapalyuk, Zelikov and Ivetić Li, Li, Zhi, Li, Tang, Yang, Yin and Ming (2018)). These (2019)).Overall, the implementation of seedling quality classification compounds are believed to possess antibacterial, antiviral, standards helps optimize production, protect the anti-inflammatory, and antioxidant properties, as well as free environment, and improve economic benefits, thereby laying radical scavenging capabilities. They contribute to cardiovascular a solid foundation for the sustainable development of agriculture protection, prevention of coronary heart disease, and related industries.

classification standard, correlation, quality classification standard, (13 more...)

2409.08867

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States > California (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Education > Health & Safety > School Nutrition (0.86)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Gerard, Patrick, Volkova, Svitlana, Penafiel, Louis, Lerman, Kristina, Weninger, Tim

Modeling Information Narrative Detection and Evolution on Telegram during the Russia-Ukraine War

arXiv.org Artificial IntelligenceSep-11-2024

Following the Russian Federation's full-scale invasion of Ukraine in February 2022, a multitude of information narratives emerged within both pro-Russian and pro-Ukrainian communities online. As the conflict progresses, so too do the information narratives, constantly adapting and influencing local and global community perceptions and attitudes. This dynamic nature of the evolving information environment (IE) underscores a critical need to fully discern how narratives evolve and affect online communities. Existing research, however, often fails to capture information narrative evolution, overlooking both the fluid nature of narratives and the internal mechanisms that drive their evolution. Recognizing this, we introduce a novel approach designed to both model narrative evolution and uncover the underlying mechanisms driving them. In this work we perform a comparative discourse analysis across communities on Telegram covering the initial three months following the invasion. First, we uncover substantial disparities in narratives and perceptions between pro-Russian and pro-Ukrainian communities. Then, we probe deeper into prevalent narratives of each group, identifying key themes and examining the underlying mechanisms fueling their evolution. Finally, we explore influences and factors that may shape the development and spread of narratives.

evolution, narrative, story cluster, (16 more...)

2409.07684

Country:

Asia > Russia (1.00)
Europe > Russia (0.73)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.14)
(7 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry:

Media > News (1.00)
Government > Military (1.00)
Government > Regional Government > Europe Government > Russia Government (0.46)
Government > Regional Government > Asia Government > Russia Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)