AITopics

Federated learning is a technique that enables the use of distributed datasets for machine learning purposes without requiring data to be pooled, thereby better preserving privacy and ownership of the data. While supervised FL research has grown substantially over the last years, unsupervised FL methods remain scarce. This work introduces an algorithm which implements K-means clustering in a federated manner, addressing the challenges of varying number of clusters between centers, as well as convergence on less separable datasets.

algorithm, dataset, noise, (13 more...)

2310.01195

Country:

Europe > Netherlands > South Holland > Delft (0.05)
North America > United States > Virginia (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

NP$^2$L: Negative Pseudo Partial Labels Extraction for Graph Neural Networks

Shen, Xinjie, Wu, Danyang, Lu, Jitao, Liang, Junjie, Xu, Jin, Nie, Feiping

How to utilize the pseudo labels has always been a research hotspot in machine learning. However, most methods use pseudo labels as supervised training, and lack of valid assessing for their accuracy. Moreover, applications of pseudo labels in graph neural networks (GNNs) oversee the difference between graph learning and other machine learning tasks such as message passing mechanism. Aiming to address the first issue, we found through a large number of experiments that the pseudo labels are more accurate if they are selected by not overlapping partial labels and defined as negative node pairs relations. Therefore, considering the extraction based on pseudo and partial labels, negative edges are constructed between two nodes by the negative pseudo partial labels extraction (NP$^2$E) module. With that, a signed graph are built containing highly accurate pseudo labels information from the original graph, which effectively assists GNN in learning at the message-passing level, provide one solution to the second issue. Empirical results about link prediction and node classification tasks on several benchmark datasets demonstrate the effectiveness of our method. State-of-the-art performance is achieved on the both tasks.

node, partial label, pseudo partial label, (13 more...)

2310.01098

Country:

North America > United States > Texas (0.05)
North America > United States > Wisconsin (0.05)
Asia > China > Shaanxi Province > Xi'an (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Khaksar, Weria, Astrup, Rasmus

Multi-Sensor Terrestrial SLAM for Real-Time, Large-Scale, and GNSS-Interrupted Forest Mapping

However, conducting real-time forest inventory in large-scale and GNSS-interrupted forest environments has long been a formidable challenge. In this paper, we present a novel solution that leverages robotics and sensor-fusion technologies to overcome these challenges and enable real-time forest inventory with higher accuracy and efficiency. The proposed solution consists of a new SLAM algorithm to create an accurate 3D map of large-scale forest stands with detailed estimation about the number of trees and the corresponding DBH, solely with the consecutive scans of a 3D lidar and an imu. This method utilized a hierarchical unsupervised clustering algorithm to detect the trees and measure the DBH from the lidar point cloud. The algorithm can run simultaneously as the data is being recorded or afterwards on the recorded dataset. Furthermore, due to the proposed fast feature extraction and transform estimation modules, the recorded data can be fed to the SLAM with higher frequency than common SLAM algorithms. The performance of the proposed solution was tested through filed data collection with hand-held sensor platform as well as a mobile forestry robot. The accuracy of the results was also compared to the state-of-the-art SLAM solutions.

algorithm, estimation, point cloud, (16 more...)

2310.01064

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)
(11 more...)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry:

Information Technology (0.93)
Food & Agriculture > Agriculture (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

HiGen: Hierarchical Graph Generative Networks

Karami, Mahdi

Most real-world graphs exhibit a hierarchical structure, which is often overlooked by existing graph generation methods. To address this limitation, we propose a novel graph generative network that captures the hierarchical nature of graphs and successively generates the graph sub-structures in a coarse-to-fine fashion. At each level of hierarchy, this model generates communities in parallel, followed by the prediction of cross-edges between communities using separate neural networks. This modular approach enables scalable graph generation for large and complex graphs. Moreover, we model the output distribution of edges in the hierarchical graph with a multinomial distribution and derive a recursive factorization for this distribution. This enables us to generate community graphs with integer-valued edge weights in an autoregressive manner. Empirical studies demonstrate the effectiveness and scalability of our proposed generative model, achieving state-of-the-art performance in terms of graph quality across various benchmark datasets. The code is available at https://github.com/Karami-m/HiGen_main.

graph, multinomial distribution, node, (14 more...)

2305.19337

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Luan, Qinmeng, Hamp, James

Automated regime detection in multidimensional time series data using sliced Wasserstein k-means clustering

arXiv.org Machine LearningOct-2-2023

Recent work has proposed Wasserstein k-means (Wk-means) clustering as a powerful method to identify regimes in time series data, and one-dimensional asset returns in particular. In this paper, we begin by studying in detail the behaviour of the Wasserstein k-means clustering algorithm applied to synthetic one-dimensional time series data. We study the dynamics of the algorithm and investigate how varying different hyperparameters impacts the performance of the clustering algorithm for different random initialisations. We compute simple metrics that we find are useful in identifying high-quality clusterings. Then, we extend the technique of Wasserstein k-means clustering to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance, resulting in a method we call `sliced Wasserstein k-means (sWk-means) clustering'. We apply the sWk-means clustering method to the problem of automated regime detection in multidimensional time series data, using synthetic data to demonstrate the validity of the approach. Finally, we show that the sWk-means method is effective in identifying distinct market regimes in real multidimensional financial time series, using publicly available foreign exchange spot rate data as a case study. We conclude with remarks about some limitations of our approach and potential complementary or alternative approaches.

artificial intelligence, machine learning, regime, (17 more...)

arXiv.org Machine Learning

2310.01285

Country:

Asia > Japan (0.14)
North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.81)

Industry: Banking & Finance > Economy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceOct-1-2023

Determining the Optimal Number of Clusters for Time Series Datasets with Symbolic Pattern Forest

Raihan, Md Nishat

Clustering algorithms are among the most widely used data mining methods due to their exploratory power and being an initial preprocessing step that paves the way for other techniques. But the problem of calculating the optimal number of clusters (say k) is one of the significant challenges for such methods. The most widely used clustering algorithms like k-means and k-shape in time series data mining also need the ground truth for the number of clusters that need to be generated. In this work, we extended the Symbolic Pattern Forest algorithm, another time series clustering algorithm, to determine the optimal number of clusters for the time series datasets. We used SPF to generate the clusters from the datasets and chose the optimal number of clusters based on the Silhouette Coefficient, a metric used to calculate the goodness of a clustering technique. Silhouette was calculated on both the bag of word vectors and the tf-idf vectors generated from the SAX words of each time series. We tested our approach on the UCR archive datasets, and our experimental results so far showed significant improvement over the baseline.

dataset, time sery, vector, (10 more...)

2310.0082

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

de-Lima-Santos, Mathias-Felipe, Gonçalves, Isabella, Quiles, Marcos G., Mesquita, Lucia, Ceron, Wilson

Visual Political Communication in a Polarized Society: A Longitudinal Study of Brazilian Presidential Elections on Instagram

In today's digital age, images have emerged as powerful tools for politicians to engage with their voters on social media platforms. Visual content possesses a unique emotional appeal that often leads to increased user engagement. However, research on visual communication remains relatively limited, particularly in the Global South. This study aims to bridge this gap by employing a combination of computational methods and qualitative approach to investigate the visual communication strategies employed in a dataset of 11,263 Instagram posts by 19 Brazilian presidential candidates in 2018 and 2022 national elections. Through two studies, we observed consistent patterns across these candidates on their use of visual political communication. Notably, we identify a prevalence of celebratory and positively toned images. They also exhibit a strong sense of personalization, portraying candidates connected with their voters on a more emotional level. We note a substantial presence of screenshots from news websites and other social media platforms. Furthermore, text-edited images with portrayals emerge as a prominent feature. In light of these results, we engage in a discussion regarding the implications for the broader field of visual political communication. This article serves as a testament to the pivotal role that Instagram has played in shaping the narrative of two fiercely polarized Brazilian elections, casting a revealing light on the ever-evolving dynamics of visual political communication in the digital age. Finally, we propose avenues for future research in the realm of visual political communication. Introduction In the ever-evolving arena of election campaigns, candidates rely heavily on the media as their megaphone to amplify their messages to the masses. Over the years, the landscape of political communication has undergone a profound transformation. This transformation has been driven by the rise of online social media platforms, which have emerged as indispensable tools for candidates in their quest to gauge public sentiment and rally support from the electorate (Boulianne & Olof Larsson, 2023; Farkas & Bene, 2021). The significance of this transformation has been further accentuated by the global ascent of populist leaders, spanning diverse nations, who have wholeheartedly embraced social media as their primary mode of communication (Bernardi & Costa, 2020; Novoselova, 2020).

communication, instagram, political communication, (15 more...)

2310.00349

Country:

North America > United States (0.28)
South America > Brazil > São Paulo (0.04)
North America > Central America (0.04)
(5 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > South America Government > Brazil Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Zhang, Guangxin, Chen, Shu

Siamese Representation Learning for Unsupervised Relation Extraction

Unsupervised relation extraction (URE) aims at discovering underlying relations between named entity pairs from open-domain plain text without prior information on relational distribution. Existing URE models utilizing contrastive learning, which attract positive samples and repulse negative samples to promote better separation, have got decent effect. However, fine-grained relational semantic in relationship makes spurious negative samples, damaging the inherent hierarchical structure and hindering performances. To tackle this problem, we propose Siamese Representation Learning for Unsupervised Relation Extraction -- a novel framework to simply leverage positive pairs to representation learning, possessing the capability to effectively optimize relation representation of instances and retain hierarchical information in relational feature space. Experimental results show that our model significantly advances the state-of-the-art results on two benchmark datasets and detailed analyses demonstrate the effectiveness and robustness of our proposed model on unsupervised relation extraction.

proceedings, relation, representation, (14 more...)

2310.00552

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(6 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Bhope, Rahul Atul, Jayaram, K. R., Venkatasubramanian, Nalini, Verma, Ashish, Thomas, Gegi

FLIPS: Federated Learning using Intelligent Participant Selection

This paper presents the design and implementation of FLIPS, a middleware system to manage data and participant heterogeneity in federated learning (FL) training workloads. In particular, we examine the benefits of label distribution clustering on participant selection in federated learning. FLIPS clusters parties involved in an FL training job based on the label distribution of their data apriori, and during FL training, ensures that each cluster is equitably represented in the participants selected. FLIPS can support the most common FL algorithms, including FedAvg, FedProx, FedDyn, FedOpt and FedYogi. To manage platform heterogeneity and dynamic resource availability, FLIPS incorporates a straggler management mechanism to handle changing capacities in distributed, smart community applications. Privacy of label distributions, clustering and participant selection is ensured through a trusted execution environment (TEE). Our comprehensive empirical evaluation compares FLIPS with random participant selection, as well as three other "smart" selection mechanisms - Oort, TiFL and gradient clustering using two real-world datasets, two benchmark datasets, two different non-IID distributions and three common FL algorithms (FedYogi, FedProx and FedAvg). We demonstrate that FLIPS significantly improves convergence, achieving higher accuracy by 17 - 20 % with 20 - 60 % lower communication costs, and these benefits endure in the presence of straggler participants.

accuracy, dataset, federated learning, (7 more...)

2308.03901

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Virginia (0.04)
North America > United States > California > Orange County > Anaheim (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Auxo: Efficient Federated Learning via Scalable Client Clustering

Liu, Jiachen, Lai, Fan, Dai, Yinwei, Akella, Aditya, Madhyastha, Harsha, Chowdhury, Mosharaf

Federated learning (FL) is an emerging machine learning (ML) paradigm that enables heterogeneous edge devices to collaboratively train ML models without revealing their raw data to a logically centralized server. However, beyond the heterogeneous device capacity, FL participants often exhibit differences in their data distributions, which are not independent and identically distributed (Non-IID). Many existing works present point solutions to address issues like slow convergence, low final accuracy, and bias in FL, all stemming from client heterogeneity. In this paper, we explore an additional layer of complexity to mitigate such heterogeneity by grouping clients with statistically similar data distributions (cohorts). We propose Auxo to gradually identify such cohorts in large-scale, low-availability, and resource-constrained FL populations. Auxo then adaptively determines how to train cohort-specific models in order to achieve better model performance and ensure resource efficiency. Our extensive evaluations show that, by identifying cohorts with smaller heterogeneity and performing efficient cohort-based training, Auxo boosts various existing FL solutions in terms of final accuracy (2.1% - 8.2%), convergence time (up to 2.2x), and model bias (4.8% - 53.8%).

auxo, cohort, heterogeneity, (15 more...)

doi: 10.1145/3620678.3624651

2210.16656

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > California > Santa Cruz County > Santa Cruz (0.05)
North America > United States > Virginia (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)