AITopics

2410.03208

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > France (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Das, Rohan, Chandra, Aditya, Lee, I-Ta, Pacheco, Maria Leonor

Media Framing through the Lens of Event-Centric Narratives

arXiv.org Artificial IntelligenceOct-4-2024

From a communications perspective, a frame defines the packaging of the language used in such a way as to encourage certain interpretations and to discourage others. For example, a news article can frame immigration as either a boost or a drain on the economy, and thus communicate very different interpretations of the same phenomenon. In this work, we argue that to explain framing devices we have to look at the way narratives are constructed. As a first step in this direction, we propose a framework that extracts events and their relations to other events, and groups them into high-level narratives that help explain frames in news articles. We show that our framework can be used to analyze framing in U.S. news for two different domains: immigration and gun control.

computational linguistic, narrative cluster, relation, (15 more...)

2410.03151

Country:

North America > Mexico (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
(16 more...)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Immigration & Customs (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Machine LearningOct-4-2024

Clustering Alzheimer's Disease Subtypes via Similarity Learning and Graph Diffusion

Wei, Tianyi, Yang, Shu, Tarzanagh, Davoud Ataee, Bao, Jingxuan, Xu, Jia, Orzechowski, Patryk, Wagenaar, Joost B., Long, Qi, Shen, Li

Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Due to the heterogeneous nature of AD, its diagnosis and treatment pose critical challenges. Consequently, there is a growing research interest in identifying homogeneous AD subtypes that can assist in addressing these challenges in recent years. In this study, we aim to identify subtypes of AD that represent distinctive clinical features and underlying pathology by utilizing unsupervised clustering with graph diffusion and similarity learning. We adopted SIMLR, a multi-kernel similarity learning framework, and graph diffusion to perform clustering on a group of 829 patients with AD and mild cognitive impairment (MCI, a prodromal stage of AD) based on their cortical thickness measurements extracted from magnetic resonance imaging (MRI) scans. Although the clustering approach we utilized has not been explored for the task of AD subtyping before, it demonstrated significantly better performance than several commonly used clustering methods. Specifically, we showed the power of graph diffusion in reducing the effects of noise in the subtype detection. Our results revealed five subtypes that differed remarkably in their biomarkers, cognitive status, and some other clinical features. To evaluate the resultant subtypes further, a genetic association study was carried out and successfully identified potential genetic underpinnings of different AD subtypes. Our source code is available at: https://github.com/PennShenLab/AD-SIMLR.

alzheimer, graph diffusion, subtype, (15 more...)

arXiv.org Machine Learning

2410.03937

Country:

North America > United States > California (0.28)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > Canada (0.04)
Europe > Poland > Lesser Poland Province > Kraków (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Sergei Vassilvitskii

Fair Clustering Through Fairlets

Neural Information Processing SystemsOct-3-2024, 23:07:28 GMT

We study the question of fair clustering under the disparate impact doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the k-center and the k-median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventions--for instance a point may no longer be assigned to its nearest cluster center! En route we introduce the concept of fairlets, which are minimal sets that satisfy fair representation while approximately preserving the clustering objective. We show that any fair clustering problem can be decomposed into first finding good fairlets, and then using existing machinery for traditional clustering algorithms. While finding good fairlets can be NP-hard, we proceed to obtain efficient approximation algorithms based on minimum cost flow. We empirically demonstrate the price of fairness by quantifying the value of fair clustering on real-world datasets with sensitive attributes.

algorithm, decomposition, fairlet decomposition, (14 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Italy > Lazio > Rome (0.04)

Industry: Law (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Neural Information Processing SystemsOct-3-2024, 05:39:45 GMT

Adaptive Clustering through Semidefinite Programming

Martin Royer

Neural Information Processing Systems http://nips.cc/

corr, estimator, recovery, (14 more...)

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(3 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Neural Information Processing SystemsOct-3-2024, 04:40:30 GMT

Independence clustering (without a matrix)

Daniil Ryabko

Since mutual independence is the target, pairwise similarity measurements are of no use, and thus traditional clustering algorithms are inapplicable. The distribution of the random variables in S is, in general, unknown, but a sample is available. Thus, the problem is cast in terms of time series. Two forms of sampling are considered: i.i.d. and stationary time series, with the main emphasis being on the latter, more general, case. A consistent, computationally tractable algorithm for each of the settings is proposed, and a number of fascinating open directions for further research are outlined.

algorithm, independence, mutual information, (15 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > France (0.04)
(3 more...)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Neural Information Processing SystemsOct-3-2024, 02:28:06 GMT

Sparse Embedded $k$-Means Clustering

Weiwei Liu, Xiaobo Shen, Ivor Tsang

The k-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster k-means clustering. Their method delivers many improvements over other dimensionality reduction methods.

dimension, matrix, probability, (14 more...)

Country:

Oceania > Australia > New South Wales (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Mohammadhossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, Vahab Mirrokni

Affinity Clustering: Hierarchical Clustering at Scale

Neural Information Processing SystemsOct-3-2024, 01:39:40 GMT

Graph clustering is a fundamental task in many data-mining and machine-learning pipelines. In particular, identifying a good hierarchical structure is at the same time a fundamental and challenging problem for several applications. The amount of data to analyze is increasing at an astonishing rate each day. Hence there is a need for new solutions to efficiently compute effective hierarchical clusterings on such huge data. The main focus of this paper is on minimum spanning tree (MST) based clusterings. In particular, we propose affinity, a novel hierarchical clustering based on Borůvka's MST algorithm. We prove certain theoretical guarantees for affinity (as well as some other classic algorithms) and show that in practice it is superior to several other state-of-the-art clustering algorithms.

affinity, algorithm, proceedings, (15 more...)

Country:

North America > United States > Maryland (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(16 more...)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Ghadiri, Nasser, Javadi, Bahman, Obst, Oliver, Pfautsch, Sebastian

Data Optimisation of Machine Learning Models for Smart Irrigation in Urban Parks

arXiv.org Artificial IntelligenceOct-3-2024

Urban environments face significant challenges due to climate change, including extreme heat, drought, and water scarcity, which impact public health, community well-being, and local economies. Effective management of these issues is crucial, particularly in areas like Sydney Olympic Park, which relies on one of Australia's largest irrigation systems. The Smart Irrigation Management for Parks and Cool Towns (SIMPaCT) project, initiated in 2021, leverages advanced technologies and machine learning models to optimize irrigation and induce physical cooling. This paper introduces two novel methods to enhance the efficiency of the SIMPaCT system's extensive sensor network and applied machine learning models. The first method employs clustering of sensor time series data using K-shape and K-means algorithms to estimate readings from missing sensors, ensuring continuous and reliable data. This approach can detect anomalies, correct data sources, and identify and remove redundant sensors to reduce maintenance costs. The second method involves sequential data collection from different sensor locations using robotic systems, significantly reducing the need for high numbers of stationary sensors. Together, these methods aim to maintain accurate soil moisture predictions while optimizing sensor deployment and reducing maintenance costs, thereby enhancing the efficiency and effectiveness of the smart irrigation system. Our evaluations demonstrate significant improvements in the efficiency and cost-effectiveness of soil moisture monitoring networks. The cluster-based replacement of missing sensors provides up to 5.4% decrease in average error. The sequential sensor data collection as a robotic emulation shows 17.2% and 2.1% decrease in average error for circular and linear paths respectively.

artificial intelligence, machine learning, sensor, (17 more...)

doi: 10.1109/IUCC65928.2024.00012

2410.02335

Country:

Asia > Thailand (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Food & Agriculture > Agriculture (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.99)

Wang, George, Hoogland, Jesse, van Wingerden, Stan, Furman, Zach, Murfet, Daniel

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

arXiv.org Artificial IntelligenceOct-3-2024

Structure in the data distribution has long been recognized as central to the development of internal structure in artificial and biological neural networks (Rumelhart et al., 1986; Olshausen & Field, 1996; Rogers & McClelland, 2004). Recent observations have renewed interest in this topic: language models progress through distinct stages of development during training, acquiring increasingly sophisticated linguistic and reasoning abilities in ways that seem to reflect the structure of the data distribution (Olsson et al., 2022; Chen et al., 2024; Belrose et al., 2024; Tigges et al., 2024; Edelman et al., 2024; Hoogland et al., 2024). A deeper understanding of how structure in the data determines internal structure in trained models requires tools that provide information about which components of a model are being shaped in response to what structure in the data distribution. Our foundation for the study of such questions begins with the local learning coefficient (LLC; Lau et al. 2023) from singular learning theory (SLT; Watanabe 2009), which is a measure of model complexity. In this paper, we introduce the refined local learning coefficient (rLLC), which measures the complexity of a component of the model with respect to an arbitrary data distribution. We focus mainly on the rLLCs of individual attention heads and demonstrate the utility of these metrics in studying the progressive differentiation and specialization of heads. The diversity of attention heads at the end of training has been established in recent years through mechanistic interpretability, which has provided numerous examples of attention heads that appear to have specialized functions, including previous-token heads (Voita et al., 2019; Clark et al., 2019) and induction heads (Olsson et al., 2022) among other kinds (Wang et al., 2023; Gould et al., 2024).

cluster 1, machine learning, natural language, (19 more...)