AITopics

2305.17518

Country:

Europe > Germany (0.04)
Asia > China (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Chakraborty, Abhisek, Bhattacharya, Anirban, Pati, Debdeep

Fair Clustering via Hierarchical Fair-Dirichlet Process

arXiv.org Artificial IntelligenceMay-27-2023

The advent of ML-driven decision-making and policy formation has led to an increasing focus on algorithmic fairness. As clustering is one of the most commonly used unsupervised machine learning approaches, there has naturally been a proliferation of literature on {\em fair clustering}. A popular notion of fairness in clustering mandates the clusters to be {\em balanced}, i.e., each level of a protected attribute must be approximately equally represented in each cluster. Building upon the original framework, this literature has rapidly expanded in various aspects. In this article, we offer a novel model-based formulation of fair clustering, complementing the existing literature which is almost exclusively based on optimizing appropriate objective functions.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2305.17557

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.67)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

O'Connell, Brandon J., Rogers, Timothy J.

A Robust Probabilistic Approach to Stochastic Subspace Identification

arXiv.org Artificial IntelligenceMay-26-2023

Modal parameter estimation of operational structures is often a challenging task when confronted with unwanted distortions (outliers) in field measurements. Atypical observations present a problem to operational modal analysis (OMA) algorithms, such as stochastic subspace identification (SSI), severely biasing parameter estimates and resulting in misidentification of the system. Despite this predicament, no simple mechanism currently exists capable of dealing with such anomalies in SSI. Addressing this problem, this paper first introduces a novel probabilistic formulation of stochastic subspace identification (Prob-SSI), realised using probabilistic projections. Mathematically, the equivalence between this model and the classic algorithm is demonstrated. This fresh perspective, viewing SSI as a problem in probabilistic inference, lays the necessary mathematical foundation to enable a plethora of new, more sophisticated OMA approaches. To this end, a statistically robust SSI algorithm (robust Prob-SSI) is developed, capable of providing a principled and automatic way of handling outlying or anomalous data in the measured timeseries, such as may occur in field recordings, e.g. intermittent sensor dropout. Robust Prob-SSI is shown to outperform conventional SSI when confronted with 'corrupted' data, exhibiting improved identification performance and higher levels of confidence in the found poles when viewing consistency (stabilisation) diagrams. Similar benefits are also demonstrated on the Z24 Bridge benchmark dataset, highlighting enhanced performance on measured systems.

artificial intelligence, machine learning, robust prob-ssi, (14 more...)

2305.16836

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.91)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Jaiswal, Ragesh, Kumar, Amit

Universal Weak Coreset

arXiv.org Artificial IntelligenceMay-26-2023

Coresets for $k$-means and $k$-median problems yield a small summary of the data, which preserve the clustering cost with respect to any set of $k$ centers. Recently coresets have also been constructed for constrained $k$-means and $k$-median problems. However, the notion of coresets has the drawback that (i) they can only be applied in settings where the input points are allowed to have weights, and (ii) in general metric spaces, the size of the coresets can depend logarithmically on the number of points. The notion of weak coresets, which have less stringent requirements than coresets, has been studied in the context of classical $k$-means and $k$-median problems. A weak coreset is a pair $(J,S)$ of subsets of points, where $S$ acts as a summary of the point set and $J$ as a set of potential centers. This pair satisfies the properties that (i) $S$ is a good summary of the data as long as the $k$ centers are chosen from $J$ only, and (ii) there is a good choice of $k$ centers in $J$ with cost close to the optimal cost. We develop this framework, which we call universal weak coresets, for constrained clustering settings. In conjunction with recent coreset constructions for constrained settings, our designs give greater data compression, are conceptually simpler, and apply to a wide range of constrained $k$-median and $k$-means problems.

artificial intelligence, coreset, machine learning, (17 more...)

2305.1689

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Germany (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.31)

Durdy, Samantha, Gaultois, Michael W., Gusev, Vladimir, Bollegala, Danushka, Rosseinsky, Matthew J.

Metrics for quantifying isotropy in high dimensional unsupervised clustering tasks in a materials context

Clustering is a common task in machine learning, but clusters of unlabelled data can be hard to quantify. The application of clustering algorithms in chemistry is often dependant on material representation. Ascertaining the effects of different representations, clustering algorithms, or data transformations on the resulting clusters is difficult due to the dimensionality of these data. We present a thorough analysis of measures for isotropy of a cluster, including a novel implantation based on an existing derivation. Using fractional anisotropy, a common method used in medical imaging for comparison, we then expand these measures to examine the average isotropy of a set of clusters. A use case for such measures is demonstrated by quantifying the effects of kernel approximation functions on different representations of the Inorganic Crystal Structure Database. Broader applicability of these methods is demonstrated in analysing learnt embedding of the MNIST dataset. Random clusters are explored to examine the differences between isotropy measures presented, and to see how each method scales with the dimensionality. Python implementations of these measures are provided for use by the community.

data mining, isotropy, machine learning, (19 more...)

2305.16372

Country: Europe (0.67)

Genre: Research Report (0.82)

Industry:

Health & Medicine (1.00)
Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Areed, wala Draidi, Price, Aiden, Arnett, Kathryn, Thompson, Helen, Malseed, Reid, Mengersen, Kerrie

Assessing the Spatial Structure of the Association between Attendance at Preschool and Childrens Developmental Vulnerabilities in Queensland Australia

Demographic and educational factors are essential, influential factors of early childhood development. This study aimed to investigate spatial patterns in the association between attendance at preschool and children's developmental vulnerabilities in one or more domain(s) in their first year of full-time school at a small area level in Queensland, Australia. This was achieved by applying geographically weighted regression (GWR) followed by K-means clustering of the regression coefficients. Three distinct geographical clusters were found in Queensland using the GWR coefficients. The first cluster covered more than half of the state of Queensland, including the Greater Brisbane region, and displays a strong negative association between developmental vulnerabilities and attendance at preschool. That is, areas with high proportions of preschool attendance tended to have lower proportions of children with at least one developmental vulnerability in the first year of full-time school. Clusters two and three were characterized by stronger negative associations between developmental vulnerabilities, English as the mother language, and geographic remoteness, respectively. This research provides evidence of the need for collaboration between health and education sectors in specific regions of Queensland to update current service provision policies and to ensure holistic and appropriate care is available to support children with developmental vulnerabilities.

artificial intelligence, coefficient, machine learning, (16 more...)

2305.15746

Country:

Oceania > Australia > Queensland (1.00)
Oceania > Australia > Tasmania (0.04)
Oceania > Australia > Northern Territory (0.04)
(13 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (1.00)
Health & Medicine > Public Health (0.68)
Education > Educational Setting (0.68)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Ordered and Binary Speaker Embedding

Wang, Jiaying, Wang, Xianglong, Wang, Namin, Li, Lantian, Wang, Dong

Modern speaker recognition systems represent utterances by embedding vectors. Conventional embedding vectors are dense and non-structural. In this paper, we propose an ordered binary embedding approach that sorts the dimensions of the embedding vector via a nested dropout and converts the sorted vectors to binary codes via Bernoulli sampling. The resultant ordered binary codes offer some important merits such as hierarchical clustering, reduced memory usage, and fast retrieval. These merits were empirically verified by comprehensive experiments on a speaker identification task with the VoxCeleb and CN-Celeb datasets.

artificial intelligence, binary code, machine learning, (20 more...)

2305.16043

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Unsupervised machine learning framework for discriminating major variants of concern during COVID-19

Chandra, Rohitash, Bansal, Chaarvi, Kang, Mingyue, Blau, Tom, Agarwal, Vinti, Singh, Pranjal, Wilson, Laurence O. W., Vasan, Seshadri

Due to the high mutation rate of the virus, the COVID-19 pandemic evolved rapidly. Certain variants of the virus, such as Delta and Omicron, emerged with altered viral properties leading to severe transmission and death rates. These variants burdened the medical systems worldwide with a major impact to travel, productivity, and the world economy. Unsupervised machine learning methods have the ability to compress, characterize, and visualize unlabelled data. This paper presents a framework that utilizes unsupervised machine learning methods to discriminate and visualize the associations between major COVID-19 variants based on their genome sequences. These methods comprise a combination of selected dimensionality reduction and clustering techniques. The framework processes the RNA sequences by performing a k-mer analysis on the data and further visualises and compares the results using selected dimensionality reduction methods that include principal component analysis (PCA), t-distributed stochastic neighbour embedding (t-SNE), and uniform manifold approximation projection (UMAP). Our framework also employs agglomerative hierarchical clustering to visualize the mutational differences among major variants of concern and country-wise mutational differences for selected variants (Delta and Omicron) using dendrograms. We also provide country-wise mutational differences for selected variants via dendrograms. We find that the proposed framework can effectively distinguish between the major variants and has the potential to identify emerging variants in the future.

artificial intelligence, machine learning, variant, (18 more...)

doi: 10.1371/journal.pone.0285719

2208.01439

Country:

North America > United States (0.14)
Africa > South Africa (0.04)
Europe > Spain (0.04)
(24 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Huijben, Iris A. M., Nijdam, Arthur A., Overeem, Sebastiaan, van Gilst, Merel M., van Sloun, Ruud J. G.

SOM-CPC: Unsupervised Contrastive Learning with Self-Organizing Maps for Structured Representations of High-Rate Time Series

Continuous monitoring with an ever-increasing number of sensors has become ubiquitous across many application domains. However, acquired time series are typically high-dimensional and difficult to interpret. Expressive deep learning (DL) models have gained popularity for dimensionality reduction, but the resulting latent space often remains difficult to interpret. In this work we propose SOM-CPC, a model that visualizes data in an organized 2D manifold, while preserving higher-dimensional information. We address a largely unexplored and challenging set of scenarios comprising high-rate time series, and show on both synthetic and real-life data (physiological data and audio recordings) that SOM-CPC outperforms strong baselines like DL-based feature extraction, followed by conventional dimensionality reduction techniques, and models that jointly optimize a DL model and a Self-Organizing Map (SOM). SOM-CPC has great potential to acquire a better understanding of latent patterns in high-rate data streams.

artificial intelligence, machine learning, som-cpc, (17 more...)

2205.15875

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Arizona (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Sleep (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Everyone's Preference Changes Differently: Weighted Multi-Interest Retrieval Model

Shi, Hui, Gu, Yupeng, Zhou, Yitong, Zhao, Bo, Gao, Sicun, Zhao, Jishen

User embeddings (vectorized representations of a user) are essential in recommendation systems. Numerous approaches have been proposed to construct a representation for the user in order to find similar items for retrieval tasks, and they have been proven effective in industrial recommendation systems as well. Recently people have discovered the power of using multiple embeddings to represent a user, with the hope that each embedding represents the user's interest in a certain topic. With multi-interest representation, it's important to model the user's preference over the different topics and how the preference change with time. However, existing approaches either fail to estimate the user's affinity to each interest or unreasonably assume every interest of every user fades with an equal rate with time, thus hurting the recall of candidate retrieval. In this paper, we propose the Multi-Interest Preference (MIP) model, an approach that not only produces multi-interest for users by using the user's sequential engagement more effectively but also automatically learns a set of weights to represent the preference over each embedding so that the candidates can be retrieved from each interest proportionally. Extensive experiments have been done on various industrial-scale datasets to demonstrate the effectiveness of our approach.

artificial intelligence, machine learning, representation, (17 more...)

2207.06652

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.47)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.69)