AITopics

A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classification problem. It has also been extended for a flexible number of speakers by introducing speaker-wise attractors. However, the output number of speakers of attractor-based EEND is empirically capped; it cannot deal with cases where the number of speakers appearing during inference is higher than that during training because its speaker counting is trained in a fully supervised manner. Our method, EEND-GLA, solves this problem by introducing unsupervised clustering into attractor-based EEND. In the method, the input audio is first divided into short blocks, then attractor-based diarization is performed for each block, and finally, the results of each block are clustered on the basis of the similarity between locally-calculated attractors. While the number of output speakers is limited within each block, the total number of speakers estimated for the entire input can be higher than the limitation. To use EEND-GLA in an online manner, our method also extends the speaker-tracing buffer, which was originally proposed to enable online inference of conventional EEND. We introduce a block-wise buffer update to make the speaker-tracing buffer compatible with EEND-GLA. Finally, to improve online diarization, our method improves the buffer update method and revisits the variable chunk-size training of EEND. The experimental results demonstrate that EEND-GLA can perform speaker diarization of an unseen number of speakers in both offline and online inferences.

artificial intelligence, diarization, machine learning, (18 more...)

doi: 10.1109/TASLP.2022.3233237

2206.02432

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Barbot, Armand, Gatti, Riccardo

Unsupervised learning for structure detection in plastically deformed crystals

Detecting structures at the particle scale within plastically deformed crystalline materials allows a better understanding of the occurring phenomena. While previous approaches mostly relied on applying hand-chosen criteria on different local parameters, these approaches could only detect already known structures.We introduce an unsupervised learning algorithm to automatically detect structures within a crystal under plastic deformation. This approach is based on a study developed for structural detection on colloidal materials. This algorithm has the advantage of being computationally fast and easy to implement. We show that by using local parameters based on bond-angle distributions, we are able to detect more structures and with a higher degree of precision than traditional hand-made criteria.

artificial intelligence, bad parameter, machine learning, (17 more...)

2212.14813

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(2 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.72)

Bouchareb, Aichetou, Boullé, Marc, Clérot, Fabrice, Rossi, Fabrice

Co-clustering based exploratory analysis of mixed-type data tables

Co-clustering is a class of unsupervised data analysis techniques that extract the existing underlying dependency structure between the instances and variables of a data table as homogeneous blocks. Most of those techniques are limited to variables of the same type. In this paper, we propose a mixed data co-clustering method based on a two-step methodology. In the first step, all the variables are binarized according to a number of bins chosen by the analyst, by equal frequency discretization in the numerical case, or keeping the most frequent values in the categorical case. The second step applies a co-clustering to the instances and the binary variables, leading to groups of instances and groups of variable parts. We apply this methodology on several data sets and compare with the results of a Multiple Correspondence Analysis applied to the same data.

artificial intelligence, machine learning, variable part, (18 more...)

doi: 10.1007/978-3-030-18129-1_2

2212.11728

Country:

Europe > France > Île-de-France > Paris > Paris (0.14)
Europe > Germany (0.04)
Asia > Philippines (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Bouchareb, Aichetou, Boullé, Marc, Clérot, Fabrice, Rossi, Fabrice

Model Based Co-clustering of Mixed Numerical and Binary Data

The goal of co-clustering is to jointly perform a clustering of rows and a clustering of columns of a data table. Proposed by [Good, 1965] then by [Hartigan, 1975], co-clustering is an extension of the standard clustering that extracts the underlying structure in the data in the form of clusters of row and clusters of columns. The advantage of this technique, over the standard clustering, lies in the joint (simultaneous) analysis of the rows and columns which enables extracting the maximum of information about the interdependence between the two entities. The utility of co-clustering lies in its capacity to create easily interpretable clusters and its capability to reduce a large data table into a significantly smaller matrix having the same structure as the orig-Aichetou Bouchareb, Marc Boullé and Fabrice Clérot: Orange Labs, 2 Avenue Pierre Marzin 22300 Lannion - France, e-mail: firstname.

artificial intelligence, data mining, machine learning, (15 more...)

doi: 10.1007/978-3-030-18129-1_1

2212.11725

Country:

Europe > France > Île-de-France > Paris > Paris (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Das, Soumita, Biswas, Anupam, Saxena, Akrati

DCC: A Cascade based Approach to Detect Communities in Social Networks

arXiv.org Artificial IntelligenceDec-21-2022

Community detection in Social Networks is associated with finding and grouping the most similar nodes inherent in the network. These similar nodes are identified by computing tie strength. Stronger ties indicates higher proximity shared by connected node pairs. This work is motivated by Granovetter's argument that suggests that strong ties lies within densely connected nodes and the theory that community cores in real-world networks are densely connected. In this paper, we have introduced a novel method called \emph{Disjoint Community detection using Cascades (DCC)} which demonstrates the effectiveness of a new local density based tie strength measure on detecting communities. Here, tie strength is utilized to decide the paths followed for propagating information. The idea is to crawl through the tuple information of cascades towards the community core guided by increasing tie strength. Considering the cascade generation step, a novel preferential membership method has been developed to assign community labels to unassigned nodes. The efficacy of $DCC$ has been analyzed based on quality and accuracy on several real-world datasets and baseline community detection algorithms.

artificial intelligence, data mining, machine learning, (15 more...)

2212.10937

Country:

North America > United States > Nebraska > Douglas County > Omaha (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry: Information Technology > Services (0.63)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Artificial IntelligenceDec-21-2022

Understanding Postpartum Parents' Experiences via Two Digital Platforms

Yao, Xuewen, Mikhelson, Miriam, Micheletti, Megan, Choi, Eunsol, Watkins, S Craig, Thomaz, Edison, De Barbaro, Kaya

Digital platforms, including online forums and helplines, have emerged as avenues of support for caregivers suffering from postpartum mental health distress. Understanding support seekers' experiences as shared on these platforms could provide crucial insight into caregivers' needs during this vulnerable time. In the current work, we provide a descriptive analysis of the concerns, psychological states, and motivations shared by healthy and distressed postpartum support seekers on two digital platforms, a one-on-one digital helpline and a publicly available online forum. Using a combination of human annotations, dictionary models and unsupervised techniques, we find stark differences between the experiences of distressed and healthy mothers. Distressed mothers described interpersonal problems and a lack of support, with 8.60% - 14.56% reporting severe symptoms including suicidal ideation. In contrast, the majority of healthy mothers described childcare issues, such as questions about breastfeeding or sleeping, and reported no severe mental health concerns. Across the two digital platforms, we found that distressed mothers shared similar content. However, the patterns of speech and affect shared by distressed mothers differed between the helpline vs. the online forum, suggesting the design of these platforms may shape meaningful measures of their support-seeking experiences. Our results provide new insight into the experiences of caregivers suffering from postpartum mental health distress. We conclude by discussing methodological considerations for understanding content shared by support seekers and design considerations for the next generation of support tools for postpartum parents.

artificial intelligence, machine learning, natural language, (20 more...)

2212.11455

Country:

North America > United States > Texas > Travis County > Austin (0.15)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Kaloga, Yacouba, Borgnat, Pierre, Habrard, Amaury

A Simple Way to Learn Metrics Between Attributed Graphs

arXiv.org Artificial IntelligenceDec-21-2022

The choice of good distances and similarity measures between objects is important for many machine learning methods. Therefore, many metric learning algorithms have been developed in recent years, mainly for Euclidean data, in order to improve performance of classification or clustering methods. However, due to difficulties in establishing computable, efficient and differentiable distances between attributed graphs, few metric learning algorithms adapted to graphs have been developed despite the strong interest of the community. In this paper, we address this issue by proposing a new Simple Graph Metric Learning - SGML - model with few trainable parameters based on Simple Graph Convolutional Neural Networks - SGCN - and elements of Optimal Transport theory. This model allows us to build an appropriate distance from a database of labeled (attributed) graphs to improve the performance of simple classification algorithms such as k-NN. This distance can be quickly trained while maintaining good performance as illustrated by the experimental studies presented in this paper.

artificial intelligence, machine learning, rpw 2, (15 more...)

2209.12727

Country:

Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

arXiv.org Artificial IntelligenceDec-20-2022

AnchorGAE: General Data Clustering via $O(n)$ Bipartite Graph Convolution

Zhang, Hongyuan, Shi, Jiankun, Zhang, Rui, Li, Xuelong

Since the representative capacity of graph-based clustering methods is usually limited by the graph constructed on the original features, it is attractive to find whether graph neural networks (GNNs) can be applied to augment the capacity. The core problems mainly come from two aspects: (1) the graph is unavailable in the most clustering scenes so that how to construct high-quality graphs on the non-graph data is usually the most important part; (2) given n samples, the graph-based clustering methods usually consume at least $\mathcal O(n^2)$ time to build graphs and the graph convolution requires nearly $\mathcal O(n^2)$ for a dense graph and $\mathcal O(|\mathcal{E}|)$ for a sparse one with $|\mathcal{E}|$ edges. Accordingly, both graph-based clustering and GNNs suffer from the severe inefficiency problem. To tackle these problems, we propose a novel clustering method, AnchorGAE, with the self-supervised estimation of graph and efficient graph convolution. We first show how to convert a non-graph dataset into a graph dataset, by introducing the generative graph model and anchors. We then show that the constructed bipartite graph can reduce the computational complexity of graph convolution from $\mathcal O(n^2)$ and $\mathcal O(|\mathcal{E}|)$ to $\mathcal O(n)$. The succeeding steps for clustering can be easily designed as $\mathcal O(n)$ operations. Interestingly, the anchors naturally lead to siamese architecture with the help of the Markov process. Furthermore, the estimated bipartite graph is updated dynamically according to the features extracted by GNN, to promote the quality of the graph. However, we theoretically prove that the self-supervised paradigm frequently results in a collapse that often occurs after 2-3 update iterations in experiments, especially when the model is well-trained. A specific strategy is accordingly designed to prevent the collapse.

artificial intelligence, graph, machine learning, (16 more...)

doi: 10.1109/TPAMI.2022.3231470

2111.06586

Country:

North America > United States (0.14)
Asia > China > Shaanxi Province > Xi'an (0.05)
Asia > Middle East > Jordan (0.04)
Asia > China > Henan Province > Zhengzhou (0.04)

Genre: Research Report (0.64)

Industry:

Education (0.48)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceDec-20-2022

Variable Clustering via Distributionally Robust Nodewise Regression

Wang, Kaizheng, Xu, Xiao, Zhou, Xun Yu

We study a multi-factor block model for variable clustering and connect it to the regularized subspace clustering by formulating a distributionally robust version of the nodewise regression. To solve the latter problem, we derive a convex relaxation, provide guidance on selecting the size of the robust region, and hence the regularization weighting parameter, based on the data, and propose an ADMM algorithm for implementation. We validate our method in an extensive simulation study. Finally, we propose and apply a variant of our method to stock return data, obtain interpretable clusters that facilitate portfolio selection and compare its out-of-sample performance with other clustering methods in an empirical study.

artificial intelligence, machine learning, subspace, (16 more...)

2212.07944

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Spain > Aragón (0.04)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science (0.93)

arXiv.org Artificial IntelligenceDec-20-2022

FriendlyCore: Practical Differentially Private Aggregation

Tsfadia, Eliad, Cohen, Edith, Kaplan, Haim, Mansour, Yishay, Stemmer, Uri

Metric aggregation tasks are at the heart of data analysis. Common tasks include averaging, k-clustering, and learning a mixture of distributions. When the data points are sensitive information, corresponding for example to records or activities of particular users, we would like the aggregation to be private. The most widely accepted solution to individual privacy is differential privacy (DP) [DMNS06] that limits the effect that each data point can have on the outcome of the computation. Differentially private algorithms, however, tend to be less accurate and practical than their non-private counterparts. This degradation in accuracy can be attributed, to a large extent, to the fact that the requirement of differential privacy is a worst-case kind of a requirement. To illustrate this point, consider the task of privately learning mixture of Gaussians.

algorithm, artificial intelligence, machine learning, (16 more...)

2110.10132

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)

Genre:

Workflow (0.46)
Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)