AITopics

2210.16424

Country:

North America > United States > Virginia (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > California (0.04)
Europe > Poland (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Multi-Dialectal Representation Learning of Sinitic Phonology

Jia, Zhibai

Machine learning techniques have shown their competence for representing and reasoning in symbolic systems such as language and phonology. In Sinitic Historical Phonology, notable tasks that could benefit from machine learning include the comparison of dialects and reconstruction of proto-languages systems. Motivated by this, this paper provides an approach for obtaining multi-dialectal representations of Sinitic syllables, by constructing a knowledge graph from structured phonological data, then applying the BoxE technique from knowledge base learning. We applied unsupervised clustering techniques to the obtained representations to observe that the representations capture phonemic contrast from the input dialects. Furthermore, we trained classifiers to perform inference of unobserved Middle Chinese labels, showing the representations' potential for indicating archaic, proto-language features. The representations can be used for performing completion of fragmented Sinitic phonological knowledge bases, estimating divergences between different characters, or aiding the exploration and reconstruction of archaic features.

artificial intelligence, machine learning, representation, (17 more...)

2307.01209

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China (0.04)
North America > United States > New York (0.04)
Europe > Czechia (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Yang, Yongyi, Steinhardt, Jacob, Hu, Wei

Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations

Recent work has observed an intriguing ''Neural Collapse'' phenomenon in well-trained neural networks, where the last-layer representations of training samples with the same label collapse into each other. This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution. We provide evidence that this is not a complete description, and that the apparent collapse hides important fine-grained structure in the representations. Specifically, even when representations apparently collapse, the small amount of remaining variation can still faithfully and accurately captures the intrinsic structure of input distribution. As an example, if we train on CIFAR-10 using only 5 coarse-grained labels (by combining two classes into one super-class) until convergence, we can reconstruct the original 10-class labels from the learned representations via unsupervised clustering. The reconstructed labels achieve $93\%$ accuracy on the CIFAR-10 test set, nearly matching the normal CIFAR-10 accuracy for the same architecture. We also provide an initial theoretical result showing the fine-grained representation structure in a simplified synthetic setting. Our results show concretely how the structure of input data can play a significant role in determining the fine-grained structure of neural representations, going beyond what Neural Collapse predicts.

artificial intelligence, machine learning, representation, (17 more...)

2306.17105

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Muškardin, Edi, Tappler, Martin, Pill, Ingo, Aichernig, Bernhard K., Pock, Thomas

On the Relationship Between RNN Hidden State Vectors and Semantic Ground Truth

We examine the assumption that the hidden-state vectors of recurrent neural networks (RNNs) tend to form clusters of semantically similar vectors, which we dub the clustering hypothesis. While this hypothesis has been assumed in the analysis of RNNs in recent years, its validity has not been studied thoroughly on modern neural network architectures. We examine the clustering hypothesis in the context of RNNs that were trained to recognize regular languages. This enables us to draw on perfect ground-truth automata in our evaluation, against which we can compare the RNN's accuracy and the distribution of the hidden-state vectors. We start with examining the (piecewise linear) separability of an RNN's hidden-state vectors into semantically different classes. We continue the analysis by computing clusters over the hidden-state vector space with multiple state-of-the-art unsupervised clustering approaches. We formally analyze the accuracy of computed clustering functions and the validity of the clustering hypothesis by determining whether clusters group semantically similar vectors to the same state in the ground-truth model. Our evaluation supports the validity of the clustering hypothesis in the majority of examined cases. We observed that the hidden-state vectors of well-trained RNNs are separable, and that the unsupervised clustering techniques succeed in finding clusters of similar state vectors.

artificial intelligence, machine learning, rnn, (17 more...)

2306.16854

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Austria > Styria > Graz (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(10 more...)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

A Restarted Large-Scale Spectral Clustering with Self-Guiding and Block Diagonal Representation

Guo, Yongyan, Wu, Gang

Spectral clustering is one of the most popular unsupervised machine learning methods. Constructing similarity matrix is crucial to this type of method. In most existing works, the similarity matrix is computed once for all or is updated alternatively. However, the former is difficult to reflect comprehensive relationships among data points, and the latter is time-consuming and is even infeasible for large-scale problems. In this work, we propose a restarted clustering framework with self-guiding and block diagonal representation. An advantage of the strategy is that some useful clustering information obtained from previous cycles could be preserved as much as possible. To the best of our knowledge, this is the first work that applies restarting strategy to spectral clustering. The key difference is that we reclassify the samples in each cycle of our method, while they are classified only once in existing methods. To further release the overhead, we introduce a block diagonal representation with Nystr\"{o}m approximation for constructing the similarity matrix. Theoretical results are established to show the rationality of inexact computations in spectral clustering. Comprehensive experiments are performed on some benchmark databases, which show the superiority of our proposed algorithms over many state-of-the-art algorithms for large-scale problems. Specifically, our framework has a potential boost for clustering algorithms and works well even using an initial guess chosen randomly.

algorithm, artificial intelligence, machine learning, (15 more...)

2306.15138

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Jiangsu Province > Xuzhou (0.04)

Genre: Research Report (0.63)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceJun-28-2023

Event Detection from Social Media Stream: Methods, Datasets and Opportunities

Li, Quanzhi, Chao, Yang, Li, Dong, Lu, Yao, Zhang, Chi

Social media streams contain large and diverse amount of information, ranging from daily-life stories to the latest global and local events and news. Twitter, especially, allows a fast spread of events happening real time, and enables individuals and organizations to stay informed of the events happening now. Event detection from social media data poses different challenges from traditional text and is a research area that has attracted much attention in recent years. In this paper, we survey a wide range of event detection methods for Twitter data stream, helping readers understand the recent development in this area. We present the datasets available to the public. Furthermore, a few research opportunities

detection, event detection, twitter, (13 more...)

2306.16495

Country:

North America > United States (0.28)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Services (1.00)
Media > News (0.68)
Leisure & Entertainment > Sports > Soccer (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Avetisyan, Lilit, Ayoub, Jackie, Yang, X. Jessie, Zhou, Feng

Building Trust Profiles in Conditionally Automated Driving

arXiv.org Artificial IntelligenceJun-28-2023

Trust is crucial for ensuring the safety, security, and widespread adoption of automated vehicles (AVs), and if trust is lacking, drivers and the public may not be willing to use them. This research seeks to investigate trust profiles in order to create personalized experiences for drivers in AVs. This technique helps in better understanding drivers' dynamic trust from a persona's perspective. The study was conducted in a driving simulator where participants were requested to take over control from automated driving in three conditions that included a control condition, a false alarm condition, and a miss condition with eight takeover requests (TORs) in different scenarios. Drivers' dispositional trust, initial learned trust, dynamic trust, personality, and emotions were measured. We identified three trust profiles (i.e., believers, oscillators, and disbelievers) using a K-means clustering model. In order to validate this model, we built a multinomial logistic regression model based on SHAP explainer that selected the most important features to predict the trust profiles with an F1-score of 0.90 and accuracy of 0.89. We also discussed how different individual factors influenced trust profiles which helped us understand trust dynamics better from a persona's perspective. Our findings have important implications for designing a personalized in-vehicle trust monitoring and calibrating system to adjust drivers' trust levels in order to improve safety and experience in automated driving.

artificial intelligence, machine learning, trust profile, (17 more...)

2306.16567

Country:

North America > United States > Michigan > Wayne County > Dearborn (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(11 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation > Ground > Road (1.00)
Health & Medicine (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.91)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

arXiv.org Artificial IntelligenceJun-28-2023

cuSLINK: Single-linkage Agglomerative Clustering on the GPU

Nolet, Corey J., Gala, Divye, Fender, Alex, Doijade, Mahesh, Eaton, Joe, Raff, Edward, Zedlewski, John, Rees, Brad, Oates, Tim

In this paper, we propose cuSLINK, a novel and state-of-the-art reformulation of the SLINK algorithm on the GPU which requires only $O(Nk)$ space and uses a parameter $k$ to trade off space and time. We also propose a set of novel and reusable building blocks that compose cuSLINK. These building blocks include highly optimized computational patterns for $k$-NN graph construction, spanning trees, and dendrogram cluster extraction. We show how we used our primitives to implement cuSLINK end-to-end on the GPU, further enabling a wide range of real-world data mining and machine learning applications that were once intractable. In addition to being a primary computational bottleneck in the popular HDBSCAN algorithm, the impact of our end-to-end cuSLINK algorithm spans a large range of important applications, including cluster analysis in social and computer networks, natural language processing, and computer vision. Users can obtain cuSLINK at https://docs.rapids.ai/api/cuml/latest/api/#agglomerative-clustering

data mining, machine learning, natural language, (18 more...)

2306.16354

Country:

North America > United States > Maryland > Baltimore County (0.14)
North America > United States > Maryland > Baltimore (0.14)
North America > United States > Virginia > Fairfax County > McLean (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Issa, Zacharia, Horvath, Blanka

Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures

arXiv.org Machine LearningJun-27-2023

In this work we present a non-parametric online market regime detection method for multidimensional data structures using a path-wise two-sample test derived from a maximum mean discrepancy-based similarity metric on path space that uses rough path signatures as a feature map. The latter similarity metric has been developed and applied as a discriminator in recent generative models for small data environments, and has been optimised here to the setting where the size of new incoming data is particularly small, for faster reactivity. On the same principles, we also present a path-wise method for regime clustering which extends our previous work [HIM21]. The presented regime clustering techniques, as in [HIM21], were designed as ex-ante market analysis tools that can identify periods of approximatively similar market activity, but the new results also apply to path-wise, high dimensional-, and to non-Markovian settings as well as to data structures that exhibit autocorrelation. We demonstrate our clustering tools on easily verifiable synthetic datasets of increasing complexity, and also show how the outlined regime detection techniques can be used as fast on-line automatic regime change detectors or as outlier detection tools, including a fully automated pipeline. Finally, we apply the fine-tuned algorithms to real-world historical data including high-dimensional baskets of equities and the recent price evolution of crypto assets, and we show that our methodology swiftly and accurately indicated historical periods of market turmoil.

data mining, machine learning, regime, (19 more...)

arXiv.org Machine Learning

2306.15835

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.66)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Campana, Mattia Giovanni, Delmastro, Franca

COMPASS: Unsupervised and Online Clustering of Complex Human Activities from Smartphone Sensors

arXiv.org Artificial IntelligenceJun-27-2023

Modern mobile devices are able to provide context-aware and personalized services to the users, by leveraging on their sensing capabilities to infer the activity and situation in which a person is currently involved. Current solutions for context-recognition rely on annotated data and expertsâ knowledge to predict the user context. In addition, their prediction ability is strongly limited to the set of situations considered during the model training or definition. However, in a mobile environment, the user context continuously evolves, and it cannot be merely restricted to a set of predefined classes. To overcome these limitations, we propose COMPASS, a novel unsupervised and online clustering algorithm aimed at identifying the user context in mobile environments based on the stream of high-dimensional data generated by smartphone sensors. COMPASScan distinguish an arbitrary number of userâs contexts from the sensorsâ data, without defining a priori the collection of expected situations. This key feature makes it a general-purpose solution to provide context-aware features to mobile devices, supporting a broad set of applications. Experimental results on 18 synthetic and 2 real-world datasets show that COMPASS correctly identifies the user context from the sensorsâ data stream, and outperforms the state-of-the-art solutions in terms of both clusters configuration and purity. Eventually, we evaluate its performances in terms of execution time and the results show that COMPASS can process 1000 high-dimensional samples in less than 20 seconds, while the reference solutions require about 60 minutes to evaluate the entire dataset. Keywords: Context-awareness, Unsupervised Machine Learning, Online Clustering, Mobile Computing

artificial intelligence, data mining, machine learning, (19 more...)

doi: 10.1016/j.eswa.2021.115124

2306.15437

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Europe > Germany > North Rhine-Westphalia > Münster Region > Münster (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)