AITopics | spectral 0

Collaborating Authors

spectral 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Deterministic Weighted Automata with Queries and Counterexamples

Gail Weiss, Yoav Goldberg, Eran Yahav

Neural Information Processing SystemsFeb-14-2026, 09:07:04 GMT

Next Token Distribution:Pn(w): $ ![0,1], defined:Pn(w)( )= Pl(w ).

artificial intelligence, machine learning, spectral 0, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > District of Columbia > Washington (0.04)
(6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Deterministic Weighted Automata with Queries and Counterexamples

Gail Weiss, Yoav Goldberg, Eran Yahav

Neural Information Processing SystemsAug-22-2025, 02:18:20 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, pdfa, probability, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > District of Columbia > Washington (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Guaranteed Recovery of Unambiguous Clusters

Mazooji, Kayvon, Shomorony, Ilan

arXiv.org Artificial IntelligenceJan-23-2025

Clustering is often a challenging problem because of the inherent ambiguity in what the "correct" clustering should be. Even when the number of clusters $K$ is known, this ambiguity often still exists, particularly when there is variation in density among different clusters, and clusters have multiple relatively separated regions of high density. In this paper we propose an information-theoretic characterization of when a $K$-clustering is ambiguous, and design an algorithm that recovers the clustering whenever it is unambiguous. This characterization formalizes the situation when two high density regions within a cluster are separable enough that they look more like two distinct clusters than two truly distinct clusters in the clustering. The algorithm first identifies $K$ partial clusters (or "seeds") using a density-based approach, and then adds unclustered points to the initial $K$ partial clusters in a greedy manner to form a complete clustering. We implement and test a version of the algorithm that is modified to effectively handle overlapping clusters, and observe that it requires little parameter selection and displays improved performance on many datasets compared to widely used algorithms for non-convex cluster recovery.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.13093

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
North America > United States > Wisconsin (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.91)

Add feedback

Text clustering with LLM embeddings

Petukhova, Alina, Matos-Carvalho, João P., Fachada, Nuno

arXiv.org Artificial IntelligenceMay-30-2024

Text clustering is an important approach for organising the growing amount of digital content, helping to structure and find hidden patterns in uncategorised data. However, the effectiveness of text clustering heavily relies on the choice of textual embeddings and clustering algorithms. We argue that recent advances in large language models (LLMs) can potentially improve this task. In this research, we investigated how different textual embeddings -- particularly those used in LLMs -- and clustering algorithms affect how text datasets are clustered. A series of experiments were conducted to assess how embeddings influence clustering results, the role played by dimensionality reduction through summarisation, and model size adjustment. Findings reveal that LLM embeddings excel at capturing subtleties in structured language, while BERT leads the lightweight options in performance. In addition, we observe that increasing model dimensionality and employing summarization techniques do not consistently lead to improvements in clustering efficiency, suggesting that these strategies require careful analysis to use in real-life models. These results highlight a complex balance between the need for refined text representation and computational feasibility in text clustering applications. This study extends traditional text clustering frameworks by incorporating embeddings from LLMs, providing a path for improved methodologies, while informing new avenues for future research in various types of textual analysis.

algorithm, dataset, representation, (16 more...)

arXiv.org Artificial Intelligence

2403.15112

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Template-Based Graph Clustering

Riva, Mateus, Yger, Florian, Gori, Pietro, Cesar, Roberto M. Jr., Bloch, Isabelle

arXiv.org Machine LearningJul-5-2021

We propose a novel graph clustering method guided by additional information on the underlying structure of the clusters (or communities). The problem is formulated as the matching of a graph to a template with smaller dimension, hence matching $n$ vertices of the observed graph (to be clustered) to the $k$ vertices of a template graph, using its edges as support information, and relaxed on the set of orthonormal matrices in order to find a $k$ dimensional embedding. With relevant priors that encode the density of the clusters and their relationships, our method outperforms classical methods, especially for challenging cases.

experiment, graph, vertex, (15 more...)

arXiv.org Machine Learning

2107.01994

Country:

South America > Brazil > São Paulo (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Learning Deterministic Weighted Automata with Queries and Counterexamples

Weiss, Gail, Goldberg, Yoav, Yahav, Eran

arXiv.org Machine LearningOct-30-2019

We present an algorithm for extraction of a probabilistic deterministic finite automaton (PDFA) from a given black-box language model, such as a recurrent neural network (RNN). The algorithm is a variant of the exact-learning algorithm L*, adapted to a probabilistic setting with noise. The key insight is the use of conditional probabilities for observations, and the introduction of a local tolerance when comparing them. When applied to RNNs, our algorithm often achieves better word error rate (WER) and normalised distributed cumulative gain (NDCG) than that achieved by spectral extraction of weighted finite automata (WFA) from the same networks. PDFAs are substantially more expressive than n-grams, and are guaranteed to be stochastic and deterministic - unlike spectrally extracted WFAs.

algorithm, pdfa, rnn, (16 more...)

arXiv.org Machine Learning

1910.13895

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Characterization and Development of Average Silhouette Width Clustering

Batool, Fatima, Hennig, Christian

arXiv.org Machine LearningOct-24-2019

The purpose of this paper is to introduced a new clustering methodology. This paper is divided into three parts. In the first part we have developed the axiomatic theory for the average silhouette width (ASW) index. There are different ways to investigate the quality and characteristics of clustering methods such as validation indices using simulations and real data experiments, model-based theory, and non-model-based theory known as the axiomatic theory. In this work we have not only taken the empirical approach of validation of clustering results through simulations, but also focus on the development of the axiomatic theory. In the second part we have presented a novel clustering methodology based on the optimization of the ASW index. We have considered the problem of estimation of number of clusters and finding clustering against this number simultaneously. Two algorithms are proposed. The proposed algorithms are evaluated against several partitioning and hierarchical clustering methods. An intensive empirical comparison of the different distance metrics on the various clustering methods is conducted. In the third part we have considered two application domains\textemdash novel single cell RNA sequencing datasets and rainfall data to cluster weather stations.

algorithm, complexity, dimension, (14 more...)

arXiv.org Machine Learning

1910.11339

Country:

Europe > United Kingdom (0.28)
Europe > Austria > Vienna (0.14)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
(7 more...)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback