AITopics

2204.05158

Country: Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.73)
Health & Medicine > Therapeutic Area > Immunology (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Kamfonas, Michael, Alon, Gabriel

What Can Secondary Predictions Tell Us? An Exploration on Question-Answering with SQuAD-v2.0

arXiv.org Artificial IntelligenceOct-24-2022

Performance in natural language processing, and specifically for the question-answer task, is typically measured by comparing a model\'s most confident (primary) prediction to golden answers (the ground truth). We are making the case that it is also useful to quantify how close a model came to predicting a correct answer even for examples that failed. We define the Golden Rank (GR) of an example as the rank of its most confident prediction that exactly matches a ground truth, and show why such a match always exists. For the 16 transformer models we analyzed, the majority of exactly matched golden answers in secondary prediction space hover very close to the top rank. We refer to secondary predictions as those ranking above 0 in descending confidence probability order. We demonstrate how the GR can be used to classify questions and visualize their spectrum of difficulty, from persistent near successes to persistent extreme failures. We derive a new aggregate statistic over entire test sets, named the Golden Rank Interpolated Median (GRIM) that quantifies the proximity of failed predictions to the top choice made by the model. To develop some intuition and explore the applicability of these metrics we use the Stanford Question Answering Dataset (SQuAD-2) and a few popular transformer models from the Hugging Face hub. We first demonstrate that the GRIM is not directly correlated with the F1 and exact match (EM) scores. We then calculate and visualize these scores for various transformer architectures, probe their applicability in error analysis by clustering failed predictions, and compare how they relate to other training diagnostics such as the EM and F1 scores. We finally suggest various research goals, such as broadening data collection for these metrics and their possible use in adversarial training.

machine learning, natural language, question answering, (18 more...)

2206.14348

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Nepal > Bagmati Province > Kathmandu District > Kathmandu (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Ta, Hoang Thang, Gelbukha, Alexander, Sidorov, Grigori

Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences

arXiv.org Artificial IntelligenceOct-23-2022

Acknowledged as one of the most successful online cooperative projects in human society, Wikipedia has obtained rapid growth in recent years and desires continuously to expand content and disseminate knowledge values for everyone globally. The shortage of volunteers brings to Wikipedia many issues, including developing content for over 300 languages at the present. Therefore, the benefit that machines can automatically generate content to reduce human efforts on Wikipedia language projects could be considerable. In this paper, we propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level. The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia. We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models. The results are helpful not only for the data-to-text generation task but also for other relevant works in the field.

artificial intelligence, machine learning, natural language, (21 more...)

2210.12659

Country:

North America > Mexico (0.14)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.05)
North America > United States > New York (0.04)
(5 more...)

Genre: Personal > Honors (0.93)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)

arXiv.org Artificial IntelligenceOct-21-2022

Discovering New Intents Using Latent Variables

Zhou, Yunhua, Liu, Peiju, Wang, Yuxin, QIu, Xipeng

Discovering new intents is of great significance to establishing Bootstrapped Task-Oriented Dialogue System. Most existing methods either lack the ability to transfer prior knowledge in the known intent data or fall into the dilemma of forgetting prior knowledge in the follow-up. More importantly, these methods do not deeply explore the intrinsic structure of unlabeled data, so they can not seek out the characteristics that make an intent in general. In this paper, starting from the intuition that discovering intents could be beneficial to the identification of the known intents, we propose a probabilistic framework for discovering intents where intent assignments are treated as latent variables. We adopt Expectation Maximization framework for optimization. Specifically, In E-step, we conduct discovering intents and explore the intrinsic structure of unlabeled data by the posterior of intent assignments. In M-step, we alleviate the forgetting of prior knowledge transferred from known intents by optimizing the discrimination of labeled data. Extensive experiments conducted in three challenging real-world datasets demonstrate our method can achieve substantial improvements.

artificial intelligence, knowledge, machine learning, (18 more...)

2210.11804

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Mode Reduction for Markov Jump Systems

Du, Zhe, Balzano, Laura, Ozay, Necmiye

Switched systems are capable of modeling processes with underlying dynamics that may change abruptly over time. To achieve accurate modeling in practice, one may need a large number of modes, but this may in turn increase the model complexity drastically. Existing work on reducing system complexity mainly considers state space reduction, whereas reducing the number of modes is less studied. In this work, we consider Markov jump linear systems (MJSs), a special class of switched systems where the active mode switches according to a Markov chain, and several issues associated with its mode complexity. Specifically, inspired by clustering techniques from unsupervised learning, we are able to construct a reduced MJS with fewer modes that approximates the original MJS well under various metrics. Furthermore, both theoretically and empirically, we show how one can use the reduced MJS to analyze stability and design controllers with significant reduction in computational cost while achieving guaranteed accuracy.

artificial intelligence, denote, machine learning, (17 more...)

doi: 10.1109/OJCSYS.2022.3212613

2205.02697

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
South America > Brazil (0.04)

Genre: Research Report (0.81)

Industry:

Energy (0.45)
Banking & Finance (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Twin Contrastive Learning for Online Clustering

Li, Yunfan, Yang, Mouxing, Peng, Dezhong, Li, Taihao, Huang, Jiantao, Peng, Xi

This paper proposes to perform online clustering by conducting twin contrastive learning (TCL) at the instance and cluster level. Specifically, we find that when the data is projected into a feature space with a dimensionality of the target cluster number, the rows and columns of its feature matrix correspond to the instance and cluster representation, respectively. Based on the observation, for a given dataset, the proposed TCL first constructs positive and negative pairs through data augmentations. Thereafter, in the row and column space of the feature matrix, instance- and cluster-level contrastive learning are respectively conducted by pulling together positive pairs while pushing apart the negatives. To alleviate the influence of intrinsic false-negative pairs and rectify cluster assignments, we adopt a confidence-based criterion to select pseudo-labels for boosting both the instance- and cluster-level contrastive learning. As a result, the clustering performance is further improved. Besides the elegant idea of twin contrastive learning, another advantage of TCL is that it could independently predict the cluster assignment for each instance, thus effortlessly fitting online scenarios. Extensive experiments on six widely-used image and text benchmarks demonstrate the effectiveness of TCL. The code will be released on GitHub.

artificial intelligence, contrastive learning, machine learning, (14 more...)

doi: 10.1007/s11263-022-01639-z

2210.1168

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Harshvardhan, null, Ghosh, Avishek, Mazumdar, Arya

An Improved Algorithm for Clustered Federated Learning

In this paper, we address the dichotomy between heterogeneous models and simultaneous training in Federated Learning (FL) via a clustering framework. We define a new clustering model for FL based on the (optimal) local models of the users: two users belong to the same cluster if their local models are close; otherwise they belong to different clusters. A standard algorithm for clustered FL is proposed in \cite{ghosh_efficient_2021}, called \texttt{IFCA}, which requires \emph{suitable} initialization and the knowledge of hyper-parameters like the number of clusters (which is often quite difficult to obtain in practical applications) to converge. We propose an improved algorithm, \emph{Successive Refine Federated Clustering Algorithm} (\texttt{SR-FCA}), which removes such restrictive assumptions. \texttt{SR-FCA} treats each user as a singleton cluster as an initialization, and then successively refine the cluster estimation via exploiting similar users belonging to the same cluster. In any intermediate step, \texttt{SR-FCA} uses a robust federated learning algorithm within each cluster to exploit simultaneous training and to correct clustering errors. Furthermore, \texttt{SR-FCA} does not require any \emph{good} initialization (warm start), both in theory and practice. We show that with proper choice of learning rate, \texttt{SR-FCA} incurs arbitrarily small clustering error. Additionally, we validate the performance of our algorithm on standard FL datasets in non-convex problems like neural nets, and we show the benefits of \texttt{SR-FCA} over baselines.

artificial intelligence, inequality, machine learning, (16 more...)

2210.11538

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Liu, Xiaoyuan, Tyagin, Ilya, Ushijima-Mwesigwa, Hayato, Ghosh, Indradeep, Safro, Ilya

Towards Practical Explainability with Cluster Descriptors

With the rapid development of machine learning, improving its explainability has become a crucial research goal. We study the problem of making the clusters more explainable by investigating the cluster descriptors. Given a set of objects $S$, a clustering of these objects $\pi$, and a set of tags $T$ that have not participated in the clustering algorithm. Each object in $S$ is associated with a subset of $T$. The goal is to find a representative set of tags for each cluster, referred to as the cluster descriptors, with the constraint that these descriptors we find are pairwise disjoint, and the total size of all the descriptors is minimized. In general, this problem is NP-hard. We propose a novel explainability model that reinforces the previous models in such a way that tags that do not contribute to explainability and do not sufficiently distinguish between clusters are not added to the optimal descriptors. The proposed model is formulated as a quadratic unconstrained binary optimization problem which makes it suitable for solving on modern optimization hardware accelerators. We experimentally demonstrate how a proposed explainability model can be solved on specialized hardware for accelerating combinatorial optimization, the Fujitsu Digital Annealer, and use real-life Twitter and PubMed datasets for use cases.

constraint, data mining, machine learning, (20 more...)

2210.10662

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.04)
North America > United States > Delaware > New Castle County > Newark (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Epidemiology (0.96)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(2 more...)

Klus, Stefan, Conrad, Natasa Djurdjevac

Koopman-based spectral clustering of directed and time-evolving graphs

arXiv.org Machine LearningOct-19-2022

While spectral clustering algorithms for undirected graphs are well established and have been successfully applied to unsupervised machine learning problems ranging from image segmentation and genome sequencing to signal processing and social network analysis, clustering directed graphs remains notoriously difficult. Two of the main challenges are that the eigenvalues and eigenvectors of graph Laplacians associated with directed graphs are in general complex-valued and that there is no universally accepted definition of clusters in directed graphs. We first exploit relationships between the graph Laplacian and transfer operators and in particular between clusters in undirected graphs and metastable sets in stochastic dynamical systems and then use a generalization of the notion of metastability to derive clustering algorithms for directed and time-evolving graphs. The resulting clusters can be interpreted as coherent sets, which play an important role in the analysis of transport and mixing processes in fluid flows.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1007/s00332-022-09863-0

2204.02951

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.82)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

arXiv.org Artificial IntelligenceOct-19-2022

Binary Orthogonal Non-negative Matrix Factorization

Hafshejani, S. Fathi, Gaur, D., Hossain, S., Benkoczi, R.

We propose a method for computing binary orthogonal non-negative matrix factorization (BONMF) for clustering and classification. The method is tested on several representative real-world data sets. The numerical results confirm that the method has improved accuracy compared to the related techniques. The proposed method is fast for training and classification and space efficient.

artificial intelligence, factorization, machine learning, (15 more...)

2210.1066

Country: North America > Canada > Alberta > Census Division No. 2 > Lethbridge County > Lethbridge (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)