Clustering
Gaining Insights into Unrecognized User Utterances in Task-Oriented Dialog Systems
Rabinovich, Ella, Vetzler, Matan, Boaz, David, Kumar, Vineet, Pandey, Gaurav, Anaby-Tavor, Ateret
The rapidly growing market demand for automatic dialogue agents capable of goal-oriented behavior has caused many tech-industry leaders to invest considerable efforts into task-oriented dialog systems. The success of these systems is highly dependent on the accuracy of their intent identification -- the process of deducing the goal or meaning of the user's request and mapping it to one of the known intents for further processing. Gaining insights into unrecognized utterances -- user requests the systems fail to attribute to a known intent -- is therefore a key process in continuous improvement of goal-oriented dialog systems. We present an end-to-end pipeline for processing unrecognized user utterances, deployed in a real-world, commercial task-oriented dialog system, including a specifically-tailored clustering algorithm, a novel approach to cluster representative extraction, and cluster naming. We evaluated the proposed components, demonstrating their benefits in the analysis of unrecognized user requests.
What Can Secondary Predictions Tell Us? An Exploration on Question-Answering with SQuAD-v2.0
Kamfonas, Michael, Alon, Gabriel
Performance in natural language processing, and specifically for the question-answer task, is typically measured by comparing a model\'s most confident (primary) prediction to golden answers (the ground truth). We are making the case that it is also useful to quantify how close a model came to predicting a correct answer even for examples that failed. We define the Golden Rank (GR) of an example as the rank of its most confident prediction that exactly matches a ground truth, and show why such a match always exists. For the 16 transformer models we analyzed, the majority of exactly matched golden answers in secondary prediction space hover very close to the top rank. We refer to secondary predictions as those ranking above 0 in descending confidence probability order. We demonstrate how the GR can be used to classify questions and visualize their spectrum of difficulty, from persistent near successes to persistent extreme failures. We derive a new aggregate statistic over entire test sets, named the Golden Rank Interpolated Median (GRIM) that quantifies the proximity of failed predictions to the top choice made by the model. To develop some intuition and explore the applicability of these metrics we use the Stanford Question Answering Dataset (SQuAD-2) and a few popular transformer models from the Hugging Face hub. We first demonstrate that the GRIM is not directly correlated with the F1 and exact match (EM) scores. We then calculate and visualize these scores for various transformer architectures, probe their applicability in error analysis by clustering failed predictions, and compare how they relate to other training diagnostics such as the EM and F1 scores. We finally suggest various research goals, such as broadening data collection for these metrics and their possible use in adversarial training.
Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences
Ta, Hoang Thang, Gelbukha, Alexander, Sidorov, Grigori
Acknowledged as one of the most successful online cooperative projects in human society, Wikipedia has obtained rapid growth in recent years and desires continuously to expand content and disseminate knowledge values for everyone globally. The shortage of volunteers brings to Wikipedia many issues, including developing content for over 300 languages at the present. Therefore, the benefit that machines can automatically generate content to reduce human efforts on Wikipedia language projects could be considerable. In this paper, we propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level. The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia. We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models. The results are helpful not only for the data-to-text generation task but also for other relevant works in the field.
Discovering New Intents Using Latent Variables
Zhou, Yunhua, Liu, Peiju, Wang, Yuxin, QIu, Xipeng
Discovering new intents is of great significance to establishing Bootstrapped Task-Oriented Dialogue System. Most existing methods either lack the ability to transfer prior knowledge in the known intent data or fall into the dilemma of forgetting prior knowledge in the follow-up. More importantly, these methods do not deeply explore the intrinsic structure of unlabeled data, so they can not seek out the characteristics that make an intent in general. In this paper, starting from the intuition that discovering intents could be beneficial to the identification of the known intents, we propose a probabilistic framework for discovering intents where intent assignments are treated as latent variables. We adopt Expectation Maximization framework for optimization. Specifically, In E-step, we conduct discovering intents and explore the intrinsic structure of unlabeled data by the posterior of intent assignments. In M-step, we alleviate the forgetting of prior knowledge transferred from known intents by optimizing the discrimination of labeled data. Extensive experiments conducted in three challenging real-world datasets demonstrate our method can achieve substantial improvements.
Mode Reduction for Markov Jump Systems
Du, Zhe, Balzano, Laura, Ozay, Necmiye
Switched systems are capable of modeling processes with underlying dynamics that may change abruptly over time. To achieve accurate modeling in practice, one may need a large number of modes, but this may in turn increase the model complexity drastically. Existing work on reducing system complexity mainly considers state space reduction, whereas reducing the number of modes is less studied. In this work, we consider Markov jump linear systems (MJSs), a special class of switched systems where the active mode switches according to a Markov chain, and several issues associated with its mode complexity. Specifically, inspired by clustering techniques from unsupervised learning, we are able to construct a reduced MJS with fewer modes that approximates the original MJS well under various metrics. Furthermore, both theoretically and empirically, we show how one can use the reduced MJS to analyze stability and design controllers with significant reduction in computational cost while achieving guaranteed accuracy.
Twin Contrastive Learning for Online Clustering
Li, Yunfan, Yang, Mouxing, Peng, Dezhong, Li, Taihao, Huang, Jiantao, Peng, Xi
This paper proposes to perform online clustering by conducting twin contrastive learning (TCL) at the instance and cluster level. Specifically, we find that when the data is projected into a feature space with a dimensionality of the target cluster number, the rows and columns of its feature matrix correspond to the instance and cluster representation, respectively. Based on the observation, for a given dataset, the proposed TCL first constructs positive and negative pairs through data augmentations. Thereafter, in the row and column space of the feature matrix, instance- and cluster-level contrastive learning are respectively conducted by pulling together positive pairs while pushing apart the negatives. To alleviate the influence of intrinsic false-negative pairs and rectify cluster assignments, we adopt a confidence-based criterion to select pseudo-labels for boosting both the instance- and cluster-level contrastive learning. As a result, the clustering performance is further improved. Besides the elegant idea of twin contrastive learning, another advantage of TCL is that it could independently predict the cluster assignment for each instance, thus effortlessly fitting online scenarios. Extensive experiments on six widely-used image and text benchmarks demonstrate the effectiveness of TCL. The code will be released on GitHub.
An Improved Algorithm for Clustered Federated Learning
Harshvardhan, null, Ghosh, Avishek, Mazumdar, Arya
In this paper, we address the dichotomy between heterogeneous models and simultaneous training in Federated Learning (FL) via a clustering framework. We define a new clustering model for FL based on the (optimal) local models of the users: two users belong to the same cluster if their local models are close; otherwise they belong to different clusters. A standard algorithm for clustered FL is proposed in \cite{ghosh_efficient_2021}, called \texttt{IFCA}, which requires \emph{suitable} initialization and the knowledge of hyper-parameters like the number of clusters (which is often quite difficult to obtain in practical applications) to converge. We propose an improved algorithm, \emph{Successive Refine Federated Clustering Algorithm} (\texttt{SR-FCA}), which removes such restrictive assumptions. \texttt{SR-FCA} treats each user as a singleton cluster as an initialization, and then successively refine the cluster estimation via exploiting similar users belonging to the same cluster. In any intermediate step, \texttt{SR-FCA} uses a robust federated learning algorithm within each cluster to exploit simultaneous training and to correct clustering errors. Furthermore, \texttt{SR-FCA} does not require any \emph{good} initialization (warm start), both in theory and practice. We show that with proper choice of learning rate, \texttt{SR-FCA} incurs arbitrarily small clustering error. Additionally, we validate the performance of our algorithm on standard FL datasets in non-convex problems like neural nets, and we show the benefits of \texttt{SR-FCA} over baselines.
Towards Practical Explainability with Cluster Descriptors
Liu, Xiaoyuan, Tyagin, Ilya, Ushijima-Mwesigwa, Hayato, Ghosh, Indradeep, Safro, Ilya
With the rapid development of machine learning, improving its explainability has become a crucial research goal. We study the problem of making the clusters more explainable by investigating the cluster descriptors. Given a set of objects $S$, a clustering of these objects $\pi$, and a set of tags $T$ that have not participated in the clustering algorithm. Each object in $S$ is associated with a subset of $T$. The goal is to find a representative set of tags for each cluster, referred to as the cluster descriptors, with the constraint that these descriptors we find are pairwise disjoint, and the total size of all the descriptors is minimized. In general, this problem is NP-hard. We propose a novel explainability model that reinforces the previous models in such a way that tags that do not contribute to explainability and do not sufficiently distinguish between clusters are not added to the optimal descriptors. The proposed model is formulated as a quadratic unconstrained binary optimization problem which makes it suitable for solving on modern optimization hardware accelerators. We experimentally demonstrate how a proposed explainability model can be solved on specialized hardware for accelerating combinatorial optimization, the Fujitsu Digital Annealer, and use real-life Twitter and PubMed datasets for use cases.
Koopman-based spectral clustering of directed and time-evolving graphs
Klus, Stefan, Conrad, Natasa Djurdjevac
While spectral clustering algorithms for undirected graphs are well established and have been successfully applied to unsupervised machine learning problems ranging from image segmentation and genome sequencing to signal processing and social network analysis, clustering directed graphs remains notoriously difficult. Two of the main challenges are that the eigenvalues and eigenvectors of graph Laplacians associated with directed graphs are in general complex-valued and that there is no universally accepted definition of clusters in directed graphs. We first exploit relationships between the graph Laplacian and transfer operators and in particular between clusters in undirected graphs and metastable sets in stochastic dynamical systems and then use a generalization of the notion of metastability to derive clustering algorithms for directed and time-evolving graphs. The resulting clusters can be interpreted as coherent sets, which play an important role in the analysis of transport and mixing processes in fluid flows.
Binary Orthogonal Non-negative Matrix Factorization
Hafshejani, S. Fathi, Gaur, D., Hossain, S., Benkoczi, R.
We propose a method for computing binary orthogonal non-negative matrix factorization (BONMF) for clustering and classification. The method is tested on several representative real-world data sets. The numerical results confirm that the method has improved accuracy compared to the related techniques. The proposed method is fast for training and classification and space efficient.