Pappadopulo, Duccio
Non-contrastive sentence representations via self-supervision
Farina, Marco, Pappadopulo, Duccio
Sample contrastive methods, typically referred to simply as contrastive are the foundation of most unsupervised methods to learn text and sentence embeddings. On the other hand, a different class of self-supervised loss functions and methods have been considered in the computer vision community and referred to as dimension contrastive. In this paper, we thoroughly compare this class of methods with the standard baseline for contrastive sentence embeddings, SimCSE. We find that self-supervised embeddings trained using dimension contrastive objectives can outperform SimCSE on downstream tasks without needing auxiliary loss functions.
Distillation of encoder-decoder transformers for sequence labelling
Farina, Marco, Pappadopulo, Duccio, Gupta, Anant, Huang, Leslie, ฤฐrsoy, Ozan, Solorio, Thamar
Driven by encouraging results on a wide range of tasks, the field of NLP is experiencing an accelerated race to develop bigger language models. This race for bigger models has also underscored the need to continue the pursuit of practical distillation approaches that can leverage the knowledge acquired by these big models in a compute-efficient manner. Having this goal in mind, we build on recent work to propose a hallucination-free framework for sequence tagging that is especially suited for distillation. We show empirical results of new state-of-the-art performance across multiple sequence labelling datasets and validate the usefulness of this framework for distilling a large model in a few-shot learning scenario.
Hierarchical clustering in particle physics through reinforcement learning
Brehmer, Johann, Macaluso, Sebastian, Pappadopulo, Duccio, Cranmer, Kyle
Particle physics experiments often require the reconstruction of decay patterns through a hierarchical clustering of the observed final-state particles. We show that this task can be phrased as a Markov Decision Process and adapt reinforcement learning algorithms to solve it. In particular, we show that Monte-Carlo Tree Search guided by a neural policy can construct high-quality hierarchical clusterings and outperform established greedy and beam search baselines.
Dialogue Act Classification in Group Chats with DAG-LSTMs
ฤฐrsoy, Ozan, Gosangi, Rakesh, Zhang, Haimin, Wei, Mu-Hsin, Lund, Peter, Pappadopulo, Duccio, Fahy, Brendan, Nephytou, Neophytos, Ortiz, Camilo
Dialogue act (DA) classification has been studied for the past two decades and has several key applications such as workflow automation and conversation analytics. Researchers have used, to address this problem, various traditional machine learning models, and more recently deep neural network models such as hierarchical convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. In this paper, we introduce a new model architecture, directed-acyclic-graph LSTM (DAG-LSTM) for DA classification. A DAG-LSTM exploits the turn-taking structure naturally present in a multi-party conversation, and encodes this relation in its model structure. Using the STAC corpus, we show that the proposed method performs roughly 0.8% better in accuracy and 1.2% better in macro-F1 score when compared to existing methods. The proposed method is generic and not limited to conversation applications.
Inferring the quantum density matrix with machine learning
Cranmer, Kyle, Golkar, Siavash, Pappadopulo, Duccio
In particular, There is a nexus of concepts at the heart of a rich interplay machine learning techniques have been used for between physics, statistics, machine learning, and variational optimization of ground state energy for quantum information theory. Concepts such as entropy that were systems [6]. Additionally, there have been a number key to the early work in thermodynamics are the bedrock of important developments that extend statistical inference of information theory. Similarly the Gibbs (or Boltzman) to domains where probabilistic modeling was previously distribution, which characterize the distribution of states inaccessible. These techniques have recently been in thermal equilibrium, is at the heart of energy based explored to solve statistical mechanics of classical systems models and Boltzman machines that were widely studied [7, 8]. In this work, we aim to connect recent developments in machine learning [1, 2]. Additionally, the study of in deep generative models [9-12], unsupervised complicated many-body systems gave rise to mean-field learning for implicit models [13], and variational inference methods and renormalization group methods.