Goto

Collaborating Authors

 lcl


Constrained Multi-Layer Contrastive Learning for Implicit Discourse Relationship Recognition

arXiv.org Artificial Intelligence

Previous approaches to the task of implicit discourse relation recognition (IDRR) generally view it as a classification task. Even with pre-trained language models, like BERT and RoBERTa, IDRR still relies on complicated neural networks with multiple intermediate layers to proper capture the interaction between two discourse units. As a result, the outputs of these intermediate layers may have different capability in discriminating instances of different classes. To this end, we propose to adapt a supervised contrastive learning (CL) method, label- and instance-centered CL, to enhance representation learning. Moreover, we propose a novel constrained multi-layer CL approach to properly impose a constraint that the contrastive loss of higher layers should be smaller than that of lower layers. Experimental results on PDTB 2.0 and PDTB 3.0 show that our approach can significantly improve the performance on both multi-class classification and binary classification.


Towards Faster Graph Partitioning via Pre-training and Inductive Inference

arXiv.org Artificial Intelligence

Graph partitioning (GP) is a classic problem that divides the node set of a graph into densely-connected blocks. Following the IEEE HPEC Graph Challenge and recent advances in pre-training techniques (e.g., large-language models), we propose PR-GPT (Pre-trained & Refined Graph ParTitioning) based on a novel pre-training & refinement paradigm. We first conduct the offline pre-training of a deep graph learning (DGL) model on small synthetic graphs with various topology properties. By using the inductive inference of DGL, one can directly generalize the pre-trained model (with frozen model parameters) to large graphs and derive feasible GP results. We also use the derived partition as a good initialization of an efficient GP method (e.g., InfoMap) to further refine the quality of partitioning. In this setting, the online generalization and refinement of PR-GPT can not only benefit from the transfer ability regarding quality but also ensure high inference efficiency without re-training. Based on a mechanism of reducing the scale of a graph to be processed by the refinement method, PR-GPT also has the potential to support streaming GP. Experiments on the Graph Challenge benchmark demonstrate that PR-GPT can ensure faster GP on large-scale graphs without significant quality degradation, compared with running a refinement method from scratch. We will make our code public at https://github.com/KuroginQin/PRGPT.


A Multi-Scale Cognitive Interaction Model of Instrument Operations at the Linac Coherent Light Source

arXiv.org Artificial Intelligence

We describe a novel multi-agent, multi-scale computational cognitive interaction model of instrument operations at the Linac Coherent Light Source (LCLS). A leading scientific user facility, LCLS is the world's first hard x-ray free electron laser, operated by the SLAC National Accelerator Laboratory for the U.S. Department of Energy. As the world's first x-ray free electron laser, LCLS is in high demand and heavily oversubscribed. Our overall project employs cognitive engineering methodologies to improve experimental efficiency and scientific productivity by refining experimental interfaces and workflows, simplifying tasks, reducing errors, and improving operator safety and stress levels. Our model simulates aspects of human cognition at multiple cognitive and temporal scales, ranging from seconds to hours, and among agents playing multiple roles, including instrument operator, real time data analyst, and experiment manager. The model can predict impacts stemming from proposed changes to operational interfaces and workflows. Because the model code is open source, and supplemental videos go into detail on all aspects of the model and results, this approach could be applied to other experimental apparatus and processes. Example results demonstrate the model's potential in guiding modifications to improve operational efficiency and scientific output. We discuss the implications of our findings for cognitive engineering in complex experimental settings and outline future directions for research.


Not All Negatives are Equal: Label-Aware Contrastive Loss for Fine-grained Text Classification

arXiv.org Artificial Intelligence

Fine-grained classification involves dealing with datasets with larger number of classes with subtle differences between them. Guiding the model to focus on differentiating dimensions between these commonly confusable classes is key to improving performance on fine-grained tasks. In this work, we analyse the contrastive fine-tuning of pre-trained language models on two fine-grained text classification tasks, emotion classification and sentiment analysis. We adaptively embed class relationships into a contrastive objective function to help differently weigh the positives and negatives, and in particular, weighting closely confusable negatives more than less similar negative examples. We find that Label-aware Contrastive Loss outperforms previous contrastive methods, in the presence of larger number and/or more confusable classes, and helps models to produce output distributions that are more differentiated.


Learning Interpretable Feature Context Effects in Discrete Choice

arXiv.org Machine Learning

The outcomes of elections, product sales, and the structure of social connections are all determined by the choices individuals make when presented with a set of options, so understanding the factors that contribute to choice is crucial. Of particular interest are context effects, which occur when the set of available options influences a chooser's relative preferences, as they violate traditional rationality assumptions yet are widespread in practice. However, identifying these effects from observed choices is challenging, often requiring foreknowledge of the effect to be measured. In contrast, we provide a method for the automatic discovery of a broad class of context effects from observed choice data. Our models are easier to train and more flexible than existing models and also yield intuitive, interpretable, and statistically testable context effects. Using our models, we identify new context effects in widely used choice datasets and provide the first analysis of choice set context effects in social network growth.