Goto

Collaborating Authors

 wsc


Mixed Membership sub-Gaussian Models

arXiv.org Machine Learning

The Gaussian mixture model is widely used in unsupervised learning, owing to its simplicity and interpretability. However, a fundamental limitation of the classical Gaussian mixture model is that it forces each observation to belong to exactly one component. In many practical applications, such as genetics, social network analysis, and text mining, an observation may naturally belong to multiple components or exhibit partial membership in several latent components. To overcome this limitation, we propose the mixed membership sub-Gaussian model, which extends the classical Gaussian mixture framework by allowing each observation to belong to multiple components. This model inherits the interpretability of the classical Gaussian mixture model while offering greater flexibility for capturing complex overlapping structures. We develop an efficient spectral algorithm to estimate the mixed membership of each individual observation, and under mild separation conditions on the component centres, we prove that the estimation error of the per-individual membership vector can be made arbitrarily small with high probability. To our knowledge, this is the first work to provide a computationally efficient estimator with such a vanishing-error guarantee for a mixed-membership extension of the Gaussian mixture model. Extensive experimental studies demonstrate that our method outperforms existing approaches that ignore mixed memberships.



Weakly-Supervised Contrastive Learning for Imprecise Class Labels

arXiv.org Artificial Intelligence

Contrastive learning has achieved remarkable success in learning effective representations, with supervised contrastive learning often outperforming self-supervised approaches. However, in real-world scenarios, data annotations are often ambiguous or inaccurate, meaning that class labels may not reliably indicate whether two examples belong to the same class. This limitation restricts the applicability of supervised contrastive learning. To address this challenge, we introduce the concept of ``continuous semantic similarity'' to define positive and negative pairs. Instead of directly relying on imprecise class labels, we measure the semantic similarity between example pairs, which quantifies how closely they belong to the same category by iteratively refining weak supervisory signals. Based on this concept, we propose a graph-theoretic framework for weakly-supervised contrastive learning, where semantic similarity serves as the graph weights. Our framework is highly versatile and can be applied to many weakly-supervised learning scenarios. We demonstrate its effectiveness through experiments in two common settings, i.e., noisy label and partial label learning, where existing methods can be easily integrated to significantly improve performance. Theoretically, we establish an error bound for our approach, showing that it can approximate supervised contrastive learning under mild conditions. The implementation code is available at https://github.com/Speechless-10308/WSC.


Enhancing Deep Learning based RMT Data Inversion using Gaussian Random Field

arXiv.org Artificial Intelligence

Deep learning (DL) methods have emerged as a powerful tool for the inversion of geophysical data. When applied to field data, these models often struggle without additional fine-tuning of the network. This is because they are built on the assumption that the statistical patterns in the training and test datasets are the same. To address this, we propose a DL-based inversion scheme for Radio Magnetotelluric data where the subsurface resistivity models are generated using Gaussian Random Fields (GRF). The network's generalization ability was tested with an out-of-distribution (OOD) dataset comprising a homogeneous background and various rectangular-shaped anomalous bodies. After end-to-end training with the GRF dataset, the pre-trained network successfully identified anomalies in the OOD dataset. Synthetic experiments confirmed that the GRF dataset enhances generalization compared to a homogeneous background OOD dataset. The network accurately recovered structures in a checkerboard resistivity model, and demonstrated robustness to noise, outperforming traditional gradient-based methods. Finally, the developed scheme is tested using exemplary field data from a waste site near Roorkee, India. The proposed scheme enhances generalization in a data-driven supervised learning framework, suggesting a promising direction for OOD generalization in DL methods.


AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark

arXiv.org Artificial Intelligence

Detecting biases in natural language understanding (NLU) for African American Vernacular English (AAVE) is crucial to developing inclusive natural language processing (NLP) systems. To address dialect-induced performance discrepancies, we introduce AAVENUE ({AAVE} {N}atural Language {U}nderstanding {E}valuation), a benchmark for evaluating large language model (LLM) performance on NLU tasks in AAVE and Standard American English (SAE). AAVENUE builds upon and extends existing benchmarks like VALUE, replacing deterministic syntactic and morphological transformations with a more flexible methodology leveraging LLM-based translation with few-shot prompting, improving performance across our evaluation metrics when translating key tasks from the GLUE and SuperGLUE benchmarks. We compare AAVENUE and VALUE translations using five popular LLMs and a comprehensive set of metrics including fluency, BARTScore, quality, coherence, and understandability. Additionally, we recruit fluent AAVE speakers to validate our translations for authenticity. Our evaluations reveal that LLMs consistently perform better on SAE tasks than AAVE-translated versions, underscoring inherent biases and highlighting the need for more inclusive NLP models. We have open-sourced our source code on GitHub and created a website to showcase our work at https://aavenue.live.


Technical Perspective: The Importance of WINOGRANDE

Communications of the ACM

Excelling at a test often does not translate into excelling at the skills the test purports to measure. This is true not only of humans but also of AI systems, and the more so the greater the claims of the test's significance. This became evident less than a decade after the introduction of the Winograd Schema Challenge (WSC),3 a test designed to measure an AI system's commonsense reasoning (CSR) ability by answering simple questions. An example would be, given the information: The sculpture rolled off the shelf because it wasn't anchored, answering: What wasn't anchored? There are multiple AI systems2 that achieve human performance on the WSC but are not capable of performing CSR.


On Sufficient and Necessary Conditions in Bounded CTL

arXiv.org Artificial Intelligence

Computation Tree Logic (CTL) is one of the central formalisms in formal verification. As a specification language, it is used to express a property that the system at hand is expected to satisfy. From both the verification and the system design points of view, some information content of such property might become irrelevant for the system due to various reasons e.g., it might become obsolete by time, or perhaps infeasible due to practical difficulties. Then, the problem arises on how to subtract such piece of information without altering the relevant system behaviour or violating the existing specifications. Moreover, in such a scenario, two crucial notions are informative: the strongest necessary condition (SNC) and the weakest sufficient condition (WSC) of a given property. To address such a scenario in a principled way, we introduce a forgetting-based approach in CTL and show that it can be used to compute SNC and WSC of a property under a given model. We study its theoretical properties and also show that our notion of forgetting satisfies existing essential postulates. Furthermore, we analyse the computational complexity of basic tasks, including various results for the relevant fragment CTLAF.


The KnowRef Coreference Corpus: a resource for training and evaluating common sense in AI - Microsoft Research

#artificialintelligence

AI has made major strides in the last decade, from beating the world champion of Go, to learning how to program, to telling fantastical short stories. However, a basic human trait continues to elude machines: common sense. Common sense is a big term with plenty of baggage, but it typically includes shared background knowledge (I know certain facts about the world, like "the sky is blue," and I know that you know them too), elements of logic, and the ability to infer what is plausible. It looms large as one of the hardest and most central problems in AI. Machines can seem glaringly unintelligent when they lack common sense.


Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge

arXiv.org Artificial Intelligence

Winograd Schema Challenge (WSC) was proposed as an AI-hard problem in testing computers' intelligence on common sense representation and reasoning. This paper presents the new state-of-theart on WSC, achieving an accuracy of 71.1%. We demonstrate that the leading performance benefits from jointly modelling sentence structures, utilizing knowledge learned from cutting-edge pretraining models, and performing fine-tuning. We conduct detailed analyses, showing that fine-tuning is critical for achieving the performance, but it helps more on the simpler associative problems. Modelling sentence dependency structures, however, consistently helps on the harder non-associative subset of WSC. Analysis also shows that larger fine-tuning datasets yield better performances, suggesting the potential benefit of future work on annotating more Winograd schema sentences.


On the Evaluation of Common-Sense Reasoning in Natural Language Understanding

arXiv.org Artificial Intelligence

The NLP and ML communities have long been interested in developing models capable of common-sense reasoning, and recent works have significantly improved the state of the art on benchmarks like the Winograd Schema Challenge (WSC). Despite these advances, the complexity of tasks designed to test common-sense reasoning remains under-analyzed. In this paper, we make a case study of the Winograd Schema Challenge and, based on two new measures of instance-level complexity, design a protocol that both clarifies and qualifies the results of previous work. Our protocol accounts for the WSC's limited size and variable instance difficulty, properties common to other common-sense benchmarks. Accounting for these properties when assessing model results may prevent unjustified conclusions.