Not enough data to create a plot.
Try a different view from the menu above.
Wang, Haoliang
Feature-Space Semantic Invariance: Enhanced OOD Detection for Open-Set Domain Generalization
Wang, Haoliang, Zhao, Chen, Chen, Feng
Open-set domain generalization addresses a real-world challenge: training a model to generalize across unseen domains (domain generalization) while also detecting samples from unknown classes not encountered during training (open-set recognition). However, most existing approaches tackle these issues separately, limiting their practical applicability. To overcome this limitation, we propose a unified framework for open-set domain generalization by introducing Feature-space Semantic Invariance (FSI). FSI maintains semantic consistency across different domains within the feature space, enabling more accurate detection of OOD instances in unseen domains. Additionally, we adopt a generative model to produce synthetic data with novel domain styles or class labels, enhancing model robustness. Initial experiments show that our method improves AUROC by 9.1% to 18.9% on ColoredMNIST, while also significantly increasing in-distribution classification accuracy.
MADOD: Generalizing OOD Detection to Unseen Domains via G-Invariance Meta-Learning
Wang, Haoliang, Zhao, Chen, Chen, Feng
Real-world machine learning applications often face simultaneous covariate and semantic shifts, challenging traditional domain generalization and out-of-distribution (OOD) detection methods. We introduce Meta-learned Across Domain Out-of-distribution Detection (MADOD), a novel framework designed to address both shifts concurrently. MADOD leverages meta-learning and G-invariance to enhance model generalizability and OOD detection in unseen domains. Our key innovation lies in task construction: we randomly designate in-distribution classes as pseudo-OODs within each meta-learning task, simulating OOD scenarios using existing data. This approach, combined with energy-based regularization, enables the learning of robust, domain-invariant features while calibrating decision boundaries for effective OOD detection. Operating in a test domain-agnostic setting, MADOD eliminates the need for adaptation during inference, making it suitable for scenarios where test data is unavailable. Extensive experiments on real-world and synthetic datasets demonstrate MADOD's superior performance in semantic OOD detection across unseen domains, achieving an AUPR improvement of 8.48% to 20.81%, while maintaining competitive in-distribution classification accuracy, representing a significant advancement in handling both covariate and semantic shifts.
FEED: Fairness-Enhanced Meta-Learning for Domain Generalization
Jiang, Kai, Zhao, Chen, Wang, Haoliang, Chen, Feng
Generalizing to out-of-distribution data while being aware of model fairness is a significant and challenging problem in meta-learning. The goal of this problem is to find a set of fairness-aware invariant parameters of classifier that is trained using data drawn from a family of related training domains with distribution shift on non-sensitive features as well as different levels of dependence between model predictions and sensitive features so that the classifier can achieve good generalization performance on unknown but distinct test domains. To tackle this challenge, existing state-of-the-art methods either address the domain generalization problem but completely ignore learning with fairness or solely specify shifted domains with various fairness levels. This paper introduces an approach to fairness-aware meta-learning that significantly enhances domain generalization capabilities. Our framework, Fairness-Enhanced Meta-Learning for Domain Generalization (FEED), disentangles latent data representations into content, style, and sensitive vectors. This disentanglement facilitates the robust generalization of machine learning models across diverse domains while adhering to fairness constraints. Unlike traditional methods that focus primarily on domain invariance or sensitivity to shifts, our model integrates a fairness-aware invariance criterion directly into the meta-learning process. This integration ensures that the learned parameters uphold fairness consistently, even when domain characteristics vary widely. We validate our approach through extensive experiments across multiple benchmarks, demonstrating not only superior performance in maintaining high accuracy and fairness but also significant improvements over existing state-of-the-art methods in domain generalization tasks.
Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval
Xia, Yu, Wu, Junda, Kim, Sungchul, Yu, Tong, Rossi, Ryan A., Wang, Haoliang, McAuley, Julian
Large language models (LLMs) have been used to generate query expansions augmenting original queries for improving information search. Recent studies also explore providing LLMs with initial retrieval results to generate query expansions more grounded to document corpus. However, these methods mostly focus on enhancing textual similarities between search queries and target documents, overlooking document relations. For queries like "Find me a highly rated camera for wildlife photography compatible with my Nikon F-Mount lenses", existing methods may generate expansions that are semantically similar but structurally unrelated to user intents. To handle such semi-structured queries with both textual and relational requirements, in this paper we propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG). To further address the limitation of entity-based scoring in existing KG-based methods, we leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR). Extensive experiments on three datasets of diverse domains show the advantages of our method compared against state-of-the-art baselines on textual and relational semi-structured retrieval.
Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey
Liu, Xiaoyu, Xu, Paiheng, Wu, Junda, Yuan, Jiaxin, Yang, Yifan, Zhou, Yuhang, Liu, Fuxiao, Guan, Tianrui, Wang, Haoliang, Yu, Tong, McAuley, Julian, Ai, Wei, Huang, Furong
Recently Large Language Models (LLMs) have showcased remarkable versatility across a spectrum of critical tasks. An LLM is adept at tasks such as copywriting, enhancing original sentences with their distinct style and voice, responding to knowledge base queries, generating code, solving mathematical problems, and performing classification or generation tasks tailored to user requirements. Moreover, there has been a recent expansion into multi-modal variants, such as Large Visual Language Models (LVLMs) or Large Multi-modal Language Models, which broaden their input/output capabilities to encompass various modalities. This evolution has significantly enhanced both the potential and range of applications of these models. In this survey, our primary focus is on Transformer-based Large Language Models (LLMs). The capability of LLMs is fundamentally rooted in their inference abilities, which dictates their proficiency in comprehending, processing, and providing solutions to various inquiries, as well as their adaptability to societally impactful domains.
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
Wang, Yizhou, Zhang, Ruiyi, Wang, Haoliang, Bhattacharya, Uttaran, Fu, Yun, Wu, Gang
Recent advancements in language-model-based video understanding have been progressing at a remarkable pace, spurred by the introduction of Large Language Models (LLMs). However, the focus of prior research has been predominantly on devising a projection layer that maps video features to tokens, an approach that is both rudimentary and inefficient. In our study, we introduce a cutting-edge framework, VaQuitA, designed to refine the synergy between video and textual information. At the data level, instead of sampling frames uniformly, we implement a sampling method guided by CLIP-score rankings, which enables a more aligned selection of frames with the given question. At the feature level, we integrate a trainable Video Perceiver alongside a Visual-Query Transformer (abbreviated as VQ-Former), which bolsters the interplay between the input question and the video features. We also discover that incorporating a simple prompt, "Please be critical", into the LLM input can substantially enhance its video comprehension capabilities. Our experimental results indicate that VaQuitA consistently sets a new benchmark for zero-shot video question-answering tasks and is adept at producing high-quality, multi-turn video dialogues with users.
Fairness-Aware Domain Generalization under Covariate and Dependence Shifts
Zhao, Chen, Jiang, Kai, Wu, Xintao, Wang, Haoliang, Khan, Latifur, Grant, Christan, Chen, Feng
While modern fairness-aware machine learning techniques have demonstrated significant success in various applications [1, 2, 3], their primary objective is to facilitate equitable decision-making, ensuring fairness across all demographic groups, regardless of sensitive attributes, such as race and gender. Nevertheless, state-of-the-art methods can encounter severe shortcomings during the inference phase, mainly due to poor generalization when the spurious correlation deviates from the patterns seen in the training data. This correlation can manifest either between model outcomes and sensitive attributes [4, 5] or between model outcomes and non-semantic data features [6]. This issue originates from the existence of out-of-distribution (OOD) data, resulting in catastrophic failures. Over the past decade, the machine learning community has made significant strides in studying the OOD generalization (or domain generalization, DG) problem and attributing the cause of the poor generalization to the distribution shifts from source domains to target domains. There are two dominant shift types [7]: concept shift and covariate shift. Concept shift refers to OOD samples drawn from a distribution with semantic change e.g., dog v.s.
Towards Effective Semantic OOD Detection in Unseen Domains: A Domain Generalization Perspective
Wang, Haoliang, Zhao, Chen, Guo, Yunhui, Jiang, Kai, Chen, Feng
Two prevalent types of distributional shifts in machine learning are the covariate shift (as observed across different domains) and the semantic shift (as seen across different classes). Traditional OOD detection techniques typically address only one of these shifts. However, real-world testing environments often present a combination of both covariate and semantic shifts. In this study, we introduce a novel problem, semantic OOD detection across domains, which simultaneously addresses both distributional shifts. To this end, we introduce two regularization strategies: domain generalization regularization, which ensures semantic invariance across domains to counteract the covariate shift, and OOD detection regularization, designed to enhance OOD detection capabilities against the semantic shift through energy bounding. Through rigorous testing on three standard domain generalization benchmarks, our proposed framework showcases its superiority over conventional domain generalization approaches in terms of OOD detection performance. Moreover, it holds its ground by maintaining comparable InD classification accuracy.
Measuring and Modeling Physical Intrinsic Motivation
Martinez, Julio, Binder, Felix, Wang, Haoliang, Haber, Nick, Fan, Judith, Yamins, Daniel L. K.
Humans are interactive agents driven to seek out situations with interesting physical dynamics. Here we formalize the functional form of physical intrinsic motivation. We first collect ratings of how interesting humans find a variety of physics scenarios. We then model human interestingness responses by implementing various hypotheses of intrinsic motivation including models that rely on simple scene features to models that depend on forward physics prediction. We find that the single best predictor of human responses is adversarial reward, a model derived from physical prediction loss. We also find that simple scene feature models do not generalize their prediction of human responses across all scenarios. Finally, linearly combining the adversarial model with the number of collisions in a scene leads to the greatest improvement in predictivity of human responses, suggesting humans are driven towards scenarios that result in high information gain and physical activity.
Few-Shot Dialogue Summarization via Skeleton-Assisted Prompt Transfer
Xie, Kaige, Yu, Tong, Wang, Haoliang, Wu, Junda, Zhao, Handong, Zhang, Ruiyi, Mahadik, Kanak, Nenkova, Ani, Riedl, Mark
In real-world scenarios, labeled samples for dialogue summarization are usually limited (i.e., few-shot) due to high annotation costs for high-quality dialogue summaries. To efficiently learn from few-shot samples, previous works have utilized massive annotated data from other downstream tasks and then performed prompt transfer in prompt tuning so as to enable cross-task knowledge transfer. However, existing general-purpose prompt transfer techniques lack consideration for dialogue-specific information. In this paper, we focus on improving the prompt transfer from dialogue state tracking to dialogue summarization and propose Skeleton-Assisted Prompt Transfer (SAPT), which leverages skeleton generation as extra supervision that functions as a medium connecting the distinct source and target task and resulting in the model's better consumption of dialogue state information. To automatically extract dialogue skeletons as supervised training data for skeleton generation, we design a novel approach with perturbation-based probes requiring neither annotation effort nor domain knowledge. Training the model on such skeletons can also help preserve model capability during prompt transfer. Our method significantly outperforms existing baselines. In-depth analyses demonstrate the effectiveness of our method in facilitating cross-task knowledge transfer in few-shot dialogue summarization.