Wu, Yifan
Conformal Prediction and Human Decision Making
Hullman, Jessica, Wu, Yifan, Xie, Dawei, Guo, Ziyang, Gelman, Andrew
Methods to quantify uncertainty in predictions from arbitrary models are in demand in high-stakes domains like medicine and finance. Conformal prediction has emerged as a popular method for producing a set of predictions with specified average coverage, in place of a single prediction and confidence value. However, the value of conformal prediction sets to assist human decisions remains elusive due to the murky relationship between coverage guarantees and decision makers' goals and strategies. How should we think about conformal prediction sets as a form of decision support? We outline a decision theoretic framework for evaluating predictive uncertainty as informative signals, then contrast what can be said within this framework about idealized use of calibrated probabilities versus conformal prediction sets. Informed by prior empirical results and theories of human decisions under uncertainty, we formalize a set of possible strategies by which a decision maker might use a prediction set. We identify ways in which conformal prediction sets and posthoc predictive uncertainty quantification more broadly are in tension with common goals and needs in human-AI decision making. We give recommendations for future research in predictive uncertainty quantification to support human decision makers.
The Value of Information in Human-AI Decision-making
Guo, Ziyang, Wu, Yifan, Hartline, Jason, Hullman, Jessica
As the performance of artificial intelligence (AI) models improves, workflows in which human and AI model-based judgments are combined to make decisions are sought in medicine, finance, and other domains. Though statistical models often make more accurate predictions than human experts on average [รgisdรณttir et al., 2006, Grove et al., 2000, Meehl, 1954], whenever humans have access to additional information over the AI, there is potential to achieve complementary performance by pairing the two, i.e., better performance than either the human or AI alone. For example, a physician may have access to additional information that may not be captured in tabular electronic health records or other structured data [Alur et al., 2024b]. However, evidence of complementary performance between humans and AI is limited, with many studies showing that human-AI teams underperform an AI alone [Buรงinca et al., 2020, Bussone et al., 2015, Green and Chen, 2019, Jacobs et al., 2021, Lai and Tan, 2019, Vaccaro and Waldo, 2019, Kononenko, 2001]. A solid understanding of such results is limited by the fact that most analyses of human-AI decision-making focus on ranking the performance of human-AI teams or each individually using measures like posthoc decision accuracy.
AskChart: Universal Chart Understanding through Textual Enhancement
Yang, Xudong, Wu, Yifan, Zhu, Yizhang, Tang, Nan, Luo, Yuyu
Chart understanding tasks such as ChartQA and Chart-to-Text involve automatically extracting and interpreting key information from charts, enabling users to query or convert visual data into structured formats. State-of-the-art approaches primarily focus on visual cues from chart images, failing to explicitly incorporate rich textual information (e.g., data labels and axis labels) embedded within the charts. This textual information is vital for intuitive human comprehension and interpretation of charts. Moreover, existing models are often large and computationally intensive, limiting their practical applicability. In this paper, we introduce AskChart, a universal model that explicitly integrates both textual and visual cues from charts using a Mixture of Experts (MoE) architecture. AskChart facilitates the learning of enhanced visual-textual representations of charts for effectively handling multiple chart understanding tasks, while maintaining a smaller model size. To capture the synergy between visual and textual modalities, we curate a large-scale dataset named ChartBank with about 7.5M data samples, which helps align textual and visual information and facilitates the extraction of visual entities and text. To effectively train AskChart, we design a three-stage training strategy to align visual and textual modalities for learning robust visual-textual representations and optimizing the learning of the MoE layer. Extensive experiments across five datasets demonstrate the significant performance gains of AskChart in four chart understanding tasks. Remarkably, AskChart with 4.6B parameters outperforms state-of-the-art models with 13B parameters by 68.3% in Open-ended ChartQA and 49.2% in Chart-to-Text tasks, while achieving comparable performance in ChartQA and Chart-to-Table tasks.
Optimized Gradient Clipping for Noisy Label Learning
Ye, Xichen, Wu, Yifan, Zhang, Weizhong, Li, Xiaoqiang, Chen, Yifan, Jin, Cheng
Previous research has shown that constraining the gradient of loss function with respect to model-predicted probabilities can enhance the model robustness against noisy labels. These methods typically specify a fixed optimal threshold for gradient clipping through validation data to obtain the desired robustness against noise. However, this common practice overlooks the dynamic distribution of gradients from both clean and noisy-labeled samples at different stages of training, significantly limiting the model capability to adapt to the variable nature of gradients throughout the training process. To address this issue, we propose a simple yet effective approach called Optimized Gradient Clipping (OGC), which dynamically adjusts the clipping threshold based on the ratio of noise gradients to clean gradients after clipping, estimated by modeling the distributions of clean and noisy samples. This approach allows us to modify the clipping threshold at each training step, effectively controlling the influence of noise gradients. Additionally, we provide statistical analysis to certify the noise-tolerance ability of OGC. Our extensive experiments across various types of label noise, including symmetric, asymmetric, instance-dependent, and real-world noise, demonstrate the effectiveness of our approach.
Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency
Li, Zenan, Wu, Yifan, Li, Zhaoyu, Wei, Xinming, Zhang, Xian, Yang, Fan, Ma, Xiaoxing
Autoformalization, the task of automatically translating natural language descriptions into a formal language, poses a significant challenge across various domains, especially in mathematics. Recent advancements in large language models (LLMs) have unveiled their promising capabilities to formalize even competition-level math problems. However, we observe a considerable discrepancy between pass@1 and pass@k accuracies in LLM-generated formalizations. To address this gap, we introduce a novel framework that scores and selects the best result from k autoformalization candidates based on two complementary self-consistency methods: symbolic equivalence and semantic consistency. Elaborately, symbolic equivalence identifies the logical homogeneity among autoformalization candidates using automated theorem provers, and semantic consistency evaluates the preservation of the original meaning by informalizing the candidates and computing the similarity between the embeddings of the original and informalized texts. Our extensive experiments on the MATH and miniF2F datasets demonstrate that our approach significantly enhances autoformalization accuracy, achieving up to 0.22-1.35x
MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model
Wu, Yifan, Zeng, Min, Li, Yang, Zhang, Yang, Li, Min
Most current molecular language models transfer the masked language model or image-text generation model from natural language processing to molecular field. However, molecules are not solely characterized by atom/bond symbols; they encapsulate important physical/chemical properties. Moreover, normal language models bring grammar rules that are irrelevant for understanding molecules. In this study, we propose a novel physicochemical knowledge-guided molecular meta language framework MolMetaLM. We design a molecule-specialized meta language paradigm, formatted as multiple (subject, predicate, object) knowledge triples sharing the same S (i.e., molecule) to enhance learning the semantic relationships between physicochemical knowledge and molecules. By introducing different molecular knowledge and noises, the meta language paradigm generates tens of thousands of pretraining tasks. By recovering the token/sequence/order-level noises, MolMetaLM exhibits proficiency in large-scale benchmark evaluations involving property prediction, molecule generation, conformation inference, and molecular optimization. Through MolMetaLM, we offer a new insight for designing language models.
StreetviewLLM: Extracting Geographic Information Using a Chain-of-Thought Multimodal Large Language Model
Li, Zongrong, Xu, Junhao, Wang, Siqin, Wu, Yifan, Li, Haiyang
Traditional machine learning has played a key role in geospatial predictions, but its limitations have become more distinct over time. One significant drawback of traditional ML is that they often rely on structured geospatial data, such as raster or vector formats, affecting their ability to handle unstructured or multimodal data (Pierdicca & Paolanti, 2022). Additionally, traditional models may face challenges in capturing complex spatial patterns and regional variations, leading to challenges with data sparsity and uneven distribution, which could affect the accuracy and generalizability of predictions (Nikparvar & Thill, 2021). In contrast, large language models (LLMs) have shown great promise across various fields by processing vast amounts of data and reasoning across multiple modalities (Chang et al., 2024). By integrating textual, visual, and contextual information, LLMs can introduce novel covariates for geospatial predictions, thus enhancing traditional approaches. However, extracting geospatial knowledge from LLMs poses its challenges. Although using geographic coordinates (i.e., latitude and longitude) was a straightforward way to retrieve location-specific information, this approach often yields suboptimal results, particularly when dealing with complex spatial relationships and regional characteristics. As a result, the traditional model does not easily to harness the full potential of multi-modal data, hindering its effectiveness in applications demanding comprehensive, cross-modal insights.
Unexploited Information Value in Human-AI Collaboration
Guo, Ziyang, Wu, Yifan, Hartline, Jason, Hullman, Jessica
Humans and AIs are often paired on decision tasks with the expectation of achieving complementary performance -- where the combination of human and AI outperforms either one alone. However, how to improve performance of a human-AI team is often not clear without knowing more about what particular information and strategies each agent employs. In this paper, we propose a model based in statistically decision theory to analyze human-AI collaboration from the perspective of what information could be used to improve a human or AI decision. We demonstrate our model on a deepfake detection task to investigate seven video-level features by their unexploited value of information. We compare the human alone, AI alone and human-AI team and offer insights on how the AI assistance impacts people's usage of the information and what information that the AI exploits well might be useful for improving human decisions.
LinFormer: A Linear-based Lightweight Transformer Architecture For Time-Aware MIMO Channel Prediction
Jin, Yanliang, Wu, Yifan, Gao, Yuan, Zhang, Shunqing, Xu, Shugong, Wang, Cheng-Xiang
The emergence of 6th generation (6G) mobile networks brings new challenges in supporting high-mobility communications, particularly in addressing the issue of channel aging. While existing channel prediction methods offer improved accuracy at the expense of increased computational complexity, limiting their practical application in mobile networks. To address these challenges, we present LinFormer, an innovative channel prediction framework based on a scalable, all-linear, encoder-only Transformer model. Our approach, inspired by natural language processing (NLP) models such as BERT, adapts an encoder-only architecture specifically for channel prediction tasks. We propose replacing the computationally intensive attention mechanism commonly used in Transformers with a time-aware multi-layer perceptron (TMLP), significantly reducing computational demands. The inherent time awareness of TMLP module makes it particularly suitable for channel prediction tasks. We enhance LinFormer's training process by employing a weighted mean squared error loss (WMSELoss) function and data augmentation techniques, leveraging larger, readily available communication datasets. Our approach achieves a substantial reduction in computational complexity while maintaining high prediction accuracy, making it more suitable for deployment in cost-effective base stations (BS). Comprehensive experiments using both simulated and measured data demonstrate that LinFormer outperforms existing methods across various mobility scenarios, offering a promising solution for future wireless communication systems.
Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery
Wu, Yifan, Yang, Yuntao, Liu, Zirui, Li, Zhao, Pahwa, Khushbu, Li, Rongbin, Zheng, Wenjin, Hu, Xia, Xu, Zhaozhuo
Gene-gene interactions play a crucial role in the manifestation of complex human diseases. Uncovering significant gene-gene interactions is a challenging task. Here, we present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth noteworthy gene-gene interactions. Despite the efficacy of Transformer models, their parameter intensity presents a bottleneck in data ingestion, hindering data efficiency. To mitigate this, we introduce a novel weighted diversified sampling algorithm. This algorithm computes the diversity score of each data sample in just two passes of the dataset, facilitating efficient subset generation for interaction discovery. Our extensive experimentation demonstrates that by sampling a mere 1\% of the single-cell dataset, we achieve performance comparable to that of utilizing the entire dataset.