AITopics

2502.16156

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > Strength High (0.92)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Epidemiology (1.00)
(5 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(6 more...)

arXiv.org Artificial IntelligenceFeb-14-2025

Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis

Zhang, Wenbo, Cai, Hengrui, Chen, Wenyu

Large language models (LLMs) have demonstrated significant utilities in real-world applications, exhibiting impressive capabilities in natural language processing and understanding. Benchmark evaluations are crucial for assessing the capabilities of LLMs as they can provide a comprehensive assessment of their strengths and weaknesses. However, current evaluation methods often overlook the inherent randomness of LLMs by employing deterministic generation strategies or relying on a single random sample, resulting in unaccounted sampling variance and unreliable benchmark score estimates. In this paper, we propose a hierarchical statistical model that provides a more comprehensive representation of the benchmarking process by incorporating both benchmark characteristics and LLM randomness. We show that leveraging multiple generations improves the accuracy of estimating the benchmark score and reduces variance. We also introduce $\mathbb P\left(\text{correct}\right)$, a prompt-level difficulty score based on correct ratios, providing fine-grained insights into individual prompts. Additionally, we create a data map that visualizes difficulty and semantic prompts, enabling error detection and quality control in benchmark construction.

large language model, natural language, preprint arxiv, (17 more...)

2502.08943

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceDec-29-2023

Is Knowledge All Large Language Models Needed for Causal Reasoning?

Cai, Hengrui, Liu, Shengjie, Song, Rui

This paper explores the causal reasoning of large language models (LLMs) to enhance their interpretability and reliability in advancing artificial intelligence. Despite the proficiency of LLMs in a range of tasks, their potential for understanding causality requires further exploration. We propose a novel causal attribution model that utilizes "do-operators" for constructing counterfactual scenarios, allowing us to systematically quantify the influence of input numerical data and LLMs' pre-existing knowledge on their causal reasoning processes. Our newly developed experimental setup assesses LLMs' reliance on contextual information and inherent knowledge across various domains. Our evaluation reveals that LLMs' causal reasoning ability depends on the context and domain-specific knowledge provided, and supports the argument that "knowledge is, indeed, what LLMs principally require for sound causal reasoning". On the contrary, in the absence of knowledge, LLMs still maintain a degree of causal reasoning using the available numerical data, albeit with limitations in the calculations.

large language model, machine learning, natural language, (16 more...)

2401.00139

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

arXiv.org Machine LearningNov-1-2023

On Learning Necessary and Sufficient Causal Graphs

Cai, Hengrui, Wang, Yixin, Jordan, Michael, Song, Rui

The causal revolution has stimulated interest in understanding complex relationships in various fields. Most of the existing methods aim to discover causal relationships among all variables within a complex large-scale graph. However, in practice, only a small subset of variables in the graph are relevant to the outcomes of interest. Consequently, causal estimation with the full causal graph -- particularly given limited data -- could lead to numerous falsely discovered, spurious variables that exhibit high correlation with, but exert no causal impact on, the target outcome. In this paper, we propose learning a class of necessary and sufficient causal graphs (NSCG) that exclusively comprises causally relevant variables for an outcome of interest, which we term causal features. The key idea is to employ probabilities of causation to systematically evaluate the importance of features in the causal graph, allowing us to identify a subgraph relevant to the outcome of interest. To learn NSCG from data, we develop a necessary and sufficient causal structural learning (NSCSL) algorithm, by establishing theoretical properties and relationships between probabilities of causation and natural causal effects of features. Across empirical studies of simulated and real data, we demonstrate that NSCSL outperforms existing algorithms and can reveal crucial yeast genes for target heritable traits of interest.

artificial intelligence, machine learning, natural language, (17 more...)

2301.12389

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningSep-8-2023

Towards Trustworthy Explanation: On Causal Rationalization

Zhang, Wenbo, Wu, Tong, Wang, Yunlong, Cai, Yong, Cai, Hengrui

With recent advances in natural language processing, rationalization becomes an essential self-explaining diagram to disentangle the black box by selecting a subset of input texts to account for the major variation in prediction. Yet, existing association-based approaches on rationalization cannot identify true rationales when two or more snippets are highly inter-correlated and thus provide a similar contribution to prediction accuracy, so-called spuriousness. To address this limitation, we novelly leverage two causal desiderata, non-spuriousness and efficiency, into rationalization from the causal inference perspective. We formally define a series of probabilities of causation based on a newly proposed structural causal model of rationalization, with its theoretical identification established as the main component of learning necessary and sufficient rationales. The superior performance of the proposed causal rationalization is demonstrated on real-world review and medical datasets with extensive experiments compared to state-of-the-art methods.

artificial intelligence, machine learning, natural language, (17 more...)

2306.14115

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Artificial IntelligenceJun-25-2023

On Heterogeneous Treatment Effects in Heterogeneous Causal Graphs

Watson, Richard A, Cai, Hengrui, An, Xinming, McLean, Samuel, Song, Rui

Heterogeneity and comorbidity are two interwoven challenges associated with various healthcare problems that greatly hampered research on developing effective treatment and understanding of the underlying neurobiological mechanism. Very few studies have been conducted to investigate heterogeneous causal effects (HCEs) in graphical contexts due to the lack of statistical methods. To characterize this heterogeneity, we first conceptualize heterogeneous causal graphs (HCGs) by generalizing the causal graphical model with confounder-based interactions and multiple mediators. Such confounders with an interaction with the treatment are known as moderators. This allows us to flexibly produce HCGs given different moderators and explicitly characterize HCEs from the treatment or potential mediators on the outcome. We establish the theoretical forms of HCEs and derive their properties at the individual level in both linear and nonlinear models. An interactive structural learning is developed to estimate the complex HCGs and HCEs with confidence intervals provided. Our method is empirically justified by extensive simulations and its practical usefulness is illustrated by exploring causality among psychiatric disorders for trauma survivors.

data mining, machine learning, mediator, (17 more...)

2301.12383

Country:

North America > United States > North Carolina (0.14)
North America > United States > Hawaii (0.14)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Cognitive Science (0.66)
Information Technology > Data Science > Data Mining (0.65)

arXiv.org Artificial IntelligenceMar-24-2023

Sequential Knockoffs for Variable Selection in Reinforcement Learning

Ma, Tao, Cai, Hengrui, Qi, Zhengling, Shi, Chengchun, Laber, Eric B.

Interest in reinforcement learning (RL, Sutton & Barto 2018) has increased dramatically in recent years due in part to a number of high-profile successes in games (Mnih et al. 2013, 2015), autonomous driving (Sallab et al. 2017), and precision medicine (Tsiatis et al. 2019). However, despite theoretical and computational advances, real-world application of RL remains difficult. A primary challenge is dealing with high-dimensional state representations. Such representations occur naturally in systems with high-dimensional measurements, like images or audio, but can also occur when the system state is constructed by concatenating a series of measurements over a contiguous block of time. A high-dimensional state-- when a more parsimonious one would suffice--dilutes the efficiency of learning algorithms and makes the estimated optimal policy harder to interpret. Thus, methods for removing uninformative or redundant variables from the state are of tremendous practical value. We develop a general variable selection algorithm for offline RL, which aims to learn an optimal policy using only logged data, i.e., without any additional online interaction. Our contributions can be summarized as follows: (i) we formally define a minimal sufficient state for an MDP and argue that it is an appropriate target by which to design and evaluate variable selection methods in RL; (ii) we show that naïve variable selection methods based on the state or reward alone need not recover the minimal sufficient state; (iii) we propose a novel sequential knockoffs (SEEK) algorithm that applies with general black-box learning methods, and, under a β-mixing condition, consistently recovers the minimal sufficient state, and controls the false discovery rate (FDR, the ratio of the number of selected irrelevant variables to the number of selected variables); and (iv) we develop a novel algorithm to estimate the β-mixing coefficients of an MDP. The algorithm in (iv) is important in its own right as it applies to a number of applications beyond RL (McDonald et al. 2015).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2303.14281

Country: North America > United States (0.27)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

arXiv.org Artificial IntelligenceJan-29-2023

Heterogeneous Synthetic Learner for Panel Data

Shen, Ye, Wan, Runzhe, Cai, Hengrui, Song, Rui

Evaluating the treatment effect from panel data has become an increasingly important problem in numerous areas including public health (Cole et al. 2020, Goodman-Bacon & Marcus 2020), politics (Abadie et al. 2010, Sabia et al. 2012), economics (Cavallo et al. 2013, Dube & Zipperer 2015), etc. During the past decades, a number of methods have been developed to estimate the average treatment effect (ATE) from panel data, including the celebrated Difference-in-Differences (DiD) (Abadie 2005) and the Synthetic Control (SC) method (Abadie & Gardeazabal 2003, Abadie et al. 2010). Yet, due to the heterogeneity of individuals in response to treatments, there may not exist one single uniformly optimal treatment across individuals. Thus, one major focus in causal machine learning is to access the Heterogeneous Treatment Effect (HTE) (see e.g., Athey & Imbens 2015, Shalit et al. 2017, Wager & Athey 2018, Künzel et al. 2019, Farrell et al. 2021) that measures the causal impact within a given group. Detecting such a heterogeneity in panel data hence becomes an inevitable trend in the new era of personalization. However, estimating HTE in panel data is surprisingly underexplored in the literature. On the one hand, despite the fact that there are many methods for the HTE estimation (see e.g., Athey & Imbens 2016, Johnson et al. 2019, Künzel et al. 2019, Nie & Wager 2021, and the reference therein), most of these works focus on independently and identically distributed (i.i.d.) observations and thus are infeasible to handle the non-stationarity and temporal dependency in the common panel data setting. On the other hand, in contrast to the popularity of estimating ATE in panel data as mentioned above, limited progress has been achieved for HTE.

artificial intelligence, heterogeneous synthetic learner, machine learning, (15 more...)

2212.1458

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry: Health & Medicine > Public Health (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningNov-16-2021

Jump Interval-Learning for Individualized Decision Making

Cai, Hengrui, Shi, Chengchun, Song, Rui, Lu, Wenbin

An individualized decision rule (IDR) is a decision function that assigns each individual a given treatment based on his/her observed characteristics. Most of the existing works in the literature consider settings with binary or finitely many treatment options. In this paper, we focus on the continuous treatment setting and propose a jump interval-learning to develop an individualized interval-valued decision rule (I2DR) that maximizes the expected outcome. Unlike IDRs that recommend a single treatment, the proposed I2DR yields an interval of treatment options for each individual, making it more flexible to implement in practice. To derive an optimal I2DR, our jump interval-learning method estimates the conditional mean of the outcome given the treatment and the covariates via jump penalized regression, and derives the corresponding optimal I2DR based on the estimated outcome regression function. The regressor is allowed to be either linear for clear interpretation or deep neural network to model complex treatment-covariates interactions. To implement jump interval-learning, we develop a searching algorithm based on dynamic programming that efficiently computes the outcome regression function. Statistical properties of the resulting I2DR are established when the outcome regression function is either a piecewise or continuous function over the treatment space. We further develop a procedure to infer the mean outcome under the (estimated) optimal policy. Extensive simulations and a real data application to a warfarin study are conducted to demonstrate the empirical validity of the proposed I2DR.

machine learning, teaching medhods, teaching method, (24 more...)

2111.08885

Country: North America > United States (0.14)

Genre: Research Report > Experimental Study (0.45)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningOct-28-2021

Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning

Cai, Hengrui, Shen, Ye, Song, Rui

Evaluating the performance of an ongoing policy plays a vital role in many areas such as medicine and economics, to provide crucial instruction on the early-stop of the online experiment and timely feedback from the environment. Policy evaluation in online learning thus attracts increasing attention by inferring the mean outcome of the optimal policy (i.e., the value) in real-time. Yet, such a problem is particularly challenging due to the dependent data generated in the online environment, the unknown optimal policy, and the complex exploration and exploitation trade-off in the adaptive experiment. In this paper, we aim to overcome these difficulties in policy evaluation for online learning. We explicitly derive the probability of exploration that quantifies the probability of exploring the non-optimal actions under commonly used bandit algorithms. We use this probability to conduct valid inference on the online conditional mean estimator under each action and develop the doubly robust interval estimation (DREAM) method to infer the value under the estimated optimal policy in online learning. The proposed value estimator provides double protection on the consistency and is asymptotically normal with a Wald-type confidence interval provided. Extensive simulations and real data applications are conducted to demonstrate the empirical validity of the proposed DREAM method.

artificial intelligence, data mining, machine learning, (20 more...)

2110.15501

Country: North America > United States (0.14)

Genre: Research Report (0.63)

Industry:

Education > Educational Setting > Online (1.00)
Health & Medicine (0.69)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)