AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-24-2025, 20:48:07 GMT

Identification of Partially Observed Linear Causal Models: Graphical Conditions for the Non-Gaussian and Heterogeneous Cases

In causal discovery, linear non-Gaussian acyclic models (LiNGAMs) have been studied extensively. While the causally sufficient case is well understood, in many real problems the observed variables are not causally related. Rather, they are generated by latent variables, such as confounders and mediators, which may themselves be causally related. Existing results on the identification of the causal structure among the latent variables often require very strong graphical assumptions. In this paper, we consider partially observed linear models with either non-Gaussian or heterogeneous errors.

graphical condition, non-gaussian and heterogeneous case, observed linear causal model, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Neural Information Processing SystemsDec-24-2025, 10:32:04 GMT

Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs

Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e.g., image pixels), but are generated by latent causal variables or confounders that are causally related. To this end, in this paper, we consider Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and propose a Generalized Independent Noise (GIN) condition to estimate such latent variable graphs. Specifically, for two observed random vectors $\mathbf{Y}$ and $\mathbf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are statistically independent, where $\omega$ is a parameter vector characterized from the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. From the graphical view, roughly speaking, GIN implies that causally earlier latent common causes of variables in $\mathbf{Y}$ d-separate $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition, i.e., if there is no confounder, causes are independent from the error of regressing the effect on the causes, can be seen as a special case of GIN. Moreover, we show that GIN helps locate latent variables and identify their causal structure, including causal directions. We further develop a recursive learning algorithm to achieve these goals. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.

generalized independent noise condition, latent variable causal graph, mathbf, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.75)

Neural Information Processing SystemsDec-24-2025, 05:18:39 GMT

Causally motivated multi-shortcut identification and removal

For predictive models to provide reliable guidance in decision making processes, they are often required to be accurate and robust to distribution shifts. Shortcut learning--where a model relies on spurious correlations or shortcuts to predict the target label--undermines the robustness property, leading to models with poor out-of-distribution accuracy despite good in-distribution performance. Existing work on shortcut learning either assumes that the set of possible shortcuts is known a priori or is discoverable using interpretability methods such as saliency maps, which might not always be true. Instead, we propose a two step approach to (1) efficiently identify relevant shortcuts, and (2) leverage the identified shortcuts to build models that are robust to distribution shifts. Our approach relies on having access to a (possibly) high dimensional set of auxiliary labels at training time, some of which correspond to possible shortcuts. We show both theoretically and empirically that our approach is able to identify a sufficient set of shortcuts leading to more efficient predictors in finite samples.

multi-shortcut identification and removal, name change, shortcut, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.78)
Information Technology > Modeling & Simulation (0.61)

Neural Information Processing SystemsDec-24-2025, 04:01:33 GMT

Improving Multi-Task Generalization via Regularizing Spurious Correlation

Multi-Task Learning (MTL) is a powerful learning paradigm to improve generalization performance via knowledge sharing. However, existing studies find that MTL could sometimes hurt generalization, especially when two tasks are less correlated. One possible reason that hurts generalization is spurious correlation, i.e., some knowledge is spurious and not causally related to task labels, but the model could mistakenly utilize them and thus fail when such correlation changes. In MTL setup, there exist several unique challenges of spurious correlation. First, the risk of having non-causal knowledge is higher, as the shared MTL model needs to encode all knowledge from different tasks, and causal knowledge for one task could be potentially spurious to the other.

knowledge, multi-task generalization, regularizing spurious correlation, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-23-2025, 18:52:58 GMT

Identification of Nonlinear Latent Hierarchical Models

Identifying latent variables and causal structures from observational data is essential to many real-world applications involving biological data, medical data, and unstructured data such as images and languages. However, this task can be highly challenging, especially when observed variables are generated by causally related latent variables and the relationships are nonlinear. In this work, we investigate the identification problem for nonlinear latent hierarchical causal models in which observed variables are generated by a set of causally related latent variables, and some latent variables may not have observed children. We show that the identifiability of causal structures and latent variables (up to invertible transformations) can be achieved under mild assumptions: on causal structures, we allow for multiple paths between any pair of variables in the graph, which relaxes latent tree assumptions in prior work; on structural functions, we permit general nonlinearity and multi-dimensional continuous variables, alleviating existing work's parametric assumptions. Specifically, we first develop an identification criterion in the form of novel identifiability guarantees for an elementary latent variable model. Leveraging this criterion, we show that both causal structures and latent variables of the hierarchical model can be identified asymptotically by explicitly constructing an estimation procedure. To the best of our knowledge, our work is the first to establish identifiability guarantees for both causal structures and latent variables in nonlinear latent hierarchical models.

causal structure, causal structure and latent variable, latent variable, (8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

arXiv.org Artificial IntelligenceDec-3-2025

From 'What-is' to 'What-if' in Human-Factor Analysis: A Post-Occupancy Evaluation Case

Chen, Xia, Sun, Ruiji, Geyer, Philipp, Borrmann, André, Schiavon, Stefano

Human-factor analysis typically employs correlation analysis and significance testing to identify relationships between variables. However, these descriptive ('what-is') methods, while effective for identifying associations, are often insufficient for answering causal ('what-if') questions. Their application in such contexts often overlooks confounding and colliding variables, potentially leading to bias and suboptimal or incorrect decisions. We advocate for explicitly distinguishing descriptive from interventional questions in human-factor analysis, and applying causal inference frameworks specifically to these problems to prevent methodological mismatches. This approach disentangles complex variable relationships and enables counterfactual reasoning. Using post-occupancy evaluation (POE) data from the Center for the Built Environment's (CBE) Occupant Survey as a demonstration case, we show how causal discovery reveals intervention hierarchies and directional relationships that traditional associational analysis misses. The systematic distinction between causally associated and independent variables, combined with intervention prioritization capabilities, offers broad applicability to complex human-centric systems, for example, in building science or ergonomics, where understanding intervention effects is critical for optimization and decision-making.

artificial intelligence, data mining, machine learning, (16 more...)

2512.0206

Country:

Europe > Germany (0.46)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Consumer Health (1.00)
Construction & Engineering (1.00)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Artificial IntelligenceNov-13-2025

STOAT: Spatial-Temporal Probabilistic Causal Inference Network

Yang, Yang, Yin, Du, Xue, Hao, Salim, Flora

Spatial-temporal causal time series (STC-TS) involve region-specific temporal observations driven by causally relevant covariates and interconnected across geographic or network-based spaces. Existing methods often model spatial and temporal dynamics independently and overlook causality-driven probabilistic forecasting, limiting their predictive power. To address this, we propose STOAT (Spatial-Temporal Probabilistic Causal Inference Network), a novel framework for probabilistic forecasting in STC-TS. The proposed method extends a causal inference approach by incorporating a spatial relation matrix that encodes interregional dependencies (e.g. proximity or connectivity), enabling spatially informed causal effect estimation. The resulting latent series are processed by deep probabilistic models to estimate the parameters of the distributions, enabling calibrated uncertainty modeling. We further explore multiple output distributions (e.g., Gaussian, Student's-$t$, Laplace) to capture region-specific variability. Experiments on COVID-19 data across six countries demonstrate that STOAT outperforms state-of-the-art probabilistic forecasting models (DeepAR, DeepVAR, Deep State Space Model, etc.) in key metrics, particularly in regions with strong spatial dependencies. By bridging causal inference and geospatial probabilistic forecasting, STOAT offers a generalizable framework for complex spatial-temporal tasks, such as epidemic management.

artificial intelligence, forecasting, machine learning, (17 more...)

doi: 10.1145/3748636.3762761

2506.09544

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.16)
Oceania > Australia > New South Wales (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Merrill, William, Sabharwal, Ashish

Exact Expressive Power of Transformers with Padding

arXiv.org Artificial IntelligenceNov-7-2025

Chain of thought is a natural inference-time method for increasing the computational power of transformer-based large language models (LLMs), but comes at the cost of sequential decoding. Are there more efficient alternatives to expand a transformer's expressive power without adding parameters? We consider transformers with padding tokens as a form of parallelizable test-time compute. We show that averaging-hard-attention, masked-pre-norm transformers with polynomial padding recognize precisely the class $\mathsf{FO}$-uniform $\mathsf{TC}^0$ of extremely parallelizable problems. While the $\mathsf{TC}^0$ upper bound was known, proving a matching lower bound had been elusive. Further, our novel analysis reveals the precise expanded power of padded transformers when coupled with another form of inference-time compute, namely dynamically increasing depth via looping. Our core technical contribution is to show how padding helps bring the notions of complete problems and reductions, which have been a cornerstone of classical complexity theory, to the formal study of transformers. Armed with this new tool, we prove that padded transformers with $O(\log^d n)$ looping on inputs of length $n$ recognize exactly the class $\mathsf{FO}$-uniform $\mathsf{TC}^d$ of moderately parallelizable problems. Thus, padding and looping together systematically expand transformers' expressive power: with polylogarithmic looping, polynomially padded transformers recognize precisely the class $\mathsf{FO}$-uniform $\mathsf{NC}$, the best that could be expected without losing parallelism (unless $\mathsf{NC} = \mathsf{P}$). Our results thus motivate further exploration of padding and looping as parallelizable alternatives to chain of thought for test-time compute.

large language model, machine learning, natural language, (19 more...)

2505.18948

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Artificial IntelligenceSep-3-2025

Systemic Constraints of Undecidability

Bulin, Seth

This paper presents a theory of systemic undecidability, reframing incomputability as a structural property of systems rather than a localized feature of specific functions or problems. We define a notion of causal embedding and prove a closure principle: any subsystem that participates functionally in the computation of an undecidable system inherits its undecidability. This result positions undecidability as a pervasive constraint on prediction, modeling, and epistemic access in both natural and artificial systems. Our framework disarms oracle mimicry and challenges the view that computational limits can be circumvented through architectural innovation. By generalizing classical results into a dynamic systems context, this work augments the logical trajectory of Gödel, Turing, and Chaitin, offering a new perspective of the topology of computability and its interrelation to the boundaries of scientific knowledge.

artificial intelligence, modeling & simulation, undecidability, (16 more...)

2507.01036

Genre: Research Report (0.50)

Technology:

Information Technology > Modeling & Simulation (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)