causal
A Graphical Terminology An arbitrary graph
We refer the readers to ( Peters et al., 2017) for more detailed graphical terminology. We base our proof mostly on ( Kirsch, 2019). The first statement follows directly from the first theorem in ( Haviland, 1936). Without loss of generality, we reorder the variables according to reversed topological ordering, i.e. a Follows directly from Lemma 1. Lemma 4. Recall condition 2) in Causal de Finetti states that 8 i, 8 n 2 N: X The first equality holds by well-defindedness. The fourth equality follow from well-definedness.
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Government (0.92)
- Health & Medicine > Therapeutic Area (0.67)
- North America > United States > Maryland (0.04)
- Antarctica (0.04)
- Europe > Monaco (0.04)
- (3 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- Leisure & Entertainment (0.93)
- Media > Film (0.47)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Supplementary Information: Acausalviewofcompositionalzero-shotrecognition
Next, we introduce two additional approximations we use to apply Eq. (S.9). An SCM matches a set of assignments to a causal graph. This implies that the error of the approximation Eq. (S.13) is mainly dominated by the gradients of g at hao, and the variance ofnao. Specifically, we use a positive differentiable measure of the statistical dependence, denoted by I. PIDA measures disentanglement of representations for models that are trained from unsupervised data. As a result, we have the following: Minimizing Eq. (S.21) leads topdo(a,o)(ˆφa0) approaching p(ˆφa0|a), which as we have just shown, leads top(ˆφa0|a) approaching pdo(a)(ˆφa0).
Causal normalizing flows: from theory to practice
Specifically, we first leverage recent results on non-linear ICA to show that causal models are identifiable from observational data given a causal ordering, and thus can be recovered using autoregressive normalizing flows (NFs). Second, we analyze different design and learning choices for to capture the underlying causal data-generating process. Third, we describe how to implement the in causal NFs, and thus, how to answer interventional and counterfactual questions. Finally, in our experiments, we validate our design and training choices through a comprehensive ablation study; compare causal NFs to other approaches for approximating causal models; and empirically demonstrate that causal NFs can be used to address real-world problems--where the presence of mixed discrete-continuous data and partial knowledge on the causal graph is the norm.
Hierarchical and Density-based Causal Clustering
Understanding treatment effect heterogeneity is vital for scientific and policy research. However, identifying and evaluating heterogeneous treatment effects pose significant challenges due to the typically unknown subgroup structure. Recently, a novel approach, causal k-means clustering, has emerged to assess heterogeneity of treatment effect by applying the k-means algorithm to unknown counterfactual regression functions. In this paper, we expand upon this framework by integrating hierarchical and density-based clustering algorithms. We propose plug-in estimators which are simple and readily implementable using off-the-shelf algorithms.
Typing Reinvented: Towards Hands-Free Input via sEMG
Lee, Kunwoo, Sreedhar, Dhivya, Saraf, Pushkar, Lee, Chaeeun, Shapovalenko, Kateryna
We explore surface electromyography (sEMG) as a non-invasive input modality for mapping muscle activity to keyboard inputs, targeting immersive typing in next-generation human-computer interaction (HCI). This is especially relevant for spatial computing and virtual reality (VR), where traditional keyboards are impractical. Using attention-based architectures, we significantly outperform the existing convolutional baselines, reducing online generic CER from 24.98% -> 20.34% and offline personalized CER from 10.86% -> 10.10%, while remaining fully causal. We further incorporate a lightweight decoding pipeline with language-model-based correction, demonstrating the feasibility of accurate, real-time muscle-driven text input for future wearable and spatial interfaces.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
- Europe > Bulgaria > Sofia City Province > Sofia (0.04)
On Transportability for Structural Causal Bandits
Intelligent agents equipped with causal knowledge can optimize their action spaces to avoid unnecessary exploration. The structural causal bandit framework provides a graphical characterization for identifying actions that are unable to maximize rewards by leveraging prior knowledge of the underlying causal structure. While such knowledge enables an agent to estimate the expected rewards of certain actions based on others in online interactions, there has been little guidance on how to transfer information inferred from arbitrary combinations of datasets collected under different conditions -- observational or experimental -- and from heterogeneous environments. In this paper, we investigate the structural causal bandit with transportability, where priors from the source environments are fused to enhance learning in the deployment setting. We demonstrate that it is possible to exploit invariances across environments to consistently improve learning. The resulting bandit algorithm achieves a sub-linear regret bound with an explicit dependence on informativeness of prior data, and it may outperform standard bandit approaches that rely solely on online learning.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Virginia (0.04)
- (2 more...)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.45)
Higher-order Linear Attention
Zhang, Yifan, Qin, Zhen, Gu, Quanquan
The quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive language models to long contexts. Linear-time attention and State Space Models (SSMs) provide scalable alternatives but are typically restricted to first-order or kernel-based approximations, which can limit expressivity. We introduce Higher-order Linear Attention (HLA), a causal, streaming mechanism that realizes higher interactions via compact prefix sufficient statistics. In the second-order case, HLA maintains a constant-size state and computes per-token outputs in linear time without materializing any $n \times n$ matrices. We give closed-form streaming identities, a strictly causal masked variant using two additional summaries, and a chunk-parallel training scheme based on associative scans that reproduces the activations of a serial recurrence exactly. We further outline extensions to third and higher orders. Collectively, these results position HLA as a principled, scalable building block that combines attention-like, data-dependent mixing with the efficiency of modern recurrent architectures. Project Page: https://github.com/yifanzhang-pro/HLA.