Goto

Collaborating Authors

 time 2


Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions

Neural Information Processing Systems

We consider a setting where there are $N$ heterogeneous units and $p$ interventions. Our goal is to learn unit-specific potential outcomes for any combination of these $p$ interventions, i.e., $N \times 2^p$ causal parameters. Choosing a combination of interventions is a problem that naturally arises in a variety of applications such as factorial design experiments and recommendation engines (e.g., showing a set of movies that maximizes engagement for a given user). Running $N \times 2^p$ experiments to estimate the various parameters is likely expensive and/or infeasible as $N$ and $p$ grow. Further, with observational data there is likely confounding, i.e., whether or not a unit is seen under a combination is correlated with its potential outcome under that combination. We study this problem under a novel model that imposes latent structure across both units and combinations of interventions.


Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

Foerster, Hanna, Shumailov, Ilia, Zhao, Yiren, Chaudhari, Harsh, Hayes, Jamie, Mullins, Robert, Gal, Yarin

arXiv.org Artificial Intelligence

Early research into data poisoning attacks against Large Language Models (LLMs) demonstrated the ease with which backdoors could be injected. More recent LLMs add step-by-step reasoning, expanding the attack surface to include the intermediate chain-of-thought (CoT) and its inherent trait of decomposing problems into subproblems. Using these vectors for more stealthy poisoning, we introduce ``decomposed reasoning poison'', in which the attacker modifies only the reasoning path, leaving prompts and final answers clean, and splits the trigger across multiple, individually harmless components. Fascinatingly, while it remains possible to inject these decomposed poisons, reliably activating them to change final answers (rather than just the CoT) is surprisingly difficult. This difficulty arises because the models can often recover from backdoors that are activated within their thought processes. Ultimately, it appears that an emergent form of backdoor robustness is originating from the reasoning capabilities of these advanced LLMs, as well as from the architectural separation between reasoning and final answer generation.


Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions

Neural Information Processing Systems

We consider a setting where there are N heterogeneous units and p interventions. Our goal is to learn unit-specific potential outcomes for any combination of these p interventions, i.e., N \times 2 p causal parameters. Choosing a combination of interventions is a problem that naturally arises in a variety of applications such as factorial design experiments and recommendation engines (e.g., showing a set of movies that maximizes engagement for a given user). Running N \times 2 p experiments to estimate the various parameters is likely expensive and/or infeasible as N and p grow. Further, with observational data there is likely confounding, i.e., whether or not a unit is seen under a combination is correlated with its potential outcome under that combination. We study this problem under a novel model that imposes latent structure across both units and combinations of interventions.


Deep Learning Algorithms for Mean Field Optimal Stopping in Finite Space and Discrete Time

Magnino, Lorenzo, Zhu, Yuchen, Laurière, Mathieu

arXiv.org Artificial Intelligence

Optimal stopping is a fundamental problem in optimization that has found applications in risk management, finance, economics, and recently in the fields of computer science. We extend the standard framework to a multi-agent setting, named multi-agent optimal stopping (MAOS), where a group of agents cooperatively solves finite-space, discrete-time optimal stopping problems. Solving the finite-agent case is computationally prohibitive when the number of agents is very large, so this work studies the mean field optimal stopping (MFOS) problem, obtained as the number of agents approaches infinity. We prove that MFOS provides a good approximate solution to MAOS. We also prove a dynamic programming principle (DPP), based on the theory of mean field control. We then propose two deep learning methods: one simulates full trajectories to learn optimal decisions, whereas the other leverages DPP with backward induction; both methods train neural networks for the optimal stopping decisions. We demonstrate the effectiveness of these approaches through numerical experiments on 6 different problems in spatial dimension up to 300. To the best of our knowledge, this is the first work to study MFOS in finite space and discrete time, and to propose efficient and scalable computational methods for this type of problem.


Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs

Serrano, Louis, Koupaï, Armand Kassaï, Wang, Thomas X, Erbacher, Pierre, Gallinari, Patrick

arXiv.org Artificial Intelligence

Solving time-dependent parametric partial differential equations (PDEs) is challenging, as models must adapt to variations in parameters such as coefficients, forcing terms, and boundary conditions. Data-driven neural solvers either train on data sampled from the PDE parameters distribution in the hope that the model generalizes to new instances or rely on gradient-based adaptation and meta-learning to implicitly encode the dynamics from observations. This often comes with increased inference complexity. Inspired by the in-context learning capabilities of large language models (LLMs), we introduce Zebra, a novel generative auto-regressive transformer designed to solve parametric PDEs without requiring gradient adaptation at inference. By leveraging in-context information during both pre-training and inference, Zebra dynamically adapts to new tasks by conditioning on input sequences that incorporate context trajectories or preceding states. This approach enables Zebra to flexibly handle arbitrarily sized context inputs and supports uncertainty quantification through the sampling of multiple solution trajectories. We evaluate Zebra across a variety of challenging PDE scenarios, demonstrating its adaptability, robustness, and superior performance compared to existing approaches.


L4GM: Large 4D Gaussian Reconstruction Model

Ren, Jiawei, Xie, Kevin, Mirzaei, Ashkan, Liang, Hanxue, Zeng, Xiaohui, Kreis, Karsten, Liu, Ziwei, Torralba, Antonio, Fidler, Sanja, Kim, Seung Wook, Ling, Huan

arXiv.org Artificial Intelligence

We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input - in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in 12M videos with a total of 300M frames. We keep our L4GM simple for scalability and build directly on top of LGM [49], a pretrained 3D Large Reconstruction Model that outputs 3D Gaussian ellipsoids from multiview image input. L4GM outputs a per-frame 3D Gaussian Splatting representation from video frames sampled at a low fps and then upsamples the representation to a higher fps to achieve temporal smoothness. We add temporal self-attention layers to the base LGM to help it learn consistency across time, and utilize a per-timestep multiview rendering loss to train the model. The representation is upsampled to a higher framerate by training an interpolation model which produces intermediate 3D Gaussian representations. We showcase that L4GM that is only trained on synthetic data generalizes extremely well on in-the-wild videos, producing high quality animated 3D assets.


Calibration of Quantum Decision Theory: Aversion to Large Losses and Predictability of Probabilistic Choices

Kovalenko, T., Vincent, S., Yukalov, V. I., Sornette, D.

arXiv.org Artificial Intelligence

We present the first calibration of quantum decision theory (QDT) to a dataset of binary risky choice. We quantitatively account for the fraction of choice reversals between two repetitions of the experiment, using a probabilistic choice formulation in the simplest form without model assumption or adjustable parameters. The prediction of choice reversal is then refined by introducing heterogeneity between decision makers through their differentiation into two groups: ``majoritarian'' and ``contrarian'' (in proportion 3:1). This supports the first fundamental tenet of QDT, which models choice as an inherent probabilistic process, where the probability of a prospect can be expressed as the sum of its utility and attraction factors. We propose to parameterise the utility factor with a stochastic version of cumulative prospect theory (logit-CPT), and the attraction factor with a constant absolute risk aversion (CARA) function. For this dataset, and penalising the larger number of QDT parameters via the Wilks test of nested hypotheses, the QDT model is found to perform significantly better than logit-CPT at both the aggregate and individual levels, and for all considered fit criteria for the first experiment iteration and for predictions (second ``out-of-sample'' iteration). The distinctive QDT effect captured by the attraction factor is mostly appreciable (i.e., most relevant and strongest in amplitude) for prospects with big losses. Our quantitative analysis of the experimental results supports the existence of an intrinsic limit of predictability, which is associated with the inherent probabilistic nature of choice. The results of the paper can find applications both in the prediction of choice of human decision makers as well as for organizing the operation of artificial intelligence.


Inverse Models for Estimating the Initial Condition of Spatio-Temporal Advection-Diffusion Processes

Liu, Xiao, Yeo, Kyongmin

arXiv.org Machine Learning

Inverse problems involve making inference about unknown parameters of a physical process using observational data, and are widely found in scientific and engineering applications. For example, in urban air quality and environmental monitoring, inverse problems aim at quickly pinpointing the sources of instantaneous emissions of gaseous pollutants that cause public health concerns (Eckhardt et al., 2008; Martinez-Camara et al., 2014; Hwang et al., 2019), or detecting fugitive emissions due to accidental releases from industrial operations (Hosseini and Stockie, 2016; Klein et al., 2016). In healthcare applications, inverse models have been employed to obtain heart-surface potentials from body-surface measurements, known as the inverse ECG problem (Yao and Yang, 2021). In Seismology, inverse problems aim at getting information about the structure of the forces acting in the earthquake's focus from seismic waves at Earth's surface (Apostol, 2019). Inverse modeling has also found its applications in detecting the impact location of the missing Malaysian Airlines MH370, using the drift of marine debris (Miron et al., 2019) or acoustic-gravity waves (Kadri, 2019). This paper investigates an important class of statistical inverse problems--the estimation of the initial condition of a spatio-temporal advection-diffusion process using spatially sparse data streams. Consider the detection of accidental releases of fugitive emissions from industrial operations (Hosseini and Stockie, 2016).


Characterizing Structural Hardness of Logic Programs: What makes Cycles and Reachability Hard for Treewidth?

Hecher, Markus

arXiv.org Artificial Intelligence

Answer Set Programming (ASP) is a problem modeling and solving framework for several problems in KR with growing industrial applications. Also for studies of computational complexity and deeper insights into the hardness and its sources, ASP has been attracting researchers for many years. These studies resulted in fruitful characterizations in terms of complexity classes, fine-grained insights in form of dichotomy-style results, as well as detailed parameterized complexity landscapes. Recently, this lead to a novel result establishing that for the measure treewidth, which captures structural density of a program, the evaluation of the well-known class of normal programs is expected to be slightly harder than deciding satisfiability (SAT). However, it is unclear how to utilize this structural power of ASP. This paper deals with a novel reduction from SAT to normal ASP that goes beyond well-known encodings: We explicitly utilize the structural power of ASP, whereby we sublinearly decrease the treewidth, which probably cannot be significantly improved. Then, compared to existing results, this characterizes hardness in a fine-grained way by establishing the required functional dependency of the dependency graph's cycle length (SCC size) on the treewidth.


Economics of NFTs: The Value of Creator Royalties

Falk, Brett Hemenway, Tsoukalas, Gerry, Zhang, Niuniu

arXiv.org Artificial Intelligence

Non-Fungible Tokens (NFTs) promise to revolutionize how content creators (e.g., artists) price and sell their work. One core feature of NFTs is the option to embed creator royalties which earmark a percentage of future sale proceeds to creators, each time their NFTs change hands. As popular as this feature is in practice, its utility is often questioned because buyers, the argument goes, simply ``price it in at the time of purchase''. As intuitive as this argument sounds, it is incomplete. We find royalties can add value to creators in at least three distinct ways. (i) Risk sharing: when creators and buyers are risk sensitive, royalties can improve trade by splitting the risks associated with future price volatility; (ii) Dynamic pricing: in the presence of information asymmetry, royalties can extract more revenues from better-informed speculators over time, mimicking the benefits of ``dynamic pricing''; (iii) Price discrimination: when creators sell multi-unit NFT collections, royalties can better capture value from heterogeneous buyers. Our results suggest creator royalties play an important and sometimes overlooked role in the economics of NFTs.