affine
87213955efbe48b46586e37bf2f1fe5b-Supplemental-Conference.pdf
However, crucially, we do not make two assumptions used to derive the AVAE objective. Firstly, we do not assume that the decoder is a one-to-one mapping between latent samples and a corresponding generated sample. Interestingly,theDCIresponsibility matrices dooftenresemble theconditioned response matrices, suggesting that relying on correlations instead of a full causal analysis, can yield similar results. Infact, then the learned causal structure estimated using the latent response matrix may be used in tandem to develop a structure-aware disentanglementmetric. Given the complexity of the underlying data manifold, aviable alternativeisbased on riemanian geometry [78]which has previously been investigated for alternativeprobabilistic models likeGaussian Process regression [79].
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- North America > Canada (0.04)
- (2 more...)
Sharp Structure-Agnostic Lower Bounds for General Functional Estimation
Jin, Jikai, Syrgkanis, Vasilis
The design of efficient nonparametric estimators has long been a central problem in statistics, machine learning, and decision making. Classical optimal procedures often rely on strong structural assumptions, which can be misspecified in practice and complicate deployment. This limitation has sparked growing interest in structure-agnostic approaches -- methods that debias black-box nuisance estimates without imposing structural priors. Understanding the fundamental limits of these methods is therefore crucial. This paper provides a systematic investigation of the optimal error rates achievable by structure-agnostic estimators. We first show that, for estimating the average treatment effect (ATE), a central parameter in causal inference, doubly robust learning attains optimal structure-agnostic error rates. We then extend our analysis to a general class of functionals that depend on unknown nuisance functions and establish the structure-agnostic optimality of debiased/double machine learning (DML). We distinguish two regimes -- one where double robustness is attainable and one where it is not -- leading to different optimal rates for first-order debiasing, and show that DML is optimal in both regimes. Finally, we instantiate our general lower bounds by deriving explicit optimal rates that recover existing results and extend to additional estimands of interest. Our results provide theoretical validation for widely used first-order debiasing methods and guidance for practitioners seeking optimal approaches in the absence of structural assumptions. This paper generalizes and subsumes the ATE lower bound established in \citet{jin2024structure} by the same authors.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Model-Based Learning of Whittle indices
Charles-Rebuffé, Joël, Gast, Nicolas, Gaujal, Bruno
We present BLINQ, a new model-based algorithm that learns the Whittle indices of an indexable, communicating and unichain Markov Decision Process (MDP). Our approach relies on building an empirical estimate of the MDP and then computing its Whittle indices using an extended version of a state-of-the-art existing algorithm. We provide a proof of convergence to the Whittle indices we want to learn as well as a bound on the time needed to learn them with arbitrary precision. Moreover, we investigate its computational complexity. Our numerical experiments suggest that BLINQ significantly outperforms existing Q-learning approaches in terms of the number of samples needed to get an accurate approximation. In addition, it has a total computational cost even lower than Q-learning for any reasonably high number of samples. These observations persist even when the Q-learning algorithms are speeded up using pre-trained neural networks to predict Q-values.
Learning to Compress Graphs via Dual Agents for Consistent Topological Robustness Evaluation
Chai, Qisen, Wang, Yansong, Huang, Junjie, Jia, Tao
As graph-structured data grow increasingly large, evaluating their robustness under adversarial attacks becomes computationally expensive and difficult to scale. To address this challenge, we propose to compress graphs into compact representations that preserve both topological structure and robustness profile, enabling efficient and reliable evaluation. We propose Cutter, a dual-agent reinforcement learning framework composed of a Vital Detection Agent (VDA) and a Redundancy Detection Agent (RDA), which collaboratively identify structurally vital and redundant nodes for guided compression. Cutter incorporates three key strategies to enhance learning efficiency and compression quality: trajectory-level reward shaping to transform sparse trajectory returns into dense, policy-equivalent learning signals; prototype-based shaping to guide decisions using behavioral patterns from both high- and low-return trajectories; and cross-agent imitation to enable safer and more transferable exploration. Experiments on multiple real-world graphs demonstrate that Cutter generates compressed graphs that retain essential static topological properties and exhibit robustness degradation trends highly consistent with the original graphs under various attack scenarios, thereby significantly improving evaluation efficiency without compromising assessment fidelity.
- Information Technology > Security & Privacy (0.69)
- Government > Military (0.69)
Just-In-Time Piecewise-Linear Semantics for ReLU-type Networks
Duan, Hongyi, Liu, Haoyang, Zhang, Jian'an, Liu, Fengrui, Wang, Yiyi
We present a JIT PL semantics for ReLU-type networks that compiles models into a guarded CPWL transducer with shared guards. The system adds hyperplanes only when operands are affine on the current cell, maintains global lower/upper envelopes, and uses a budgeted branch-and-bound. We obtain anytime soundness, exactness on fully refined cells, monotone progress, guard-linear complexity (avoiding global $\binom{k}{2}$), dominance pruning, and decidability under finite refinement. The shared carrier supports region extraction, decision complexes, Jacobians, exact/certified Lipschitz, LP/SOCP robustness, and maximal causal influence. A minimal prototype returns certificates or counterexamples with cost proportional to visited subdomains.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)