Not enough data to create a plot.
Try a different view from the menu above.
9ecff5455677b38d19f49ce658ef0608-AuthorFeedback.pdf
We thank the reviewers for their positive and constructive feedback. We address several points in the review below. The bias reduction technique in Section 5 is designed for DP-SGD with clipping. Section 5, the results are similar to thoise in the figure in Section 5 (which used ฯ = 0 for all algorithms). Typos: Thank you for pointing them out, we will correct the typos.
Enriching Disentanglement: From Logical Definitions to Quantitative Metrics
Disentangling the explanatory factors in complex data is a promising approach for generalizable and data-efficient representation learning. While a variety of quantitative metrics for learning and evaluating disentangled representations have been proposed, it remains unclear what properties these metrics truly quantify. In this work, we establish algebraic relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics. Concretely, we introduce a compositional approach for converting a higher-order predicate into a real-valued quantity by replacing (i) equality with a strict premetric, (ii) the Heyting algebra of binary truth values with a quantale of continuous values, and (iii) quantifiers with aggregators. The metrics induced by logical definitions have strong theoretical guarantees, and some of them are easily differentiable and can be used as learning objectives directly. Finally, we empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
Efficient LLM Pretraining and Inference with Unlimited Context Length
The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.
GEPS: Boosting Generalization in Parametric PDE Neural Solvers through Adaptive Conditioning
Solving parametric partial differential equations (PDEs) presents significant challenges for data-driven methods due to the sensitivity of spatio-temporal dynamics to variations in PDE parameters. Machine learning approaches often struggle to capture this variability. To address this, data-driven approaches learn parametric PDEs by sampling a very large variety of trajectories with varying PDE parameters. We first show that incorporating conditioning mechanisms for learning parametric PDEs is essential and that among them, adaptive conditioning, allows stronger generalization. As existing adaptive conditioning methods do not scale well with respect to the number of parameters to adapt in the neural solver, we propose GEPS, a simple adaptation mechanism to boost GEneralization in Pde Solvers via a first-order optimization and low-rank rapid adaptation of a small set of context parameters. We demonstrate the versatility of our approach for both fully datadriven and for physics-aware neural solvers. Validation performed on a whole range of spatio-temporal forecasting problems demonstrates excellent performance for generalizing to unseen conditions including initial conditions, PDE coefficients, forcing terms and solution domain.
Robust Disentanglement of a Few Factors at a Time using rPU-VAE
Disentanglement is at the forefront of unsupervised learning, as disentangled representations of data improve generalization, interpretability, and performance in downstream tasks. Current unsupervised approaches remain inapplicable for real-world datasets since they are highly variable in their performance and fail to reach levels of disentanglement of (semi-)supervised approaches. We introduce population-based training (PBT) for improving consistency in training variational autoencoders (VAEs) and demonstrate the validity of this approach in a supervised setting (PBT-VAE). We then use Unsupervised Disentanglement Ranking (UDR) as an unsupervised heuristic to score models in our PBT-VAE training and show how models trained this way tend to consistently disentangle only a subset of the generative factors. Building on top of this observation we introduce the recursive rPU-VAE approach. We train the model until convergence, remove the learned factors from the dataset and reiterate. In doing so, we can label subsets of the dataset with the learned factors and consecutively use these labels to train one model that fully disentangles the whole dataset. With this approach, we show striking improvement in state-of-the-art unsupervised disentanglement performance and robustness across multiple datasets and metrics.
Sample Complexity of Uniform Convergence for Multicalibration
There is a growing interest in societal concerns in machine learning systems, especially in fairness. Multicalibration gives a comprehensive methodology to address group fairness. In this work, we address the multicalibration error and decouple it from the prediction error. The importance of decoupling the fairness metric (multicalibration) and the accuracy (prediction error) is due to the inherent tradeoff between the two, and the societal decision regarding the "right tradeoff" (as imposed many times by regulators). Our work gives sample complexity bounds for uniform convergence guarantees of multicalibration error, which implies that regardless of the accuracy, we can guarantee that the empirical and (true) multicalibration errors are close. We emphasize that our results: (1) are more general than previous bounds, as they apply to both agnostic and realizable settings, and do not rely on a specific type of algorithm (such as differentially private), (2) improve over previous multicalibration sample complexity bounds and (3) implies uniform convergence guarantees for the classical calibration error.
99f6a934a7cf277f2eaece8e3ce619b2-AuthorFeedback.pdf
We would like to thank all reviewers for their time and consideration in reviewing our paper. R1: "This work is perhaps the most effective in achieving [training "This paper will spark discussion... and the discussion it sparks will have value". R2: "This work will no doubt be of substantial interest to the image generation community". "It is impressive that a very simple preprocessing strategy can result in substantial improvements "Very handy and simple, which is a virtue". Score), while P, R, C and D stand for Precision, Recall, Density and Coverage metrics.
99ba5c4097c6b8fef5ed774a1a6714b8-AuthorFeedback.pdf
We thank the reviewers for the positive assessment of our work, useful comments, and proposed improvements. It would be helpful to also provide the derivation for Equation 3. The derivation is straightforward and will be included in the Appendix of the revised version. It would be nice to have some additional discussions in general. We will add a discussion to this effect to the Summary. We will include this missing definition in the revised version.