Goto

Collaborating Authors

 Europe


Supplement to " Estimating Riemannian Metric with Noise-Contaminated Intrinsic Distance "

Neural Information Processing Systems

Unlike distance metric learning where the subsequent tasks utilizing the estimated distance metric is the usual focus, the proposal focuses on the estimated metric characterizing the geometry structure. Despite the illustrated taxi and MNIST examples, it is still open to finding more compelling applications that target the data space geometry. Interpreting mathematical concepts such as Riemannian metric and geodesic in the context of potential application (e.g., cognition and perception research where similarity measures are common) could be inspiring. Our proposal requires sufficiently dense data, which could be demanding, especially for high-dimensional data due to the curse of dimensionality. Dimensional reduction (e.g., manifold embedding as in the MNIST example) can substantially alleviate the curse of dimensionality, and the dense data requirement will more likely hold true.



Conditional Outcome Equivalence: A Quantile Alternative to CATE

Neural Information Processing Systems

The conditional quantile treatment effect (CQTE) can provide insight into the effect of a treatment beyond the conditional average treatment effect (CA TE). This ability to provide information over multiple quantiles of the response makes the CQTE especially valuable in cases where the effect of a treatment is not well-modelled by a location shift, even conditionally on the covariates. Nevertheless, the estimation of the CQTE is challenging and often depends upon the smoothness of the individual quantiles as a function of the covariates rather than smoothness of the CQTE itself. This is in stark contrast to the CA TE where it is possible to obtain high-quality estimates which have less dependency upon the smoothness of the nuisance parameters when the CA TE itself is smooth. Moreover, relative smoothness of the CQTE lacks the interpretability of smoothness of the CA TE making it less clear whether it is a reasonable assumption to make.




DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

Neural Information Processing Systems

Recent studies show that even advanced attacks cannot break such defenses effectively, since the purification process induces an extremely deep computational graph which poses the potential problem of vanishing/exploding gradient, high memory cost, and unbounded randomness.



Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences Damien Ferbach 1, 2, Quentin Bertrand 1, A vishek Joey Bose

Neural Information Processing Systems

The rapid progress in generative models has resulted in impressive leaps in generation quality, blurring the lines between synthetic and real data. Web-scale datasets are now prone to the inevitable contamination by synthetic data, directly impacting the training of future generated models. Already, some theoretical results on self-consuming generative models (a.k.a., iterative retraining) have emerged in the literature, showcasing that either model collapse or stability could be possible depending on the fraction of generated data used at each retraining step. However, in practice, synthetic data is often subject to human feedback and curated by users before being used and uploaded online. For instance, many interfaces of popular text-to-image generative models, such as Stable Diffusion or Midjourney, produce several variations of an image for a given query which can eventually be curated by the users. In this paper, we theoretically study the impact of data curation on iterated retraining of generative models and show that it can be seen as an implicit preference optimization mechanism .