Goto

Collaborating Authors

 theorem2


3. Sample is upscaled by User with probability: xk er(xk)

Neural Information Processing Systems

The rapid progress in generative models has resulted in impressive leaps in generation quality, blurring the lines between synthetic and real data. Web-scale datasets are now prone to the inevitable contamination by synthetic data, directly impacting the training of future generated models. Already, some theoretical results on self-consuming generative models (a.k.a., iterative retraining) have emerged in the literature, showcasing that either model collapse or stability could be possible depending on the fraction of generated data used at each retraining step. However, in practice, synthetic data is often subject to human feedback and curated by users before being used and uploaded online. For instance, many interfaces of popular text-to-image generative models, such as Stable Diffusion or Midjourney, produce several variations of an image for a given query which can eventually be curated by the users. In this paper, we theoretically study the impact of data curation on iterated retraining of generative models and show that it can be seen as an implicit preference optimization mechanism.





Making Non-StochasticControl(Almost)asEasyas Stochastic

Neural Information Processing Systems

We attain the optimal eO( T) regret when the dynamics are unknown to the learner, and poly(logT) regret when known, provided that the cost functions are strongly convex (as in LQR).



LinearandKernelClassificationintheStreaming Model: ImprovedBoundsforHeavyHitters

Neural Information Processing Systems

We consider logistic regression, and more generally, linear classification, in the streaming model. In our setting, we are given a dataset consisting ofT examples (xt,yt), where t [T], xt Rd, yt { 1,1}. The examples arrive one by one, and moreover, the nonzero coordinates of each examplext arrive one by one.


BeyondSmoothness: IncorporatingLow-Rank AnalysisintoNonparametricDensityEstimation

Neural Information Processing Systems

Ouranalysis culminates inshowing thatthere exists a universally consistent histogram-style estimator that converges to any multi-view model with a finite number of Lipschitz continuous components at a rate of eO(1/3 n) in L1 error.