Goto

Collaborating Authors

 Neural Information Processing Systems


Q: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning

Neural Information Processing Systems

Users typically engage with LLMs interactively, yet most existing benchmarks evaluate them in a static, single-turn format, posing reliability concerns in interactive scenarios. We identify a key obstacle towards reliability: LLMs are trained to answer any question, even with incomplete context or insufficient knowledge.


A Generalised Jensen Inequality

Neural Information Processing Systems

In Section 4, we require a version of Jensen's inequality generalised to (possibly) infinite-dimensional vector spaces, because our random variable takes values in H R. Note that this square norm function is indeed convex, since, for any t [0, 1] and any pair f, g H Suppose T is a real Hausdorff locally convex (possibly infinite-dimensional) linear topological space, and let C be a closed convex subset of T. Suppose (ฮฉ, F, P) is a probability space, and V: ฮฉ T a Pettis-integrable random variable such that V (ฮฉ) C. Let f: C [,) be a convex, lower semi-continuous extended-real-valued function such that E We will actually apply generalised Jensen's inequality with conditional expectations, so we need the following theorem. Suppose T is a real Hausdorff locally convex (possibly infinite-dimensional) linear topological space, and let C be a closed convex subset of T. Suppose (ฮฉ, F, P) is a probability space, and V: ฮฉ T a Pettis-integrable random variable such that V (ฮฉ) C. Let f: C [,) be a convex, lower semi-continuous extended-realvalued function such that E Here, (*) and (**) use the properties of conditional expectation of vector-valued random variables given in [12, pp.45-46, Properties 43 and 40 respectively]. The right-hand side is clearly E-measurable, since we have a linear operator on an E-measurable random variable. Now take the supremum of the right-hand side over Q. Then (5) tells us that E [ f(V) | E ] ( f E [ V | E ]), as required.


Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Neural Information Processing Systems

Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities.


Implicit Regularization in Deep Learning May Not Be Explainable by Norms

Neural Information Processing Systems

Mathematically characterizing the implicit regularization induced by gradientbased optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may apply, and a standard test-bed for studying this prospect is matrix factorization (matrix completion via linear neural networks). It is an open question whether norms can explain the implicit regularization in matrix factorization. The current paper resolves this open question in the negative, by proving that there exist natural matrix factorization problems on which the implicit regularization drives all norms (and quasi-norms) towards infinity. Our results suggest that, rather than perceiving the implicit regularization via norms, a potentially more useful interpretation is minimization of rank. We demonstrate empirically that this interpretation extends to a certain class of non-linear neural networks, and hypothesize that it may be key to explaining generalization in deep learning.


Implicit Regularization in Deep Learning May Not Be Explainable by Norms

Neural Information Processing Systems

Mathematically characterizing the implicit regularization induced by gradientbased optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may apply, and a standard test-bed for studying this prospect is matrix factorization (matrix completion via linear neural networks). It is an open question whether norms can explain the implicit regularization in matrix factorization. The current paper resolves this open question in the negative, by proving that there exist natural matrix factorization problems on which the implicit regularization drives all norms (and quasi-norms) towards infinity. Our results suggest that, rather than perceiving the implicit regularization via norms, a potentially more useful interpretation is minimization of rank. We demonstrate empirically that this interpretation extends to a certain class of non-linear neural networks, and hypothesize that it may be key to explaining generalization in deep learning.


f21e255f89e0f258accbe4e984eef486-AuthorFeedback.pdf

Neural Information Processing Systems

We thank reviewers for their time and effort! Miscellaneous () Thank you for the positive feedback! Miscellaneous () Thank you for the feedback and support! By this they refute the prospect of norms being implicitly minimized on every convex objective. To our knowledge, very few have endorsed this far-reaching prospect.


Regret in Online Recommendation Systems

Neural Information Processing Systems

This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of m users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of n items. Importantly, an item cannot be recommended twice to the same user. The probabilities that a user likes each item are unknown. The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities. We investigate various structural assumptions on these probabilities: we derive for each structure regret lower bounds, and devise algorithms achieving these limits. Interestingly, our analysis reveals the relative weights of the different components of regret: the component due to the constraint of not presenting the same item twice to the same user, that due to learning the chances users like items, and finally that arising when learning the underlying structure.


List-Decodable Sparse Mean Estimation

Neural Information Processing Systems

In this paper, we consider that the underlying distribution D is Gaussian with k-sparse mean. Our main contribution is the first polynomial-time algorithm that enjoys sample complexity O poly(k, log d), i.e. poly-logarithmic in the dimension. One of our core algorithmic ingredients is using low-degree sparse polynomials to filter outliers, which may find more applications.


Generative Forests

Neural Information Processing Systems

We focus on generative AI for a type of data that still represent one of the most prevalent form of data: tabular data. Our paper introduces two key contributions: a new powerful class of forest-based models fit for such tasks and a simple training algorithm with strong convergence guarantees in a boosting model that parallels that of the original weak / strong supervised learning setting. This algorithm can be implemented by a few tweaks to the most popular induction scheme for decision tree induction (i.e.


Polynomial time guarantees for the Burer-Monteiro method

Neural Information Processing Systems

The Burer-Monteiro method is one of the most widely used techniques for solving large-scale semidefinite programs (SDP).