A Generalised Jensen Inequality
In Section 4, we require a version of Jensen's inequality generalised to (possibly) infinite-dimensional vector spaces, because our random variable takes values in H R. Note that this square norm function is indeed convex, since, for any t [0, 1] and any pair f, g H Suppose T is a real Hausdorff locally convex (possibly infinite-dimensional) linear topological space, and let C be a closed convex subset of T. Suppose (ฮฉ, F, P) is a probability space, and V: ฮฉ T a Pettis-integrable random variable such that V (ฮฉ) C. Let f: C [,) be a convex, lower semi-continuous extended-real-valued function such that E We will actually apply generalised Jensen's inequality with conditional expectations, so we need the following theorem. Suppose T is a real Hausdorff locally convex (possibly infinite-dimensional) linear topological space, and let C be a closed convex subset of T. Suppose (ฮฉ, F, P) is a probability space, and V: ฮฉ T a Pettis-integrable random variable such that V (ฮฉ) C. Let f: C [,) be a convex, lower semi-continuous extended-realvalued function such that E Here, (*) and (**) use the properties of conditional expectation of vector-valued random variables given in [12, pp.45-46, Properties 43 and 40 respectively]. The right-hand side is clearly E-measurable, since we have a linear operator on an E-measurable random variable. Now take the supremum of the right-hand side over Q. Then (5) tells us that E [ f(V) | E ] ( f E [ V | E ]), as required.
The cast of Mission: Impossible on the importance of humanity during the rise of AI
On May 23, the final installment of the Mission: Impossible saga is set to come to an end with Mission: Impossible โ The Final Reckoning. Known for its vicious villains, the franchise sports its Biggest Bad yet: an AI known as The Entity that's bent on wiping humans off the planet. Mashable Senior Creative Producer Mark Stetson sat down with the cast (Simon Pegg, Angela Bassett, Hayley Atwell, Pom Klementieff, and Greg Tarzan Davis) to discuss the film's themes of humanity and friendship and its exploration of the future of AI. First, Simon Pegg, who has played Benji Dunn since Mission: Impossible III -- when we first get a hint of The Entity's existence -- helped break down the origins of this Big Bad. "Yeah, I mean, the Entity was around in its nascent form a long time ago. It was a malicious code, basically, which itself evolved into what we are up against in Dead Reckoning, in The Final Reckoning. And I love the idea that McQ [Director Christopher McQuarrie] looked back into the past to see where things may have started, where the rumblings of the Entity may have begun. And further back as well, to, obviously, when Bill Donloe was exiled to Alaska."
Implicit Regularization in Deep Learning May Not Be Explainable by Norms
Mathematically characterizing the implicit regularization induced by gradientbased optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may apply, and a standard test-bed for studying this prospect is matrix factorization (matrix completion via linear neural networks). It is an open question whether norms can explain the implicit regularization in matrix factorization. The current paper resolves this open question in the negative, by proving that there exist natural matrix factorization problems on which the implicit regularization drives all norms (and quasi-norms) towards infinity. Our results suggest that, rather than perceiving the implicit regularization via norms, a potentially more useful interpretation is minimization of rank. We demonstrate empirically that this interpretation extends to a certain class of non-linear neural networks, and hypothesize that it may be key to explaining generalization in deep learning.
Implicit Regularization in Deep Learning May Not Be Explainable by Norms
Mathematically characterizing the implicit regularization induced by gradientbased optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may apply, and a standard test-bed for studying this prospect is matrix factorization (matrix completion via linear neural networks). It is an open question whether norms can explain the implicit regularization in matrix factorization. The current paper resolves this open question in the negative, by proving that there exist natural matrix factorization problems on which the implicit regularization drives all norms (and quasi-norms) towards infinity. Our results suggest that, rather than perceiving the implicit regularization via norms, a potentially more useful interpretation is minimization of rank. We demonstrate empirically that this interpretation extends to a certain class of non-linear neural networks, and hypothesize that it may be key to explaining generalization in deep learning.
f21e255f89e0f258accbe4e984eef486-AuthorFeedback.pdf
We thank reviewers for their time and effort! Miscellaneous () Thank you for the positive feedback! Miscellaneous () Thank you for the feedback and support! By this they refute the prospect of norms being implicitly minimized on every convex objective. To our knowledge, very few have endorsed this far-reaching prospect.
Regret in Online Recommendation Systems
This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of m users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of n items. Importantly, an item cannot be recommended twice to the same user. The probabilities that a user likes each item are unknown. The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities. We investigate various structural assumptions on these probabilities: we derive for each structure regret lower bounds, and devise algorithms achieving these limits. Interestingly, our analysis reveals the relative weights of the different components of regret: the component due to the constraint of not presenting the same item twice to the same user, that due to learning the chances users like items, and finally that arising when learning the underlying structure.
First 1B business with one human employee will happen in 2026, says Anthropic CEO
AI can perform tasks such as writing, coding, reasoning, and researching with great accuracy -- all tasks that are key to starting your own company. That begs the question: Can AI help people start their very own billion-dollar business? Anthropic CEO Dario Amodei believes the answer is yes, and it's sooner than you may think. When asked at Anthropic's first developer conference, Code with Claude, when the first billion-dollar company with one human employee would happen, Amodei confidently responded, "2026." Also: Anthropic's latest Claude AI models are here - and you can try one for free today At the same event, Anthropic unveiled its most powerful family of models yet -- Claude Opus 4 and Sonnet 4 -- which can code, reason, and support agentic capabilities better than ever before.
List-Decodable Sparse Mean Estimation
In this paper, we consider that the underlying distribution D is Gaussian with k-sparse mean. Our main contribution is the first polynomial-time algorithm that enjoys sample complexity O poly(k, log d), i.e. poly-logarithmic in the dimension. One of our core algorithmic ingredients is using low-degree sparse polynomials to filter outliers, which may find more applications.
Generative Forests
We focus on generative AI for a type of data that still represent one of the most prevalent form of data: tabular data. Our paper introduces two key contributions: a new powerful class of forest-based models fit for such tasks and a simple training algorithm with strong convergence guarantees in a boosting model that parallels that of the original weak / strong supervised learning setting. This algorithm can be implemented by a few tweaks to the most popular induction scheme for decision tree induction (i.e.