Well File:

Non-parametric classification via expand-and-sparsify representation

Neural Information Processing Systems

We propose two algorithms for non-parametric classification using such EaS representation. For our first algorithm, we use winners-take-all operation for the sparsification step and show that the proposed classifier admits the form of a locally weighted average classifier and establish its consistency via Stone's Theorem. Further, assuming that the conditional probability function P (y = 1|x) = η(x) is Hölder continuous and for optimal choice of m, we show that the convergence rate of this classifier is minimax-optimal.


Supplementary Material: Cross Aggregation Transformer for Image Restoration

Neural Information Processing Systems

These settings are consistent with CAT-R and CAT-A. For CAT-R-2, we apply regular-Rwin, and set [sw, sh] as [4, 16] (same as CAT-R). We set the MLP expansion ratio as 2, consistent with SwinIR [13]. For CAT-A-2, we apply axial-Rwin, and set sl as 4 for all CATB in each RG. The MLP expansion ratio is set as 4. Best and second best results are colored with red and blue.


Hard Negative Mixing for Contrastive Learning

Neural Information Processing Systems

The uniformity experiment is based on Wang and Isola [53]. We follow the same definitions of the losses/metrics as presented in the paper. We set α = 2 and t = 2. All features were L2-normalized, as the metrics are defined on the hypersphere. B.1 Proxy task: Effect of MLP and Stronger Augmentation Following our discussion in Section 3, we wanted to verify that hardness of the proxy task for MoCo [19] is directly correlated to the difficulty of the transformations set, i.e. proxy task hardness can modulated via the positive pair.


Constrained Sampling with Primal-Dual Langevin Monte Carlo

Neural Information Processing Systems

This work considers the problem of sampling from a probability distribution known up to a normalization constant while satisfying a set of statistical constraints specified by the expected values of general nonlinear functions. This problem finds applications in, e.g., Bayesian inference, where it can constrain moments to evaluate counterfactual scenarios or enforce desiderata such as prediction fairness. Methods developed to handle support constraints, such as those based on mirror maps, barriers, and penalties, are not suited for this task. This work therefore relies on gradient descent-ascent dynamics in Wasserstein space to put forward a discretetime primal-dual Langevin Monte Carlo algorithm (PD-LMC) that simultaneously constrains the target distribution and samples from it. We analyze the convergence of PD-LMC under standard assumptions on the target distribution and constraints, namely (strong) convexity and log-Sobolev inequalities. To do so, we bring classical optimization arguments for saddle-point algorithms to the geometry of Wasserstein space. We illustrate the relevance and effectiveness of PD-LMC in several applications.


Forecasting Human Trajectory from Scene History Ziyan Wu2 Terrence Chen 2

Neural Information Processing Systems

Predicting the future trajectory of a person remains a challenging problem, due to randomness and subjectivity of human movement. However, the moving patterns of human in a constrained scenario typically conform to a limited number of regularities to a certain extent, because of the scenario restrictions (e.g., floor plan, roads, and obstacles) and person-person or person-object interactivity. Thus, an individual person in this scenario should follow one of the regularities as well. In other words, a person's subsequent trajectory has likely been traveled by others. Based on this hypothesis, we propose to forecast a person's future trajectory by learning from the implicit scene regularities. We call the regularities, inherently derived from the past dynamics of the people and the environment in the scene, scene history.


Q: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning

Neural Information Processing Systems

Users typically engage with LLMs interactively, yet most existing benchmarks evaluate them in a static, single-turn format, posing reliability concerns in interactive scenarios. We identify a key obstacle towards reliability: LLMs are trained to answer any question, even with incomplete context or insufficient knowledge.


A Generalised Jensen Inequality

Neural Information Processing Systems

In Section 4, we require a version of Jensen's inequality generalised to (possibly) infinite-dimensional vector spaces, because our random variable takes values in H R. Note that this square norm function is indeed convex, since, for any t [0, 1] and any pair f, g H Suppose T is a real Hausdorff locally convex (possibly infinite-dimensional) linear topological space, and let C be a closed convex subset of T. Suppose (Ω, F, P) is a probability space, and V: Ω T a Pettis-integrable random variable such that V (Ω) C. Let f: C [,) be a convex, lower semi-continuous extended-real-valued function such that E We will actually apply generalised Jensen's inequality with conditional expectations, so we need the following theorem. Suppose T is a real Hausdorff locally convex (possibly infinite-dimensional) linear topological space, and let C be a closed convex subset of T. Suppose (Ω, F, P) is a probability space, and V: Ω T a Pettis-integrable random variable such that V (Ω) C. Let f: C [,) be a convex, lower semi-continuous extended-realvalued function such that E Here, (*) and (**) use the properties of conditional expectation of vector-valued random variables given in [12, pp.45-46, Properties 43 and 40 respectively]. The right-hand side is clearly E-measurable, since we have a linear operator on an E-measurable random variable. Now take the supremum of the right-hand side over Q. Then (5) tells us that E [ f(V) | E ] ( f E [ V | E ]), as required.



The cast of Mission: Impossible on the importance of humanity during the rise of AI

Mashable

On May 23, the final installment of the Mission: Impossible saga is set to come to an end with Mission: Impossible – The Final Reckoning. Known for its vicious villains, the franchise sports its Biggest Bad yet: an AI known as The Entity that's bent on wiping humans off the planet. Mashable Senior Creative Producer Mark Stetson sat down with the cast (Simon Pegg, Angela Bassett, Hayley Atwell, Pom Klementieff, and Greg Tarzan Davis) to discuss the film's themes of humanity and friendship and its exploration of the future of AI. First, Simon Pegg, who has played Benji Dunn since Mission: Impossible III -- when we first get a hint of The Entity's existence -- helped break down the origins of this Big Bad. "Yeah, I mean, the Entity was around in its nascent form a long time ago. It was a malicious code, basically, which itself evolved into what we are up against in Dead Reckoning, in The Final Reckoning. And I love the idea that McQ [Director Christopher McQuarrie] looked back into the past to see where things may have started, where the rumblings of the Entity may have begun. And further back as well, to, obviously, when Bill Donloe was exiled to Alaska."


Implicit Regularization in Deep Learning May Not Be Explainable by Norms

Neural Information Processing Systems

Mathematically characterizing the implicit regularization induced by gradientbased optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may apply, and a standard test-bed for studying this prospect is matrix factorization (matrix completion via linear neural networks). It is an open question whether norms can explain the implicit regularization in matrix factorization. The current paper resolves this open question in the negative, by proving that there exist natural matrix factorization problems on which the implicit regularization drives all norms (and quasi-norms) towards infinity. Our results suggest that, rather than perceiving the implicit regularization via norms, a potentially more useful interpretation is minimization of rank. We demonstrate empirically that this interpretation extends to a certain class of non-linear neural networks, and hypothesize that it may be key to explaining generalization in deep learning.