Goto

Collaborating Authors

 Information Technology


Your Gmail inbox now includes Gemini summaries by default - how to stop them

ZDNet

Last summer, Google added the ability for Gemini in Gmail to summarize individual messages or long email threads. It was an especially useful feature for catching up on an email chain while you're on the go or if you were on a smaller screen, like your phone. The only drawback was that you had to manually start the "Summarize this email" process from the Gemini sidebar. In an announcement yesterday, Google says those summary cards will now appear automatically for Workspace users. Starting this week, mobile users will begin seeing summaries at the top of email messages when Gemini determines it's helpful -- for example, in a long thread, or in messages with several replies.


Aligning Embeddings and Geometric Random Graphs: Informational Results and Computational Approaches for the Procrustes-Wasserstein Problem

Neural Information Processing Systems

The Procrustes-Wasserstein problem consists in matching two high-dimensional point clouds in an unsupervised setting, and has many applications in natural language processing and computer vision.


Linear Time Sinkhorn Divergences using Positive Features

Neural Information Processing Systems

Although Sinkhorn divergences are now routinely used in data sciences to compare probability distributions, the computational effort required to compute them remains expensive, growing in general quadratically in the size n of the support of these distributions. Indeed, solving optimal transport (OT) with an entropic regularization requires computing a n n kernel matrix (the neg-exponential of a n n pairwise ground cost matrix) that is repeatedly applied to a vector.


Linear Time Sinkhorn Divergences using Positive Features

Neural Information Processing Systems

Although Sinkhorn divergences are now routinely used in data sciences to compare probability distributions, the computational effort required to compute them remains expensive, growing in general quadratically in the size n of the support of these distributions. Indeed, solving optimal transport (OT) with an entropic regularization requires computing a n n kernel matrix (the neg-exponential of a n n pairwise ground cost matrix) that is repeatedly applied to a vector. We propose to use instead ground costs of the form c(x, y) = logh'(x),'(y)i where ' is a map from the ground space onto the positive orthant R


Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes Dennis Wu Han Liu

Neural Information Processing Systems

We study the optimal memorization capacity of modern Hopfield models and Kernelized Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories. We present a tight analysis by establishing a connection between the memory configuration of KHMs and spherical codes from information theory. Specifically, we treat the stored memory set as a specialized spherical code. This enables us to cast the memorization problem in KHMs into a point arrangement problem on a hypersphere. We show that the optimal capacity of KHMs occurs when the feature space allows memories to form an optimal spherical code. This unique perspective leads to: (i) An analysis of how KHMs achieve optimal memory capacity, and identify corresponding necessary conditions. Importantly, we establish an upper capacity bound that matches the well-known exponential lower bound in the literature. This provides the first tight and optimal asymptotic memory capacity for modern Hopfield models.


A Missing Proofs is intra order-preserving, if and only if f(x) = S(x) Uw(x) with U being an upper-triangular matrix of ones and w: R

Neural Information Processing Systems

Proof of Theorem 1. () For a continuous intra order-preserving function f(x), let w(x) = U First we show w is continuous. Because f is intra order-preserving, it holds that S(x) = S(f(x)). Let ˆf(x):= S(f(x))f(x) be the sorted version of f(x). By Lemma 1, we know ˆf is continuous and therefore w is also continuous. Next, we show that w satisfies the properties listed in Theorem 1. These two arguments prove the necessary condition.


9bc99c590be3511b8d53741684ef574c-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for the insightful comments. Due to space limitation, we only discuss major comments below. This example is shown in Fig(a) below. This has been shown for ECE (e.g., Sec. 3 of [i], pointed out by To further understand this, in Sec. D.2 we evaluate the performance of all D.1 due to its adaptive binning scheme (see We will update Sec D.1 as follows: Before giving the fooling example, we highlight that ECE is not a proper We were not able to finish the OOD experiments on time and have to do it in future work.


Performative Control for Linear Dynamical Systems

Neural Information Processing Systems

We introduce the framework of performative control, where the policy chosen by the controller affects the underlying dynamics of the control system.


Offline Multitask Representation Learning for Reinforcement Learning

Neural Information Processing Systems

We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from different tasks that share a common representation and is tasked to learn the shared representation. We theoretically investigate offline multitask low-rank RL, and propose a new algorithm called MORL for offline multitask representation learning. Furthermore, we examine downstream RL in reward-free, offline and online scenarios, where a new task is introduced to the agent that shares the same representation as the upstream offline tasks. Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model.


Appendix of Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Neural Information Processing Systems

A.1 Proof of Lemma 2.1 Inserting A(t) as defined in (4) into (3) we arrive at A(t) = 1 [ A(t))) (XA(t) + A(t)X) ] (A.1) where we used X This proves that A(t) is equal the the rightmost equation in (5). A(t))) X ] (A.2) which shows that (A.1) can be written as A = A E A. A.2 Proof of Lemma 2.2 Equations (8) and (9) can be derived from (5) and (6) by taking their expectation over ν, owing to the fact that the data is Gaussian and using Wick's theorem which asserts that E Note that this derivation can be generalized to non-Gaussian data, see Ref. [1] for details. A.3 Proximal scheme We note that (5) (and similarly (8) if we use the population loss in (9) instead of the empirical loss in (6)) can be viewed as the time continuous limit of a simple proximal scheme involving the Cholesky decomposition of A and the standard Forbenius norm as Bregman distance. We state this result as: Proposition A.1. A(t) as τ 0, p with pτ t (A.5) where A(t) solves (5) for the initial condition A(0) = B Work done while visiting at Courant Institute.