Goto

Collaborating Authors

 silverman


How squirrels actually find all their buried nuts

Popular Science

Every fall, squirrels hide hundreds of acorns--and use smell, memory, and even theft to get them back. Every fall, squirrels stash hundreds of nuts to survive the colder winter months. Breakthroughs, discoveries, and DIY tips sent every weekday. As someone who routinely "hides" things from myself--car keys, receipts, even my phone while I'm actively talking on it--I felt instantly validated by Sarah Silverman's joke that squirrels forget where they bury 80% of their nuts. "And that's how trees are planted!"

  Country:
  Genre: Research Report (0.35)
  Industry:

Density estimation with LLMs: a geometric investigation of in-context learning trajectories

Liu, Toni J. B., Boullé, Nicolas, Sarfati, Raphaël, Earls, Christopher J.

arXiv.org Machine Learning

Large language models (LLMs) demonstrate remarkable emergent abilities to perform in-context learning across various tasks, including time series forecasting. This work investigates LLMs' ability to estimate probability density functions (PDFs) from data observed in-context; such density estimation (DE) is a fundamental task underlying many probabilistic modeling problems. We leverage the Intensive Principal Component Analysis (InPCA) to visualize and analyze the in-context learning dynamics of LLaMA-2 models. Our main finding is that these LLMs all follow similar learning trajectories in a low-dimensional InPCA space, which are distinct from those of traditional density estimation methods like histograms and Gaussian kernel density estimation (KDE). We interpret the LLaMA in-context DE process as a KDE with an adaptive kernel width and shape. This custom kernel model captures a significant portion of LLaMA's behavior despite having only two parameters. We further speculate on why LLaMA's kernel width and shape differs from classical algorithms, providing insights into the mechanism of in-context probabilistic reasoning in LLMs.


Neural Conditional Probability for Inference

Kostic, Vladimir R., Lounici, Karim, Pacreau, Gregoire, Novelli, Pietro, Turri, Giacomo, Pontil, Massimiliano

arXiv.org Machine Learning

We introduce NCP (Neural Conditional Probability), a novel operator-theoretic approach for learning conditional distributions with a particular focus on inference tasks. NCP can be used to build conditional confidence regions and extract important statistics like conditional quantiles, mean, and covariance. It offers streamlined learning through a single unconditional training phase, facilitating efficient inference without the need for retraining even when conditioning changes. By tapping into the powerful approximation capabilities of neural networks, our method efficiently handles a wide variety of complex probability distributions, effectively dealing with nonlinear relationships between input and output variables. Theoretical guarantees ensure both optimization consistency and statistical accuracy of the NCP method. Our experiments show that our approach matches or beats leading methods using a simple Multi-Layer Perceptron (MLP) with two hidden layers and GELU activations. This demonstrates that a minimalistic architecture with a theoretically grounded loss function can achieve competitive results without sacrificing performance, even in the face of more complex architectures.


As Employers Embrace AI, Workers Fret--and Seek Input

TIME - Tech

The Swedish buy-now-pay-later company Klarna has become something of a poster child for the potential benefits of generative artificial intelligence. The company relies on AI to create and tailor promotional images and to draft marketing copy, saving millions of dollars. Earlier this year it said an AI chatbot assistant was doing the work of 700 human customer-service agents, which it forecast would boost profits by 40 million this year. Klarna's approach highlights generative AI's promise for powering businesswide systems, like customer service. U.S. businesses are investing in AI, and they're eager to see such gains.


Sarah Silverman's copyright infringement suit against OpenAI will advance in pared-down form

Engadget

Sarah Silverman's lawsuit against OpenAI will advance with some of her legal team's claims dismissed. The comedian sued OpenAI and Meta in July 2023, claiming they trained their AI models on her books and other work without consent. Bloomberg reported on Tuesday that the unfair competition portion of the lawsuit will proceed. Judge Martínez-Olguín gave the plaintiffs until March 13 to amend the suit. US District Judge Araceli Martínez-Olguín threw out portions of the complaint from Silverman's legal team Monday, including negligence, unjust enrichment, DMCA violations and accusations of vicarious infringement.


Robust Multi-Modal Density Estimation

Mészáros, Anna, Schumann, Julian F., Alonso-Mora, Javier, Zgonnikov, Arkady, Kober, Jens

arXiv.org Artificial Intelligence

Development of multi-modal, probabilistic prediction models has lead to a need for comprehensive evaluation metrics. While several metrics can characterize the accuracy of machine-learned models (e.g., negative log-likelihood, Jensen-Shannon divergence), these metrics typically operate on probability densities. Applying them to purely sample-based prediction models thus requires that the underlying density function is estimated. However, common methods such as kernel density estimation (KDE) have been demonstrated to lack robustness, while more complex methods have not been evaluated in multi-modal estimation problems. In this paper, we present ROME (RObust Multi-modal density Estimator), a non-parametric approach for density estimation which addresses the challenge of estimating multi-modal, non-normal, and highly correlated distributions. ROME utilizes clustering to segment a multi-modal set of samples into multiple uni-modal ones and then combines simple KDE estimates obtained for individual clusters in a single multi-modal estimate. We compared our approach to state-of-the-art methods for density estimation as well as ablations of ROME, showing that it not only outperforms established methods but is also more robust to a variety of distributions. Our results demonstrate that ROME can overcome the issues of over-fitting and over-smoothing exhibited by other estimators, promising a more robust evaluation of probabilistic machine learning models.


Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian Control

Allerbo, Oskar, Jörnsten, Rebecka

arXiv.org Machine Learning

Most machine learning methods require tuning of hyper-parameters. For kernel ridge regression with the Gaussian kernel, the hyper-parameter is the bandwidth. The bandwidth specifies the length scale of the kernel and has to be carefully selected to obtain a model with good generalization. The default methods for bandwidth selection, cross-validation and marginal likelihood maximization, often yield good results, albeit at high computational costs. Inspired by Jacobian regularization, we formulate an approximate expression for how the derivatives of the functions inferred by kernel ridge regression with the Gaussian kernel depend on the kernel bandwidth. We use this expression to propose a closed-form, computationally feather-light, bandwidth selection heuristic, based on controlling the Jacobian. In addition, the Jacobian expression illuminates how the bandwidth selection is a trade-off between the smoothness of the inferred function and the conditioning of the training data kernel matrix. We show on real and synthetic data that compared to cross-validation and marginal likelihood maximization, our method is on pair in terms of model performance, but up to six orders of magnitude faster.


What I Found in a Database Meta Uses to Train Generative AI

The Atlantic - Technology

Editor's note: This article is part of The Atlantic's series on Books3. You can search the database for yourself here, and read about its origins here. This summer, I reported on a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. "Books3," as it's called, was based on a collection of pirated ebooks that includes travel guides, self-published erotic fiction, novels by Stephen King and Margaret Atwood, and a lot more. Books play a crucial role in the training of generative-AI systems.


Revealed: The Authors Whose Pirated Books Are Powering Generative AI

The Atlantic - Technology

One of the most troubling issues around generative AI is simple: It's being made in secret. To produce humanlike answers to questions, systems such as ChatGPT process huge quantities of written material. But few people outside of companies such as Meta and OpenAI know the full extent of the texts these programs have been trained on. Some training text comes from Wikipedia and other online writing, but high-quality generative AI requires higher-quality input than is usually found on the internet--that is, it requires the kind found in books. But neither the lawsuit itself nor the commentary surrounding it has offered a look under the hood: We have not previously known for certain whether LLaMA was trained on Silverman's, Kadrey's, or Golden's books, or any others, for that matter.


Want agency in the AI age? Get ready to fight

MIT Technology Review

Writers are protesting against studios' use of AI language models to write scripts. Actors are on strike after rejecting a proposal from companies seeking to use AI technology to scan people's faces and bodies, and own the right to use these deepfake-style digital copies without consent or compensation in perpetuity. What connects these cases is a fear that humans will be replaced by computer programs, and a feeling that there's very little we can do about it. Our lax approach to regulating the excesses of the previous tech boom means AI companies have felt safe building and launching products that are exploitative and harmful. But that is about to change.