silverman
How squirrels actually find all their buried nuts
Every fall, squirrels hide hundreds of acorns--and use smell, memory, and even theft to get them back. Every fall, squirrels stash hundreds of nuts to survive the colder winter months. Breakthroughs, discoveries, and DIY tips sent every weekday. As someone who routinely "hides" things from myself--car keys, receipts, even my phone while I'm actively talking on it--I felt instantly validated by Sarah Silverman's joke that squirrels forget where they bury 80% of their nuts. "And that's how trees are planted!"
- North America > United States > New Jersey (0.05)
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.05)
- Retail (0.70)
- Media > Photography (0.48)
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
Liu, Toni J. B., Boullé, Nicolas, Sarfati, Raphaël, Earls, Christopher J.
Large language models (LLMs) demonstrate remarkable emergent abilities to perform in-context learning across various tasks, including time series forecasting. This work investigates LLMs' ability to estimate probability density functions (PDFs) from data observed in-context; such density estimation (DE) is a fundamental task underlying many probabilistic modeling problems. We leverage the Intensive Principal Component Analysis (InPCA) to visualize and analyze the in-context learning dynamics of LLaMA-2 models. Our main finding is that these LLMs all follow similar learning trajectories in a low-dimensional InPCA space, which are distinct from those of traditional density estimation methods like histograms and Gaussian kernel density estimation (KDE). We interpret the LLaMA in-context DE process as a KDE with an adaptive kernel width and shape. This custom kernel model captures a significant portion of LLaMA's behavior despite having only two parameters. We further speculate on why LLaMA's kernel width and shape differs from classical algorithms, providing insights into the mechanism of in-context probabilistic reasoning in LLMs.
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Asia > India > West Bengal > Kolkata (0.04)
Neural Conditional Probability for Inference
Kostic, Vladimir R., Lounici, Karim, Pacreau, Gregoire, Novelli, Pietro, Turri, Giacomo, Pontil, Massimiliano
We introduce NCP (Neural Conditional Probability), a novel operator-theoretic approach for learning conditional distributions with a particular focus on inference tasks. NCP can be used to build conditional confidence regions and extract important statistics like conditional quantiles, mean, and covariance. It offers streamlined learning through a single unconditional training phase, facilitating efficient inference without the need for retraining even when conditioning changes. By tapping into the powerful approximation capabilities of neural networks, our method efficiently handles a wide variety of complex probability distributions, effectively dealing with nonlinear relationships between input and output variables. Theoretical guarantees ensure both optimization consistency and statistical accuracy of the NCP method. Our experiments show that our approach matches or beats leading methods using a simple Multi-Layer Perceptron (MLP) with two hidden layers and GELU activations. This demonstrates that a minimalistic architecture with a theoretically grounded loss function can achieve competitive results without sacrificing performance, even in the face of more complex architectures.
- North America > United States > New York (0.04)
- Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
As Employers Embrace AI, Workers Fret--and Seek Input
The Swedish buy-now-pay-later company Klarna has become something of a poster child for the potential benefits of generative artificial intelligence. The company relies on AI to create and tailor promotional images and to draft marketing copy, saving millions of dollars. Earlier this year it said an AI chatbot assistant was doing the work of 700 human customer-service agents, which it forecast would boost profits by 40 million this year. Klarna's approach highlights generative AI's promise for powering businesswide systems, like customer service. U.S. businesses are investing in AI, and they're eager to see such gains.
Sarah Silverman's copyright infringement suit against OpenAI will advance in pared-down form
Sarah Silverman's lawsuit against OpenAI will advance with some of her legal team's claims dismissed. The comedian sued OpenAI and Meta in July 2023, claiming they trained their AI models on her books and other work without consent. Bloomberg reported on Tuesday that the unfair competition portion of the lawsuit will proceed. Judge Martínez-Olguín gave the plaintiffs until March 13 to amend the suit. US District Judge Araceli Martínez-Olguín threw out portions of the complaint from Silverman's legal team Monday, including negligence, unjust enrichment, DMCA violations and accusations of vicarious infringement.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Robust Multi-Modal Density Estimation
Mészáros, Anna, Schumann, Julian F., Alonso-Mora, Javier, Zgonnikov, Arkady, Kober, Jens
Development of multi-modal, probabilistic prediction models has lead to a need for comprehensive evaluation metrics. While several metrics can characterize the accuracy of machine-learned models (e.g., negative log-likelihood, Jensen-Shannon divergence), these metrics typically operate on probability densities. Applying them to purely sample-based prediction models thus requires that the underlying density function is estimated. However, common methods such as kernel density estimation (KDE) have been demonstrated to lack robustness, while more complex methods have not been evaluated in multi-modal estimation problems. In this paper, we present ROME (RObust Multi-modal density Estimator), a non-parametric approach for density estimation which addresses the challenge of estimating multi-modal, non-normal, and highly correlated distributions. ROME utilizes clustering to segment a multi-modal set of samples into multiple uni-modal ones and then combines simple KDE estimates obtained for individual clusters in a single multi-modal estimate. We compared our approach to state-of-the-art methods for density estimation as well as ablations of ROME, showing that it not only outperforms established methods but is also more robust to a variety of distributions. Our results demonstrate that ROME can overcome the issues of over-fitting and over-smoothing exhibited by other estimators, promising a more robust evaluation of probabilistic machine learning models.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian Control
Allerbo, Oskar, Jörnsten, Rebecka
Most machine learning methods require tuning of hyper-parameters. For kernel ridge regression with the Gaussian kernel, the hyper-parameter is the bandwidth. The bandwidth specifies the length scale of the kernel and has to be carefully selected to obtain a model with good generalization. The default methods for bandwidth selection, cross-validation and marginal likelihood maximization, often yield good results, albeit at high computational costs. Inspired by Jacobian regularization, we formulate an approximate expression for how the derivatives of the functions inferred by kernel ridge regression with the Gaussian kernel depend on the kernel bandwidth. We use this expression to propose a closed-form, computationally feather-light, bandwidth selection heuristic, based on controlling the Jacobian. In addition, the Jacobian expression illuminates how the bandwidth selection is a trade-off between the smoothness of the inferred function and the conditioning of the training data kernel matrix. We show on real and synthetic data that compared to cross-validation and marginal likelihood maximization, our method is on pair in terms of model performance, but up to six orders of magnitude faster.
- North America > United States > California (0.04)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (4 more...)
What I Found in a Database Meta Uses to Train Generative AI
Editor's note: This article is part of The Atlantic's series on Books3. You can search the database for yourself here, and read about its origins here. This summer, I reported on a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. "Books3," as it's called, was based on a collection of pirated ebooks that includes travel guides, self-published erotic fiction, novels by Stephen King and Margaret Atwood, and a lot more. Books play a crucial role in the training of generative-AI systems.
- Law > Litigation (1.00)
- Law > Intellectual Property & Technology Law (0.78)
Revealed: The Authors Whose Pirated Books Are Powering Generative AI
One of the most troubling issues around generative AI is simple: It's being made in secret. To produce humanlike answers to questions, systems such as ChatGPT process huge quantities of written material. But few people outside of companies such as Meta and OpenAI know the full extent of the texts these programs have been trained on. Some training text comes from Wikipedia and other online writing, but high-quality generative AI requires higher-quality input than is usually found on the internet--that is, it requires the kind found in books. But neither the lawsuit itself nor the commentary surrounding it has offered a look under the hood: We have not previously known for certain whether LLaMA was trained on Silverman's, Kadrey's, or Golden's books, or any others, for that matter.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Want agency in the AI age? Get ready to fight
Writers are protesting against studios' use of AI language models to write scripts. Actors are on strike after rejecting a proposal from companies seeking to use AI technology to scan people's faces and bodies, and own the right to use these deepfake-style digital copies without consent or compensation in perpetuity. What connects these cases is a fear that humans will be replaced by computer programs, and a feeling that there's very little we can do about it. Our lax approach to regulating the excesses of the previous tech boom means AI companies have felt safe building and launching products that are exploitative and harmful. But that is about to change.
- Government > Regional Government > North America Government > United States Government (0.56)
- Law > Litigation (0.41)
- Information Technology > Security & Privacy (0.39)
- Law > Business Law (0.36)