AITopics

Collaborating Authors

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dissecting Chain-of-Thought: Compositionality through In-Context Filtering and Learning

Neural Information Processing SystemsMar-22-2025, 11:47:47 GMT

Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light on this, our study investigates the impact of CoT on the ability of transformers to in-context learn a simple to study, yet general family of compositional functions: multi-layer perceptrons (MLPs). In this setting, we find that the success of CoT can be attributed to breaking down in-context learning of a compositional function into two distinct phases: focusing on and filtering data related to each step of the composition and in-context learning the single-step composition function. Through both experimental and theoretical evidence, we demonstrate how CoT significantly reduces the sample complexity of in-context learning (ICL) and facilitates the learning of complex functions that non-CoT methods struggle with. Furthermore, we illustrate how transformers can transition from vanilla in-context learning to mastering a compositional function with CoT by simply incorporating additional layers that perform the necessary data-filtering for CoT via the attention mechanism. In addition to these test-time benefits, we show CoT helps accelerate pretraining by learning shortcuts to represent complex functions and filtering plays an important role in this process. These findings collectively provide insights into the mechanics of CoT, inviting further investigation of its role in complex reasoning tasks.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Surbhi Goel, Sushrut Karmalkar, Adam Klivans

Neural Information Processing SystemsMar-22-2025, 11:47:32 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.47)
North America > Canada (0.46)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Meta-Surrogate Benchmarking for Hyperparameter Optimization

Aaron Klein, Zhenwen Dai, Frank Hutter, Neil Lawrence, Javier Gonzalez

Neural Information Processing SystemsMar-22-2025, 11:47:08 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, optimization, (16 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

0668e20b3c9e9185b04b3d2a9dc8fa2d-AuthorFeedback.pdf

Neural Information Processing SystemsMar-22-2025, 11:46:52 GMT

First of all, we thank all reviewers for their valuable time and feedback. We thank the reviewers for pointing out typos and grammatical errors, which we of course have fixed now. R1: We are afraid that the reviewer might have misunderstood some parts of the paper. We refer to the original paper for further details about the approximation of the variational posterior. We have made this more clear in the main paper now.

artificial intelligence, natural language, reviewer, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.36)

Add feedback

Supplementary Material for " Lattice partition recovery with dyadic CART "

Neural Information Processing SystemsMar-22-2025, 11:46:49 GMT

In this document, we provide some further technical details and all the proofs of the results in "Lattice We have repeatedly used a concept that two rectangles are adjacent. This section contains the proofs of the main results from Section 2. Theorem 1 demonstrates the one-sided consistency of DCART. The proof of (15) is identical to that of Theorem S1 with one difference. Assumption 1 leads to a contradiction. We are using Fano's method in this proof.

artificial intelligence, dyad, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

No " Zero-Shot " Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Neural Information Processing SystemsMar-22-2025, 11:46:36 GMT

Web-crawled datasets underlie the impressive "zero-shot" performance of multimodal models, such as CLIP for classification and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such models because the extent to which their pretraining datasets encompass downstream concepts used in "zero-shot" evaluation is unknown. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and 5 standard pretraining datasets, generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and evaluation datasets [81], and testing on purely synthetic data distributions [52]. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test dataset as the Let it Wag! benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training data and compute paradigms remains to be found.

large language model, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country:

Europe (0.67)
North America > United States > California (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Missing preliminaries

Paritosh Verma

Neural Information Processing SystemsMar-22-2025, 11:46:28 GMT

A mechanism R is randomized if for every reported valuations b it outputs a randomized allocation, i.e. it returns integral allocations that are drawn from a probability distribution corresponding to a randomized allocation. Since every randomized allocation has an associated expected fractional allocation, the output of a randomized mechanism for reported valuations b can also be interpreted as representing a fractional allocation. Notice that for randomized mechanisms, the definition of NOM takes an expectation over the randomness of the mechanism, and minimum/maximum are over the reports of other agents; we sometimes write "NOM in expectation" when referring specifically to a randomized mechanism. The PS-Lottery algorithm is based on the well-known probabilistic serial algorithm, which outputs fractional allocations that are envy-free. On a high level, the PS-Lottery algorithm uses Birkhoff's algorithm For the sake of completeness, the PS-Lottery algorithm is formally described in Appendix B.1.

agent 1, allocation, artificial intelligence, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.72)

Add feedback

Robust Gaussian Processes via Relevance Pursuit Sebastian Ament Elizabeth Santorella David Eriksson Meta

Neural Information Processing SystemsMar-22-2025, 11:46:14 GMT

Gaussian processes (GPs) are non-parametric probabilistic regression models that are popular due to their flexibility, data efficiency, and well-calibrated uncertainty estimates. However, standard GP models assume homoskedastic Gaussian noise, while many real-world applications are subject to non-Gaussian corruptions. Variants of GPs that are more robust to alternative noise models have been proposed, and entail significant trade-offs between accuracy and robustness, and between computational requirements and theoretical guarantees. In this work, we propose and study a GP model that achieves robustness against sparse outliers by inferring data-point-specific noise levels with a sequential selection procedure maximizing the log marginal likelihood that we refer to as relevance pursuit. We show, surprisingly, that the model can be parameterized such that the associated log marginal likelihood is strongly concave in the data-point-specific noise variances, a property rarely found in either robust regression objectives or GP marginal likelihoods. This in turn implies the weak submodularity of the corresponding subset selection problem, and thereby proves approximation guarantees for the proposed algorithm. We compare the model's performance relative to other approaches on diverse regression and Bayesian optimization tasks, including the challenging but common setting of sparse corruptions of the labels within or close to the function range.

artificial intelligence, likelihood, machine learning, (18 more...)

Neural Information Processing Systems

Country: