AITopics | Neural Information Processing Systems

Collaborating Authors

Neural Information Processing Systems

Multivariate Triangular Quantile Maps for Novelty Detection Jingjing Wang 1, Sun Sun 2 University of Waterloo 1

Neural Information Processing SystemsJun-2-2025, 13:42:33 GMT

Novelty detection, a fundamental task in machine learning, has drawn a lot of recent attention due to its wide-ranging applications and the rise of neural approaches. In this work, we present a general framework for neural novelty detection that centers around a multivariate extension of the univariate quantile function. Our framework unifies and extends many classical and recent novelty detection algorithms, and opens the way to exploit recent advances in flow-based neural density estimation. We adapt the multiple gradient descent algorithm to obtain the first efficient endto-end implementation of our framework that is free of tuning hyperparameters. Extensive experiments over a number of real datasets confirm the efficacy of our proposed method against state-of-the-art alternatives.

data mining, detection, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Industry: Energy > Oil & Gas > Upstream (0.41)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Sampling Sketches for Concave Sublinear Functions of Frequencies

Edith Cohen, Ofir Geri

Neural Information Processing SystemsJun-2-2025, 13:42:29 GMT

We consider massive distributed datasets that consist of elements modeled as keyvalue pairs and the task of computing statistics or aggregates where the contribution of each key is weighted by a function of its frequency (sum of values of its elements). This fundamental problem has a wealth of applications in data analytics and machine learning, in particular, with concave sublinear functions of the frequencies that mitigate the disproportionate effect of keys with high frequency. The family of concave sublinear functions includes low frequency moments ( 1), capping, logarithms, and their compositions. A common approach is to sample keys, ideally, proportionally to their contributions and estimate statistics from the sample. A simple but costly way to do this is by aggregating the data to produce a table of keys and their frequencies, apply our function to the frequency values, and then apply a weighted sampling scheme. Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies. Our sketch structure size is very close to the desired sample size and our samples provide statistical guarantees on the estimation quality that are very close to that of an ideal sample of the same size computed over aggregated data. Finally, we demonstrate experimentally the simplicity and effectiveness of our methods.

artificial intelligence, machine learning, sketch, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

Neural Information Processing SystemsJun-2-2025, 13:42:11 GMT

In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression. We first prove that our new simple scheme based on a sequence of problems with decreasing regularization parameters is provably globally convergent, that this convergence is linear with a constant factor which scales only logarithmically with the condition number. In the parametric setting, we obtain an algorithm with the same scaling than regular first-order methods but with an improved behavior, in particular in ill-conditioned problems.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.88)

Add feedback

Towards a Theoretical Understanding of the ' Reversal Curse ' via Training Dynamics

Neural Information Processing SystemsJun-2-2025, 13:41:53 GMT

Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on "A B" (e.g., Tom is the parent of John), LLM fails to directly conclude "B A" (e.g., John is the child of Tom) during inference even if the two sentences are semantically identical, which is known as the "reversal curse". In this paper, we theoretically analyze the reversal curse via the training dynamics of (stochastic) gradient descent for two auto-regressive models: (1) a bilinear model that can be viewed as a simplification of a one-layer transformer; (2) one-layer transformers under certain assumptions. Our analysis reveals that for both models, the reversal curse is a consequence of the (effective) model weights asymmetry, i.e., the increase of weights from a token A to token B during training does not necessarily cause the increase of the weights from B to A, which is caused by the training dynamics under certain choice of loss function and the optimization space of model parameters. Moreover, our analysis can be naturally applied to other logical reasoning tasks such as chain-of-thought (COT), which provides a new perspective different from previous work that focuses on expressivity. Finally, we conduct experiments to validate our theory on multi-layer transformers under different settings.

large language model, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Once Read is Enough: Domain-specific Pretraining-free Language Models with Cluster-guided Sparse Experts for Long-tail Domain Knowledge

Neural Information Processing SystemsJun-2-2025, 13:38:15 GMT

Language models (LMs) only pretrained on a general and massive corpus usually cannot attain satisfying performance on domain-specific downstream tasks, and hence, applying domain-specific pretraining to LMs is a common and indispensable practice. However, domain-specific pretraining can be costly and time-consuming, hindering LMs' deployment in real-world applications. In this work, we consider the incapability to memorize domain-specific knowledge embedded in the general corpus with rare occurrences and "long-tail" distributions as the leading cause for pretrained LMs' inferior downstream performance. Analysis of Neural Tangent Kernels (NTKs) reveals that those long-tail data are commonly overlooked in the model's gradient updates and, consequently, are not effectively memorized, leading to poor domain-specific downstream performance. Based on the intuition that data with similar semantic meaning are closer in the embedding space, we devise a Cluster-guided Sparse Expert (CSE) layer to actively learn long-tail domain knowledge typically neglected in previous pretrained LMs. During pretraining, a CSE layer efficiently clusters domain knowledge together and assigns long-tail knowledge to designate extra experts. CSE is also a lightweight structure that only needs to be incorporated in several deep layers. With our training strategy, we found that during pretraining, data of long-tail knowledge gradually formulate isolated, "outlier" clusters in an LM's representation spaces, especially in deeper layers.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana (0.14)
North America > United States > Colorado (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

revised version of the paper. and decision policies π(x) that maximize the decision maker's utility rather than the individuals ' best interest

Neural Information Processing SystemsJun-2-2025, 13:37:30 GMT

We would like to thank the reviewers for their comments and suggestions. Figure 7 in Appendix F.3, and this is likely to increase their individual utility in the long term. We will clarify this in the revised version of the paper. We will fix the statement of Proposition 4. The "strategic setting" refers to a scenario in which individuals who are subject to (semi)-automated A counterfactual is a statement of how the world would have to be different for a desirable outcome to occur [13]. We will clarify this in the revised version of the paper.

artificial intelligence, decision maker, explanation, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Differentially Private Reinforcement Learning with Self-Play

Neural Information Processing SystemsJun-2-2025, 13:37:13 GMT

We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints. This is well-motivated by various real-world applications involving sensitive data, where it is critical to protect users' private information. We first extend the definitions of Joint DP (JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games, where both definitions ensure trajectory-wise privacy protection. Then we design a provably efficient algorithm based on optimistic Nash value iteration and privatization of Bernsteintype bonuses. The algorithm is able to satisfy JDP and LDP requirements when instantiated with appropriate privacy mechanisms. Furthermore, for both notions of DP, our regret bound generalizes the best known result under the single-agent RL case, while our regret could also reduce to the best known result for multi-agent RL without privacy constraints. To the best of our knowledge, these are the first results towards understanding trajectory-wise privacy protection in multi-agent RL.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

DeformableTST: Transformer for Time Series Forecasting without Over-reliance on Patching

Neural Information Processing SystemsJun-2-2025, 13:37:08 GMT

With the proposal of patching technique in time series forecasting, Transformerbased models have achieved compelling performance and gained great interest from the time series community. But at the same time, we observe a new problem that the recent Transformer-based models are overly reliant on patching to achieve ideal performance, which limits their applicability to some forecasting tasks unsuitable for patching. In this paper, we intent to handle this emerging issue. Through diving into the relationship between patching and full attention (the core mechanism in Transformer-based models), we further find out the reason behind this issue is that full attention relies overly on the guidance of patching to focus on the important time points and learn non-trivial temporal representation. Based on this finding, we propose DeformableTST as an effective solution to this emerging issue. Specifically, we propose deformable attention, a sparse attention mechanism that can better focus on the important time points by itself, to get rid of the need of patching. And we also adopt a hierarchical structure to alleviate the efficiency issue caused by the removal of patching. Experimentally, our DeformableTST achieves the consistent state-of-the-art performance in a broader range of time series tasks, especially achieving promising performance in forecasting tasks unsuitable for patching, therefore successfully reducing the reliance on patching and broadening the applicability of Transformer-based models.

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine (0.68)
Banking & Finance (0.67)
Energy > Renewable (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SpiderBoost and Momentum: Faster Variance Reduction Algorithms

Zhe Wang, Kaiyi Ji, Yi Zhou, Yingbin Liang, Vahid Tarokh

Neural Information Processing SystemsJun-2-2025, 13:36:55 GMT

SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in smooth nonconvex optimization. However, SPIDER uses an accuracy-dependent stepsize that slows down the convergence in practice, and cannot handle objective functions that involve nonsmooth regularizers. In this paper, we propose SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee.

artificial intelligence, machine learning, optimization, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

Write, Execute, Assess: Program Synthesis with a REPL

Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, Armando Solar-Lezama

Neural Information Processing SystemsJun-2-2025, 13:36:40 GMT

We present a neural program synthesis approach integrating components which write, execute, and assess code to navigate the search space of possible programs. We equip the search process with an interpreter or a read-eval-print-loop (REPL), which immediately executes partially written programs, exposing their semantics. The REPL addresses a basic challenge of program synthesis: tiny changes in syntax can lead to huge changes in semantics. We train a pair of models, a policy that proposes the new piece of code to write, and a value function that assesses the prospects of the code written so-far. At test time we can combine these models with a Sequential Monte Carlo algorithm. We apply our approach to two domains: synthesizing text editing programs and inferring 2D and 3D graphics programs.

artificial intelligence, logic & formal reasoning, spec, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.84)

Add feedback