AITopics | Education

Solving Quantitative Reasoning Problems With Language Models

Neural Information Processing SystemsApr-24-2026, 20:49:51 GMT

logic & formal reasoning, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report (0.70)
Instructional Material (0.46)

Industry: Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again

Neural Information Processing SystemsApr-24-2026, 20:49:11 GMT

Knowledge Distillation (KD) aims at transferring the knowledge of a wellperformed neural network (the teacher) to a weaker one (the student). A peculiar phenomenon is that a more accurate model doesn't necessarily teach better, and temperature adjustment can neither alleviate the mismatched capacity. To explain this, we decompose the efficacy of KD into three parts: correct guidance, smooth regularization, and class discriminability. The last term describes the distinctness of wrong class probabilities that the teacher provides in KD. Complex teachers tend to be over-confident and traditional temperature scaling limits the efficacy of class discriminability, resulting in less discriminative wrong class probabilities. Therefore, we propose Asymmetric Temperature Scaling (ATS), which separately applies a higher/lower temperature to the correct/wrong class. ATS enlarges the variance of wrong class probabilities in the teacher's label and makes the students grasp the absolute affinities of wrong classes to the target class as discriminative as possible. Both theoretical analysis and extensive experimental results demonstrate the effectiveness of ATS.

artificial intelligence, machine learning, student, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.67)

Genre: Research Report > New Finding (0.87)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again

Neural Information Processing SystemsApr-24-2026, 20:49:07 GMT

Knowledge Distillation (KD) aims at transferring the knowledge of a wellperformed neural network (the teacher) to a weaker one (the student). A peculiar phenomenon is that a more accurate model doesn't necessarily teach better, and temperature adjustment can neither alleviate the mismatched capacity. To explain this, we decompose the efficacy of KD into three parts: correct guidance, smooth regularization, and class discriminability. The last term describes the distinctness of wrong class probabilities that the teacher provides in KD. Complex teachers tend to be over-confident and traditional temperature scaling limits the efficacy of class discriminability, resulting in less discriminative wrong class probabilities. Therefore, we propose Asymmetric Temperature Scaling (ATS), which separately applies a higher/lower temperature to the correct/wrong class. ATS enlarges the variance of wrong class probabilities in the teacher's label and makes the students grasp the absolute affinities of wrong classes to the target class as discriminative as possible. Both theoretical analysis and extensive experimental results demonstrate the effectiveness of ATS.

artificial intelligence, international conference, machine learning, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.68)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ACloser Look at Learned Optimization: Stability, Robustness, and Inductive Biases

Neural Information Processing SystemsApr-24-2026, 20:33:13 GMT

Learned optimizers--neural networks that are trained to act as optimizers--have the potential to dramatically accelerate training of machine learning models. However, even when meta-trained across thousands of tasks at huge computational expense, blackbox learned optimizers often struggle with stability and generalization when applied to tasks unlike those in their meta-training set. In this paper, we use tools from dynamical systems to investigate the inductive biases and stability properties of optimization algorithms, and apply the resulting insights to designing inductive biases for blackbox optimizers. Our investigation begins with a noisy quadratic model, where we characterize conditions in which optimization is stable, in terms of eigenvalues of the training dynamics. We then introduce simple modifications to a learned optimizer's architecture and meta-training procedure which lead to improved stability, and improve the optimizer's inductive bias. We apply the resulting learned optimizer to a variety of neural network training tasks, where it outperforms the current state of the art learned optimizer--at matched optimizer computational overhead--with regard to optimization performance and meta-training speed, and is capable of generalization to tasks far different from those it was meta-trained on.

artificial intelligence, machine learning, optimizer, (15 more...)

Neural Information Processing Systems

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

1531beb762df4029513ebf9295e0d34f-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 20:16:43 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.29)

Genre: Research Report > New Finding (0.94)

Industry:

Education (0.68)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)

Add feedback

17a234c91f746d9625a75cf8a8731ee2-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 20:16:07 GMT

As the scope of machine learning broadens, we observe a recurring theme of algorithmic monoculture: the same systems, or systems that share components (e.g.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry:

Banking & Finance (0.67)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

ReSSL: Relational Self-Supervised Learning with Weak Augmentation

Neural Information Processing SystemsApr-24-2026, 19:56:01 GMT

Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations. However, most of methods mainly focus on the instance level information (i.e., the different augmented images of the same instance should have the same feature or cluster into the same class), but there is a lack of attention on the relationships between different instances. In this paper, we introduced a novel SSL paradigm, which we term as relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances. Specifically, our proposed method employs sharpened distribution of pairwise similarities among different instances as relation metric, which is thus utilized to match the feature embeddings of different augmentations. Moreover, to boost the performance, we argue that weak augmentations matter to represent a more reliable relation, and leverage momentum strategy for practical efficiency. Experimental results show that our proposed ReSSL significantly outperforms the previous stateof-the-art algorithms in terms of both performance and training efficiency.

artificial intelligence, inductive learning, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Learning Predictions for Algorithms with Predictions

Neural Information Processing SystemsApr-24-2026, 19:35:24 GMT

A burgeoning paradigm in algorithm design is the field of algorithms with predictions, in which algorithms can take advantage of a possibly-imperfect prediction of some aspect of the problem. While much work has focused on using predictions to improve competitive ratios, running times, or other performance measures, less effort has been devoted to the question of how to obtain the predictions themselves, especially in the critical online setting. We introduce a general design approach for algorithms that learn predictors: (1) identify a functional dependence of the performance measure on the prediction quality and (2) apply techniques from online learning to learn predictors, tune robustness-consistency trade-offs, and bound the sample complexity. We demonstrate the effectiveness of our approach by applying it to bipartite matching, ski-rental, page migration, and job scheduling. In several settings we improve upon multiple existing results while utilizing a much simpler analysis, while in the others we provide the first learning-theoretic guarantees.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report (0.93)

Industry: Education > Educational Setting > Online (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.36)

Add feedback

Preserved central model for faster bidirectional compression in distributed settings

Neural Information Processing SystemsApr-24-2026, 19:14:10 GMT

We develop a new approach to tackle communication constraints in a distributed learning problem with a central server. We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as algorithms using only uplink (from the local workers to the central server) compression. To obtain this improvement, we design MCM, an algorithm such that the downlink compression only impacts local models, while the global model is preserved. As a result, and contrary to previous works, the gradients on local servers are computed on perturbed models. Consequently, convergence proofs are more challenging and require a precise control of this perturbation. To ensure it, MCMadditionally combines model compression with a memory mechanism. This analysis opens new doors, e.g.

artificial intelligence, machine learning, wk 1, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.45)

Industry: Education (0.34)

Technology: