AITopics | Educational Setting

Collaborating Authors

Educational Setting

MPNet: Masked and Permuted Pre-training for Language Understanding, Tao Qin

Neural Information Processing SystemsMay-31-2025, 18:28:45 GMT

BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pretraining to address this problem. However, XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to make the model see a full sentence and thus reducing the position discrepancy (vs.

large language model, machine learning, mpnet, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > Italy (0.14)

Industry: Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.75)

Add feedback

Learning in Generalized Linear Contextual Bandits with Stochastic Delays

Zhengyuan Zhou, Renyuan Xu, Jose Blanchet

Neural Information Processing SystemsMay-31-2025, 18:09:05 GMT

In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision maker only after some delay, which is unknown and stochastic, even though a decision must be made at each time step for an incoming set of contexts. We study the performance of upper confidence bound (UCB) based algorithms adapted to this delayed setting. In particular, we design a delay-adaptive algorithm, which we call Delayed UCB, for generalized linear contextual bandits using UCB-style exploration and establish regret bounds under various delay assumptions. In the important special case of linear contextual bandits, we further modify this algorithm and establish a tighter regret bound under the same delay assumptions. Our results contribute to the broad landscape of contextual bandits literature by establishing that UCB algorithms, which are widely deployed in modern recommendation engines, can be made robust to delays.

bandit, data mining, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.96)
Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.34)

Add feedback

A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise Nikos Zarifis Department of Computer Sciences Department of Computer Sciences UW-Madison

Neural Information Processing SystemsMay-31-2025, 18:02:08 GMT

We study the problem of PAC learning γ-margin halfspaces in the presence of Massart noise.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Education > Educational Setting > Online (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.88)

Add feedback

IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation Fan Lin

Neural Information Processing SystemsMay-31-2025, 17:58:32 GMT

As Large Language Models (LLMs) grow increasingly adept at managing complex tasks, the evaluation set must keep pace with these advancements to ensure it remains sufficiently discriminative. Item Discrimination (ID) theory, which is widely used in educational assessment, measures the ability of individual test items to differentiate between high and low performers. Inspired by this theory, we propose an ID-induced prompt synthesis framework for evaluating LLMs to ensure the evaluation set can continually update and refine according to model abilities.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

Neural Information Processing SystemsMay-31-2025, 17:52:09 GMT

First-order methods (FOMs) are arguably the most scalable algorithms for equilibrium computation in large extensive-form games. To operationalize these methods, a distance-generating function, acting as a regularizer for the strategy space, must be chosen. The ratio between the strong convexity modulus and the diameter of the regularizer is a key parameter in the analysis of FOMs. A natural question is then: what is the optimal distance-generating function for extensive-form decision spaces? In this paper, we make a number of contributions, ultimately establishing that the weight-one dilated entropy (DilEnt) distance-generating function is optimal up to logarithmic factors. The DilEnt regularizer is notable due to its iterate-equivalence with Kernelized OMWU (KOMWU)--the algorithm with stateof-the-art dependence on the game tree size in extensive-form games--when used in conjunction with the online mirror descent (OMD) algorithm.

artificial intelligence, big data, data mining, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.45)

Genre: Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment > Games (1.00)
Education > Educational Setting > Online (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.41)
Information Technology > Data Science > Data Mining > Big Data (0.34)

Add feedback

On the Equivalence between Online and Private Learnability beyond Binary Classification

Neural Information Processing SystemsMay-31-2025, 17:47:07 GMT

Alon et al. [4] and Bun et al. [10] recently showed that online learnability and private PAC learnability are equivalent in binary classification. We investigate whether this equivalence extends to multi-class classification and regression. First, we show that private learnability implies online learnability in both settings. Our extension involves studying a novel variant of the Littlestone dimension that depends on a tolerance parameter and on an appropriate generalization of the concept of threshold functions beyond binary classification. Second, we show that while online learnability continues to imply private learnability in multi-class classification, current proof techniques encounter significant hurdles in the regression setting. While the equivalence for regression remains open, we provide non-trivial sufficient conditions for an online learnable class to also be privately learnable.

artificial intelligence, learnability, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Instructional Material > Online (0.35)

Industry:

Information Technology > Security & Privacy (0.93)
Education > Educational Setting > Online (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

GAN Memory with No Forgetting

Neural Information Processing SystemsMay-31-2025, 16:49:16 GMT

As a fundamental issue in lifelong learning, catastrophic forgetting is directly caused by inaccessible historical data; accordingly, if the data (information) were memorized perfectly, no forgetting should be expected. Motivated by that, we propose a GAN memory for lifelong learning, which is capable of remembering a stream of datasets via generative processes, with no forgetting. Our GAN memory is based on recognizing that one can modulate the "style" of a GAN model to form perceptually-distant targeted generation. Accordingly, we propose to do sequential style modulations atop a well-behaved base GAN model, to form sequential targeted generative models, while simultaneously benefiting from the transferred base knowledge. The GAN memory - that is motivated by lifelong learning - is therefore itself manifested by a form of lifelong learning, via forward transfer and modulation of information from prior tasks. Experiments demonstrate the superiority of our method over existing approaches and its effectiveness in alleviating catastrophic forgetting for lifelong classification problems.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Instructional Material (0.96)

Industry:

Education > Educational Setting (1.00)
Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models 1,2

Neural Information Processing SystemsMay-31-2025, 15:37:33 GMT

Large language models (LLMs) are considered a crucial technology for advancing intelligent education since they exhibit the potential for an in-depth understanding of teaching scenarios and providing students with personalized guidance. Nonetheless, current LLM-based application in personalized teaching predominantly follows a "Question-Answering" paradigm, where students are passively provided with answers and explanations. In this paper, we propose SocraticLM, which achieves a Socratic "Thought-Provoking" teaching paradigm that fulfills the role of a real classroom teacher in actively engaging students in the thought process required for genuine problem-solving mastery. To build SocraticLM, we first propose a novel "Dean-Teacher-Student" multi-agent pipeline to construct a new dataset, SocraTeach, which contains 35K meticulously crafted Socratic-style multi-round (equivalent to 208K single-round) teaching dialogues grounded in fundamental mathematical problems. Our dataset simulates authentic teaching scenarios, interacting with six representative types of simulated students with different cognitive states, and strengthening four crucial teaching abilities. SocraticLM is then fine-tuned on SocraTeach with three strategies balancing its teaching and reasoning abilities. Moreover, we contribute a comprehensive evaluation system encompassing five pedagogical dimensions for assessing the teaching quality of LLMs. Extensive experiments verify that SocraticLM achieves significant improvements in the teaching performance, outperforming GPT4 by more than 12%.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education > Educational Setting (1.00)
Education > Educational Technology > Educational Software (0.46)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

PRODuctive bandits: Importance Weighting No More

Neural Information Processing SystemsMay-31-2025, 15:21:56 GMT

Prod is a seminal algorithm in full-information online learning, which has been conjectured to be fundamentally sub-optimal for multi-armed bandits.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Industry: Education > Educational Setting > Online (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.34)

Add feedback

Understanding and Minimising Outlier Features in Transformer Training

Neural Information Processing SystemsMay-31-2025, 14:08:38 GMT

Outlier Features (OFs) are neurons whose activation magnitudes significantly exceed the average over a neural network's (NN) width. They are well known to emerge during standard transformer training and have the undesirable effect of hindering quantisation in afflicted models. Despite their practical importance, little is known behind why OFs emerge during training, nor how one can minimise them. Our work focuses on the above questions, first identifying several quantitative metrics, such as the kurtosis over neuron activation norms, to measure OFs. With these metrics, we study how architectural and optimisation choices influence OFs, and provide practical insights to minimise OFs during training. As highlights, we introduce a novel unnormalised transformer block, the Outlier Protected block, and present a previously unknown benefit of non-diagonal preconditioning optimisers, finding both approaches to significantly reduce OFs and improve quantisation without compromising convergence speed, at scales of up to 7B parameters. Notably, our combination of OP block and non-diagonal preconditioner (SOAP) achieves 14.87 weight-and-activation int8 perplexity (from 14.71 in standard precision), compared to 63.4 int8 perplexity (from 16.00) with a default OF-prone combination of Pre-Norm model and Adam, when quantising OPT-125m models post-training.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: