AITopics | Education

We consider adversarial multi-armed bandit problems where the learner is allowed to observe losses of a number of arms beside the arm that it actually chose. We study the case where all non-chosen arms reveal their loss with a fixed but unknown probability $r$, independently of each other and the action of the learner. We propose two algorithms that work for different ranges of $r$. We show that after $T$ rounds in a bandit problem with $N$ arms, the expected regret of our first algorithm is $O(\sqrt{(T /r) \log N })$ whenever $r\ge(\log T)/(2N)$, while our second algorithm achieves a regret of $O(\sqrt{(T/r) \log (N+T)})$ for smaller values of $r$. We also give a quick estimation procedure that decides the range of~$r$. All our bounds are within logarithmic factors of the best achievable performance of any algorithm that is even allowed to know~$r$.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2604.25271

Country: Europe (0.68)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.51)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Spectral bandits

Kocák, Tomáš, Munos, Rémi, Kveton, Branislav, Agrawal, Shipra, Valko, Michal

arXiv.org Machine LearningApr-29-2026

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node of an undirected graph and its expected rating is similar to the one of its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2604.25272

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback

Supplementary Materials for the Paper " L2T-DLN: Learning to Teach with Dynamic Loss Network "

Neural Information Processing SystemsApr-28-2026, 21:43:01 GMT

In this supplementary material, we provide the proofs of convergence analysis in Section 1, 1-vs-1 transformation employed in the classification and semantic segmentation tasks in Section 2, the coordinate-wise and the preprocessing method of the LSTM teacher in Section 3, the loss functions of YOLO-v3 in Section 4, more experiments of image classification in Section 5, and the inferences of semantic segmentation in Section 6. A differentiable function e()is L-smooth with gradient Lipschitz constant C (uniformly Lipschitz continuous), if e(x) e(y) C x y, x,y. The function is called block-wise smooth with gradient Lipschitz Ci, if i e(x i,xi) ie(x i,x i) Ci xi x i, x,x (1) or with gradient Lipschitz constants { Ci}, if i e(x i,xi) ie(x i,xi) Ci x i x i, x,x (2) Further, Let Gmax max{Ci, Ci, k} C. Definition 2. For a differentiable function e(), if e(x) = 0, then x is a first-order stationary solution (SS1). For a differentiable function e(), if x is a SS1, and there exists ϵ > 0 so that for any y in the ϵ-neighborhood of x, we have e(x) e(y), then xis a local minimum. A saddle point xis an SS1 that is not a local minimum. If λmin( 2e(x)) < 0, x is a strict (non-degenerate) saddle point.

artificial intelligence, machine learning, student model, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Education (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

L2T-DLN: Learning to Teach with Dynamic Loss Network

Neural Information Processing SystemsApr-28-2026, 21:42:58 GMT

With the concept of teaching being introduced to the machine learning community, a teacher model start using dynamic loss functions to teach the training of a student model. The dynamic intends to set adaptive loss functions to different phases of student model learning. In existing works, the teacher model 1) merely determines the loss function based on the present states of the student model, i.e., disregards the experience of the teacher; 2) only utilizes the states of the student model, e.g., training iteration number and loss/accuracy from training/validation sets, while ignoring the states of the loss function. In this paper, we first formulate the loss adjustment as a temporal task by designing a teacher model with memory units, and, therefore, enables the student learning to be guided by the experience of the teacher model. Then, with a dynamic loss network, we can additionally use the states of the loss to assist the teacher learning in enhancing the interactions between the teacher and the student model. Extensive experiments demonstrate our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, objective detection, and semantic segmentation scenarios.

artificial intelligence, loss function, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
Asia > China (0.14)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

8393d955a00c463a982cefe77d0404e1-Paper-Conference.pdf

Neural Information Processing SystemsApr-28-2026, 20:48:10 GMT

artificial intelligence, encoder, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.68)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Sample Complexity of Forecast Aggregation

Neural Information Processing SystemsApr-28-2026, 14:13:10 GMT

We consider a Bayesian forecast aggregation model where nexperts, after observing private signals about an unknown binary event, report their posterior beliefs about the event to a principal, who then aggregates the reports into a single prediction for the event. The signals of the experts and the outcome of the event follow a joint distribution that is unknown to the principal, but the principal has access to i.i.d. "samples" from the distribution, where each sample is a tuple of the experts' reports (not signals) and the realization of the event. Using these samples, the principal aims to find an ε-approximately optimal aggregator, where optimality is measured in terms of the expected squared distance between the aggregated prediction and the realization of the event. We show that the sample complexity of this problem is at least Ω(mn 2/ε) for arbitrary discrete distributions, where m is the size of each expert's signal space. This sample complexity grows exponentially in the number of experts n. But, if the experts' signals are independent conditioned on the realization of the event, then the sample complexity is significantly reduced, to O(1/ε2), which does not depend on n. Our results can be generalized to non-binary events. The proof of our results uses a reduction from the distribution learning problem and reveals the fact that forecast aggregation is almost as difficult as distribution learning.

artificial intelligence, bayesian inference, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (0.67)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

6d7c4a0727e089ed6cdd3151cbe8d8ba-Paper-Conference.pdf

Neural Information Processing SystemsApr-28-2026, 12:56:16 GMT

machine learning, natural language, reinforcement learning, (14 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.68)
Instructional Material (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

fbd8e65962da06f83f3f28b52774ffd0-Paper-Conference.pdf

Neural Information Processing SystemsApr-28-2026, 11:03:30 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.69)

Industry:

Education (0.47)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

6a42b45af2b72e6e5b5e3a6fe695809f-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-28-2026, 11:02:43 GMT

The model can easily distinguish A and B according to the background (i.e., the so-called geometric skews [26]), but not according to the features of the class instance itself. However, if there is another class C, which is also in black background. In this tri-classification task (distinguishing A,B, and C), an ideal model should focus on the feature of the instance itself but not the background. This is one of the difficulties: distribution bias on samples, that some beneficial features (e.g., background) may be good for the classification, but not good for understanding the class (in a compositional way). Another difficulty is entanglement of the labels. We provide the labels in a relative way that the label of A is '0' and of B is '1', but not their true textual meanings (e.g., white paper and green leaves). The concept information is entangled and embedded into the label, thus, it is hard for the model to tell which visual features capture the corresponding concepts (i.e., white refers to the color feature and paper refers to the texture feature). We hope our understanding of this issue can inspire researchers to focus more on compositionality and design excellent continual learners.

artificial intelligence, helmet, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback