AITopics | Instructional Material

Bandit Learning with Implicit Feedback Yi Qi

Neural Information Processing SystemsMay-26-2025, 10:18:45 GMT

Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's output. Without proper modeling, such incomplete supervision inevitably misleads model estimation, especially in a bandit learning setting where the feedback is acquired on the fly. In this work, we perform contextual bandit learning with implicit feedback by modeling the feedback as a composition of user result examination and relevance judgment. Since users' examination behavior is unobserved, we introduce latent variables to model it. We perform Thompson sampling on top of variational Bayesian inference for arm selection and model update. Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > China (0.14)
North America > Canada (0.14)

Genre: Instructional Material > Online (0.49)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.67)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Anytime-Competitive Reinforcement Learning with Policy Prior

Neural Information Processing SystemsMay-25-2025, 16:46:57 GMT

This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under the anytime competitive constraints. Experiments on the application of carbonintelligent computing verify the reward performance and cost constraint guarantee of ACRL.

constraint, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre:

Research Report (0.66)
Overview (0.48)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable (0.67)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)

Add feedback

Anytime-Competitive Reinforcement Learning with Policy Prior

Neural Information Processing SystemsMay-25-2025, 16:46:53 GMT

This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under the anytime competitive constraints. Experiments on the application of carbonintelligent computing verify the reward performance and cost constraint guarantee of ACRL.

constraint, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre:

Research Report (0.66)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Energy > Power Industry (1.00)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)

Add feedback

TradeMaster Appendix

Neural Information Processing SystemsMay-25-2025, 09:57:02 GMT

Is there a label or target associated with each instance? No, there is no label or target associated with each instance as our focus is not supervised learning settings. Is any information missing from individual instances? Yes, it is common to have missing values in financial datasets. We provide scripts to preprocess and conduct data imputation with diffusion models [26]. Are relationships between individual instances made explicit?

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Instructional Material (0.68)

Industry:

Banking & Finance > Trading (1.00)
Information Technology (0.93)
Law (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

6d7c4a0727e089ed6cdd3151cbe8d8ba-Paper-Conference.pdf

Neural Information Processing SystemsMay-24-2025, 11:48:17 GMT

machine learning, natural language, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.68)
Instructional Material (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Robust Mean Estimation Without Moments for Symmetric Distributions David Steurer Department of Computer Science Department of Computer Science ETH Zurich

Neural Information Processing SystemsMay-24-2025, 10:42:03 GMT

We study the problem of robustly estimating the mean or location parameter without moment assumptions. Known computationally efficient algorithms rely on strong distributional assumptions, such as sub-Gaussianity, or (certifiably) bounded moments. Moreover, the guarantees that they achieve in the heavy-tailed setting are weaker than those for sub-Gaussian distributions with known covariance. In this work, we show that such a tradeoff, between error guarantees and heavy-tails, is not necessary for symmetric distributions. We show that for a large class of symmetric distributions, the same error as in the Gaussian setting can be achieved efficiently. The distributions we study include products of arbitrary symmetric one-dimensional distributions, such as product Cauchy distributions, as well as elliptical distributions, a vast generalization of the Gaussian distribution.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.40)

Genre: Instructional Material (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

612a56f193d031687683445cd0001083-Supplemental-Conference.pdf

Neural Information Processing SystemsMay-24-2025, 02:02:32 GMT

artificial intelligence, machine learning, vir, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)
North America > United States > Maryland (0.14)
North America > United States > Arizona (0.14)

Genre:

Research Report (0.67)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

5951641ad71b0052cf776f9b71f18932-Paper-Conference.pdf

Neural Information Processing SystemsMay-23-2025, 19:52:27 GMT

data mining, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (0.93)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Education > Educational Setting > Online (0.47)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Neural Information Processing SystemsMay-23-2025, 06:31:24 GMT

The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-ofthe-art open LLMs like Llama 3 and Mixtral are not publicly available and very little is known about how they were created. In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produces better-performing LLMs than other open pretraining datasets. To advance the understanding of how best to curate high-quality pretraining datasets, we carefully document and ablate all of the design choices used in FineWeb, including indepth investigations of deduplication and filtering strategies. In addition, we introduce FineWeb-Edu, a 1.3-trillion token collection of educational text filtered from FineWeb.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: