AITopics | logarithmic loss

Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning

Neural Information Processing SystemsMar-22-2026, 16:37:14 GMT

Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations, and has been widely applied to robotics, autonomous driving, and autoregressive text generation. The simplest approach to IL, behavior cloning (BC) is thought to incur sample complexity with unfavorable quadratic dependence on the problem horizon, motivating a variety of different online algorithms that attain improved linear horizon dependence under stronger assumptions on the data and the learner's access to the expert. We revisit the apparent gap between offline and online IL from a learning-theoretic perspective, with a focus on general policy classes up to and including deep neural networks. Through a new analysis of BC with the logarithmic loss, we show that it is possible to achieve horizon-independent sample complexity in offline IL whenever (i) the range of the cumulative payoffs is controlled, and (ii) an appropriate notion of supervised learning complexity for the policy class is controlled. Specializing our results to deterministic, stationary policies, we show that the gap between offline and online IL is not fundamental: (i) it is possible to achieve linear dependence on horizon in offline IL under dense rewards (matching what was previously only known to be achievable in online IL); and (ii) without further assumptions on the policy class, online IL cannot improve over offline IL with the logarithmic loss, even in benign MDPs. We complement our theoretical results with experiments on standard RL tasks and autoregressive language generation to validate the practical relevance of our findings.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Gradient Information for Representation and Modeling

Jie Ding, Robert Calderbank, Vahid Tarokh

Neural Information Processing SystemsFeb-12-2026, 12:46:11 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, information, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > Hawaii (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

9d684c589d67031a627ad33d59db65e5-Paper.pdf

Neural Information Processing SystemsNov-15-2025, 07:02:46 GMT

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Gradient Information for Representation and Modeling

Jie Ding, Robert Calderbank, Vahid Tarokh

Neural Information Processing SystemsOct-2-2025, 23:01:54 GMT

Some commonly used objective loss function such as cross-entropy loss for classification and squared loss for regression can be regarded as special cases of the logarithmic loss function.

artificial intelligence, information, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm Changlong Wu

Neural Information Processing SystemsAug-17-2025, 14:29:26 GMT

To derive lower bounds, we use tools from information theory (e.g., Shtarkov sum),

artificial intelligence, information theory, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Indiana (0.04)
North America > United States > Hawaii (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Add feedback

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Neural Information Processing SystemsAug-16-2025, 09:42:04 GMT

Contextual bandits encompass both the general problem of statistical learning with function approximation (specifically, cost-sensitive classification) and the classical multi-armed bandit problem, yet present algorithmic challenges greater than the sum of both parts.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report > New Finding (0.67)
Workflow (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

9d684c589d67031a627ad33d59db65e5-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 09:42:01 GMT

algorithm, contextual bandit, first-order regret, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning

Neural Information Processing SystemsMay-27-2025, 19:02:21 GMT

Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations, and has been widely applied to robotics, autonomous driving, and autoregressive text generation. The simplest approach to IL, behavior cloning (BC) is thought to incur sample complexity with unfavorable quadratic dependence on the problem horizon, motivating a variety of different online algorithms that attain improved linear horizon dependence under stronger assumptions on the data and the learner's access to the expert. We revisit the apparent gap between offline and online IL from a learning-theoretic perspective, with a focus on general policy classes up to and including deep neural networks. Through a new analysis of BC with the logarithmic loss, we show that it is possible to achieve horizon-independent sample complexity in offline IL whenever (i) the range of the cumulative payoffs is controlled, and (ii) an appropriate notion of supervised learning complexity for the policy class is controlled. Specializing our results to deterministic, stationary policies, we show that the gap between offline and online IL is not fundamental: (i) it is possible to achieve linear dependence on horizon in offline IL under dense rewards (matching what was previously only known to be achievable in online IL); and (ii) without further assumptions on the policy class, online IL cannot improve over offline IL with the logarithmic loss, even in benign MDPs.

behavior cloning, imitation learning, policy class, (5 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Superior Scoring Rules for Probabilistic Evaluation of Single-Label Multi-Class Classification Tasks

Ahmadian, Rouhollah, Ghatee, Mehdi, Wahlström, Johan

arXiv.org Artificial IntelligenceJul-24-2024

This study introduces novel superior scoring rules called Penalized Brier Score (PBS) and Penalized Logarithmic Loss (PLL) to improve model evaluation for probabilistic classification. Traditional scoring rules like Brier Score and Logarithmic Loss sometimes assign better scores to misclassifications in comparison with correct classifications. This discrepancy from the actual preference for rewarding correct classifications can lead to suboptimal model selection. By integrating penalties for misclassifications, PBS and PLL modify traditional proper scoring rules to consistently assign better scores to correct predictions. Formal proofs demonstrate that PBS and PLL satisfy strictly proper scoring rule properties while also preferentially rewarding accurate classifications. Experiments showcase the benefits of using PBS and PLL for model selection, model checkpointing, and early stopping. PBS exhibits a higher negative correlation with the F1 score compared to the Brier Score during training. Thus, PBS more effectively identifies optimal checkpoints and early stopping points, leading to improved F1 scores. Comparative analysis verifies models selected by PBS and PLL achieve superior F1 scores. Therefore, PBS and PLL address the gap between uncertainty quantification and accuracy maximization by encapsulating both proper scoring principles and explicit preference for true classifications. The proposed metrics can enhance model evaluation and selection for reliable probabilistic classification.

brier score, logarithmic loss, prediction, (15 more...)

arXiv.org Artificial Intelligence

2407.17697

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Europe > Portugal > Porto > Porto (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

Online Learning Quantum States with the Logarithmic Loss via VB-FTRL

Tseng, Wei-Fu, Chen, Kai-Chun, Xiao, Zi-Hong, Li, Yen-Huan

arXiv.org Machine LearningNov-6-2023

Online learning quantum states with the logarithmic loss (LL-OLQS) is a quantum generalization of online portfolio selection, a classic open problem in the field of online learning for over three decades. The problem also emerges in designing randomized optimization algorithms for maximum-likelihood quantum state tomography. Recently, Jezequel et al. (arXiv:2209.13932) proposed the VB-FTRL algorithm, the first nearly regret-optimal algorithm for OPS with moderate computational complexity. In this note, we generalize VB-FTRL for LL-OLQS. Let $d$ denote the dimension and $T$ the number of rounds. The generalized algorithm achieves a regret rate of $O ( d^2 \log ( d + T ) )$ for LL-OLQS. Each iteration of the algorithm consists of solving a semidefinite program that can be implemented in polynomial time by, e.g., cutting-plane methods. For comparison, the best-known regret rate for LL-OLQS is currently $O ( d^2 \log T )$, achieved by the exponential weight method. However, there is no explicit implementation available for the exponential weight method for LL-OLQS. To facilitate the generalization, we introduce the notion of VB-convexity. VB-convexity is a sufficient condition for the logarithmic barrier associated with any function to be convex and is of independent interest.

artificial intelligence, logarithmic loss, online learning quantum state, (1 more...)

arXiv.org Machine Learning

2311.04237

Genre: Research Report (0.69)

Industry: Education > Educational Setting > Online (0.80)

Technology: