Goto

Collaborating Authors

 concrete example





Convergence for Discrete Parameter Update Schemes

Wilson, Paul, Zanasi, Fabio, Constantinides, George

arXiv.org Artificial Intelligence

Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in low-bit integers, but typically relies on discretising real-valued updates. We introduce an alternative approach where the update rule itself is discrete, avoiding the quantisation of continuous updates by design. We establish convergence guarantees for a general class of such discrete schemes, and present a multinomial update rule as a concrete example, supported by empirical evaluation. This perspective opens new avenues for efficient training, particularly for models with inherently discrete structure.






Review for NeurIPS paper: Learnability with Indirect Supervision Signals

Neural Information Processing Systems

Weaknesses: 1. Are there lower bounds to match the main upper bound for the main result, Thm. 4.2, Eq. 2? If there was only a single T which was the identity then "no" because you can get the faster rate from realizable PAC learning. How would this need to be generalized to have matching upper and lower bounds? At least a discussion of the matter would help make the limitations of the present work more clear. E.g. if there are only 2 indirect labels and 4 real labels, show a concrete example of learning the true labelling funciton for the more complex case.... Maybe just have linear 1-D thresholds with fixed distances between the transitions discontinuity in classes, with noisy subsetting. Both working through this analytically, to give intuition for why we can learn more labels with less labels is possible, and some empirical results would be useful.


Reviews: Repeated Inverse Reinforcement Learning

Neural Information Processing Systems

The authors present a learning framework for inverse reinforcement learning wherein an agent provides policies for a variety of related tasks and a human determines whether or not the produced policies are acceptable or not. They present algorithms for learning a human's latent reward function over the tasks, and they give upper and lower bounds on the performance of the algorithms. They also address the setting where an agent's is "corrected" as it executes trajectories. This is a comprehensive theoretical treatment of a new conceptualization of IRL that I think is valuable. I have broad clarification/scoping questions and a few minor points.