Goto

Collaborating Authors

 Karimzadehgan, Maryam


Ever Evolving Evaluator (EV3): Towards Flexible and Reliable Meta-Optimization for Knowledge Distillation

arXiv.org Artificial Intelligence

We introduce EV3, a novel meta-optimization framework designed to efficiently train scalable machine learning models through an intuitive explore-assess-adapt protocol. In each iteration of EV3, we explore various model parameter updates, assess them using pertinent evaluation methods, and then adapt the model based on the optimal updates and previous progress history. EV3 offers substantial flexibility without imposing stringent constraints like differentiability on the key objectives relevant to the tasks of interest, allowing for exploratory updates with intentionally-biased gradients and through a diversity of losses and optimizers. Additionally, the assessment phase provides reliable safety controls to ensure robust generalization, and can dynamically prioritize tasks in scenarios with multiple objectives. With inspiration drawn from evolutionary algorithms, meta-learning, and neural architecture search, we investigate an application of EV3 to knowledge distillation. Our experimental results illustrate EV3's capability to safely explore the modeling landscape, while hinting at its potential applicability across numerous domains due to its inherent flexibility and adaptability.


Overcoming Prior Misspecification in Online Learning to Rank

arXiv.org Artificial Intelligence

The recent literature on online learning to rank (LTR) has established the utility of prior knowledge to Bayesian ranking bandit algorithms. However, a major limitation of existing work is the requirement for the prior used by the algorithm to match the true prior. In this paper, we propose and analyze adaptive algorithms that address this issue and additionally extend these results to the linear and generalized linear models. We also consider scalar relevance feedback on top of click feedback. Moreover, we demonstrate the efficacy of our algorithms using both synthetic and real-world experiments.


CORe: Capitalizing On Rewards in Bandit Exploration

arXiv.org Machine Learning

A multi-armed bandit can be considered as a special case We propose a bandit algorithm that explores purely of linear bandits, where the feature vector of each arm is by randomizing its past observations. In particular, a one-hot vector indicating the index of the arm, and the the sufficient optimism in the mean reward estimates parameter vector is a vector of corresponding mean rewards. is achieved by exploiting the variance in Arguably, the most popular and well-studied exploration the past observed rewards. We name the algorithm strategies for solving bandit problems are Thompson sampling Capitalizing On Rewards (CORe). The algorithm (TS) [Thompson, 1933, Agrawal and Goyal, 2013] is general and can be easily applied to different and Optimism in the Face of Uncertainty (OFU) [Auer et al., bandit settings. The main benefit of CORe is that 2002]. TS maintains a posterior distribution over each arm's its exploration is fully data-dependent.