AITopics | Neopane, Ojash

Collaborating Authors

Neopane, Ojash

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect

Neopane, Ojash, Ramdas, Aaditya, Singh, Aarti

arXiv.org Machine LearningFeb-7-2025

Estimation and inference for the Average Treatment Effect (ATE) is a cornerstone of causal inference and often serves as the foundation for developing procedures for more complicated settings. Although traditionally analyzed in a batch setting, recent advances in martingale theory have paved the way for adaptive methods that can enhance the power of downstream inference. Despite these advances, progress in understanding and developing adaptive algorithms remains in its early stages. Existing work either focus on asymptotic analyses that overlook exploration-exploitation tradeoffs relevant in finite-sample regimes or rely on simpler but suboptimal estimators. In this work, we address these limitations by studying adaptive sampling procedures that take advantage of the asymptotically optimal Augmented Inverse Probability Weighting (AIPW) estimator. Our analysis uncovers challenges obscured by asymptotic approaches and introduces a novel algorithmic design principle reminiscent of optimism in multiarmed bandits. This principled approach enables our algorithm to achieve significant theoretical and empirical gains compared to prior methods. Our findings mark a step forward in advancing adaptive causal inference methods in theory and practice.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2502.04673

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.88)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Logarithmic Neyman Regret for Adaptive Estimation of the Average Treatment Effect

Neopane, Ojash, Ramdas, Aaditya, Singh, Aarti

arXiv.org Machine LearningNov-21-2024

Estimation of the Average Treatment Effect (ATE) is a core problem in causal inference with strong connections to Off-Policy Evaluation in Reinforcement Learning. This paper considers the problem of adaptively selecting the treatment allocation probability in order to improve estimation of the ATE. The majority of prior work on adaptive ATE estimation focus on asymptotic guarantees, and in turn overlooks important practical considerations such as the difficulty of learning the optimal treatment allocation as well as hyper-parameter selection. Existing non-asymptotic methods are limited by poor empirical performance and exponential scaling of the Neyman regret with respect to problem parameters. In order to address these gaps, we propose and analyze the Clipped Second Moment Tracking (ClipSMT) algorithm, a variant of an existing algorithm with strong asymptotic optimality guarantees, and provide finite sample bounds on its Neyman regret. Our analysis shows that ClipSMT achieves exponential improvements in Neyman regret on two fronts: improving the dependence on $T$ from $O(\sqrt{T})$ to $O(\log T)$, as well as reducing the exponential dependence on problem parameters to a polynomial dependence. Finally, we conclude with simulations which show the marked improvement of ClipSMT over existing approaches.

machine learning, ney, reinforcement learning, (21 more...)

arXiv.org Machine Learning

2411.14341

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration

Mehta, Viraj, Das, Vikramjeet, Neopane, Ojash, Dai, Yijia, Bogunovic, Ilija, Schneider, Jeff, Neiswanger, Willie

arXiv.org Machine LearningNov-30-2023

Preference-based feedback is important for many applications in reinforcement learning where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback (RLHF) on large language models. For many applications of RLHF, the cost of acquiring the human feedback can be substantial. In this work, we take advantage of the fact that one can often choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and formalize this as an offline contextual dueling bandit problem. We give an upper-confidence-bound style algorithm for this problem and prove a polynomial worst-case regret bound. We then provide empirical confirmation in a synthetic setting that our approach outperforms existing methods. After, we extend the setting and methodology for practical use in RLHF training of large language models. Here, our method is able to reach better performance with fewer samples of human preferences than multiple baselines on three real-world datasets.

machine learning, natural language, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2312.00267

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Kernelized Offline Contextual Dueling Bandits

Mehta, Viraj, Neopane, Ojash, Das, Vikramjeet, Lin, Sen, Schneider, Jeff, Neiswanger, Willie

arXiv.org Artificial IntelligenceJul-20-2023

Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of these applications, the cost of acquiring the human feedback can be substantial or even prohibitive. In this work, we take advantage of the fact that often the agent can choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and introduce the offline contextual dueling bandit setting. We give an upper-confidence-bound style algorithm for this setting and prove a regret bound. We also give empirical confirmation that this method outperforms a similar strategy that uses uniformly sampled contexts.

machine learning, natural language, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2307.11288

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

Best Arm Identification under Additive Transfer Bandits

Neopane, Ojash, Ramdas, Aaditya, Singh, Aarti

arXiv.org Machine LearningDec-7-2021

We consider a variant of the best arm identification (BAI) problem in multi-armed bandits (MAB) in which there are two sets of arms (source and target), and the objective is to determine the best target arm while only pulling source arms. In this paper, we study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances. We show how our framework covers a range of previously studied pure exploration problems and additionally captures new problems. We propose and theoretically analyze an LUCB-style algorithm to identify an $\epsilon$-optimal target arm with high probability. Our theoretical analysis highlights aspects of this transfer learning problem that do not arise in the typical BAI setup, and yet recover the LUCB algorithm for single domain BAI as a special case.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2112.04083

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback