AITopics | Ramchandran, Kannan

Plotting

Ramchandran, Kannan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

Wu, Tianhao, Zhu, Banghua, Zhang, Ruoyu, Wen, Zhaojin, Ramchandran, Kannan, Jiao, Jiantao

arXiv.org Artificial IntelligenceOct-9-2023

Large Language Models (LLMs) can acquire extensive world knowledge through pre-training on large corpora. However, due to exposure to low-quality data, LLMs may exhibit harmful behavior without aligning with human values. The dominant approach for steering LLMs towards beneficial behavior involves Reinforcement Learning with Human Feedback (RLHF), with Proximal Policy Optimization (PPO) serving as the default RL optimizer. Despite its effectiveness, PPO has limitations when optimizing rewards trained from comparison-based loss. Primarily, PPO is not invariant to equivalent reward functions containing identical preference information due to the need to calibrate the reward scale. Additionally, PPO's necessity for token-wise updates introduces complexity in both function approximation and algorithm design compared to trajectory-wise optimization. This paper proposes a new framework, reinforcement learning with relative feedback, and a novel trajectory-wise policy gradient algorithm, Pairwise Proximal Policy Optimization (P3O) that operates directly on comparative rewards. We show theoretically that P3O is invariant to equivalent rewards and avoids the complexity of PPO. Empirical evaluations demonstrate that P3O outperforms PPO in the KL-Reward trade-off and can align with human preferences as well as or better than prior methods. In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2310.00212

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing

Rajaraman, Nived, Devvrit, null, Mokhtari, Aryan, Ramchandran, Kannan

arXiv.org Artificial IntelligenceJun-4-2023

Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. In fact, several practical studies have shown that if a pruned model is fine-tuned with some gradient-based updates it generalizes well to new samples. Although the above pipeline, which we refer to as pruning + fine-tuning, has been extremely successful in lowering the complexity of trained models, there is very little known about the theory behind this success. In this paper, we address this issue by investigating the pruning + fine-tuning framework on the overparameterized matrix sensing problem with the ground truth $U_\star \in \mathbb{R}^{d \times r}$ and the overparameterized model $U \in \mathbb{R}^{d \times k}$ with $k \gg r$. We study the approximate local minima of the mean square error, augmented with a smooth version of a group Lasso regularizer, $\sum_{i=1}^k \| U e_i \|_2$. In particular, we provably show that pruning all the columns below a certain explicit $\ell_2$-norm threshold results in a solution $U_{\text{prune}}$ which has the minimum number of columns $r$, yet close to the ground truth in training loss. Moreover, in the subsequent fine-tuning phase, gradient descent initialized at $U_{\text{prune}}$ converges at a linear rate to its limit. While our analysis provides insights into the role of regularization in pruning, we also show that running gradient descent in the absence of regularization results in models which {are not suitable for greedy pruning}, i.e., many columns could have their $\ell_2$ norm comparable to that of the maximum. To the best of our knowledge, our results provide the first rigorous insights on why greedy pruning + fine-tuning leads to smaller models which also generalize well.

artificial intelligence, machine learning, stationary point, (17 more...)

arXiv.org Artificial Intelligence

2303.11453

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

Yang, Yaoqing, Theisen, Ryan, Hodgkinson, Liam, Gonzalez, Joseph E., Ramchandran, Kannan, Martin, Charles H., Mahoney, Michael W.

arXiv.org Artificial IntelligenceJun-4-2023

Selecting suitable architecture parameters and training hyperparameters is essential for enhancing machine learning (ML) model performance. Several recent empirical studies conduct large-scale correlational analysis on neural networks (NNs) to search for effective \emph{generalization metrics} that can guide this type of model selection. Effective metrics are typically expected to correlate strongly with test performance. In this paper, we expand on prior analyses by examining generalization-metric-based model selection with the following objectives: (i) focusing on natural language processing (NLP) tasks, as prior work primarily concentrates on computer vision (CV) tasks; (ii) considering metrics that directly predict \emph{test error} instead of the \emph{generalization gap}; (iii) exploring metrics that do not need access to data to compute. From these objectives, we are able to provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics. Our analyses consider (I) hundreds of Transformers trained in different settings, in which we systematically vary the amount of data, the model size and the optimization hyperparameters, (II) a total of 51 pretrained Transformers from eight families of Huggingface NLP models, including GPT2, BERT, etc., and (III) a total of 28 existing and novel generalization metrics. Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks, exhibiting stronger correlations than other, more popular metrics. To further examine these metrics, we extend prior formulations relying on power law (PL) spectral distributions to exponential (EXP) and exponentially-truncated power law (E-TPL) families.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2202.02842

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

Rajaraman, Nived, Han, Yanjun, Jiao, Jiantao, Ramchandran, Kannan

arXiv.org Artificial IntelligenceMar-14-2023

We consider the sequential decision-making problem where the mean outcome is a non-linear function of the chosen action. Compared with the linear model, two curious phenomena arise in non-linear models: first, in addition to the "learning phase" with a standard parametric rate for estimation or regret, there is an "burn-in period" with a fixed cost determined by the non-linear function; second, achieving the smallest burn-in cost requires new exploration algorithms. For a special family of non-linear functions named ridge functions in the literature, we derive upper and lower bounds on the optimal burn-in cost, and in addition, on the entire learning trajectory during the burn-in period via differential equations. In particular, a two-stage algorithm that first finds a good initial action and then treats the problem as locally linear is statistically optimal. In contrast, several classical algorithms, such as UCB and algorithms relying on regression oracles, are provably suboptimal.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2302.06025

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

The Fair Value of Data Under Heterogeneous Privacy Constraints

Kang, Justin, Pedarsani, Ramtin, Ramchandran, Kannan

arXiv.org Artificial IntelligenceJan-30-2023

Modern data aggregation often takes the form of a platform collecting data from a network of users. More than ever, these users are now requesting that the data they provide is protected with a guarantee of privacy. This has led to the study of optimal data acquisition frameworks, where the optimality criterion is typically the maximization of utility for the agent trying to acquire the data. This involves determining how to allocate payments to users for the purchase of their data at various privacy levels. The main goal of this paper is to characterize a fair amount to pay users for their data at a given privacy level. We propose an axiomatic definition of fairness, analogous to the celebrated Shapley value. Two concepts for fairness are introduced. The first treats the platform and users as members of a common coalition and provides a complete description of how to divide the utility among the platform and users. In the second concept, fairness is defined only among users, leading to a potential fairness-constrained mechanism design problem for the platform. We consider explicit examples involving private heterogeneous data and show how these notions of fairness can be applied. To the best of our knowledge, these are the first fairness concepts for data that explicitly consider privacy constraints.

artificial intelligence, machine learning, platform, (19 more...)

arXiv.org Artificial Intelligence

2301.13336

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Data Science (1.00)
(2 more...)

Add feedback

Efficiently Computing Sparse Fourier Transforms of $q$-ary Functions

Erginbas, Yigit Efe, Kang, Justin Singh, Aghazadeh, Amirali, Ramchandran, Kannan

arXiv.org Artificial IntelligenceJan-15-2023

Fourier transformations of pseudo-Boolean functions are popular tools for analyzing functions of binary sequences. Real-world functions often have structures that manifest in a sparse Fourier transform, and previous works have shown that under the assumption of sparsity the transform can be computed efficiently. But what if we want to compute the Fourier transform of functions defined over a $q$-ary alphabet? These types of functions arise naturally in many areas including biology. A typical workaround is to encode the $q$-ary sequence in binary, however, this approach is computationally inefficient and fundamentally incompatible with the existing sparse Fourier transform techniques. Herein, we develop a sparse Fourier transform algorithm specifically for $q$-ary functions of length $n$ sequences, dubbed $q$-SFT, which provably computes an $S$-sparse transform with vanishing error as $q^n \rightarrow \infty$ in $O(Sn)$ function evaluations and $O(S n^2 \log q)$ computations, where $S = q^{n\delta}$ for some $\delta < 1$. Under certain assumptions, we show that for fixed $q$, a robust version of $q$-SFT has a sample complexity of $O(Sn^2)$ and a computational complexity of $O(Sn^3)$ with the same asymptotic guarantees. We present numerical simulations on synthetic and real-world RNA data, demonstrating the scalability of $q$-SFT to massively high dimensional $q$-ary functions.

artificial intelligence, data quality, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2301.062

Country:

North America > United States (0.14)
North America > Canada (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > France (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Minimax Optimal Online Imitation Learning via Replay Estimation

Swamy, Gokul, Rajaraman, Nived, Peng, Matthew, Choudhury, Sanjiban, Bagnell, J. Andrew, Wu, Zhiwei Steven, Jiao, Jiantao, Ramchandran, Kannan

arXiv.org Artificial IntelligenceJan-14-2023

Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap that scales with $H^2 / N$ for behavioral cloning and $H / \sqrt{N}$ for online moment matching, where $H$ is the horizon and $N$ is the size of the expert dataset. We introduce the technique of replay estimation to reduce this empirical variance: by repeatedly executing cached expert actions in a stochastic simulator, we compute a smoother expert visitation distribution estimate to match. In the presence of general function approximation, we prove a meta theorem reducing the performance gap of our approach to the parameter estimation error for offline classification (i.e. learning the expert policy). In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by our approach achieves the optimal $\widetilde{O} \left( \min({H^{3/2}} / {N}, {H} / {\sqrt{N}} \right)$ dependency, under significantly weaker assumptions compared to prior work. We implement multiple instantiations of our approach on several continuous control tasks and find that we are able to significantly improve policy performance across a variety of dataset sizes.

artificial intelligence, machine learning, minimax optimal online imitation learning, (1 more...)

arXiv.org Artificial Intelligence

2205.15397

Genre:

Instructional Material > Online (0.60)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.89)
Information Technology > Artificial Intelligence > Robots (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)

Add feedback

Interactive Learning with Pricing for Optimal and Stable Allocations in Markets

Erginbas, Yigit Efe, Phade, Soham, Ramchandran, Kannan

arXiv.org Artificial IntelligenceDec-13-2022

Large-scale online recommendation systems must facilitate the allocation of a limited number of items among competing users while learning their preferences from user feedback. As a principled way of incorporating market constraints and user incentives in the design, we consider our objectives to be two-fold: maximal social welfare with minimal instability. To maximize social welfare, our proposed framework enhances the quality of recommendations by exploring allocations that optimistically maximize the rewards. To minimize instability, a measure of users' incentives to deviate from recommended allocations, the algorithm prices the items based on a scheme derived from the Walrasian equilibria. Though it is known that these equilibria yield stable prices for markets with known user preferences, our approach accounts for the inherent uncertainty in the preferences and further ensures that the users accept their recommendations under offered prices. To the best of our knowledge, our approach is the first to integrate techniques from combinatorial bandits, optimal resource allocation, and collaborative filtering to obtain an algorithm that achieves sub-linear social welfare regret as well as sub-linear instability. Empirical studies on synthetic and real-world data also demonstrate the efficacy of our strategy compared to approaches that do not fully incorporate all these aspects.

artificial intelligence, instability, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2212.06891

Country: Europe (0.28)

Genre: Research Report > New Finding (0.45)

Industry:

Information Technology (0.67)
Education > Educational Setting > Online (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Interactive Recommendations for Optimal Allocations in Markets with Constraints

Erginbas, Yigit Efe, Phade, Soham, Ramchandran, Kannan

arXiv.org Artificial IntelligenceJul-28-2022

Recommendation systems when employed in markets play a dual role: they assist users in selecting their most desired items from a large pool and they help in allocating a limited number of items to the users who desire them the most. Despite the prevalence of capacity constraints on allocations in many real-world recommendation settings, a principled way of incorporating them in the design of these systems has been lacking. Motivated by this, we propose an interactive framework where the system provider can enhance the quality of recommendations to the users by opportunistically exploring allocations that maximize user rewards and respect the capacity constraints using appropriate pricing mechanisms. We model the problem as an instance of a low-rank combinatorial multi-armed bandit problem with selection constraints on the arms. We employ an integrated approach using techniques from collaborative filtering, combinatorial bandits, and optimal resource allocation to provide an algorithm that provably achieves sub-linear regret, namely $\tilde{\mathcal{O}} ( \sqrt{N M (N+M) RT} )$ in $T$ rounds for a problem with $N$ users, $M$ items and rank $R$ mean reward matrix. Empirical studies on synthetic and real-world data also demonstrate the effectiveness and performance of our approach.

allocation, artificial intelligence, data mining, (20 more...)

arXiv.org Artificial Intelligence

2207.04143

Genre: Research Report > New Finding (0.46)

Industry: Consumer Products & Services > Restaurants (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback

Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes

Ghosh, Avishek, Chowdhury, Sayak Ray, Ramchandran, Kannan

arXiv.org Machine LearningJul-13-2021

We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem where the transition kernel $P^*$ belongs to a family of models $\mathcal{P}^*$ with finite metric entropy. In the model selection framework, instead of $\mathcal{P}^*$, we are given $M$ nested families of transition kernels $\cP_1 \subset \cP_2 \subset \ldots \subset \cP_M$. We propose and analyze a novel algorithm, namely \emph{Adaptive Reinforcement Learning (General)} (\texttt{ARL-GEN}) that adapts to the smallest such family where the true transition kernel $P^*$ lies. \texttt{ARL-GEN} uses the Upper Confidence Reinforcement Learning (\texttt{UCRL}) algorithm with value targeted regression as a blackbox and puts a model selection module at the beginning of each epoch. Under a mild separability assumption on the model classes, we show that \texttt{ARL-GEN} obtains a regret of $\Tilde{\mathcal{O}}(d_{\mathcal{E}}^*H^2+\sqrt{d_{\mathcal{E}}^* \mathbb{M}^* H^2 T})$, with high probability, where $H$ is the horizon length, $T$ is the total number of steps, $d_{\mathcal{E}}^*$ is the Eluder dimension and $\mathbb{M}^*$ is the metric entropy corresponding to $\mathcal{P}^*$. Note that this regret scaling matches that of an oracle that knows $\mathcal{P}^*$ in advance. We show that the cost of model selection for \texttt{ARL-GEN} is an additive term in the regret having a weak dependence on $T$. Subsequently, we remove the separability assumption and consider the setup of linear mixture MDPs, where the transition kernel $P^*$ has a linear function approximation. With this low rank structure, we propose novel adaptive algorithms for model selection, and obtain (order-wise) regret identical to that of an oracle with knowledge of the true model class.

artificial intelligence, machine learning, probability, (14 more...)

arXiv.org Machine Learning

2107.05849

Country: Asia (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback