AITopics | Ma, Yifei

Collaborating Authors

Ma, Yifei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

Zhang, Miaosen, Wei, Yixuan, Xing, Zhen, Ma, Yifei, Wu, Zuxuan, Li, Ji, Zhang, Zheng, Dai, Qi, Luo, Chong, Geng, Xin, Guo, Baining

arXiv.org Artificial IntelligenceJun-13-2024

Modern vision models are trained on very large noisy datasets. While these models acquire strong capabilities, they may not follow the user's intent to output the desired results in certain aspects, e.g., visual aesthetic, preferred style, and responsibility. In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system. Advanced retrieval systems usually adopt a cascade of aesthetic models as re-rankers or filters, which are limited to low-level features like saturation and perform poorly when stylistic, cultural or knowledge contexts are involved. We find that utilizing the reasoning ability of large language models (LLMs) to rephrase the search query and extend the aesthetic expectations can make up for this shortcoming. Based on the above findings, we propose a preference-based reinforcement learning method that fine-tunes the vision models to distill the knowledge from both LLMs reasoning and the aesthetic models to better align the vision models with human aesthetics. Meanwhile, with rare benchmarks designed for evaluating retrieval systems, we leverage large multi-modality model (LMM) to evaluate the aesthetic performance with their strong abilities. As aesthetic assessment is one of the most subjective tasks, to validate the robustness of LMM, we further propose a novel dataset named HPIR to benchmark the alignment with human aesthetics. Experiments demonstrate that our method significantly enhances the aesthetic behaviors of the vision models, under several metrics. We believe the proposed algorithm can be a general practice for aligning vision models with human values.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.09397

Country: North America (0.14)

Genre: Research Report (0.81)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Optimal Design for Human Feedback

Mukherjee, Subhojyoti, Lalitha, Anusha, Kalantari, Kousha, Deshmukh, Aniket, Liu, Ge, Ma, Yifei, Kveton, Branislav

arXiv.org Artificial IntelligenceMay-30-2024

Learning of preference models from human feedback has been central to recent advances in artificial intelligence. Motivated by the cost of obtaining high-quality human annotations, we study the problem of data collection for learning preference models. The key idea in our work is to generalize the optimal design, a method for computing information gathering policies, to ranked lists. To show the generality of our ideas, we study both absolute and relative feedback on the lists. We design efficient algorithms for both settings and analyze them. We prove that our preference model estimators improve with more data and so does the ranking error under the estimators. Finally, we experiment with several synthetic and real-world datasets to show the statistical efficiency of our algorithms.

data mining, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2404.13895

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Experimental Design for Active Transductive Inference in Large Language Models

Mukherjee, Subhojyoti, Lalitha, Anusha, Deshmukh, Aniket, Liu, Ge, Ma, Yifei, Kveton, Branislav

arXiv.org Artificial IntelligenceMay-30-2024

One emergent ability of large language models (LLMs) is that query-specific examples can be included in the prompt at inference time. In this work, we use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD). We design the LLM prompt by adaptively choosing few-shot examples from a training set to optimize performance on a test set. The training examples are initially unlabeled and we obtain the label of the most informative ones, which maximally reduces uncertainty in the LLM prediction. We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen. We analyze these algorithms in linear models: first GO and then use its equivalence with SAL. We experiment with many different tasks in small, medium-sized, and large language models; and show that GO and SAL outperform other methods for choosing few-shot examples in the LLM prompt at inference time.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2404.08846

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)

Genre:

Research Report (1.00)
Overview (0.92)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs

Rahdari, Behnam, Ding, Hao, Fan, Ziwei, Ma, Yifei, Chen, Zhuotong, Deoras, Anoop, Kveton, Branislav

arXiv.org Artificial IntelligenceJan-17-2024

The unique capabilities of Large Language Models (LLMs), such as the natural language text generation ability, position them as strong candidates for providing explanation for recommendations. However, despite the size of the LLM, most existing models struggle to produce zero-shot explanations reliably. To address this issue, we propose a framework called Logic-Scaffolding, that combines the ideas of aspect-based explanation and chain-of-thought prompting to generate explanations through intermediate reasoning steps. In this paper, we share our experience in building the framework and present an interactive demonstration for exploring our results.

explanation, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3616855.3635689

2312.14345

Country: North America > United States > California > Santa Barbara County > Santa Barbara (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Media (0.69)
Leisure & Entertainment (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances

Lalitha, Anusha, Kalantari, Kousha, Ma, Yifei, Deoras, Anoop, Kveton, Branislav

arXiv.org Artificial IntelligenceJun-13-2023

We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances. Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than those with lower variances. The main algorithmic novelty is in the design of SHAdaVar, which allocates budget greedily based on overestimating the unknown reward variances. We bound probabilities of misidentifying the best arms in both SHVar and SHAdaVar. Our analyses rely on novel lower bounds on the number of pulls of an arm that do not require closed-form solutions to the budget allocation problem. Since one of our budget allocation problems is analogous to the optimal experiment design with unknown variances, we believe that our results are of a broad interest. Our experiments validate our theory, and show that SHVar and SHAdaVar outperform algorithms from prior works with analytical guarantees.

data mining, machine learning, variance, (20 more...)

arXiv.org Artificial Intelligence

2306.07549

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Media > Film (1.00)
Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Context Uncertainty in Contextual Bandits with Applications to Recommender Systems

Wang, Hao, Ma, Yifei, Ding, Hao, Wang, Yuyang

arXiv.org Machine LearningFeb-16-2022

Recurrent neural networks have proven effective in modeling sequential user feedbacks for recommender systems. However, they usually focus solely on item relevance and fail to effectively explore diverse items for users, therefore harming the system performance in the long run. To address this problem, we propose a new type of recurrent neural networks, dubbed recurrent exploration networks (REN), to jointly perform representation learning and effective exploration in the latent space. REN tries to balance relevance and exploration while taking into account the uncertainty in the representations. Our theoretical analysis shows that REN can preserve the rate-optimal sublinear regret even when there exists uncertainty in the learned representations. Our empirical study demonstrates that REN can achieve satisfactory long-term rewards on both synthetic and real-world recommendation datasets, outperforming state-of-the-art models.

context uncertainty, machine learning, neural network, (4 more...)

arXiv.org Machine Learning

2202.00805

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.60)

Add feedback

Zero-Shot Recommender Systems

Ding, Hao, Ma, Yifei, Deoras, Anoop, Wang, Yuyang, Wang, Hao

arXiv.org Artificial IntelligenceMay-18-2021

Performance of recommender systems (RS) relies heavily on the Many large scale e-commerce platforms (such as Etsy, Overstock, amount of training data available. This poses a chicken-and-egg etc) and online content platforms (such as Spotify, Overstock, Disney, problem for early-stage products, whose amount of data, in turn, Netflix, etc) have such a large inventory of items that showcasing relies on the performance of their RS. On the other hand, zero-shot all of them in front of their users is simply not practical. In learning promises some degree of generalization from an old dataset particular, in the online content category of businesses, it is often to an entirely new dataset. In this paper, we explore the possibility seen that users of their service do not have a crisp intent in mind of zero-shot learning in RS. We develop an algorithm, dubbed ZEro-unlike in the retail shopping experience where the users often have Shot Recommenders (ZESRec), that is trained on an old dataset a crisp intent of purchasing something. The need for personalized and generalize to a new one where there are neither overlapping recommendations therefore arises from the fact that not only it is users nor overlapping items, a setting that contrasts typical crossdomain impractical to show all the items in the catalogue but often times RS that has either overlapping users or items. Different users of such services need help discovering the next best thing from categorical item indices, i.e., item ID, in previous methods, -- be it the new and exciting movie or be it a new music album or ZESRec uses items' natural-language descriptions (or description even a piece of merchandise that they may want to consider for embeddings) as their continuous indices, and therefore naturally future buying if not immediately.

deep learning, neural network, zesrec, (23 more...)

arXiv.org Artificial Intelligence

2105.08318

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Consumer Health (0.94)
Media > Music (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling

Xie, Tengyang, Ma, Yifei, Wang, Yu-Xiang

arXiv.org Artificial IntelligenceJun-8-2019

Motivated by the many real-world applications of reinforcement learning (RL) that require safe-policy iterations, we consider the problem of off-policy evaluation (OPE) --- the problem of evaluating a new policy using the historical data obtained by different behavior policies --- under the model of nonstationary episodic Markov Decision Processes with a long horizon and large action space. Existing importance sampling (IS) methods often suffer from large variance that depends exponentially on the RL horizon $H$. To solve this problem, we consider a marginalized importance sampling (MIS) estimator that recursively estimates the state marginal distribution for the target policy at every step. MIS achieves a mean-squared error of $O(H^2R_{\max}^2\sum_{t=1}^H\mathbb E_\mu[(w_{\pi,\mu}(s_t,a_t))^2]/n)$ for large $n$, where $w_{\pi,\mu}(s_t,a_t)$ is the ratio of the marginal distribution of $t$th step under $\pi$ and $\mu$, $H$ is the horizon, $R_{\max}$ is the maximal rewards, and $n$ is the sample size. The result nearly matches the Cramer-Rao lower bounds for DAG MDP in \citet{jiang2016doubly} for most non-trivial regimes. To the best of our knowledge, this is the first OPE estimator with provably optimal dependence in $H$ and the second moments of the importance weight. Besides theoretical optimality, we empirically demonstrate the superiority of our method in time-varying, partially observable, and long-horizon RL environments.

artificial intelligence, estimator, health & medicine, (17 more...)

arXiv.org Artificial Intelligence

1906.03393

Country:

North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

Lin, Haibin, Zhang, Hang, Ma, Yifei, He, Tong, Zhang, Zhi, Zha, Sheng, Li, Mu

arXiv.org Machine LearningMay-2-2019

With an increasing demand for training powers for deep learning algorithms and the rapid growth of computation resources in data centers, it is desirable to dynamically schedule different distributed deep learning tasks to maximize resource utilization and reduce cost. In this process, different tasks may receive varying numbers of machines at different time, a setting we call elastic distributed training. Despite the recent successes in large mini-batch distributed training, these methods are rarely tested in elastic distributed training environments and suffer degraded performance in our experiments, when we adjust the learning rate linearly immediately with respect to the batch size. One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes. We therefore propose to smoothly adjust the learning rate over time to alleviate the influence of the noisy momentum estimation. Our experiments on image classification, object detection and semantic segmentation have demonstrated that our proposed Dynamic SGD method achieves stabilized performance when varying the number of GPUs from 8 to 128. We also provide theoretical understanding on the optimality of linear learning rate scheduling and the effects of stochastic momentum.

deep learning, mini-batch size, neural network, (16 more...)

arXiv.org Machine Learning

1904.12043

Country: Asia (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Services (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Imitation-Regularized Offline Learning

Ma, Yifei, Wang, Yu-Xiang, Balakrishnan, null, Narayanaswamy, null

arXiv.org Machine LearningJan-15-2019

We study the problem of offline learning in automated decision systems under the contextual bandits model. We are given logged historical data consisting of contexts, (randomized) actions, and (nonnegative) rewards. A common goal is to evaluate what would happen if different actions were taken in the same contexts, so as to optimize the action policies accordingly. The typical approach to this problem, inverse probability weighted estimation (IPWE) [Bottou et al., 2013], requires logged action probabilities, which may be missing in practice due to engineering complications. Even when available, small action probabilities cause large uncertainty in IPWE, rendering the corresponding results insignificant. To solve both problems, we show how one can use policy improvement (PIL) objectives, regularized by policy imitation (IML). We motivate and analyze PIL as an extension to Clipped-IPWE, by showing that both are lower-bound surrogates to the vanilla IPWE. We also formally connect IML to IPWE variance estimation [Swaminathan and Joachims 2015] and natural policy gradients. Without probability logging, our PIL-IML interpretations justify and improve, by reward-weighting, the state-of-art cross-entropy (CE) loss that predicts the action items among all action candidates available in the same contexts. With probability logging, our main theoretical contribution connects IML-underfitting to the existence of either confounding variables or model misspecification. We show the value and accuracy of our insights by simulations based on Simpson's paradox, standard UCI multiclass-to-bandit conversions and on the Criteo counterfactual analysis challenge dataset.

health & medicine, optimization problem, probability, (21 more...)

arXiv.org Machine Learning

1901.04723

Country: Asia > Japan (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Add feedback