AITopics | Yi, Xinyang

Collaborating Authors

Yi, Xinyang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

STAR: A Simple Training-free Approach for Recommendations using Large Language Models

Lee, Dong-Ho, Kraft, Adam, Jin, Long, Mehta, Nikhil, Xu, Taibai, Hong, Lichan, Chi, Ed H., Yi, Xinyang

arXiv.org Artificial IntelligenceOct-21-2024

Recent progress in large language models (LLMs) offers promising new approaches for recommendation system (RecSys) tasks. While the current state-of-the-art methods rely on fine-tuning LLMs to achieve optimal results, this process is costly and introduces significant engineering complexities. Conversely, methods that bypass fine-tuning and use LLMs directly are less resource-intensive but often fail to fully capture both semantic and collaborative information, resulting in sub-optimal performance compared to their fine-tuned counterparts. In this paper, we propose a Simple Training-free Approach for Recommendation (STAR), a framework that utilizes LLMs and can be applied to various recommendation tasks without the need for fine-tuning. Our approach involves a retrieval stage that uses semantic embeddings from LLMs combined with collaborative user information to retrieve candidate items. We then apply an LLM for pairwise ranking to enhance next-item prediction. Experimental results on the Amazon Review dataset show competitive performance for next item prediction, even with our retrieval stage alone. Our full method achieves Hits@10 performance of +23.8% on Beauty, +37.5% on Toys and Games, and -1.8% on Sports and Outdoors relative to the best supervised models. This framework offers an effective alternative to traditional supervised models, highlighting the potential of LLMs in recommendation systems without extensive training or custom architectures.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.16458

Country: North America > United States > California (0.47)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Online Matching: A Real-time Bandit System for Large-scale Recommendations

Yi, Xinyang, Wang, Shao-Chuan, He, Ruining, Chandrasekaran, Hariharan, Wu, Charles, Heldt, Lukasz, Hong, Lichan, Chen, Minmin, Chi, Ed H.

arXiv.org Artificial IntelligenceJul-29-2023

The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce Online Matching: a scalable closed-loop bandit system learning from users' direct feedback on items in real time. We present a hybrid "offline + online" approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB -- a novel extension of the LinUCB algorithm -- to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.

artificial intelligence, machine learning, online matching, (18 more...)

arXiv.org Artificial Intelligence

2307.15893

Country:

Europe (0.93)
North America > United States > California (0.15)
North America > Canada > British Columbia (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Improving Training Stability for Multitask Ranking Models in Recommender Systems

Tang, Jiaxi, Drori, Yoel, Chang, Daryl, Sathiamoorthy, Maheswaran, Gilmer, Justin, Wei, Li, Yi, Xinyang, Hong, Lichan, Chi, Ed H.

arXiv.org Artificial IntelligenceJun-15-2023

Recommender systems play an important role in many content platforms. While most recommendation research is dedicated to designing better models to improve user experience, we found that research on stabilizing the training for such models is severely under-explored. As recommendation models become larger and more sophisticated, they are more susceptible to training instability issues, i.e., loss divergence, which can make the model unusable, waste significant resources and block model developments. In this paper, we share our findings and best practices we learned for improving the training stability of a real-world multitask ranking model for YouTube recommendations. We show some properties of the model that lead to unstable training and conjecture on the causes. Furthermore, based on our observations of training dynamics near the point of training instability, we hypothesize why existing solutions would fail, and propose a new algorithm to mitigate the limitations of existing solutions. Our experiments on YouTube production dataset show the proposed algorithm can significantly improve training stability while not compromising convergence, comparing with several commonly used baseline methods.

artificial intelligence, clippy, machine learning, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3580305.3599846

2302.09178

Country: North America > United States > California (0.30)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.86)

Add feedback

Better Generalization with Semantic IDs: A case study in Ranking for Recommendations

Singh, Anima, Vu, Trung, Keshavan, Raghunandan, Mehta, Nikhil, Yi, Xinyang, Hong, Lichan, Heldt, Lukasz, Wei, Li, Chi, Ed, Sathiamoorthy, Maheswaran

arXiv.org Artificial IntelligenceJun-13-2023

Training good representations for items is critical in recommender models. Typically, an item is assigned a unique randomly generated ID, and is commonly represented by learning an embedding corresponding to the value of the random ID. Although widely used, this approach have limitations when the number of items are large and items are power-law distributed -- typical characteristics of real-world recommendation systems. This leads to the item cold-start problem, where the model is unable to make reliable inferences for tail and previously unseen items. Removing these ID features and their learned embeddings altogether to combat cold-start issue severely degrades the recommendation quality. Content-based item embeddings are more reliable, but they are expensive to store and use, particularly for users' past item interaction sequence. In this paper, we use Semantic IDs, a compact discrete item representations learned from content embeddings using RQ-VAE that captures hierarchy of concepts in items. We showcase how we use them as a replacement of item IDs in a resource-constrained ranking model used in an industrial-scale video sharing platform. Moreover, we show how Semantic IDs improves the generalization ability of our system, without sacrificing top-level metrics.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.08121

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback

Improving Multi-Task Generalization via Regularizing Spurious Correlation

Hu, Ziniu, Zhao, Zhe, Yi, Xinyang, Yao, Tiansheng, Hong, Lichan, Sun, Yizhou, Chi, Ed H.

arXiv.org Artificial IntelligenceNov-24-2022

Multi-Task Learning (MTL) is a powerful learning paradigm to improve generalization performance via knowledge sharing. However, existing studies find that MTL could sometimes hurt generalization, especially when two tasks are less correlated. One possible reason that hurts generalization is spurious correlation, i.e., some knowledge is spurious and not causally related to task labels, but the model could mistakenly utilize them and thus fail when such correlation changes. In MTL setup, there exist several unique challenges of spurious correlation. First, the risk of having non-causal knowledge is higher, as the shared MTL model needs to encode all knowledge from different tasks, and causal knowledge for one task could be potentially spurious to the other. Second, the confounder between task labels brings in a different type of spurious correlation to MTL. We theoretically prove that MTL is more prone to taking non-causal knowledge from other tasks than single-task learning, and thus generalize worse. To solve this problem, we propose Multi-Task Causal Representation Learning framework, aiming to represent multi-task knowledge via disentangled neural modules, and learn which module is causally related to each task via MTL-specific invariant regularization. Experiments show that it could enhance MTL model's performance by 5.5% on average over Multi-MNIST, MovieLens, Taskonomy, CityScape, and NYUv2, via alleviating spurious correlation problem.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

2205.09797

Country:

Europe (1.00)
North America > Canada (0.68)
North America > United States > California > Los Angeles County (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (0.92)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Learning-to-Rank with Partitioned Preference: Fast Estimation for the Plackett-Luce Model

Ma, Jiaqi, Yi, Xinyang, Tang, Weijing, Zhao, Zhe, Hong, Lichan, Chi, Ed H., Mei, Qiaozhu

arXiv.org Machine LearningOct-25-2020

The industry-scale ranking systems are typically applied to millions of items in a personalized way for billions of users. To We investigate the Plackett-Luce (PL) model meet the need of scalability and to exploit a huge based listwise learning-to-rank (LTR) on amount of user feedback data, learning-to-rank (LTR) data with partitioned preference, where a set has been the most popular paradigm for building the of items are sliced into ordered and disjoint ranking system. Existing LTR approaches can be categorized partitions, but the ranking of items within a into three groups: pointwise (Gey, 1994), pairwise partition is unknown. Given N items with (Burges et al., 2005), and listwise (Cao et al., M partitions, calculating the likelihood of 2007; Taylor et al., 2008) methods. The pointwise and data with partitioned preference under the pairwise LTR methods convert the ranking problem PL model has a time complexity of O(N S!), into regression or classification tasks on single or pairs where S is the maximum size of the top M 1 of items respectively.

artificial intelligence, exp, machine learning, (16 more...)

arXiv.org Machine Learning

2006.05067

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Self-supervised Learning for Large-scale Item Recommendations

Yao, Tiansheng, Yi, Xinyang, Cheng, Derek Zhiyuan, Yu, Felix, Chen, Ting, Menon, Aditya, Hong, Lichan, Chi, Ed H., Tjoa, Steve, Kang, Jieqi, Ettinger, Evan

arXiv.org Machine LearningOct-22-2020

Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items, the power-law user feedback makes labels very sparse for a large amount of long-tail items. Inspired by the recent success in self-supervised representation learning research in both computer vision and natural language understanding, we propose a multi-task self-supervised learning (SSL) framework for large-scale item recommendations. The framework is designed to tackle the label sparsity problem by learning more robust item representations. Furthermore, we propose two self-supervised tasks applicable to models with categorical features within the proposed framework: (i) Feature Masking (FM) and (ii) Feature Dropout (FD). We evaluate our framework using two large-scale datasets with 500M and 1B training examples respectively. Our results demonstrate that the proposed framework outperforms traditional supervised learning only models and state-of-the-art regularization techniques in the context of item recommendations. The SSL framework shows larger improvement with less supervision compared to the counterparts. We also apply the proposed techniques to a web-scale commercial app-to-app recommendation system, and significantly improve top-tier business metrics via A/B experiments on live traffic. Our online results also verify our hypothesis that our framework indeed improves model performance on slices that lack supervision.

deep learning, neural network, representation, (18 more...)

arXiv.org Machine Learning

2007.12865

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.67)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Fast Algorithms for Robust PCA via Gradient Descent

Yi, Xinyang, Park, Dohyung, Chen, Yudong, Caramanis, Constantine

Neural Information Processing SystemsFeb-14-2020, 15:41:32 GMT

We consider the problem of Robust PCA in the fully and partially observed settings. Without corruptions, this is the well-known matrix completion problem. From a statistical standpoint this problem has been recently well-studied, and conditions on when recovery is possible (how many observations do we need, how many corruptions can we tolerate) via polynomial-time algorithms is by now understood. This paper presents and analyzes a non-convex optimization approach that greatly reduces the computational complexity of the above problems, compared to the best available algorithms. In particular, in the fully observed case, with $r$ denoting rank and $d$ dimension, we reduce the complexity from $O(r 2d 2\log(1/\epsilon))$ to $O(rd 2\log(1/\epsilon))$ -- a big savings when the rank is big.

algorithm, artificial intelligence, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning

Yi, Xinyang, Wang, Zhaoran, Yang, Zhuoran, Caramanis, Constantine, Liu, Han

arXiv.org Machine LearningJul-14-2019

We consider the weakly supervised binary classification problem where the labels are randomly flipped with probability $1- {\alpha}$. Although there exist numerous algorithms for this problem, it remains theoretically unexplored how the statistical accuracies and computational efficiency of these algorithms depend on the degree of supervision, which is quantified by ${\alpha}$. In this paper, we characterize the effect of ${\alpha}$ by establishing the information-theoretic and computational boundaries, namely, the minimax-optimal statistical accuracy that can be achieved by all algorithms, and polynomial-time algorithms under an oracle computational model. For small ${\alpha}$, our result shows a gap between these two boundaries, which represents the computational price of achieving the information-theoretic boundary due to the lack of supervision. Interestingly, we also show that this gap narrows as ${\alpha}$ increases. In other words, having more supervision, i.e., more correct labels, not only improves the optimal statistical accuracy as expected, but also enhances the computational efficiency for achieving such accuracy.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1907.06257

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning

Yi, Xinyang, Wang, Zhaoran, Yang, Zhuoran, Caramanis, Constantine, Liu, Han

Neural Information Processing SystemsDec-31-2016

We consider the weakly supervised binary classification problem where the labels are randomly flipped with probability $1-\alpha$. Although there exist numerous algorithms for this problem, it remains theoretically unexplored how the statistical accuracies and computational efficiency of these algorithms depend on the degree of supervision, which is quantified by $\alpha$. In this paper, we characterize the effect of $\alpha$ by establishing the information-theoretic and computational boundaries, namely, the minimax-optimal statistical accuracy that can be achieved by all algorithms, and polynomial-time algorithms under an oracle computational model. For small $\alpha$, our result shows a gap between these two boundaries, which represents the computational price of achieving the information-theoretic boundary due to the lack of supervision. Interestingly, we also show that this gap narrows as $\alpha$ increases. In other words, having more supervision, i.e., more correct labels, not only improves the optimal statistical accuracy as expected, but also enhances the computational efficiency for achieving such accuracy.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback