AITopics | Kong, Yuqing

Collaborating Authors

Kong, Yuqing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mitigating the Participation Bias by Balancing Extreme Ratings

Guo, Yongkang, Kong, Yuqing, Liu, Jialiang

arXiv.org Artificial IntelligenceFeb-5-2025

Rating aggregation plays a crucial role in various fields, such as product recommendations, hotel rankings, and teaching evaluations. However, traditional averaging methods can be affected by participation bias, where some raters do not participate in the rating process, leading to potential distortions. In this paper, we consider a robust rating aggregation task under the participation bias. We assume that raters may not reveal their ratings with a certain probability depending on their individual ratings, resulting in partially observed samples. Our goal is to minimize the expected squared loss between the aggregated ratings and the average of all underlying ratings (possibly unobserved) in the worst-case scenario. We focus on two settings based on whether the sample size (i.e. the number of raters) is known. In the first setting, where the sample size is known, we propose an aggregator, named as the Balanced Extremes Aggregator. It estimates unrevealed ratings with a balanced combination of extreme ratings. When the sample size is unknown, we derive another aggregator, the Polarizing-Averaging Aggregator, which becomes optimal as the sample size grows to infinity. Numerical results demonstrate the superiority of our proposed aggregators in mitigating participation bias, compared to simple averaging and the spectral method. Furthermore, we validate the effectiveness of our aggregators on a real-world dataset.

aggregator, artificial intelligence, probability, (16 more...)

arXiv.org Artificial Intelligence

2502.03737

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Information Technology > Services (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Benchmarking LLMs' Judgments with No Gold Standard

Xu, Shengwei, Lu, Yuxuan, Schoenebeck, Grant, Kong, Yuqing

arXiv.org Artificial IntelligenceNov-11-2024

We introduce the GEM (Generative Estimator for Mutual Information), an evaluation metric for assessing language generation by Large Language Models (LLMs), particularly in generating informative judgments, without the need for a gold standard reference. GEM broadens the scenarios where we can benchmark LLM generation performance-from traditional ones, like machine translation and summarization, where gold standard references are readily available, to subjective tasks without clear gold standards, such as academic peer review. GEM uses a generative model to estimate mutual information between candidate and reference responses, without requiring the reference to be a gold standard. In experiments on a human-annotated dataset, GEM demonstrates competitive correlations with human scores compared to the state-of-the-art GPT-4o Examiner, and outperforms all other baselines. Additionally, GEM is more robust against strategic manipulations, such as rephrasing or elongation, which can artificially inflate scores under a GPT-4o Examiner. We also present GRE-bench (Generating Review Evaluation Benchmark) which evaluates LLMs based on how well they can generate high-quality peer reviews for academic research papers. Because GRE-bench is based upon GEM, it inherits its robustness properties. Additionally, GRE-bench circumvents data contamination problems (or data leakage) by using the continuous influx of new open-access research papers and peer reviews each year. We show GRE-bench results of various popular LLMs on their peer review capabilities using the ICLR2023 dataset.

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.07127

Country: North America > United States > Michigan (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Surprising Benefits of Base Rate Neglect in Robust Aggregation

Kong, Yuqing, Wang, Shu, Wang, Ying

arXiv.org Artificial IntelligenceJun-19-2024

Robust aggregation integrates predictions from multiple experts without knowledge of the experts' information structures. Prior work assumes experts are Bayesian, providing predictions as perfect posteriors based on their signals. However, real-world experts often deviate systematically from Bayesian reasoning. Our work considers experts who tend to ignore the base rate. We find that a certain degree of base rate neglect helps with robust forecast aggregation. Specifically, we consider a forecast aggregation problem with two experts who each predict a binary world state after observing private signals. Unlike previous work, we model experts exhibiting base rate neglect, where they incorporate the base rate information to degree $\lambda\in[0,1]$, with $\lambda=0$ indicating complete ignorance and $\lambda=1$ perfect Bayesian updating. To evaluate aggregators' performance, we adopt Arieli et al. (2018)'s worst-case regret model, which measures the maximum regret across the set of considered information structures compared to an omniscient benchmark. Our results reveal the surprising V-shape of regret as a function of $\lambda$. That is, predictions with an intermediate incorporating degree of base rate $\lambda<1$ can counter-intuitively lead to lower regret than perfect Bayesian posteriors with $\lambda=1$. We additionally propose a new aggregator with low regret robust to unknown $\lambda$. Finally, we conduct an empirical study to test the base rate neglect model and evaluate the performance of various aggregators.

artificial intelligence, bayesian inference, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2406.1349

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Eliciting Informative Text Evaluations with Large Language Models

Lu, Yuxuan, Xu, Shengwei, Zhang, Yichi, Kong, Yuqing, Schoenebeck, Grant

arXiv.org Artificial IntelligenceMay-28-2024

Peer prediction mechanisms motivate high-quality feedback with provable guarantees. However, current methods only apply to rather simple reports, like multiple-choice or scalar numbers. We aim to broaden these techniques to the larger domain of text-based reports, drawing on the recent developments in large language models. This vastly increases the applicability of peer prediction mechanisms as textual feedback is the norm in a large variety of feedback channels: peer reviews, e-commerce customer reviews, and comments on social media. We introduce two mechanisms, the Generative Peer Prediction Mechanism (GPPM) and the Generative Synopsis Peer Prediction Mechanism (GSPPM). These mechanisms utilize LLMs as predictors, mapping from one agent's report to a prediction of her peer's report. Theoretically, we show that when the LLM prediction is sufficiently accurate, our mechanisms can incentivize high effort and truth-telling as an (approximate) Bayesian Nash equilibrium. Empirically, we confirm the efficacy of our mechanisms through experiments conducted on two real datasets: the Yelp review dataset and the ICLR OpenReview dataset. We highlight the results that on the ICLR dataset, our mechanisms can differentiate three quality levels -- human-written reviews, GPT-4-generated reviews, and GPT-3.5-generated reviews in terms of expected scores. Additionally, GSPPM penalizes LLM-generated reviews more effectively than GPPM.

large language model, machine learning, mechanism, (19 more...)

arXiv.org Artificial Intelligence

2405.15077

Country: North America > United States > Michigan (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robust Decision Aggregation with Adversarial Experts

Guo, Yongkang, Kong, Yuqing

arXiv.org Artificial IntelligenceMar-12-2024

We consider a binary decision aggregation problem in the presence of both truthful and adversarial experts. The truthful experts will report their private signals truthfully with proper incentive, while the adversarial experts can report arbitrarily. The decision maker needs to design a robust aggregator to forecast the true state of the world based on the reports of experts. The decision maker does not know the specific information structure, which is a joint distribution of signals, states, and strategies of adversarial experts. We want to find the optimal aggregator minimizing regret under the worst information structure. The regret is defined by the difference in expected loss between the aggregator and a benchmark who makes the optimal decision given the joint distribution and reports of truthful experts. We prove that when the truthful experts are symmetric and adversarial experts are not too numerous, the truncated mean is optimal, which means that we remove some lowest reports and highest reports and take averaging among the left reports. Moreover, for many settings, the optimal aggregators are in the family of piecewise linear functions. The regret is independent of the total number of experts but only depends on the ratio of adversaries. We evaluate our aggregators by numerical experiment in an ensemble learning task. We also obtain some negative results for the aggregation problem with adversarial experts under some more general information structures and experts' report space.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2403.08222

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Algorithmic Robust Forecast Aggregation

Guo, Yongkang, Hartline, Jason D., Huang, Zhihuan, Kong, Yuqing, Shah, Anant, Yu, Fang-Yi

arXiv.org Artificial IntelligenceJan-31-2024

Forecast aggregation combines the predictions of multiple agents into a more accurate prediction. With forecast aggregation, decision-makers can reduce error, diversify risk and enhance accuracy based on the collective knowledge of agents compared to any single agent, thereby advancing the common good. Forecast aggregation is commonly used in many domains to generate more informed predictions for various variables, such as weather in weather forecasting, the spread of infectious diseases in public health, the outcome of games in sports, fuel prices in energy, and GDP growth in economics. In practice, one crucial challenge of forecast aggregation is that the aggregator may not have full knowledge of the information structure and the agents. Without this prior knowledge, the aggregator cannot employ Bayes rules to combine the forecasts optimally. Traditional prior-free aggregation methods, such as simple averaging, are especially bad on some information structures. For example, in weather forecasting, assume the prior probability of raining tomorrow is 30%, and there are two agents who will receive a conditionally independent binary signal (Low or High). Agents will report their posterior, which is 10% given the Low signal and 50% given the High signal. When both agents report 50%, the simple averaging will also output 50%.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2401.17743

Country: North America > United States (0.27)

Genre: Research Report (0.49)

Industry: Health & Medicine (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Robust Decision Aggregation with Second-order Information

Pan, Yuqi, Chen, Zhaohua, Kong, Yuqing

arXiv.org Artificial IntelligenceNov-23-2023

We consider a decision aggregation problem with two experts who each make a binary recommendation after observing a private signal about an unknown binary world state. An agent, who does not know the joint information structure between signals and states, sees the experts' recommendations and aims to match the action with the true state. Under the scenario, we study whether supplemented additionally with second-order information (each expert's forecast on the other's recommendation) could enable a better aggregation. We adopt a minimax regret framework to evaluate the aggregator's performance, by comparing it to an omniscient benchmark that knows the joint information structure. With general information structures, we show that second-order information provides no benefit. No aggregator can improve over a trivial aggregator, which always follows the first expert's recommendation. However, positive results emerge when we assume experts' signals are conditionally independent given the world state. When the aggregator is deterministic, we present a robust aggregator that leverages second-order information, which can significantly outperform counterparts without it. Second, when two experts are homogeneous, by adding a non-degenerate assumption on the signals, we demonstrate that random aggregators using second-order information can surpass optimal ones without it. In the remaining settings, the second-order information is not beneficial. We also extend the above results to the setting when the aggregator's utility function is more general.

aggregator, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2311.14094

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Add feedback

Near-Optimal Experimental Design Under the Budget Constraint in Online Platforms

Guo, Yongkang, Yuan, Yuan, Zhang, Jinshan, Kong, Yuqing, Zhu, Zhihua, Cai, Zheng

arXiv.org Artificial IntelligenceFeb-9-2023

A/B testing, or controlled experiments, is the gold standard approach to causally compare the performance of algorithms on online platforms. However, conventional Bernoulli randomization in A/B testing faces many challenges such as spillover and carryover effects. Our study focuses on another challenge, especially for A/B testing on two-sided platforms -- budget constraints. Buyers on two-sided platforms often have limited budgets, where the conventional A/B testing may be infeasible to be applied, partly because two variants of allocation algorithms may conflict and lead some buyers to exceed their budgets if they are implemented simultaneously. We develop a model to describe two-sided platforms where buyers have limited budgets. We then provide an optimal experimental design that guarantees small bias and minimum variance. Bias is lower when there is more budget and a higher supply-demand rate. We test our experimental design on both synthetic data and real-world data, which verifies the theoretical results and shows our advantage compared to Bernoulli randomization.

artificial intelligence, experiment, platform, (14 more...)

arXiv.org Artificial Intelligence

2302.05005

Country: North America > United States (0.30)

Genre:

Research Report > Experimental Study (0.87)
Research Report > Strength High (0.55)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Communications (0.93)

Add feedback

L_DMI: An Information-theoretic Noise-robust Loss Function

Xu, Yilun, Cao, Peng, Kong, Yuqing, Wang, Yizhou

arXiv.org Machine LearningSep-8-2019

Accurately annotating large scale dataset is notoriously expensive both in time and in money. Although acquiring low-quality-annotated dataset can be much cheaper, it often badly damages the performance of trained models when using such dataset without particular treatment. Various of methods have been proposed for learning with noisy labels. However, they only handle limited kinds of noise patterns, require auxiliary information (e.g,, the noise transition matrix), or lack theoretical justification. In this paper, we propose a novel information-theoretic loss function, $\mathcal{L}_{\rm DMI}$, for training deep neural networks robust to label noise. The core of $\mathcal{L}_{\rm DMI}$ is a generalized version of mutual information, termed Determinant based Mutual Information (DMI), which is not only information-monotone but also relatively invariant. \emph{To the best of our knowledge, $\mathcal{L}_{\rm DMI}$ is the first loss function that is provably not sensitive to noise patterns and noise amounts, and it can be applied to any existing classification neural networks straightforwardly without any auxiliary information}. In addition to theoretical justification, we also empirically show that using $\mathcal{L}_{\rm DMI}$ outperforms all other counterparts in the classification task on Fashion-MNIST, CIFAR-10, Dogs vs. Cats datasets with a variety of synthesized noise patterns and noise amounts as well as a real-world dataset Clothing1M. Codes are available at https://github.com/Newbeeer/L_DMI

deep learning, dmi, neural network, (18 more...)

arXiv.org Machine Learning

1909.03388

Country: North America > Canada (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds

Cao, Peng, Xu, Yilun, Kong, Yuqing, Wang, Yizhou

arXiv.org Machine LearningMay-31-2019

Eliciting labels from crowds is a potential way to obtain large labeled data. Despite a variety of methods developed for learning from crowds, a key challenge remains unsolved: \emph{learning from crowds without knowing the information structure among the crowds a priori, when some people of the crowds make highly correlated mistakes and some of them label effortlessly (e.g. randomly)}. We propose an information theoretic approach, Max-MIG, for joint learning from crowds, with a common assumption: the crowdsourced labels and the data are independent conditioning on the ground truth. Max-MIG simultaneously aggregates the crowdsourced labels and learns an accurate data classifier. Furthermore, we devise an accurate data-crowds forecaster that employs both the data and the crowdsourced labels to forecast the ground truth. To the best of our knowledge, this is the first algorithm that solves the aforementioned challenge of learning from crowds. In addition to the theoretical validation, we also empirically show that our algorithm achieves the new state-of-the-art results in most settings, including the real-world data, and is the first algorithm that is robust to various information structures. Codes are available at \hyperlink{https://github.com/Newbeeer/Max-MIG}{https://github.com/Newbeeer/Max-MIG}

crowdsourced label, health & medicine, neural network, (17 more...)

arXiv.org Machine Learning

1905.13436

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback