Feng, Zhe
Online Bidding under RoS Constraints without Knowing the Value
Vijayan, Sushant, Feng, Zhe, Padmanabhan, Swati, Shanmugam, Karthikeyan, Suggala, Arun, Wang, Di
This introduces a Online advertising, a multi-billion dollar industry, relies on realtime challenging exploration-exploitation dilemma: the advertiser must auctions to connect advertisers with users. These auctions, balance exploring different bids to estimate impression values with triggered by user queries or website visits, allow advertisers to exploiting current knowledge to bid effectively. To address this, we bid for advertising slots, such as prominent placements on search propose a novel Upper Confidence Bound (UCB)-style algorithm engine results pages or in social media feeds. Advertisers aim to that carefully manages this trade-off. Via a rigorous theoretical maximize their returns, measured in conversions or other relevant analysis, we prove that our algorithm achieves ( log(|B|)) metrics, by carefully determining their bids while adhering to regret and constraint violation, where is the number of bidding budget constraints and desired return-on-spend (RoS) targets. To rounds and B is the domain of possible bids. This establishes the achieve this, a wide array of bidding strategies have been developed, first optimal regret and constraint violation bounds for bidding in leveraging techniques from optimization, online learning, and game the online setting with unknown impression values. Moreover, our theory to maximize advertiser utility [2, 7, 11, 23, 29, 30, 36, 47, 52].
PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos
Wei, Kangda, Zhou, Zhengyu, Wang, Bingqing, Araki, Jun, Lange, Lukas, Huang, Ruihong, Feng, Zhe
In recent years, online lecture videos have become an increasingly popular resource for acquiring new knowledge. Systems capable of effectively understanding/indexing lecture videos are thus highly desirable, enabling downstream tasks like question answering to help users efficiently locate specific information within videos. This work proposes PreMind, a novel multi-agent multimodal framework that leverages various large models for advanced understanding/indexing of presentation-style videos. PreMind first segments videos into slide-presentation segments using a Vision-Language Model (VLM) to enhance modern shot-detection techniques. Each segment is then analyzed to generate multimodal indexes through three key steps: (1) extracting slide visual content, (2) transcribing speech narratives, and (3) consolidating these visual and speech contents into an integrated understanding. Three innovative mechanisms are also proposed to improve performance: leveraging prior lecture knowledge to refine visual understanding, detecting/correcting speech transcription errors using a VLM, and utilizing a critic agent for dynamic iterative self-reflection in vision analysis. Compared to traditional video indexing methods, PreMind captures rich, reliable multimodal information, allowing users to search for details like abbreviations shown only on slides. Systematic evaluations on the public LPM dataset and an internal enterprise dataset are conducted to validate PreMind's effectiveness, supported by detailed analyses.
Deep Reinforcement Learning for Sequential Combinatorial Auctions
Ravindranath, Sai Srivatsa, Feng, Zhe, Wang, Di, Zaheer, Manzil, Mehta, Aranyak, Parkes, David C.
Revenue-optimal auction design is a challenging problem with significant theoretical and practical implications. Sequential auction mechanisms, known for their simplicity and strong strategyproofness guarantees, are often limited by theoretical results that are largely existential, except for certain restrictive settings. Although traditional reinforcement learning methods such as Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) are applicable in this domain, they struggle with computational demands and convergence issues when dealing with large and continuous action spaces. In light of this and recognizing that we can model transitions differentiable for our settings, we propose using a new reinforcement learning framework tailored for sequential combinatorial auctions that leverages first-order gradients. Our extensive evaluations show that our approach achieves significant improvement in revenue over both analytical baselines and standard reinforcement learning algorithms. Furthermore, we scale our approach to scenarios involving up to 50 agents and 50 items, demonstrating its applicability in complex, real-world auction settings. As such, this work advances the computational tools available for auction design and contributes to bridging the gap between theoretical results and practical implementations in sequential auction design.
How to Strategize Human Content Creation in the Era of GenAI?
Esmaeili, Seyed A., Bhawalkar, Kshipra, Feng, Zhe, Wang, Di, Xu, Haifeng
Generative AI (GenAI) will have significant impact on content creation platforms. In this paper, we study the dynamic competition between a GenAI and a human contributor. Unlike the human, the GenAI's content only improves when more contents are created by human over the time; however, GenAI has the advantage of generating content at a lower cost. We study the algorithmic problem in this dynamic competition model about how the human contributor can maximize her utility when competing against the GenAI for content generation over a set of topics. In time-sensitive content domains (e.g., news or pop music creation) where contents' value diminishes over time, we show that there is no polynomial time algorithm for finding the human's optimal (dynamic) strategy, unless the randomized exponential time hypothesis is false. Fortunately, we are able to design a polynomial time algorithm that naturally cycles between myopically optimizing over a short time window and pausing and provably guarantees an approximation ratio of $\frac{1}{2}$. We then turn to time-insensitive content domains where contents do not lose their value (e.g., contents on history facts). Interestingly, we show that this setting permits a polynomial time algorithm that maximizes the human's utility in the long run.
Improved Online Learning Algorithms for CTR Prediction in Ad Auctions
Feng, Zhe, Liaw, Christopher, Zhou, Zixin
In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner. We focus on two models of the advertisers' strategic behaviors. First, we assume that the advertiser is completely myopic; i.e.~in each round, they aim to maximize their utility only for the current round. In this setting, we develop an online mechanism based on upper-confidence bounds that achieves a tight $O(\sqrt{T})$ regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value (i.e.~value multiplied by their CTR) and second highest expected value ad. Next, we assume that the advertiser is non-myopic and cares about their long term utility. This setting is much more complex since an advertiser is incentivized to influence the mechanism by bidding strategically in earlier rounds. In this setting, we provide an algorithm to achieve negative regret for the static valuation setting (with a positive gap), which is in sharp contrast with the prior work that shows $O(T^{2/3})$ regret when the valuation is generated by adversary.
DelucionQA: Detecting Hallucinations in Domain-specific Question Answering
Sadat, Mobashir, Zhou, Zhengyu, Lange, Lukas, Araki, Jun, Gundroo, Arsalan, Wang, Bingqing, Menon, Rakesh R, Parvez, Md Rizwan, Feng, Zhe
Hallucination is a well-known phenomenon in text generated by large language models (LLMs). The existence of hallucinatory responses is found in almost all application scenarios e.g., summarization, question-answering (QA) etc. For applications requiring high reliability (e.g., customer-facing assistants), the potential existence of hallucination in LLM-generated text is a critical problem. The amount of hallucination can be reduced by leveraging information retrieval to provide relevant background information to the LLM. However, LLMs can still generate hallucinatory content for various reasons (e.g., prioritizing its parametric knowledge over the context, failure to capture the relevant information from the context, etc.). Detecting hallucinations through automated methods is thus paramount. To facilitate research in this direction, we introduce a sophisticated dataset, DelucionQA, that captures hallucinations made by retrieval-augmented LLMs for a domain-specific QA task. Furthermore, we propose a set of hallucination detection methods to serve as baselines for future works from the research community. Analysis and case study are also provided to share valuable insights on hallucination phenomena in the target scenario.
Learning Thresholds with Latent Values and Censored Feedback
Zhang, Jiahao, Lin, Tao, Zheng, Weiqiang, Feng, Zhe, Teng, Yifeng, Deng, Xiaotie
In this paper, we investigate a problem of actively learning threshold in latent space, where the unknown reward $g(\gamma, v)$ depends on the proposed threshold $\gamma$ and latent value $v$ and it can be $only$ achieved if the threshold is lower than or equal to the unknown latent value. This problem has broad applications in practical scenarios, e.g., reserve price optimization in online auctions, online task assignments in crowdsourcing, setting recruiting bars in hiring, etc. We first characterize the query complexity of learning a threshold with the expected reward at most $\epsilon$ smaller than the optimum and prove that the number of queries needed can be infinitely large even when $g(\gamma, v)$ is monotone with respect to both $\gamma$ and $v$. On the positive side, we provide a tight query complexity $\tilde{\Theta}(1/\epsilon^3)$ when $g$ is monotone and the CDF of value distribution is Lipschitz. Moreover, we show a tight $\tilde{\Theta}(1/\epsilon^3)$ query complexity can be achieved as long as $g$ satisfies one-sided Lipschitzness, which provides a complete characterization for this problem. Finally, we extend this model to an online learning setting and demonstrate a tight $\Theta(T^{2/3})$ regret bound using continuous-arm bandit techniques and the aforementioned query complexity results.
Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts
Wang, Chaoqi, Ye, Ziyu, Feng, Zhe, Badanidiyuru, Ashwinkumar, Xu, Haifeng
Standard contextual bandit problem assumes that all the relevant contexts are observed before the algorithm chooses an arm. This modeling paradigm, while useful, often falls short when dealing with problems in which valuable additional context can be observed after arm selection. For example, content recommendation platforms like Youtube, Instagram, Tiktok also observe valuable follow-up information pertinent to the user's reward after recommendation (e.g., how long the user stayed, what is the user's watch speed, etc.). To improve online learning efficiency in these applications, we study a novel contextual bandit problem with post-serving contexts and design a new algorithm, poLinUCB, that achieves tight regret under standard assumptions. Core to our technical proof is a robustified and generalized version of the well-known Elliptical Potential Lemma (EPL), which can accommodate noise in data. Such robustification is necessary for tackling our problem, and we believe it could also be of general interest. Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts as well as the superior performance of our algorithm over the state-of-the-art approaches.
Knowledge-grounded Natural Language Recommendation Explanation
Colas, Anthony, Araki, Jun, Zhou, Zhengyu, Wang, Bingqing, Feng, Zhe
Explanations accompanied by a recommendation can assist users in understanding the decision made by recommendation systems, which in turn increases a user's confidence and trust in the system. Recently, research has focused on generating natural language explanations in a human-readable format. Thus far, the proposed approaches leverage item reviews written by users, which are often subjective, sparse in language, and unable to account for new items that have not been purchased or reviewed before. Instead, we aim to generate fact-grounded recommendation explanations that are objectively described with item features while implicitly considering a user's preferences, based on the user's purchase history. To achieve this, we propose a knowledge graph (KG) approach to natural language explainable recommendation. Our approach draws on user-item features through a novel collaborative filtering-based KG representation to produce fact-grounded, personalized explanations, while jointly learning user-item representations for recommendation scoring. Experimental results show that our approach consistently outperforms previous state-of-the-art models on natural language explainable recommendation.
Exploring the Dynamics of the Specialty Insurance Market Using a Novel Discrete Event Simulation Framework: a Lloyd's of London Case Study
Olmez, Sedar, Ahmed, Akhil, Kam, Keith, Feng, Zhe, Tua, Alan
This research presents a novel Discrete Event Simulation (DES) of the Lloyd's of London specialty insurance market, exploring complex market dynamics that have not been previously studied quantitatively. The proof-of-concept model allows for the simulation of various scenarios that capture important market phenomena such as the underwriting cycle, the impact of risk syndication, and the importance of appropriate exposure management. Despite minimal calibration, our model has shown that it is a valuable tool for understanding and analysing the Lloyd's of London specialty insurance market, particularly in terms of identifying areas for further investigation for regulators and participants of the market alike. The results generate the expected behaviours that, syndicates (insurers) are less likely to go insolvent if they adopt sophisticated exposure management practices, catastrophe events lead to more defined patterns of cyclicality and cause syndicates to substantially increase their premiums offered. Lastly, syndication enhances the accuracy of actuarial price estimates and narrows the divergence among syndicates. Overall, this research offers a new perspective on the Lloyd's of London market and demonstrates the potential of individual-based modelling (IBM) for understanding complex financial systems.