Goto

Collaborating Authors

 expo



770f8e448d07586afbf77bb59f698587-AuthorFeedback.pdf

Neural Information Processing Systems

Thank you for your thoughtful feedback. We will first discuss common themes and then specific reviewer comments. Even though ExpO is "simple" (in that it connects existing concepts, albeit in a novel way), we believe We will add a discussion as outlined below. " by Qin et al does not consider interpretability at all. Several methods rely on domain knowledge: "Learning credible . . .


'Catalyst for progress': Nvidia CEO hails China's AI at Beijing expo

Al Jazeera

Nvidia CEO Jensen Huang has called China's open-source artificial intelligence a "catalyst for global progress" and says it is "revolutionising" supply chains. In a speech during Wednesday's opening ceremony of the China International Supply Chain Expo in Beijing, Huang – whose firm last week became the first to touch 4 trillion in market value – hailed China's role in pioneering AI, describing Chinese AI startup DeepSeek as "giving every country and industry a chance to join the AI revolution". Huang made the comments a day after Nvidia announced it will resume sales of its H20 AI chips to China after the United States government pledged to remove licensing restrictions that had halted exports. "AI is transforming every industry from scientific research and healthcare to energy, transportation and logistics," said Huang, who also praised China's "super-fast" innovation, powered by its "researchers, developers and entrepreneurs". The California-based company produces some of the world's most advanced semiconductors but cannot ship its most cutting-edge chips to China due to Washington's concerns that Beijing could use them to enhance its military capabilities.


EXPO: Stable Reinforcement Learning with Expressive Policies

Dong, Perry, Li, Qiyang, Sadigh, Dorsa, Finn, Chelsea

arXiv.org Artificial Intelligence

We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a unique challenge of stable value maximization. Unlike simpler Gaussian policies commonly used in online RL, expressive policies like diffusion and flow-matching policies are parameterized by a long denoising chain, which hinders stable gradient propagation from actions to policy parameters when optimizing against some value function. Our key insight is that we can address stable value maximization by avoiding direct optimization over value with the expressive policy and instead construct an on-the-fly RL policy to maximize Q-value. We propose Expressive Policy Optimization (EXPO), a sample-efficient online RL algorithm that utilizes an on-the-fly policy to maximize value with two parameterized policies -- a larger expressive base policy trained with a stable imitation learning objective and a light-weight Gaussian edit policy that edits the actions sampled from the base policy toward a higher value distribution. The on-the-fly policy optimizes the actions from the base policy with the learned edit policy and chooses the value maximizing action from the base and edited actions for both sampling and temporal-difference (TD) backup. Our approach yields up to 2-3x improvement in sample efficiency on average over prior methods both in the setting of fine-tuning a pretrained policy given offline data and in leveraging offline data to train online.


Explicit Preference Optimization: No Need for an Implicit Reward Model

Hu, Xiangkun, Kong, Lemin, He, Tong, Wipf, David

arXiv.org Machine Learning

The generated responses of large language models (LLMs) are often fine-tuned to human preferences through a process called reinforcement learning from human feedback (RLHF). As RLHF relies on a challenging training sequence, whereby a separate reward model is independently learned and then later applied to LLM policy updates, ongoing research effort has targeted more straightforward alternatives. In this regard, direct preference optimization (DPO) and its many offshoots circumvent the need for a separate reward training step. Instead, through the judicious use of a reparameterization trick that induces an \textit{implicit} reward, DPO and related methods consolidate learning to the minimization of a single loss function. And yet despite demonstrable success in some real-world settings, we prove that DPO-based objectives are nonetheless subject to sub-optimal regularization and counter-intuitive interpolation behaviors, underappreciated artifacts of the reparameterizations upon which they are based. To this end, we introduce an \textit{explicit} preference optimization framework termed EXPO that requires no analogous reparameterization to achieve an implicit reward. Quite differently, we merely posit intuitively-appealing regularization factors from scratch that transparently avoid the potential pitfalls of key DPO variants, provably satisfying regularization desiderata that prior methods do not. Empirical results serve to corroborate our analyses and showcase the efficacy of EXPO.


The Download: a longevity influencer's new religion, and humanoid robots' shortcomings

MIT Technology Review

Bryan Johnson is on a mission to not die. The 47-year-old multimillionaire has already applied his slogan "Don't Die" to events, merchandise, and a Netflix documentary. Now he's founding a Don't Die religion. Johnson, who famously spends millions of dollars on scans, tests, supplements, and a lifestyle routine designed to slow or reverse the aging process, has enjoyed extensive media coverage, and a huge social media following. For many people, he has become the face of the longevity field.


AI suitcase for visually impaired to be tested at expo

The Japan Times

A demonstration of an artificial intelligence-powered suitcase, designed to assist visually impaired individuals as a robotic alternative to guide dogs, will be conducted at the Osaka Expo, set to open on Sunday. The latest model incorporates generative AI technology, enabling it to describe the surrounding environment through voice feedback. Equipped with a built-in camera and sensors, the suitcase can analyze its surroundings and provide real-time guidance to users. In late January, an AI suitcase was demonstrated at the National Museum of Emerging Science and Innovation, known as Miraikan, in Tokyo. Resembling a regular suitcase, the device activated when Chieko Asakawa, the museum's chief executive director and a key member of the development team, grasped its handle at hip level.


Meta-Prompt Optimization for LLM-Based Sequential Decision Making

Kong, Mingze, Wang, Zhiyong, Shu, Yao, Dai, Zhongxiang

arXiv.org Artificial Intelligence

Large language models (LLMs) have recently been employed as agents to solve sequential decision-making tasks such as Bayesian optimization and multi-armed bandits (MAB). These works usually adopt an LLM for sequential action selection by providing it with a fixed, manually designed meta-prompt. However, numerous previous works have found that the prompt has a significant impact on the performance of the LLM, which calls for a method to automatically optimize the meta-prompt for LLM-based agents. Unfortunately, the non-stationarity in the reward observations during LLM-based sequential decision-making makes meta-prompt optimization highly challenging. To address this challenge, we draw inspirations from adversarial bandit algorithms, which are inherently capable of handling non-stationary reward observations. Building on this foundation, we propose our EXPonential-weight algorithm for prompt Optimization} (EXPO) to automatically optimize the task description and meta-instruction in the meta-prompt for LLM-based agents. We also extend EXPO to additionally optimize the exemplars (i.e., history of interactions) in the meta-prompt to further enhance the performance, hence introducing our EXPO-ES algorithm. We use extensive experiments to show that our algorithms significantly improve the performance of LLM-based sequential decision-making.


ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

Ma, Yi, Wang, Shuai, Liu, Tianchi, Li, Haizhou

arXiv.org Artificial Intelligence

In speaker verification, we use computational method to verify if an utterance matches the identity of an enrolled speaker. This task is similar to the manual task of forensic voice comparison, where linguistic analysis is combined with auditory measurements to compare and evaluate voice samples. Despite much success, we have yet to develop a speaker verification system that offers explainable results comparable to those from manual forensic voice comparison. A novel approach, Explainable Phonetic Trait-Oriented (ExPO) network, is proposed in this paper to introduce the speaker's phonetic trait which describes the speaker's characteristics at the phonetic level, resembling what forensic comparison does. ExPO not only generates utterance-level speaker embeddings but also allows for fine-grained analysis and visualization of phonetic traits, offering an explainable speaker verification process. Furthermore, we investigate phonetic traits from within-speaker and between-speaker variation perspectives to determine which trait is most effective for speaker verification, marking an important step towards explainable speaker verification. Our code is available at https://github.com/mmmmayi/ExPO.


E3 is dead. Is CES next?

PCWorld

The Electronic Entertainment Expo, perhaps the most over-the-top, bombastic event ever to be designated an industry trade show, is no more. E3 was a staple of the video game calendar for over two decades, showing off the latest and greatest in gaming hardware and software every summer. But it's been officially declared dead by the ESA, in the wake of diminishing trade shows worldwide post-Covid pandemic. E3's demise wasn't exactly shocking, since it hasn't held a live, in-person event since 2019. But the closure of such a high-profile event has some people wondering: Is CES, the electronics industry's most high-profile event, next on the chopping block?