hurdle
The AI Consumer Index (ACE)
Benchek, Julien, Shetty, Rohit, Hunsberger, Benjamin, Arun, Ajay, Richards, Zach, Foody, Brendan, Nitski, Osvald, Vidgen, Bertie
We introduce the first version of the AI Consumer Index (ACE), a benchmark for assessing whether frontier AI models can perform everyday consumer tasks. ACE contains a hidden heldout set of 400 test cases, split across four consumer activities: shopping, food, gaming, and DIY. We are also open sourcing 80 cases as a devset with a CC-BY license. For the ACE leaderboard we evaluated 10 frontier models (with websearch turned on) using a novel grading methodology that dynamically checks whether relevant parts of the response are grounded in the retrieved web sources. GPT 5 (Thinking = High) is the top-performing model, scoring 56.1%, followed by o3 Pro (Thinking = On) at 55.2% and GPT 5.1 (Thinking = High) at 55.1%. Model scores differ across domains, and in Shopping the top model scores under 50\%. We find that models are prone to hallucinating key information, such as prices. ACE shows a substantial gap between the performance of even the best models and consumers' AI needs.
- Research Report (0.50)
- Workflow (0.48)
- Health & Medicine (0.68)
- Banking & Finance (0.46)
- Leisure & Entertainment > Games > Computer Games (0.46)
- Information Technology (0.46)
A Appendix
A.1 T ensorFlow Primitives V ocabulary Name TF Function Argument Mapping Input 1 Input 2 Constant Dim Size ADD tf.math.add "Name" is the name of the operation in our search "TF Function" is the TensorFlow function that the name is mapped to when a DNA instruction "Argument Mapping" describes how the values in a DNA's argument set are mapped to the corresponding TensorFlow function arguments. TensorFlow graphs are built from DNA programs as described in Section 2 of the main text. The vocabulary for these relative dimensions is [1, 2, 4, 8, 12, 16, 24, 32, 48, 64]. This vocabulary was not tuned.
- North America > United States > Oklahoma (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Asia > Taiwan > Taiwan > Taipei (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
A Appendix
A.1 T ensorFlow Primitives V ocabulary Name TF Function Argument Mapping Input 1 Input 2 Constant Dim Size ADD tf.math.add "Name" is the name of the operation in our search "TF Function" is the TensorFlow function that the name is mapped to when a DNA instruction "Argument Mapping" describes how the values in a DNA's argument set are mapped to the corresponding TensorFlow function arguments. TensorFlow graphs are built from DNA programs as described in Section 2 of the main text. The vocabulary for these relative dimensions is [1, 2, 4, 8, 12, 16, 24, 32, 48, 64]. This vocabulary was not tuned.
- North America > United States > Oklahoma (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework
Mi, Qirui, Yang, Mengyue, Yu, Xiangning, Zhao, Zhiyu, Deng, Cheng, An, Bo, Zhang, Haifeng, Chen, Xu, Wang, Jun
Simulating collective decision-making involves more than aggregating individual behaviors; it emerges from dynamic interactions among individuals. While large language models (LLMs) offer strong potential for social simulation, achieving quantitative alignment with real-world data remains a key challenge. To bridge this gap, we propose the Mean-Field LLM (MF-LLM) framework, the first to incorporate mean field theory into LLM-based social simulation. MF-LLM models bidirectional interactions between individuals and the population through an iterative process, generating population signals to guide individual decisions, which in turn update the signals. This interplay produces coherent trajectories of collective behavior. To improve alignment with real-world data, we introduce IB-Tune, a novel fine-tuning method inspired by the Information Bottleneck principle, which retains population signals most predictive of future actions while filtering redundant history. Evaluated on a real-world social dataset, MF-LLM reduces KL divergence to human population distributions by 47\% compared to non-mean-field baselines, enabling accurate trend forecasting and effective intervention planning. Generalizing across 7 domains and 4 LLM backbones, MF-LLM provides a scalable, high-fidelity foundation for social simulation.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Trump wants to revive the lagging US shipbuilding industry. Here are the hurdles he faces
President Donald Trump is turning his attention to the U.S. shipbuilding industry, which is leagues behind its near-peer competitor China, and recently signed an executive order designed to reinvigorate it. Trump's April 10 order instructs agencies to develop a Maritime Action Plan and orders the U.S. trade representative to compile a list of recommendations to address China's "anticompetitive actions within the shipbuilding industry," among other things. Additionally, the executive order instructs a series of assessments regarding how the government could bolster financial support through the Defense Production Act, the Department of Defense Office of Strategic Capital, a new Maritime Security Trust Fund, investment from shipbuilders from allied countries and other grant programs. But simply throwing money at the shipbuilding industry won't solve the problem, according to Bryan Clark, director of the Hudson Institute think tank's Center for Defense Concepts and Technology. "It is unlikely that just putting more money into U.S. shipbuilding – even with foreign technical assistance – will make U.S. commercial shipbuilders competitive with experienced and highly-subsidized shipyards in China, Korea, or Japan," Clark said in a Monday email to Fox News Digital.
- Asia > China (0.73)
- Asia > Japan (0.26)
- North America > United States > Georgia > Chatham County > Savannah (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.05)
- Shipbuilding (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
The DOGE Acting Administrator Isn't New to the Trump World
The White House today announced the name of the acting administrator of the Department of Government Efficiency: Amy Gleason, the US government's problem solver in the early days of the data-starved response to the Covid pandemic and a seasoned worker in the health space. The White House named Gleason after it argued in court that Elon Musk is not really the head of DOGE, and faced pressure from a federal judge to say who is. How long Gleason has been the acting administrator, and if Musk was an unofficial one before today's announcement, is unclear. This is Gleason's second time working in US Digital Services, now turned DOGE. In her first tour, which started in 2018 and carried through the frenzied and chaotic pandemic response, she pushed the bounds of existing bureaucracy to meet the crisis' demand.
- North America > United States (1.00)
- North America > Mexico (0.05)
Beyond Convexity: Stochastic Quasi-Convex Optimization
This poster has been moved from Monday #86 to Thursday #101. Stochastic convex optimization is a basic and well studied primitive in machine learning. It is well known that convex and Lipschitz functions can be minimized efficiently using Stochastic Gradient Descent (SGD).The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves. In this paper we analyze a stochastic version of NGD and prove its convergence to a global minimum for a wider class of functions: we require the functions to be quasi-convex and locally-Lipschitz. Quasi-convexity broadens the concept of unimodality to multidimensions and allows for certain types of saddle points, which are a known hurdle for first-order optimization methods such as gradient descent.
TechScape: Will OpenAI's 5bn gamble on chatbots pay off? Only if you use them
What if you build it and they don't come? The Guardian's journalism is independent. We will earn a commission if you buy something through an affiliate link. It's fair to say the shine is coming off the AI boom. Soaring valuations are starting to look unstable next to the sky-high spending required to sustain them.
- Europe > United Kingdom (0.05)
- Europe > Ireland (0.05)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.43)
Lifelike Agility and Play in Quadrupedal Robots using Reinforcement Learning and Generative Pre-trained Models
Han, Lei, Zhu, Qingxu, Sheng, Jiapeng, Zhang, Chong, Li, Tingguang, Zhang, Yizheng, Zhang, He, Liu, Yuzhen, Zhou, Cheng, Zhao, Rui, Li, Jie, Zhang, Yufeng, Wang, Rui, Chi, Wanchao, Li, Xiong, Zhu, Yonghui, Xiang, Lingzhu, Teng, Xiao, Zhang, Zhengyou
Knowledge from animals and humans inspires robotic innovations. Numerous efforts have been made to achieve agile locomotion in quadrupedal robots through classical controllers or reinforcement learning approaches. These methods usually rely on physical models or handcrafted rewards to accurately describe the specific system, rather than on a generalized understanding like animals do. Here we propose a hierarchical framework to construct primitive-, environmental- and strategic-level knowledge that are all pre-trainable, reusable and enrichable for legged robots. The primitive module summarizes knowledge from animal motion data, where, inspired by large pre-trained models in language and image understanding, we introduce deep generative models to produce motor control signals stimulating legged robots to act like real animals. Then, we shape various traversing capabilities at a higher level to align with the environment by reusing the primitive module. Finally, a strategic module is trained focusing on complex downstream tasks by reusing the knowledge from previous levels. We apply the trained hierarchical controllers to the MAX robot, a quadrupedal robot developed in-house, to mimic animals, traverse complex obstacles and play in a designed challenging multi-agent chase tag game, where lifelike agility and strategy emerge in the robots.
- North America > Canada > Newfoundland and Labrador > Labrador (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Microsoft buys Activision, maker of Diablo, Warcraft, and Call of Duty, for $69 billion
United Kingdom regulators were effectively the last hurdle stopping Microsoft from purchasing Activision Blizzard, in the biggest merger the video game industry has ever seen. That hurdle was cleared this morning as the UK's Competition and Markets Authority relented to adjusted terms. With the nearly $70 billion purchase now officially complete, Microsoft unveiled a victory blog post, complete with an extended showcase of its now-combined intellectual property with Activision, Blizzard, and King. The CMA's sticking points included Microsoft's prospective dominance in the unfolding game streaming market, and Microsoft's concessions were deep. When it initially blocked the merger early this year, regulators said that the combined publishing giant could effectively monopolize games streamed to consumers without the need for local PCs or consoles, as is already the case with Xbox Game Stream and the all-you-can-eat Game Pass subscription. Microsoft's concessions to the UK include a block on exclusivity for cloud streaming for all existing Activision games, crucially including the massive Call of Duty shooter franchise.
- Europe > United Kingdom (0.58)
- North America > United States (0.35)