Goto

Collaborating Authors

 Retail


Can LLM Agents Simulate Multi-Turn Human Behavior? Evidence from Real Online Customer Behavior Data

arXiv.org Artificial Intelligence

Recent research shows that LLM Agents can generate ``believable'' human behaviors via prompt-only methods, and such agents have been increasingly adopted in downstream applications. However, existing evaluation of these agents only focuses on qualitative believability (whether human raters think they are accurate), leaving open questions of whether LLM agents can accurately generate step-by-step actions mimicking a particular human's behavior in a multi-turn interaction task. In this work, we take shopping as a case study and present the first large-scale quantitative evaluation of state-of-the-art LLMs' ability to accurately simulate human behavior. Using real-world data from 31,865 online shopping sessions containing 230,965 user actions, our evaluation reveals that prompt-based LLMs (DeepSeek-R1, Llama, Claude) achieve only 11.86% accuracy in generating human actions, highlighting a substantial gap in actual behavioral accuracy. Through experiments, we also showcase that strategies as simple as fine-tuning LLMs on real human click-through data augmented with synthesized reasoning traces can greatly enhance models' performance. The fine-tuned Qwen2.5-7B achieves 17.26% action generation accuracy and 33.86% F1 score on final purchase prediction, representing substantial improvements of 5.4% and 13.85% over prompt-only baselines. This work establishes the first rigorous benchmark for human behavior simulation and provides actionable insights for developing more accurate LLM agents for future downstream applications.


Parameter-Free Federated TD Learning with Markov Noise in Heterogeneous Environments

arXiv.org Artificial Intelligence

Federated learning (FL) can dramatically speed up reinforcement learning by distributing exploration and training across multiple agents. It can guarantee an optimal convergence rate that scales linearly in the number of agents, i.e., a rate of $\tilde{O}(1/(NT)),$ where $T$ is the iteration index and $N$ is the number of agents. However, when the training samples arise from a Markov chain, existing results on TD learning achieving this rate require the algorithm to depend on unknown problem parameters. We close this gap by proposing a two-timescale Federated Temporal Difference (FTD) learning with Polyak-Ruppert averaging. Our method provably attains the optimal $\tilde{O}(1/NT)$ rate in both average-reward and discounted settings--offering a parameter-free FTD approach for Markovian data. Although our results are novel even in the single-agent setting, they apply to the more realistic and challenging scenario of FL with heterogeneous environments.






We Found 136 of the Best Prime Day Deals Still on for 2025: Up to 55% Off

WIRED

Amazon's fall Prime Day sale has come and gone, but a few of the best deals are still available. All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links. Amazon Prime's Latest Prime Day sale has come and gone. If you are a Prime member who missed out, there's some good news--there are some leftover deals still going strong. We're still keeping you updated here with all the best markdowns on our favorite tech gear and gadgets that are still available, from Alexa-enabled speakers to robot vacs to laptops and tablets. The WIRED Reviews team tests products year-round, and at sales events like this, we only recommend deals on stuff we have actually used and approved. We sorted through thousands of deals by hand to make these picks. The Fire HD 10 is Amazon's best tablet for most people . The current model dates from 2023, but the Octa Core processor is plenty fast enough for consuming Amazon Prime content, which is really the primary reason to buy a Fire tablet. The full HD (1080p) screen won't win any awards, but it's good enough for streaming movies. Fire tablets can do double duty as an Echo speaker, too. Turn on Show Mode (swipe down on the notification overlay and check the Show Mode box) and you can query Alexa to your heart's content.


Estimating Propensity for Causality-based Recommendation without Exposure Data

Neural Information Processing Systems

They aim to recommend an item based on the uplift, also called the causal effect, in the user's behavior (e.g., clicks or purchases) caused by different treatments (i.e., recommending/exposing the item or not) [



Osprey backpacks and camping bags hit their lowest prices of the year during Amazon Prime Day

Popular Science

Amazon Prime Day is live. See the best deals HERE. PopSci editors are big fans of Osprey outdoor packs and backpacks. Almost all of them are on sale for Prime Day. We may earn revenue from the products available on this page and participate in affiliate programs.