deodorant
Soft Self-Consistency Improves Language Model Agents
Wang, Han, Prasad, Archiki, Stengel-Eskin, Elias, Bansal, Mohit
Generations from large language models (LLMs) can be improved by sampling and scoring multiple solutions to select a final answer. Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers. However, when tasks have many distinct and valid answers, selection by voting requires a large number of samples. This makes SC prohibitively expensive for interactive tasks that involve generating multiple actions (answers) sequentially. After establishing that majority voting fails to provide consistent gains on such tasks, we demonstrate how to increase success rates by softening the scoring criterion. We introduce Soft Self-Consistency (SOFT-SC), which replaces SC's discontinuous scoring with a continuous score computed from model likelihoods, allowing for selection even when actions are sparsely distributed. SOFT-SC improves both performance and efficiency on long-horizon interactive tasks, requiring half as many samples as SC for comparable or better performance. For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping (WebShop), and a 4.7% increase for an interactive household game (ALFWorld). Finally, we show that SOFT-SC can be applied to both open-source and black-box models.
- Asia > Singapore (0.04)
- North America > United States > Oregon (0.04)
- Asia > Middle East > UAE (0.04)
- Asia > Japan (0.04)
- Education (0.46)
- Materials (0.46)
- Information Technology (0.34)
Why is it OK for rich guys to steal my work?
Every day, what's left of the once-mighty ranks of reporters across this country tap out stories meant to inform, entertain and expose. Sometimes they are the work of minutes, the first bits of knowledge on breaking news such as fires, storms or even elections. Sometimes they are investigations that have taken years. Inevitably, as soon as we publish, rich dudes with algorithms come in and sweep this work away for their own profit, like deodorant off a Target shelf. Retail theft is causing a civic meltdown and inspiring a ballot measure to incarcerate repeat toothpaste thieves.
- North America > Canada (0.18)
- Oceania > New Zealand (0.05)
- Oceania > Australia (0.05)
- (3 more...)
- Media > News (1.00)
- Information Technology (1.00)
- Government > Regional Government > North America Government (0.49)
O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models
Xiao, Yuchen, Sun, Yanchao, Xu, Mengda, Madhushani, Udari, Vann, Jared, Garg, Deepeka, Ganesh, Sumitra
Recent advancements in large language models (LLMs) have exhibited promising performance in solving sequential decision-making problems. By imitating few-shot examples provided in the prompts (i.e., in-context learning), an LLM agent can interact with an external environment and complete given tasks without additional training. However, such few-shot examples are often insufficient to generate high-quality solutions for complex and long-horizon tasks, while the limited context length cannot consume larger-scale demonstrations with long interaction horizons. To this end, we propose an offline learning framework that utilizes offline data at scale (e.g, logs of human interactions) to improve LLM-powered policies without finetuning. The proposed method O3D (Offline Data-driven Discovery and Distillation) automatically discovers reusable skills and distills generalizable knowledge across multiple tasks based on offline interaction data, advancing the capability of solving downstream tasks. Empirical results under two interactive decision-making benchmarks (ALFWorld and WebShop) verify that O3D can notably enhance the decision-making capabilities of LLMs through the offline discovery and distillation process, and consistently outperform baselines across various LLMs.
Hierarchical Prompting Assists Large Language Model on Web Navigation
Sridhar, Abishek, Lo, Robert, Xu, Frank F., Zhu, Hao, Zhou, Shuyan
Large language models (LLMs) struggle on processing complicated observations in interactive decision making tasks. To alleviate this issue, we propose a simple hierarchical prompting approach. Diverging from previous prompting approaches that always put the full observation (e.g. a web page) to the prompt, we propose to first construct an action-aware observation which is more condensed and relevant with a dedicated SUMMARIZER prompt. The ACTOR prompt then predicts the next action based on the summarized observation. While our method has broad applicability, we particularly demonstrate its efficacy in the complex domain of web navigation where a full observation often contains redundant and irrelevant information. Our approach outperforms the previous state-of-the-art prompting mechanics by 6.2% on task success rate, demonstrating its potential on interactive decision making tasks with long observation traces.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Help! My Mom Is Catfishing a Guy Online--By Pretending to Be Me.
Our advice columnists have heard it all over the years. Each Sunday, we dive into the Dear Prudie archives and share a selection of classic letters with our readers. For the past few months, my mom has been catfishing a guy online and I don't know what to do. Earlier this year, I decided to give online dating a try and signed up for a free online dating site. My mom was very supportive and interested in me finding someone, and, unbeknownst to me, created a fake profile to scope out the site.
Thoughts on sustainability from the big consulting firms at ValueX
The morning session at SAP Ariba/SAP Fieldglass ValueX left us with some thoughts from the big consulting firms about the way procurement, and indeed business, should and will go in the coming years if we are to survive the pace of technological change the world is experiencing, and adapt our processes to respect our planet. The keynote called for a reboot of capitalism and (the environment), championed by John Penrose MP. Deloitte was quick to follow with a look at Industry 4 (automation, data exchange, cyber-physical systems (CPS), the internet of things (IoT), industrial internet of things (IIOT), cloud computing, cognitive computing and artificial intelligence and so on) in the context of procurement. They reminded us that the fourth industrial revolution is not, however, just about a collection of technologies, rather how you package and use them altogether to support your long-term business strategy. A global Deloitte survey revealed that 94% (of respondents) saw implementing digital tech and processes (aka digital transformation) as a means just to'keep up' with the rest of the marketplace.
- North America > Mexico (0.05)
- Europe > United Kingdom > England (0.05)
- Europe > France (0.05)
- Professional Services (1.00)
- Information Technology (0.89)
Thoughts on sustainability from the big consulting firms at ValueX
The morning session at SAP Ariba/SAP Fieldglass ValueX left us with some thoughts from the big consulting firms about the way procurement, and indeed business, should and will go in the coming years if we are to survive the pace of technological change the world is experiencing, and adapt our processes to respect our planet. The keynote called for a reboot of capitalism and (the environment), championed by John Penrose MP. Deloitte was quick to follow with a look at Industry 4 (automation, data exchange, cyber-physical systems (CPS), the internet of things (IoT), industrial internet of things (IIOT), cloud computing, cognitive computing and artificial intelligence and so on) in the context of procurement. They reminded us that the fourth industrial revolution is not, however, just about a collection of technologies, rather how you package and use them altogether to support your long-term business strategy. A global Deloitte survey revealed that 94% (of respondents) saw implementing digital tech and processes (aka digital transformation) as a means just to'keep up' with the rest of the marketplace.
- North America > Mexico (0.05)
- Europe > United Kingdom > England (0.05)
- Europe > France (0.05)
- Professional Services (1.00)
- Information Technology (0.89)