bernard
Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs
Yang, Hongming, Lin, Shi, Shao, Jun, Lin, Changting, Zhu, Donghai, Han, Meng, Kong, Qinglei
Lightweight Large Language Models (LwLLMs) are reduced-parameter, optimized models designed to run efficiently on consumer-grade hardware, offering significant advantages in resource efficiency, cost-effectiveness, and data privacy. However, these models often struggle with limited inference and reasoning capabilities, which restrict their performance on complex tasks and limit their practical applicability. Moreover, existing prompt optimization methods typically rely on extensive manual effort or the meta-cognitive abilities of state-of-the-art LLMs, making them less effective for LwLLMs. To address these challenges, we introduce DeBoP, a new Direct Behavior Optimization Paradigm, original from the Chain-of-Thought (CoT) prompting technique. Unlike CoT Prompting, DeBoP is an automatic optimization method, which focuses on the optimization directly on the behavior of LwLLMs. In particular, DeBoP transforms the optimization of complex prompts into the optimization of discrete, quantifiable execution sequences using a gradient-free Monte Carlo Tree Search. We evaluate DeBoP on seven challenging tasks where state-of-the-art LLMs excel but LwLLMs generally underperform. Experimental results demonstrate that DeBoP significantly outperforms recent prompt optimization methods on most tasks. In particular, DeBoP-optimized LwLLMs surpass GPT-3.5 on most tasks while reducing computational time by approximately 60% compared to other automatic prompt optimization methods.
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- North America > United States > Indiana (0.04)
- North America > United States > California > Riverside County (0.04)
- (4 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine (0.88)
- Information Technology > Security & Privacy (0.68)
DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo
Zhu, Junzhe, Ju, Yuanchen, Zhang, Junyi, Wang, Muhan, Yuan, Zhecheng, Hu, Kaizhe, Xu, Huazhe
Circles represent the contact points in the human demo / grasping points for robot manipulation. Dense 3D correspondence can enhance robotic manipulation by enabling the generalization of spatial, functional, and dynamic information from one object to an unseen counterpart. Compared to shape correspondence, semantic correspondence is more effective in generalizing across different object categories. DenseMatcher first computes vertex features by projecting multiview 2D features onto meshes and refining them with a 3D network, and subsequently finds dense correspondences with the obtained features using functional map. In addition, we craft the first 3D matching dataset that contains colored object meshes across diverse categories. In our experiments, we show that DenseMatcher significantly outperforms prior 3D matching baselines by 43.5%. We demonstrate the downstream effectiveness of DenseMatcher in (i) robotic manipulation, where it achieves crossinstance and cross-category generalization on long-horizon complex manipulation tasks from observing only one demo; (ii) zero-shot color mapping between digital assets, where appearance can be transferred between different objects with relatable geometry. Correspondence plays a pivotal role in robotics Wang (2019). For instance, in robotic assembly, it is necessary to determine the corresponding parts between the target and source objects.
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > Switzerland (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Cui, Yingqian, He, Pengfei, Tang, Xianfeng, He, Qi, Luo, Chen, Tang, Jiliang, Xing, Yue
Few-shot Chain-of-Thought (CoT) prompting has demonstrated strong performance in improving the reasoning capabilities of large language models (LLMs). While theoretical investigations have been conducted to understand CoT, the underlying transformer used in these studies isolates the CoT reasoning process into separated in-context learning steps (Stepwise ICL). In this work, we theoretically show that, compared to Stepwise ICL, the transformer gains better error correction ability and more accurate predictions if the reasoning from earlier steps (Coherent CoT) is integrated. Given that this coherent reasoning changes the behavior of the transformer, we further investigate the sensitivity of the transformer with Coherent CoT when the demonstration examples are corrupted at the inference stage. Our theoretical results indicate that the transformer is more sensitive to errors in intermediate reasoning steps than the final outcome. Building upon this observation, we propose an improvement on CoT by incorporating both correct and incorrect reasoning paths in the demonstration. Our experiments validate the effectiveness of the proposed approach.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.89)
MultiPoT: Multilingual Program of Thoughts Harnesses Multiple Programming Languages
Luo, Xianzhen, Zhu, Qingfu, Zhang, Zhiming, Qin, Libo, Wang, Xu, Yang, Qing, Xu, Dongliang, Che, Wanxiang
Program of Thoughts (PoT) is an approach characterized by its executable intermediate steps, which ensure the accuracy of the numerical calculations in the reasoning process. Currently, PoT primarily uses Python. However, relying solely on a single language may result in suboptimal solutions and overlook the potential benefits of other programming languages. In this paper, we conduct comprehensive experiments on the programming languages used in PoT and find that no single language consistently delivers optimal performance across all tasks and models. The effectiveness of each language varies depending on the specific scenarios. Inspired by this, we propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages. Experimental results reveal that it significantly outperforms Python Self-Consistency. Furthermore, it achieves comparable or superior performance compared to the best monolingual PoT in almost all tasks across all models. In particular, MultiPoT achieves more than 4.6\% improvement on average on both Starcoder and ChatGPT (gpt-3.5-turbo).
- Asia > China > Heilongjiang Province > Harbin (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (4 more...)
- Information Technology > Software > Programming Languages (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Non-Negative Spherical Relaxations for Universe-Free Multi-Matching and Clustering
Thunberg, Johan, Bernard, Florian
We propose a novel non-negative spherical relaxation for optimization problems over binary matrices with injectivity constraints, which in particular has applications in multi-matching and clustering. We relax respective binary matrix constraints to the (high-dimensional) non-negative sphere. To optimize our relaxed problem, we use a conditional power iteration method to iteratively improve the objective function, while at same time sweeping over a continuous scalar parameter that is (indirectly) related to the universe size (or number of clusters). Opposed to existing procedures that require to fix the integer universe size before optimization, our method automatically adjusts the analogous continuous parameter. Furthermore, while our approach shares similarities with spectral multi-matching and spectral clustering, our formulation has the strong advantage that we do not rely on additional post-processing procedures to obtain binary results. Our method shows compelling results in various multi-matching and clustering settings, even when compared to methods that use the ground truth universe size (or number of clusters).
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (3 more...)
Why Is It So Hard for Scholars to Launch Startups?
Eunice Yang first tasted entrepreneurship in her twenties, when she helped run her family's carton manufacturing business. Five years later, after the business was acquired, she enrolled in a PhD program at Pennsylvania State University. By 2014 she was a tenured professor in mechanical engineering at University of Pittsburgh–Johnstown. After being approached by a colleague in the nursing school, Yang developed an AI-based solution for preventing falls in older adults (rather than detecting them after the fact). "I said, 'I've got to make this,'" Yang tells me.
- North America > United States > Pennsylvania (0.26)
- North America > Canada > Ontario > Toronto (0.17)
- Health & Medicine (0.54)
- Education (0.37)
Will ChatGPT and other AI tools replace journalists in newsrooms?
Will artificial intelligence (AI) soon replace journalists? Many have been asking this question since the boom of generative AI tools such as ChatGPT, which can write a high school essay, a poem, or even pass a medical licensing exam in a matter of seconds. Now, AI tools are seeping into newsrooms. CNET, an American tech news outlet, has acknowledged using AI to write financial articles, seemingly as early as November 2022. When looking more closely at the articles on CNET, a disclaimer reads: "This article was assisted by an AI engine and reviewed, fact-checked and edited by our editorial staff".
PAL: Program-aided Language Models
Gao, Luyu, Madaan, Aman, Zhou, Shuyan, Alon, Uri, Liu, Pengfei, Yang, Yiming, Callan, Jamie, Neubig, Graham
Large language models (LLMs) have recently demonstrated an impressive ability to perform arithmetic and symbolic reasoning tasks, when provided with a few examples at test time ("few-shot prompting"). Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LLMs seem to be adept at this sort of step-by-step decomposition, LLMs often make logical and arithmetic mistakes in the solution part, even when the problem is decomposed correctly. In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter. With PAL, decomposing the natural language problem into runnable steps remains the only learning task for the LLM, while solving is delegated to the interpreter. We demonstrate this synergy between a neural LLM and a symbolic interpreter across 13 mathematical, symbolic, and algorithmic reasoning tasks from BIG-Bench Hard and other benchmarks. In all these natural language reasoning tasks, generating code using an LLM and reasoning using a Python interpreter leads to more accurate results than much larger models. For example, PAL using Codex achieves state-of-the-art few-shot accuracy on the GSM8K benchmark of math word problems, surpassing PaLM-540B which uses chain-of-thought by absolute 15% top-1. Our code and data are publicly available at http://reasonwithpal.com/ .
- North America > United States > California > Los Angeles County > Beverly Hills (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Leisure & Entertainment (0.70)
- Education (0.67)
Westworld season 4: everything we know
Westworld season 4 was confirmed a couple of weeks before the season 3 finale aired in May 2020, suggesting that HBO still has plenty of faith in its big-budget sci-fi drama. "From the western theme park to the technocratic metropolis of the near future, we've thoroughly enjoyed every twist and turn from the minds of master storytellers Jonathan Nolan and Lisa Joy," said president of HBO programming, Casey Bloys in a company statement. "We can't wait to see where their inspired vision takes us next." Showrunners Nolan and Joy are back to continue their epic tale – based on the 1973 movie directed by Jurassic Park author Michael Crichton – and, going on past form, it's probably best to expect the unexpected. In its third year, the drama took a hard turn away from the android theme park setting at the heart of the first two seasons, spending most of its time in the human world.
- Media > Television (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Artificial Intelligence (0.71)
- Information Technology > Communications > Mobile (0.36)
'Dead Cells' creator will release firefighting game 'Nuclear Blaze' on October 18th
Sebastian Bernard, the lead developer and designer for hit indie game Dead Cells, has created a new game that will have you fighting fires, solving mysteries and saving cats. Yes, you'll play a firefighter in Bernard's new 2D action-adventure game entitled Nuclear Blaze -- one who gets air-dropped into a secret military facility that went up in flames for unknown reasons. In the complex, you'll have to use your firehose wisely to deal with the wildfire spreading uncontrollably throughout each section. You'll also have to deal with backdrafts, exploding walls and complex sprinkler systems. But you won't just be trying to put out a blazing inferno in the game: You also have to rescue survivors (cats included) and investigate every nook and cranny to find hidden secrets that would help you figure out the site's true nature, as well as solve the mystery behind the fire.