AITopics | denny zhou

Collaborating Authors

denny zhou

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation

Ahn, Jihyun Janice, Kamoi, Ryo, Cheng, Lu, Zhang, Rui, Yin, Wenpeng

arXiv.org Artificial IntelligenceJun-26-2024

Mainstream LLM research has primarily focused on enhancing their generative capabilities. However, even the most advanced LLMs experience uncertainty in their outputs, often producing varied results on different runs or when faced with minor changes in input, despite no substantial change in content. Given multiple responses from the same LLM to the same input, we advocate leveraging the LLMs' discriminative capability to reduce this generative uncertainty, aiding in identifying the correct answers. Specifically, we propose and analyze three discriminative prompts: direct, inverse, and hybrid, to explore the potential of both closed-source and open-source LLMs in self-improving their generative performance on two benchmark datasets. Our insights reveal which discriminative prompt is most promising and when to use it. To our knowledge, this is the first work to systematically analyze LLMs' discriminative capacity to address generative uncertainty.

direct prompt, inverse prompt, llm, (15 more...)

arXiv.org Artificial Intelligence

2407.11017

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.78)

Add feedback

Self-Guiding Exploration for Combinatorial Problems

Iklassov, Zangir, Du, Yali, Akimov, Farkhad, Takac, Martin

arXiv.org Artificial IntelligenceMay-28-2024

Large Language Models (LLMs) have become pivotal in addressing reasoning tasks across diverse domains, including arithmetic, commonsense, and symbolic reasoning. They utilize prompting techniques such as Exploration-of-Thought, Decomposition, and Refinement to effectively navigate and solve intricate tasks. Despite these advancements, the application of LLMs to Combinatorial Problems (CPs), known for their NP-hardness and critical roles in logistics and resource management remains underexplored. To address this gap, we introduce a novel prompting strategy: Self-Guiding Exploration (SGE), designed to enhance the performance of solving CPs. SGE operates autonomously, generating multiple thought trajectories for each CP task. It then breaks these trajectories down into actionable subtasks, executes them sequentially, and refines the results to ensure optimal outcomes. We present our research as the first to apply LLMs to a broad range of CPs and demonstrate that SGE outperforms existing prompting strategies by over 27.84% in CP optimization performance. Additionally, SGE achieves a 2.46% higher accuracy over the best existing results in other reasoning tasks (arithmetic, commonsense, and symbolic). Our implementation is available online.

avg, combinatorial problem, language model, (16 more...)

arXiv.org Artificial Intelligence

2405.1795

Country:

Africa > Rwanda > Kigali > Kigali (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models

Li, Yitian, Tian, Jidong, He, Hao, Jin, Yaohui

arXiv.org Artificial IntelligenceMay-9-2024

Combining different forms of prompts with pre-trained large language models has yielded remarkable results on reasoning tasks (e.g. Chain-of-Thought prompting). However, along with testing on more complex reasoning, these methods also expose problems such as invalid reasoning and fictional reasoning paths. In this paper, we develop \textit{Hypothesis Testing Prompting}, which adds conclusion assumptions, backward reasoning, and fact verification during intermediate reasoning steps. \textit{Hypothesis Testing prompting} involves multiple assumptions and reverses validation of conclusions leading to its unique correct answer. Experiments on two challenging deductive reasoning datasets ProofWriter and RuleTaker show that hypothesis testing prompting not only significantly improves the effect, but also generates a more reasonable and standardized reasoning process.

conclusion, language model, reasoning, (11 more...)

arXiv.org Artificial Intelligence

2405.06707

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.05)
North America > United States > District of Columbia > Washington (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Chain-Of-Thought Prompting Under Streaming Batch: A Case Study

Tang, Yuxin

arXiv.org Artificial IntelligenceJun-1-2023

Recently, Large Language Models (LLMs) have demonstrated remarkable capabilities. Chain-of-Thought (CoT) has been proposed as a way of assisting LLMs in performing complex reasoning. However, developing effective prompts can be a challenging and labor-intensive task. Many studies come out of some way to automatically construct CoT from test data. Most of them assume that all test data is visible before testing and only select a small subset to generate rationales, which is an unrealistic assumption. In this paper, we present a case study on how to construct and optimize chain-of-thought prompting using batch data in streaming settings.

arxiv preprint arxiv, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2306.0055

Country:

North America > United States > Texas > Harris County > Houston (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report (0.51)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

GitHub - jeffhj/LM-reasoning: This repository contains a collection of papers and resources on Reasoning in Large Language Models.

#artificialintelligenceDec-25-2022, 14:05:37 GMT

This repository contains a collection of papers and resources on Reasoning in Large Language Models. Feel free to let me know the missing papers (issue or pull request). Thank Kevin Chen-Chuan Chang @UIUC, Jason Wei @Google Brain, Denny Zhou @Google Brain for insightful discussions and suggestions. We mainly focus on techniques that are applicable to improving or eliciting "reasoning" in large language models like GPT-3 (175B) Papers in this paradigm vary a lot and are usually based on small models trained on specific datasets. We list several papers here for reference (that is, the list is not complete).

chen, denny zhou, xuezhi wang, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.83)

Add feedback

Transcending Scaling Laws with 0.1% Extra Compute

Tay, Yi, Wei, Jason, Chung, Hyung Won, Tran, Vinh Q., So, David R., Shakeri, Siamak, Garcia, Xavier, Zheng, Huaixiu Steven, Rao, Jinfeng, Chowdhery, Aakanksha, Zhou, Denny, Metzler, Donald, Petrov, Slav, Houlsby, Neil, Le, Quoc V., Dehghani, Mostafa

arXiv.org Artificial IntelligenceNov-16-2022

Scaling language models improves performance but comes with significant computational costs. This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute. The key idea is to continue training a state-of-the-art large language model (e.g., PaLM) on a few more steps with UL2's mixture-of-denoiser objective. We show that, with almost negligible extra computational costs and no new sources of data, we are able to substantially improve the scaling properties of large language models on downstream metrics. In this paper, we continue training PaLM with UL2R, introducing a new set of models at 8B, 62B, and 540B scale which we call U-PaLM. Impressively, at 540B scale, we show an approximately 2x computational savings rate where U-PaLM achieves the same performance as the final PaLM 540B model at around half its computational budget (i.e., saving $\sim$4.4 million TPUv4 hours). We further show that this improved scaling curve leads to 'emergent abilities' on challenging BIG-Bench tasks -- for instance, U-PaLM does much better than PaLM on some tasks or demonstrates better quality at much smaller scale (62B as opposed to 540B). Overall, we show that U-PaLM outperforms PaLM on many few-shot setups, i.e., English NLP tasks (e.g., commonsense reasoning, question answering), reasoning tasks with chain-of-thought (e.g., GSM8K), multilingual tasks (MGSM, TydiQA), MMLU and challenging BIG-Bench tasks. Finally, we provide qualitative examples showing the new capabilities of U-PaLM for single and multi-span infilling.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.11399

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Jordan (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.66)

Add feedback

Large Language Models Can Self-Improve

Huang, Jiaxin, Gu, Shixiang Shane, Hou, Le, Wu, Yuexin, Wang, Xuezhi, Yu, Hongkun, Han, Jiawei

arXiv.org Artificial IntelligenceOct-25-2022

Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without external inputs. In this work, we demonstrate that an LLM is also capable of self-improving with only unlabeled datasets. We use a pre-trained LLM to generate "high-confidence" rationale-augmented answers for unlabeled questions using Chain-of-Thought prompting and self-consistency, and fine-tune the LLM using those self-generated solutions as target outputs. We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74.4% 82.1% on GSM8K, 78.2% 83.0% on DROP, 90.0% 94.4% on OpenBookQA, and 63.4% 67.9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label. We conduct ablation studies and show that finetuning on reasoning is critical for self-improvement. Scaling has enabled Large Language ...

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.1161

Country:

North America > United States > Pennsylvania (0.04)
Asia > Vietnam (0.04)
Asia > Middle East > Iraq (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Government (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback