AITopics

2301.11309

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Israel (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
Information Technology > Services (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Trivedi, Harsh, Balasubramanian, Niranjan, Khot, Tushar, Sabharwal, Ashish

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

arXiv.org Artificial IntelligenceJun-22-2023

Prompting-based large language models (LLMs) are surprisingly powerful at generating natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question answering (QA). They struggle, however, when the necessary knowledge is either unavailable to the LLM or not up-to-date within its parameters. While using the question to retrieve relevant text from an external knowledge source helps LLMs, we observe that this one-step retrieve-and-read approach is insufficient for multi-step QA. Here, \textit{what to retrieve} depends on \textit{what has already been derived}, which in turn may depend on \textit{what was previously retrieved}. To address this, we propose IRCoT, a new approach for multi-step QA that interleaves retrieval with steps (sentences) in a CoT, guiding the retrieval with CoT and in turn using retrieved results to improve CoT. Using IRCoT with GPT3 substantially improves retrieval (up to 21 points) as well as downstream QA (up to 15 points) on four datasets: HotpotQA, 2WikiMultihopQA, MuSiQue, and IIRC. We observe similar substantial gains in out-of-distribution (OOD) settings as well as with much smaller models such as Flan-T5-large without additional training. IRCoT reduces model hallucination, resulting in factually more accurate CoT reasoning. Code, data, and prompts are available at \url{https://github.com/stonybrooknlp/ircot}

large language model, machine learning, natural language, (19 more...)

2212.10509

Country:

North America > United States > California > Los Angeles County (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Germany (0.04)
(38 more...)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Media > Film (0.95)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

TIME - TechJun-21-2023, 12:00:32 GMT

Read TIME's Interview With OpenAI CEO Sam Altman

For this week's TIME100 Most Influential Companies cover story about OpenAI and its CEO Sam Altman, TIME's former editor-in-chief Edward Felsenthal sat down with a number of company executives in early May, including two sessions with Altman, transcribed below. The conversations have been condensed and edited for clarity. Sam Altman: One thing I use it for every day is help with summarization. I can't really keep up on my inbox anymore, but I made a little thing to help it summarize for me and pull out important stuff from unknown senders, and that's very helpful. I used it to translate an article for someone I'm meeting next week, to prepare for that. This is sort of a funny thing, I used it to help me draft a tweet that I was having a hard time with. Not as much as it might have seemed from the outside.

ceo sam altman, openai ceo sam altman, sam altman, (14 more...)

TIME - Tech

Genre: Personal > Interview (0.50)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.84)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.60)

Washington Post - Technology NewsJun-21-2023, 10:07:48 GMT

Schumer to call for AI regulation in keynote address

The booming popularity of AI-driven chatbots like OpenAI's ChatGPT and Google's Bard has both captivated and concerned officials, who have said they are worried about again failing to protect consumers from the perils of Silicon Valley's latest craze. It's prompted lawmakers to hold a wave of public hearings and private meetings with industry leaders, researchers and advocates as they look to get their bearings in the quickly changing AI field.

ai regulation, keynote address, schumer

Washington Post - Technology News

Country: North America > United States > California (0.37)

Industry:

Law > Statutes (0.85)
Government > Regional Government > North America Government > United States Government (0.85)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.37)

USATODAY - Tech Top StoriesJun-21-2023, 09:05:40 GMT

Beyond ChatGPT: AI conspiracy theories are here. Don't believe everything you read.

For as long as there have been scientific breakthroughs and technological innovations, people have been labeling them as magic, witchcraft or the product of nefarious conspiracies directed by powerful, unseen actors. Medieval metalworkers, who transformed stone into jewelry and swords, were seen as agents of either the ruling class or the supernatural, threatening the social fabric. Many still believe that the moon landing was faked in a TV studio. More recently, conspiracy theories that falsely claimed 5G cell technology spread COVID-19 led to attacks on cell towers in the United Kingdom. Artificial intelligence is a technology ready-made for conspiratorial thinking.

ai conspiracy theory, algorithm, conspiracy theory, (7 more...)

USATODAY - Tech Top Stories

Country:

Europe > United Kingdom (0.25)
North America > United States (0.15)
Europe > Russia (0.05)
(2 more...)

Industry:

Government (1.00)
Media > News (0.69)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.50)
Health & Medicine > Therapeutic Area > Immunology (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

GPT-Based Models Meet Simulation: How to Efficiently Use Large-Scale Pre-Trained Language Models Across Simulation Tasks

Giabbanelli, Philippe J.

The disruptive technology provided by large-scale pre-trained language models (LLMs) such as ChatGPT or GPT-4 has received significant attention in several application domains, often with an emphasis on high-level opportunities and concerns. This paper is the first examination regarding the use of LLMs for scientific simulations. We focus on four modeling and simulation tasks, each time assessing the expected benefits and limitations of LLMs while providing practical guidance for modelers regarding the steps involved. The first task is devoted to explaining the structure of a conceptual model to promote the engagement of participants in the modeling process. The second task focuses on summarizing simulation outputs, so that model users can identify a preferred scenario. The third task seeks to broaden accessibility to simulation platforms by conveying the insights of simulation visualizations via text. Finally, the last task evokes the possibility of explaining simulation errors and providing guidance to resolve them.

large language model, machine learning, natural language, (17 more...)

2306.13679

Country:

North America > United States > Ohio > Butler County > Oxford (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Genre:

Overview (1.00)
Research Report (0.82)
Instructional Material (0.66)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Zhang, Boyu, Yang, Hongyang, Liu, Xiao-Yang

Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models

Sentiment analysis is a vital tool for uncovering insights from financial articles, news, and social media, shaping our understanding of market movements. Despite the impressive capabilities of large language models (LLMs) in financial natural language processing (NLP), they still struggle with accurately interpreting numerical values and grasping financial context, limiting their effectiveness in predicting financial sentiment. In this paper, we introduce a simple yet effective instruction tuning approach to address these issues. By transforming a small portion of supervised financial sentiment analysis data into instruction data and fine-tuning a general-purpose LLM with this method, we achieve remarkable advancements in financial sentiment analysis. In the experiment, our approach outperforms state-of-the-art supervised sentiment analysis models, as well as widely used LLMs like ChatGPT and LLaMAs, particularly in scenarios where numerical understanding and contextual comprehension are vital.

large language model, machine learning, sentiment analysis, (18 more...)

2306.12659

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Identifying and Extracting Rare Disease Phenotypes with Large Language Models

Shyr, Cathy, Hu, Yan, Harris, Paul A., Xu, Hua

Rare diseases (RDs) are collectively common and affect 300 million people worldwide. Accurate phenotyping is critical for informing diagnosis and treatment, but RD phenotypes are often embedded in unstructured text and time-consuming to extract manually. While natural language processing (NLP) models can perform named entity recognition (NER) to automate extraction, a major bottleneck is the development of a large, annotated corpus for model training. Recently, prompt learning emerged as an NLP paradigm that can lead to more generalizable results without any (zero-shot) or few labeled samples (few-shot). Despite growing interest in ChatGPT, a revolutionary large language model capable of following complex human prompts and generating high-quality responses, none have studied its NER performance for RDs in the zero- and few-shot settings. To this end, we engineered novel prompts aimed at extracting RD phenotypes and, to the best of our knowledge, are the first the establish a benchmark for evaluating ChatGPT's performance in these settings. We compared its performance to the traditional fine-tuning approach and conducted an in-depth error analysis. Overall, fine-tuning BioClinicalBERT resulted in higher performance (F1 of 0.689) than ChatGPT (F1 of 0.472 and 0.591 in the zero- and few-shot settings, respectively). Despite this, ChatGPT achieved similar or higher accuracy for certain entities (i.e., rare diseases and signs) in the one-shot setting (F1 of 0.776 and 0.725). This suggests that with appropriate prompt engineering, ChatGPT has the potential to match or outperform fine-tuned language models for certain entity types with just one labeled sample. While the proliferation of large language models may provide opportunities for supporting RD diagnosis and treatment, researchers and clinicians should critically evaluate model outputs and be well-informed of their limitations.

large language model, machine learning, natural language, (18 more...)

2306.12656

Country:

North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ahmad, Baleegh, Tan, Benjamin, Karri, Ramesh, Pearce, Hammond

FLAG: Finding Line Anomalies (in code) with Generative AI

Code contains security and functional bugs. The process of identifying and localizing them is difficult and relies on human labor. In this work, we present a novel approach (FLAG) to assist human debuggers. FLAG is based on the lexical capabilities of generative AI, specifically, Large Language Models (LLMs). Here, we input a code file then extract and regenerate each line within that file for self-comparison. By comparing the original code with an LLM-generated alternative, we can flag notable differences as anomalies for further inspection, with features such as distance from comments and LLM confidence also aiding this classification. This reduces the inspection search space for the designer. Unlike other automated approaches in this area, FLAG is language-agnostic, can work on incomplete (and even non-compiling) code and requires no creation of security properties, functional tests or definition of rules. In this work, we explore the features that help LLMs in this classification and evaluate the performance of FLAG on known bugs. We use 121 benchmarks across C, Python and Verilog; with each benchmark containing a known security or functional weakness. We conduct the experiments using two state of the art LLMs in OpenAI's code-davinci-002 and gpt-3.5-turbo, but our approach may be used by other models. FLAG can identify 101 of the defects and helps reduce the search space to 12-17% of source code.

large language model, machine learning, natural language, (22 more...)

2306.12643

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > New South Wales (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.90)

ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews

D'Arcy, Mike, Ross, Alexis, Bransom, Erin, Kuehl, Bailey, Bragg, Jonathan, Hope, Tom, Downey, Doug

Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in high-level feedback and to choose the best of many possible ways to update the manuscript in response. We introduce this task for large language models and release ARIES, a dataset of review comments and their corresponding paper edits, to enable training and evaluating models. We study two versions of the task: comment-edit alignment and edit generation, and evaluate several baselines, including GPT-4. We find that models struggle even to identify the edits that correspond to a comment, especially in cases where the comment is phrased in an indirect way or where the edit addresses the spirit of a comment but not the precise request. When tasked with generating edits, GPT-4 often succeeds in addressing comments on a surface level, but it rigidly follows the wording of the feedback rather than the underlying intent, and includes fewer technical details than human-written edits. We hope that our formalization, dataset, and analysis will form a foundation for future work in this area.

large language model, machine learning, natural language, (21 more...)

2306.12587

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(11 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)