AITopics | Toles, Matthew

Collaborating Authors

Toles, Matthew

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Program Synthesis Dialog Agents for Interactive Decision-Making

Toles, Matthew, Balwani, Nikhil, Singh, Rattandeep, Rodriguez, Valentina Giulia Sartori, Yu, Zhou

arXiv.org Artificial IntelligenceMar-17-2025

Many real-world eligibility problems, ranging from medical diagnosis to tax planning, can be mapped to decision problems expressed in natural language, wherein a model must make a binary choice based on user features. Large-scale domains such as legal codes or frequently updated funding opportunities render human annotation (e.g., web forms or decision trees) impractical, highlighting the need for agents that can automatically assist in decision-making. Since relevant information is often only known to the user, it is crucial that these agents ask the right questions. As agents determine when to terminate a conversation, they face a trade-off between accuracy and the number of questions asked, a key metric for both user experience and cost. To evaluate this task, we propose BeNYfits, a new benchmark for determining user eligibility for multiple overlapping social benefits opportunities through interactive decision-making. Our experiments show that current language models struggle with frequent hallucinations, with GPT-4o scoring only 35.7 F1 using a ReAct-style chain-of-thought. To address this, we introduce ProADA, a novel approach that leverages program synthesis to assist in decision-making by mapping dialog planning to a code generation problem and using gaps in structured data to determine the best next action. Our agent, ProADA, improves the F1 score to 55.6 while maintaining nearly the same number of dialog turns.

large language model, logic & formal reasoning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.1961

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Law (0.91)
Health & Medicine > Therapeutic Area > Immunology (0.67)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.85)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
(2 more...)

Add feedback

Pragmatic Evaluation of Clarifying Questions with Fact-Level Masking

Toles, Matthew, Huang, Yukun, Yu, Zhou, Gravano, Luis

arXiv.org Artificial IntelligenceJan-7-2024

The ability to derive useful information by asking clarifying questions (ACQ) is an important element of real life collaboration on reasoning tasks, such as question answering (QA). Existing natural language ACQ challenges, however, evaluate generations based on word overlap rather than the value of the information itself. Word overlap is often an inappropriate metric for question generation since many different questions could be useful in a given situation, and a single question can be phrased many different ways. Instead, we propose evaluating questions pragmatically based on the value of the information they retrieve. Here we present a definition and framework for natural language pragmatic asking of clarifying questions (PACQ), the problem of generating questions that result in answers useful for a reasoning task. We also present fact-level masking (FLM), a procedure for converting natural language datasets into self-supervised PACQ datasets by omitting particular critical facts. Finally, we generate a PACQ dataset from the HotpotQA dataset using FLM and evaluate several zero-shot language models on it. Our experiments show that current zero-shot models struggle to ask questions that retrieve useful information, as compared to human annotators. These results demonstrate an opportunity to use FLM datasets and the PACQ framework to objectively evaluate and improve question generation and other language models.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2310.11571

Country:

North America > United States > Tennessee (0.15)
North America > United States > Alabama (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback