AITopics | field goal

Collaborating Authors

field goal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Has Been Lost with Synthetic Evaluation?

Gill, Alexander, Ravichander, Abhilasha, Marasović, Ana

arXiv.org Artificial IntelligenceOct-7-2025

Large language models (LLMs) are increasingly used for data generation. However, creating evaluation benchmarks raises the bar for this emerging paradigm. Benchmarks must target specific phenomena, penalize exploiting shortcuts, and be challenging. Through two case studies, we investigate whether LLMs can meet these demands by generating reasoning over-text benchmarks and comparing them to those created through careful crowdsourcing. Specifically, we evaluate both the validity and difficulty of LLM-generated versions of two high-quality reading comprehension datasets: CondaQA, which evaluates reasoning about negation, and DROP, which targets reasoning about quantities. We find that prompting LLMs can produce variants of these datasets that are often valid according to the annotation guidelines, at a fraction of the cost of the original crowdsourcing effort. However, we show that they are less challenging for LLMs than their human-authored counterparts. This finding sheds light on what may have been lost by generating evaluation data with LLMs, and calls for critically reassessing the immediate use of this increasingly prevalent approach to benchmark creation.

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2505.2283

Country:

Europe > United Kingdom (1.00)
Asia > Middle East > UAE (0.46)
North America > United States > New Mexico (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Government > Regional Government > Europe Government > United Kingdom Government (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Attribute with Attention

Cohen-Wang, Benjamin, Chuang, Yung-Sung, Madry, Aleksander

arXiv.org Artificial IntelligenceApr-21-2025

Given a sequence of tokens generated by a language model, we may want to identify the preceding tokens that influence the model to generate this sequence. Performing such token attribution is expensive; a common approach is to ablate preceding tokens and directly measure their effects. To reduce the cost of token attribution, we revisit attention weights as a heuristic for how a language model uses previous tokens. Naive approaches to attribute model behavior with attention (e.g., averaging attention weights across attention heads to estimate a token's influence) have been found to be unreliable. To attain faithful attributions, we propose treating the attention weights of different attention heads as features. This way, we can learn how to effectively leverage attention weights for attribution (using signal from ablations). Our resulting method, Attribution with Attention (AT2), reliably performs on par with approaches that involve many ablations, while being significantly more efficient. To showcase the utility of AT2, we use it to prune less important parts of a provided context in a question answering setting, improving answer quality. We provide code for AT2 at https://github.com/MadryLab/AT2 .

attribution, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2504.13752

Country:

North America > United States (1.00)
Asia (1.00)
Europe (0.67)

Genre: Research Report (0.63)

Industry:

Leisure & Entertainment > Sports (0.94)
Government > Regional Government > North America Government > United States Government (0.93)
Media (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

The Empirical Impact of Data Sanitization on Language Models

Pal, Anwesan, Bhargava, Radhika, Hinsz, Kyle, Esterhuizen, Jacques, Bhattacharya, Sudipta

arXiv.org Artificial IntelligenceNov-8-2024

Data sanitization in the context of language modeling involves identifying sensitive content, such as personally identifiable information (PII), and redacting them from a dataset corpus. It is a common practice used in natural language processing (NLP) to maintain privacy. Nevertheless, the impact of data sanitization on the language understanding capability of a language model remains less studied. This paper empirically analyzes the effects of data sanitization across several benchmark language-modeling tasks including comprehension question answering (Q&A), entailment, sentiment analysis, and text classification. Our experiments cover a wide spectrum comprising finetuning small-scale language models, to prompting large language models (LLMs), on both original and sanitized datasets, and comparing their performance across the tasks. Interestingly, our results suggest that for some tasks such as sentiment analysis or entailment, the impact of redaction is quite low, typically around 1-5%, while for tasks such as comprehension Q&A there is a big drop of >25% in performance observed in redacted queries as compared to the original. For tasks that have a higher impact, we perform a deeper dive to inspect the presence of task-critical entities. Finally, we investigate correlation between performance and number of redacted entities, and also suggest a strategy to repair an already redacted dataset by means of content-based subsampling. Additional details are available at https://sites.google.com/view/datasan.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.05978

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
Europe > Italy > Lazio > Rome (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.54)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment > Sports > Football (0.70)
Law (0.68)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Kim, Joongwon, Paranjape, Bhargavi, Khot, Tushar, Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceJun-10-2024

Language agents perform complex tasks by using tools to execute each step precisely. However, most existing agents are based on proprietary models or designed to target specific tasks, such as mathematics or multi-hop question answering. We introduce Husky, a holistic, open-source language agent that learns to reason over a unified action space to address a diverse set of complex tasks involving numerical, tabular, and knowledge-based reasoning. Husky iterates between two stages: 1) generating the next action to take towards solving a given task and 2) executing the action using expert models and updating the current solution state. We identify a thorough ontology of actions for addressing complex tasks and curate high-quality data to train expert models for executing these actions. Our experiments show that Husky outperforms prior language agents across 14 evaluation datasets. Moreover, we introduce HuskyQA, a new evaluation set which stress tests language agents for mixed-tool reasoning, with a focus on retrieving missing knowledge and performing numerical reasoning. Despite using 7B models, Husky matches or even exceeds frontier LMs such as GPT-4 on these tasks, showcasing the efficacy of our holistic approach in addressing complex reasoning problems. Our code and models are available at https://github.com/agent-husky/Husky-v1.

agent, solution trajectory, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2406.06469

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(49 more...)

Genre:

Workflow (0.95)
Research Report > New Finding (0.67)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Transportation > Ground > Rail (1.00)
(12 more...)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Estimating the age-conditioned average treatment effects curves: An application for assessing load-management strategies in the NBA

Nakamura-Sakai, Shinpei, Forastiere, Laura, Macdonald, Brian

arXiv.org Artificial IntelligenceFeb-17-2024

In the realm of competitive sports, understanding the performance dynamics of athletes, represented by the age curve (showing progression, peak, and decline), is vital. Our research introduces a novel framework for quantifying age-specific treatment effects, enhancing the granularity of performance trajectory analysis. Firstly, we propose a methodology for estimating the age curve using game-level data, diverging from traditional season-level data approaches, and tackling its inherent complexities with a meta-learner framework that leverages advanced machine learning models. This approach uncovers intricate non-linear patterns missed by existing methods. Secondly, our framework enables the identification of causal effects, allowing for a detailed examination of age curves under various conditions. By defining the Age-Conditioned Treatment Effect (ACTE), we facilitate the exploration of causal relationships regarding treatment impacts at specific ages. Finally, applying this methodology to study the effects of rest days on performance metrics, particularly across different ages, offers valuable insights into load management strategies' effectiveness. Our findings underscore the importance of tailored rest periods, highlighting their positive impact on athlete performance and suggesting a reevaluation of current management practices for optimizing athlete performance.

box score statistics, field goal, treatment effect, (11 more...)

arXiv.org Artificial Intelligence

2402.124

Country:

North America > United States > Connecticut > New Haven County > New Haven (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.86)

Industry:

Leisure & Entertainment > Sports > Basketball (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Modeling & Simulation (0.93)

Add feedback

Chiefs' Harrison Butker drills longest field goal in Super Bowl history, breaking record set in 1st half

FOX NewsFeb-12-2024, 02:28:10 GMT

Fox News Flash top sports headlines are here. Check out what's clicking on Foxnews.com. The first score of the Super Bowl turned out to be historic, but the record didn't stand for long. Kansas City Chiefs kicker Harrison Butker drilled his second field goal of Super Bowl LVIII against the San Francisco 49ers, a 57-yarder that rewrote the record for longest kick in the "Big Game." And that previous record was hit in the first half by 49ers rookie kicker Jake Moody, who hit one home from 56 yards.

field goal, longest field goal, san francisco 49er, (8 more...)

FOX News

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.64)
North America > United States > California > San Francisco County > San Francisco (0.33)
North America > United States > Nevada > Clark County > Las Vegas (0.06)
North America > United States > Michigan (0.06)

Industry: Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.40)

Add feedback

EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning

Mekala, Rajasekhar Reddy, Razeghi, Yasaman, Singh, Sameer

arXiv.org Artificial IntelligenceOct-12-2023

Language models are achieving impressive performance on various tasks by aggressively adopting inference-time prompting techniques, such as zero-shot and few-shot prompting. In this work, we introduce EchoPrompt, a simple yet effective approach that prompts the model to rephrase its queries before answering them. EchoPrompt is adapted for both zero-shot and few-shot in-context learning with standard and chain-of-thought prompting. Experimental results show that EchoPrompt yields substantial improvements across all these settings for four families of causal language models. These improvements are observed across various numerical reasoning (e.g. GSM8K, SVAMP), reading comprehension (e.g. DROP), and logical reasoning (e.g. Coin Flipping) tasks. On average, EchoPrompt improves the Zero-shot-CoT performance of code-davinci-002 by 5% in numerical tasks and 13% in reading comprehension tasks. We investigate the factors contributing to EchoPrompt's effectiveness through ablation studies, which reveal that both the original query and the model-generated rephrased version are instrumental in its performance gains. Our empirical results indicate that EchoPrompt is an effective technique that enhances in-context learning performance. We recommend incorporating EchoPrompt into various baseline prompting strategies to achieve performance boosts.

echoprompt, rewriting, simple word, (16 more...)

arXiv.org Artificial Intelligence

2309.10687

Country:

Europe (0.14)
North America > Dominican Republic (0.04)
North America > Central America (0.04)
(4 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Education (1.00)
Media > Music (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Successive Prompting for Decomposing Complex Questions

Dua, Dheeru, Gupta, Shivanshu, Singh, Sameer, Gardner, Matt

arXiv.org Artificial IntelligenceDec-8-2022

Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting by demonstrating how to output intermediate rationalizations while solving the complex question in a single pass. We introduce ``Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution. Successive prompting decouples the supervision for decomposing complex questions from the supervision for answering simple questions, allowing us to (1) have multiple opportunities to query in-context examples at each reasoning step (2) learn question decomposition separately from question answering, including using synthetic data, and (3) use bespoke (fine-tuned) components for reasoning steps where a large LM does not perform well. The intermediate supervision is typically manually written, which can be expensive to collect. We introduce a way to generate a synthetic dataset which can be used to bootstrap a model's ability to decompose and answer intermediate questions. Our best model (with successive prompting) achieves an improvement of ~5% absolute F1 on a few-shot version of the DROP dataset when compared with a state-of-the-art model with the same supervision.

artificial intelligence, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2212.04092

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Mexico > Veracruz (0.05)
Asia > China > Hong Kong (0.04)
(6 more...)

Genre:

Research Report (0.70)
Workflow (0.66)

Industry: Leisure & Entertainment > Sports > Football (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.70)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)

Add feedback

A Neural-Symbolic Approach to Natural Language Understanding

Liu, Zhixuan, Wang, Zihao, Lin, Yuan, Li, Hang

arXiv.org Artificial IntelligenceOct-21-2022

Deep neural networks, empowered by pre-trained language models, have achieved remarkable results in natural language understanding (NLU) tasks. However, their performances can drastically deteriorate when logical reasoning is needed. This is because NLU in principle depends on not only analogical reasoning, which deep neural networks are good at, but also logical reasoning. According to the dual-process theory, analogical reasoning and logical reasoning are respectively carried out by System 1 and System 2 in the human brain. Inspired by the theory, we present a novel framework for NLU called Neural-Symbolic Processor (NSP), which performs analogical reasoning based on neural processing and logical reasoning based on both neural and symbolic processing. As a case study, we conduct experiments on two NLU tasks, question answering (QA) and natural language inference (NLI), when numerical reasoning (a type of logical reasoning) is necessary. The experimental results show that our method significantly outperforms state-of-the-art methods in both tasks.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2203.10557

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Sports > Football (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Teaching Neural Module Networks to Do Arithmetic

Chen, Jiayi, Guo, Xiao-Yu, Li, Yuan-Fang, Haffari, Gholamreza

arXiv.org Artificial IntelligenceOct-6-2022

Answering complex questions that require multi-step multi-type reasoning over raw text is challenging, especially when conducting numerical reasoning. Neural Module Networks(NMNs), follow the programmer-interpreter framework and design trainable modules to learn different reasoning skills. However, NMNs only have limited reasoning abilities, and lack numerical reasoning capability. We up-grade NMNs by: (a) bridging the gap between its interpreter and the complex questions; (b) introducing addition and subtraction modules that perform numerical reasoning over numbers. On a subset of DROP, experimental results show that our proposed methods enhance NMNs' numerical reasoning skills by 17.7% improvement of F1 score and significantly outperform previous state-of-the-art models.

artificial intelligence, module, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.02703

Country:

Europe > Austria > Vienna (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(13 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Government (0.95)
Leisure & Entertainment > Sports > Football (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback