AITopics | Stanovsky, Gabriel

Collaborating Authors

Stanovsky, Gabriel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Seeing the Forest for the Trees: A Large Scale, Continuously Updating Meta-Analysis of Frontier LLMs

Park, Jungsoo, Kang, Junmo, Stanovsky, Gabriel, Ritter, Alan

arXiv.org Artificial IntelligenceFeb-25-2025

The surge of LLM studies makes synthesizing their findings challenging. Meta-analysis can uncover important trends across studies, but its use is limited by the time-consuming nature of manual data extraction. Our study presents a semi-automated approach for meta-analysis that accelerates data extraction using LLMs. It automatically identifies relevant arXiv papers, extracts experimental results and related attributes, and organizes them into a structured dataset. We conduct a comprehensive meta-analysis of frontier LLMs using an automatically extracted dataset, reducing the effort of paper surveying and data extraction by more than 93\% compared to manual approaches. We validate our dataset by showing that it reproduces key findings from a recent manual meta-analysis about Chain-of-Thought (CoT), and also uncovers new insights that go beyond it, showing for example that in-context examples benefit multimodal tasks but offer limited gains in mathematical tasks compared to CoT. Our automatically updatable dataset enables continuous tracking of target models by extracting evaluation studies as new data becomes available. Through our scientific artifacts and empirical analysis, we provide novel insights into LLMs while facilitating ongoing meta-analyses of their behavior.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.18791

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

WildFrame: Comparing Framing in Humans and LLMs on Naturally Occurring Texts

Lior, Gili, Nacchace, Liron, Stanovsky, Gabriel

arXiv.org Artificial IntelligenceFeb-24-2025

Humans are influenced by how information is presented, a phenomenon known as the framing effect. Previous work has shown that LLMs may also be susceptible to framing but has done so on synthetic data and did not compare to human behavior. We introduce WildFrame, a dataset for evaluating LLM responses to positive and negative framing, in naturally-occurring sentences, and compare humans on the same data. WildFrame consists of 1,000 texts, first selecting real-world statements with clear sentiment, then reframing them in either positive or negative light, and lastly, collecting human sentiment annotations. By evaluating eight state-of-the-art LLMs on WildFrame, we find that all models exhibit framing effects similar to humans ($r\geq0.57$), with both humans and models being more influenced by positive rather than negative reframing. Our findings benefit model developers, who can either harness framing or mitigate its effects, depending on the downstream application.

large language model, machine learning, sentiment, (21 more...)

arXiv.org Artificial Intelligence

2502.17091

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs

Simhi, Adi, Itzhak, Itay, Barez, Fazl, Stanovsky, Gabriel, Belinkov, Yonatan

arXiv.org Artificial IntelligenceFeb-18-2025

Large Language Models (LLMs) often generate outputs that lack grounding in real-world facts, a phenomenon known as hallucinations. Prior research has associated hallucinations with model uncertainty, leveraging this relationship for hallucination detection and mitigation. In this paper, we challenge the underlying assumption that all hallucinations are associated with uncertainty. Using knowledge detection and uncertainty measurement methods, Figure 1: Do high-certainty hallucinations exist? An we demonstrate that models can hallucinate illustrative categorization of hallucinations based on a with high certainty even when they have the model's knowledge and certainty. Highlighted is the correct knowledge. We further show that highcertainty phenomenon of high-certainty hallucinations (purple) hallucinations are consistent across - where models confidently produce incorrect outputs, models and datasets, distinctive enough to be even when they have the correct knowledge. While other singled out, and challenge existing mitigation types of hallucinations can potentially be explained by methods. Our findings reveal an overlooked aspect the model not knowing, being mistaken, or uncertain, of hallucinations, emphasizing the need to high-certainty hallucinations are harder to rationalize, understand their origins and improve mitigation making their existence particularly intriguing.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.12964

Country: Asia > Middle East (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Beyond Benchmarks: On The False Promise of AI Regulation

Stanovsky, Gabriel, Keydar, Renana, Perl, Gadi, Habba, Eliya

arXiv.org Artificial IntelligenceJan-26-2025

The rapid advancement of artificial intelligence (AI) systems in critical domains like healthcare, justice, and social services has sparked numerous regulatory initiatives aimed at ensuring their safe deployment. Current regulatory frameworks, exemplified by recent US and EU efforts, primarily focus on procedural guidelines while presuming that scientific benchmarking can effectively validate AI safety, similar to how crash tests verify vehicle safety or clinical trials validate drug efficacy. However, this approach fundamentally misunderstands the unique technical challenges posed by modern AI systems. Through systematic analysis of successful technology regulation case studies, we demonstrate that effective scientific regulation requires a causal theory linking observable test outcomes to future performance - for instance, how a vehicle's crash resistance at one speed predicts its safety at lower speeds. We show that deep learning models, which learn complex statistical patterns from training data without explicit causal mechanisms, preclude such guarantees. This limitation renders traditional regulatory approaches inadequate for ensuring AI safety. Moving forward, we call for regulators to reckon with this limitation, and propose a preliminary two-tiered regulatory framework that acknowledges these constraints: mandating human oversight for high-risk applications while developing appropriate risk communication strategies for lower-risk uses. Our findings highlight the urgent need to reconsider fundamental assumptions in AI regulation and suggest a concrete path forward for policymakers and researchers.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.15693

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.66)

Industry:

Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > Europe Government (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time

Berger, Uri, Abend, Omri, Frermann, Lea, Stanovsky, Gabriel

arXiv.org Artificial IntelligenceJan-8-2025

Incorporating automatically predicted human feedback into the process of training generative models has attracted substantial recent interest, while feedback at inference time has received less attention. The typical feedback at training time, i.e., preferences of choice given two samples, does not naturally transfer to the inference phase. We introduce a novel type of feedback -- caption reformulations -- and train models to mimic reformulation feedback based on human annotations. Our method does not require training the image captioning model itself, thereby demanding substantially less computational effort. We experiment with two types of reformulation feedback: first, we collect a dataset of human reformulations that correct errors in the generated captions. We find that incorporating reformulation models trained on this data into the inference phase of existing image captioning models results in improved captions, especially when the original captions are of low quality. We apply our method to non-English image captioning, a domain where robust models are less prevalent, and gain substantial improvement. Second, we apply reformulations to style transfer. Quantitative evaluations reveal state-of-the-art performance on German image captioning and English style transfer, while human validation with a detailed comparative framework exposes the specific axes of improvement.

caption, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2501.04513

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

The State and Fate of Summarization Datasets

Dahan, Noam, Stanovsky, Gabriel

arXiv.org Artificial IntelligenceNov-7-2024

Automatic summarization has consistently attracted attention, due to its versatility and wide application in various downstream tasks. Despite its popularity, we find that annotation efforts have largely been disjointed, and have lacked common terminology. Consequently, it is challenging to discover existing resources or identify coherent research directions. To address this, we survey a large body of work spanning 133 datasets in over 100 languages, creating a novel ontology covering sample properties, collection methods and distribution. With this ontology we make key observations, including the lack in accessible high-quality datasets for low-resource languages, and the field's over-reliance on the news domain and on automatically collected distant supervision. Finally, we make available a web interface that allows users to interact and explore our ontology and dataset collection, as well as a template for a summarization data card, which can be used to streamline future research into a more coherent body of work.

artificial intelligence, natural language, social media, (15 more...)

arXiv.org Artificial Intelligence

2411.04585

Country:

Asia (1.00)
Europe (0.93)
North America > United States (0.28)

Genre: Overview (1.00)

Industry:

Law (0.68)
Media > News (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.68)

Add feedback

SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction

Neuberger, Shlomo, Eckhaus, Niv, Berger, Uri, Taubenfeld, Amir, Stanovsky, Gabriel, Goldstein, Ariel

arXiv.org Artificial IntelligenceNov-5-2024

Many human interactions, such as political debates, are carried out in group settings, where there are arbitrarily many participants, each with different views and agendas. To explore such complex social settings, we present SAUCE: a customizable Python platform, allowing researchers to plug-and-play various LLMs participating in discussions on any topic chosen by the user. Our platform takes care of instantiating the models, scheduling their responses, managing the discussion history, and producing a comprehensive output log, all customizable through configuration files, requiring little to no coding skills. A novel feature of SAUCE is our asynchronous communication feature, where models decide when to speak in addition to what to say, thus modeling an important facet of human communication. We show SAUCE's attractiveness in two initial experiments, and invite the community to use it in simulating various group simulations.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.03397

Country: Asia (0.29)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Looking Beyond The Top-1: Transformers Determine Top Tokens In Order

Lioubashevski, Daria, Schlank, Tomer, Stanovsky, Gabriel, Goldstein, Ariel

arXiv.org Artificial IntelligenceOct-26-2024

Understanding the inner workings of Transformers is crucial for achieving more accurate and efficient predictions. In this work, we analyze the computation performed by Transformers in the layers after the top-1 prediction has become fixed, which has been previously referred to as the "saturation event". We expand the concept of saturation events for top-k tokens, demonstrating that similar saturation events occur across language, vision, and speech models. We find that these saturation events happen in order of the corresponding tokens' ranking, i.e., the model first decides on the top ranking token, then the second highest ranking token, and so on. This phenomenon seems intrinsic to the Transformer architecture, occurring across different architectural variants (decoder-only, encoder-only, and to a lesser extent full-Transformer), and even in untrained Transformers. We propose an underlying mechanism of task transition for this sequential saturation, where task k corresponds to predicting the k-th most probable token, and the saturation events are in fact discrete transitions between the tasks. In support of this we show that it is possible to predict the current task from hidden layer embedding. Furthermore, using an intervention method we demonstrate that we can cause the model to switch from one task to the next. Finally, leveraging our findings, we introduce a novel token-level early-exit strategy, which surpasses existing methods in balancing performance and efficiency.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.2021

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Do Zombies Understand? A Choose-Your-Own-Adventure Exploration of Machine Cognition

Goldstein, Ariel, Stanovsky, Gabriel

arXiv.org Artificial IntelligenceJul-11-2024

Recent advances in LLMs have sparked a debate on whether they understand text. In this position paper, we argue that opponents in this debate hold different definitions for understanding, and particularly differ in their view on the role of consciousness. To substantiate this claim, we propose a thought experiment involving an open-source chatbot $Z$ which excels on every possible benchmark, seemingly without subjective experience. We ask whether $Z$ is capable of understanding, and show that different schools of thought within seminal AI research seem to answer this question differently, uncovering their terminological disagreement. Moving forward, we propose two distinct working definitions for understanding which explicitly acknowledge the question of consciousness, and draw connections with a rich literature in philosophy, psychology and neuroscience.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.00499

Country:

Europe (1.00)
North America > United States > Louisiana (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games (0.97)
Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology:

Information Technology > Artificial Intelligence > Issues (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
(2 more...)

Add feedback

Filters

Collaborating Authors

Stanovsky, Gabriel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG

Seeing the Forest for the Trees: A Large Scale, Continuously Updating Meta-Analysis of Frontier LLMs

WildFrame: Comparing Framing in Humans and LLMs on Naturally Occurring Texts

Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs

Beyond Benchmarks: On The False Promise of AI Regulation

Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time

The State and Fate of Summarization Datasets

SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction

Looking Beyond The Top-1: Transformers Determine Top Tokens In Order

Do Zombies Understand? A Choose-Your-Own-Adventure Exploration of Machine Cognition