AITopics | excerpt

Collaborating Authors

excerpt

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplemental Information for " Diverse Community Data for Benchmarking Data Privacy Algorithms " October 27, 2023 Supplemental Information Contents

Neural Information Processing SystemsFeb-16-2026, 06:02:18 GMT

SDNist are intended as tools to encourage investigation and discussion of deiden-tification algorithms, and they are not intended or suitable for product evaluation. The National Institute of Standards and Technology does not endorse any algorithm included in these resources.

artificial intelligence, disperse, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Tarrant County > Fort Worth (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.70)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.68)

Add feedback

Diverse Community Data for Benchmarking Data Privacy Algorithms

Neural Information Processing SystemsFeb-16-2026, 06:02:15 GMT

Deidentification algorithms are vulnerable to the same bias and privacy issues that impact other data analytics and machine learning applications, and it can even amplify those issues by contaminating downstream applications.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe (0.14)
(5 more...)

Genre: Research Report (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.30)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

What if Readers Like A.I.-Generated Fiction?

The New YorkerDec-20-2025, 11:00:00 GMT

Finally, he gave the summaries to his fine-tuned model, and he asked it to compose passages "in the style of Vauhini Vara." Going into all this, I was self-assured, even smug. I'd always felt that my style was original and, more important, that my books were totally distinct from one another. I figured that, even if the A.I. model could imitate my past books, it couldn't predict the style of the novel in progress. So, when Chakrabarty sent me the A.I.-generated imitations, I was genuinely confused.

artificial intelligence, large language model, natural language, (20 more...)

The New Yorker

Country:

South America (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > Michigan (0.04)
(7 more...)

Genre:

Personal (1.00)
Research Report > New Finding (0.46)

Industry:

Media > News (0.46)
Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)

Add feedback

FLAWS: A Benchmark for Error Identification and Localization in Scientific Papers

Xi, Sarina, Rao, Vishisht, Payan, Justin, Shah, Nihar B.

arXiv.org Artificial IntelligenceDec-1-2025

The identification and localization of errors is a core task in peer review, yet the exponential growth of scientific output has made it increasingly difficult for human reviewers to reliably detect errors given the limited pool of experts. Recent advances in Large Language Models (LLMs) have sparked interest in their potential to support such evaluation tasks, from academic peer review to automated scientific assessment. However, despite the growing use of LLMs in review systems, their capabilities to pinpoint errors remain underexplored. In this work, we introduce Fault Localization Across Writing in Science (FLAWS), an automated benchmark consisting of 713 paper-error pairs designed to evaluate how effectively LLMs detect errors that undermine key claims in research papers. We construct the benchmark by systematically inserting claim-invalidating errors into peer-reviewed papers using LLMs, paired with an automated evaluation metric that measures whether models can identify and localize these errors. Developing such a benchmark presents unique challenges that we overcome: ensuring that the inserted errors are well-defined, challenging, and relevant to the content of the paper, avoiding artifacts that would make identification trivial, and designing a scalable, automated evaluation metric. On the resulting benchmark, we evaluate five frontier LLMs: Claude Sonnet 4.5, DeepSeek Reasoner v3.1, Gemini 2.5 Pro, GPT 5, and Grok 4. Among these, GPT 5 is the top-performing model, achieving 39.1% identification accuracy when k=10, where k is the number of top-ranked error text candidates generated by the LLM.

excerpt, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2511.21843

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CiteME: Can Language Models Accurately Cite Scientific Claims?

Neural Information Processing SystemsNov-13-2025, 17:40:59 GMT

Thousands of new scientific papers are published each month.

excerpt, large language model, machine learning, (23 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Jordan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback

Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

Chakrabarty, Tuhin, Ginsburg, Jane C., Dhillon, Paramveer

arXiv.org Artificial IntelligenceNov-4-2025

The use of copyrighted books for training AI models has led to numerous lawsuits from authors concerned about AI's ability to generate derivative content. Yet it's unclear if these models can generate high quality literary text while emulating authors' styles. To answer this we conducted a preregistered study comparing MFA-trained expert writers with three frontier AI models: ChatGPT, Claude & Gemini in writing up to 450 word excerpts emulating 50 award-winning authors' diverse styles. In blind pairwise evaluations by 159 representative expert & lay readers, AI-generated text from in-context prompting was strongly disfavored by experts for both stylistic fidelity (OR=0.16, p<10^-8) & writing quality (OR=0.13, p<10^-7) but showed mixed results with lay readers. However, fine-tuning ChatGPT on individual authors' complete works completely reversed these findings: experts now favored AI-generated text for stylistic fidelity (OR=8.16, p<10^-13) & writing quality (OR=1.87, p=0.010), with lay readers showing similar shifts. These effects generalize across authors & styles. The fine-tuned outputs were rarely flagged as AI-generated (3% rate v. 97% for in-context prompting) by best AI detectors. Mediation analysis shows this reversal occurs because fine-tuning eliminates detectable AI stylistic quirks (e.g., cliche density) that penalize in-context outputs. While we do not account for additional costs of human effort required to transform raw AI output into cohesive, publishable prose, the median fine-tuning & inference cost of $81 per author represents a dramatic 99.7% reduction compared to typical professional writer compensation. Author-specific fine-tuning thus enables non-verbatim AI writing that readers prefer to expert human writing, providing empirical evidence directly relevant to copyright's fourth fair-use factor, the "effect upon the potential market or value" of the source works.

excerpt, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.13939

Country:

North America > United States > Michigan (0.04)
North America > United States > California (0.04)
North America > United States > Virginia (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law > Litigation (1.00)
Law > Intellectual Property & Technology Law (1.00)
Government > Regional Government > North America Government > United States Government (0.68)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Multimodal Large Language Models on Core Music Perception Tasks

Carone, Brandon James, Roman, Iran R., Ripollés, Pablo

arXiv.org Artificial IntelligenceOct-28-2025

Multimodal Large Language Models (LLMs) claim "musical understanding" via evaluations that conflate listening with score reading. We benchmark three SOTA LLMs (Gemini 2.5 Pro, Gemini 2.5 Flash, and Qwen2.5-Omni) across three core music skills: Syncopation Scoring, Transposition Detection, and Chord Quality Identification. Moreover, we separate three sources of variability: (i) perceptual limitations (audio vs. MIDI inputs), (ii) exposure to examples (zero- vs. few-shot manipulations), and (iii) reasoning strategies (Standalone, CoT, LogicLM). For the latter we adapt LogicLM, a framework combining LLMs with symbolic solvers to perform structured reasoning, to music. Results reveal a clear perceptual gap: models perform near ceiling on MIDI but show accuracy drops on audio. Reasoning and few-shot prompting offer minimal gains. This is expected for MIDI, where performance reaches saturation, but more surprising for audio, where LogicLM, despite near-perfect MIDI accuracy, remains notably brittle. Among models, Gemini Pro achieves the highest performance across most conditions. Overall, current systems reason well over symbols (MIDI) but do not yet "listen" reliably from audio. Our method and dataset make the perception-reasoning boundary explicit and offer actionable guidance for building robust, audio-first music systems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.22455

Country:

Asia > Middle East > Iran (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

LC-Eval: A Bilingual Multi-Task Evaluation Benchmark for Long-Context Understanding

Jubair, Sheikh, Omayrah, Arwa, Alshammari, Amal, Althnian, Alhanoof, Alothaimen, Abdulhamed, Alzahrani, Norah A., Alzaidi, Shahad D., Al-Twairesh, Nora, Al-Thubaity, Abdulmohsen

arXiv.org Artificial IntelligenceOct-21-2025

Recent advancements in Large Language Models (LLMs) have demonstrated sophisticated capabilities, including the ability to process and comprehend extended contexts. These emergent capabilities necessitate rigorous evaluation methods to effectively assess their performance in long-context understanding. In this paper, we present \textbf{LC-Eval}, a bilingual, multi-task evaluation benchmark designed to evaluate long-context understanding in English and Arabic, targeting context lengths ranging from 4k to over 128k tokens. LC-Eval introduces four novel and challenging tasks: multi-document question answering, bilingual question answering, claim verification within a paragraph, and multiple-choice questions based on long contexts. These tasks are designed to assess LLMs' abilities in deep reasoning, document comprehension, information tracing, and bilingual information extraction and understanding. The benchmark includes datasets in both Arabic and English for each task, allowing for a comparative analysis of their performance across different text genres. Evaluations were conducted on both open-weight and closed LLMs, with results indicating that LC-Eval presents significant challenges. Even high-performing models, such as GPT-4o, struggled with certain tasks, highlighting the complexity and rigor of the benchmark.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.16783

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > France (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(7 more...)

Genre:

Research Report (0.64)
Questionnaire & Opinion Survey (0.49)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AIReg-Bench: Benchmarking Language Models That Assess AI Regulation Compliance

Marino, Bill, Hunter, Rosco, Jamali, Zubair, Kalpakos, Marinos Emmanouil, Kashyap, Mudra, Hinton, Isaiah, Hanson, Alexa, Nazir, Maahum, Schnabl, Christoph, Steffek, Felix, Wen, Hongkai, Lane, Nicholas D.

arXiv.org Artificial IntelligenceOct-14-2025

As governments move to regulate AI, there is growing interest in using Large Language Models (LLMs) to assess whether or not an AI system complies with a given AI Regulation (AIR). However, there is presently no way to benchmark the performance of LLMs at this task. To fill this void, we introduce AIReg-Bench: the first benchmark dataset designed to test how well LLMs can assess compliance with the EU AI Act (AIA). We created this dataset through a two-step process: (1) by prompting an LLM with carefully structured instructions, we generated 120 technical documentation excerpts (samples), each depicting a fictional, albeit plausible, AI system - of the kind an AI provider might produce to demonstrate their compliance with AIR; (2) legal experts then reviewed and annotated each sample to indicate whether, and in what way, the AI system described therein violates specific Articles of the AIA. The resulting dataset, together with our evaluation of whether frontier LLMs can reproduce the experts' compliance labels, provides a starting point to understand the opportunities and limitations of LLM-based AIR compliance assessment tools and establishes a benchmark against which subsequent LLMs can be compared. The dataset and evaluation code are available at https://github.com/camlsys/aireg-bench.

ai system, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.01474

Country: