AITopics

Real-world data analysis tasks often come with under-specified goals and unclean data. User interaction is necessary to understand and disambiguate a user's intent, and hence, essential to solving these complex tasks. Existing benchmarks for evaluating LLMs on data analysis tasks do not capture these complexities or provide first-class support for interactivity. We introduce ConDABench, a framework for generating conversational data analysis (ConDA) benchmarks and evaluating external tools on the generated benchmarks. \bench consists of (a) a multi-agent workflow for generating realistic benchmarks from articles describing insights gained from public datasets, (b) 1,420 ConDA problems generated using this workflow, and (c) an evaluation harness that, for the first time, makes it possible to systematically evaluate conversational data analysis tools on the generated ConDA problems. Evaluation of state-of-the-art LLMs on the benchmarks reveals that while the new generation of models are better at solving more instances, they are not necessarily better at solving tasks that require sustained, long-form engagement. ConDABench is an avenue for model builders to measure progress towards truly collaborative models that can complete complex interactive tasks.

benchmark, large language model, machine learning, (19 more...)

2510.13835

Country: Asia (0.67)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.92)
Media (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Amouyal, Samuel Joseph, Meltzer-Asscher, Aya, Berant, Jonathan

Comparing Human and Language Models Sentence Processing Difficulties on Complex Structures

Large language models (LLMs) that fluently converse with humans are a reality - but do LLMs experience human-like processing difficulties? We systematically compare human and LLM sentence comprehension across seven challenging linguistic structures. We collect sentence comprehension data from humans and five families of state-of-the-art LLMs, varying in size and training procedure in a unified experimental framework. Our results show LLMs overall struggle on the target structures, but especially on garden path (GP) sentences. Indeed, while the strongest models achieve near perfect accuracy on non-GP structures (93.7% for GPT-5), they struggle on GP structures (46.8% for GPT-5). Additionally, when ranking structures based on average performance, rank correlation between humans and models increases with parameter count. For each target structure, we also collect data for their matched baseline without the difficult structure. Comparing performance on the target vs. baseline sentences, the performance gap observed in humans holds for LLMs, with two exceptions: for models that are too weak performance is uniformly low across both sentence types, and for models that are too strong the performance is uniformly high. Together, these reveal convergence and divergence in human and LLM sentence comprehension, offering new insights into the similarity of humans and LLMs.

directionality, large language model, machine learning, (20 more...)

2510.07141

Country:

Europe (1.00)
Asia > Middle East > Israel (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline

Li, Haiyang, Wang, Yaxiong, Tang, Shengeng, Wu, Lianwei, Cheng, Lechao, Zhong, Zhun

In recent years, detecting fake multimodal content on social media has drawn increasing attention. Two major forms of deception dominate: human-crafted misinformation (e.g., rumors and misleading posts) and AI-generated content produced by image synthesis models or vision-language models (VLMs). Although both share deceptive intent, they are typically studied in isolation. NLP research focuses on human-written misinformation, while the CV community targets AI-generated artifacts. As a result, existing models are often specialized for only one type of fake content. In real-world scenarios, however, the type of a multimodal post is usually unknown, limiting the effectiveness of such specialized systems. To bridge this gap, we construct the Omnibus Dataset for Multimodal News Deception (OmniFake), a comprehensive benchmark of 127K samples that integrates human-curated misinformation from existing resources with newly synthesized AI-generated examples. Based on this dataset, we propose Unified Multimodal Fake Content Detection (UMFDet), a framework designed to handle both forms of deception. UMFDet leverages a VLM backbone augmented with a Category-aware Mixture-of-Experts (MoE) Adapter to capture category-specific cues, and an attribution chain-of-thought mechanism that provides implicit reasoning guidance for locating salient deceptive signals. Extensive experiments demonstrate that UMFDet achieves robust and consistent performance across both misinformation types, outperforming specialized baselines and offering a practical solution for real-world multimodal deception detection.

detection, machine learning, natural language, (18 more...)

2509.25991

Genre: Research Report (0.64)

Industry: Media > News (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

EviNote-RAG: Enhancing RAG Models via Answer-Supportive Evidence Notes

Dai, Yuqin, Wang, Guoqing, Wang, Yuan, Dou, Kairan, Zhou, Kaichen, Zhang, Zhanwei, Yang, Shuo, Tang, Fei, Yin, Jun, Zeng, Pengyu, Ying, Zhenzhe, Yi, Can, Meng, Changhua, Zhou, Yuchen, Shen, Yongliang, Lu, Shuai

Retrieval-Augmented Generation (RAG) has advanced open-domain question answering by incorporating external information into model reasoning. However, effectively leveraging external information to enhance reasoning presents the following challenges: (1) low signal-to-noise ratio, where answer-supportive external information is diluted by irrelevant material, and (2) error accumulation, which arises in multi-hop reasoning when incomplete or misleading information is incorporated. To address these challenges, we introduce EviNote-RAG, a framework that follows a retrieve-note-answer workflow. Instead of reasoning directly over raw external information, the model first produces Supportive-Evidence Notes (SENs), which concisely preserve answer-critical information and explicitly mark key and uncertainty information to improve accuracy. We further design an entailment-based Evidence Quality Reward (EQR) to ensure that SENs are logically sufficient to derive the final answer, thereby enhancing SENs' quality. Experiments on both in-domain and out-of-domain QA benchmarks show that EviNote-RAG achieves state-of-the-art performance, improving answer accuracy, training stability, robustness, and efficiency. In particular, it yields relative F1 gains of 20% on HotpotQA (+0.093), 40% on Bamboogle (+0.151), and 91% on 2Wiki (+0.256), benefiting from improvements in the reasoning process.

large language model, machine learning, natural language, (20 more...)

2509.00877

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)
(2 more...)

SlateOct-16-2025, 22:17:55 GMT

Guillermo del Toro's em Frankenstein /em Is a Lavish Epic Decades in the Making

Movies Guillermo del Toro's Is a Lavish Epic Decades in the Making Enter your email to receive alerts for this author. You can manage your newsletter subscriptions at any time. You're already subscribed to the aa_Dana_Stevens newsletter. You can manage your newsletter subscriptions at any time. We encountered an issue signing you up.

advertisement advertisement, movie, toro, (12 more...)

Slate

Country: Europe > United Kingdom (0.04)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Science Fiction (0.53)

BBC NewsOct-16-2025, 22:00:10 GMT

Sam Fender wins 2025 Mercury Prize for album of the year

Sam Fender has won the 2025 Mercury Prize for his third album, People Watching, a steely-eyed dissection of working-class life in the north of England. The singer looked stunned when his name was announced. I didn't think that was going to happen at all, he told the BBC as he came off stage. I've spent the last 10 minutes crying. Fender beat the likes of Pulp and Wolf Alice - both former winners of the £25,000 prize for the best British or Irish album of the year - at a star-studded ceremony in Newcastle's Utilita Arena.

album, fender, mercury prize, (11 more...)

BBC News

Country:

South America (0.29)
Europe > United Kingdom > England (0.25)
North America > Central America (0.15)
(15 more...)

Genre: Personal > Honors (0.70)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence (0.49)

FOX NewsOct-16-2025, 17:29:41 GMT

We found the best appliances, from Samsung to Frigidaire

These high-tech appliances can make your life easier. Samsung, Frigidaire, GE and other big-name brands have great deals on ovens, refrigerators, small appliances, washers and dryers.

original price, refrigerator, samsung, (13 more...)

FOX News

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.72)

BBC NewsOct-16-2025, 14:21:39 GMT

Sharon Osbourne backs naming airport after Ozzy

Sharon Osbourne has said it would be amazing if Birmingham Airport was renamed in honour of her late husband, rock legend Ozzy Osbourne. The TV personality has given her support to a campaign to call the airport Ozzy Osbourne International, which was launched by podcaster and comedian Dan Hudson after the Black Sabbath singer died at the age of 76 in July. More than 70,000 people have signed a petition backing the idea, which Hudson said was inspired by airports being named after famous figures such as John Lennon. It would be amazing, Osbourne said of a potential rebrand. It's just a dream right now, but sometimes dreams come true.

airport, ozzy, sharon osbourne back, (8 more...)

BBC News

Country:

South America (0.16)
North America > Central America (0.16)
Oceania > Australia (0.07)
(14 more...)

Genre: Personal (0.51)

Industry:

Leisure & Entertainment (1.00)
Transportation > Infrastructure & Services > Airport (0.61)
Transportation > Air (0.61)
Media > Music (0.51)

Technology:

Information Technology > Artificial Intelligence (0.52)
Information Technology > Communications > Mobile (0.36)

Green sea turtle no longer Endangered

These gentle, 400-pound giants are splashing back from the brink of extinction. Breakthroughs, discoveries, and DIY tips sent every weekday. In an ocean conservation victory, green sea turtles () have been brought from the brink of extinction. The International Union for Conservation of Nature (IUCN) elevated the keystone species from Endangered to Least Concern . The global conservation organization moves species between categories once new data indicates changes in their population, threat levels, or habitat.

green sea turtle, sea turtle, turtle, (11 more...)

Popular Science

Country:

Oceania > Australia (0.06)
North America > United States > New Jersey (0.05)
North America > United States > Massachusetts > Norfolk County > Quincy (0.05)
(3 more...)

Industry: Media > Photography (0.30)

Technology: Information Technology > Artificial Intelligence (0.50)

Daily Mail - Science & techOct-16-2025, 13:43:43 GMT

Is this why aliens haven't contacted us yet? Extraterrestrials are BORED of trying to find us - and have simply stopped looking, scientist claims

'Arc de Trump' designed by president unveiled as he reveals controversial past plan for monument site'Vile' American flag spotted in Republican's office sparks Capitol investigation Experts reveal five-day window when'life-threatening' storm is set to smash US as it brews in Atlantic Ocean RICHARD EDEN: The VERY telling video that suggests one of Meghan's closest confidants has been'Markled'. He once leapt to her defence... now like so many others he needs to watch his step She's the dancer caught'going at it' in bed with Britney Spears. Nepo babies dare to bare! Celebrity offspring leave nothing to imagination as they dominate Victoria's Secret show... what would their parents say? Nightmarish moment train door closes on 65-year-old man's coat and drags him to his death MAUREEN CALLAHAN: Trump's depraved critics have committed their foulest act yet... Bella Hadid's health battle takes dark turn: Loved ones reveal hellish new details about model... as ominous texts emerge Why'embarrassed' Keith Urban is'in hiding' amid divorce from wife Nicole Kidman Disney superfan, 31, vanishes from her Midwest home months after announcing pregnancy... then horrific discovery is made at Walt Disney World Selena Gomez admits she was'sobbing' and fearing the worst just WEEKS after marrying music producer Benny Blanco in lavish ceremony Race against time to build a 211-mile gravel track across America's most extreme frontier for new'Manhattan Project'... but it could be too late Victoria's Secret show 2025: Bella Hadid rules the runway after her health woes, Jasmine Tookes opens the show at nine months pregnant and Emily Ratajkowski makes her debut aged 34 as legendary Angels and nepo babies unite after failed woke rebrand Red-eyed female executive, 61, with $1.1m home attacked two Alaska Airlines staff and forced plane to make emergency landing, police say Most shocking moments from female-fronted talk show dubbed'The View for conservatives' Nancy Pelosi explodes at reporter as she's escorted down Capitol Building steps Is this why aliens haven't contacted us yet? READ MORE: Reaching out to aliens could result in'the end of all life on Earth' It's one of the biggest unanswered questions in science: if there's life beyond Earth, why hasn't it contacted us yet?

britney spear, death, extraterrestrial, (13 more...)

Daily Mail - Science & tech

Country:

North America > United States > Florida > Orange County (0.24)
North America > United States > Alaska (0.24)
Atlantic Ocean (0.24)
(15 more...)

Genre: Personal (1.00)

Industry:

Transportation > Air (1.00)
Media > Television (1.00)
Media > Music (1.00)
(5 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)