AITopics | Soweto

Collaborating Authors

Soweto

Long-form factuality in large language models Jerry Wei 1 Chengrun Y ang 1 Xinying Song 1 Yifeng Lu

Neural Information Processing SystemsNov-19-2025, 21:47:41 GMT

To benchmark a model's long-form factuality in open domains, we first use GPT -4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE).

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > South Australia > Adelaide (0.14)
(48 more...)

Genre:

Research Report > Experimental Study (1.00)
Personal > Honors (0.67)

Industry:

Media > Television (1.00)
Media > Music (1.00)
Media > Film (1.00)
(23 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

937ae0e83eb08d2cb8627fe1def8c751-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 09:59:25 GMT

factuality, individual fact, language model, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > South Australia > Adelaide (0.14)
(50 more...)

Genre:

Research Report > Experimental Study (1.00)
Personal > Honors (0.67)
Overview (0.67)

Industry:

Media > Television (1.00)
Media > Music (1.00)
Media > Film (1.00)
(22 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Add feedback

Towards culturally-appropriate conversational AI for health in the majority world: An exploratory study with citizens and professionals in Latin America

Peters, Dorian, Espinoza, Fernanda, da Re, Marco, Ivetta, Guido, Benotti, Luciana, Calvo, Rafael A.

arXiv.org Artificial IntelligenceJul-3-2025

There is justifiable interest in leveraging conversational AI (CAI) for health across the majority world, but to be effective, CAI must respond appropriately within cultur ally and linguistically diverse context s . Therefore, we need ways to address the fact that current LLMs exclude many lived experience s globally . Various advances are underway which focus on top - down approaches and increas ing training data . In this paper, we aim to complement these with a bottom - up locally - grounded approach based on qualitative data collected during participatory workshops in Latin America. Our goal is to construct a rich and human - centred understanding o f: a) potential areas of cultural misalignment in digital health; b) regional perspectives on chatbots for health and c) strategies for creating culturally - appropriate CAI; with a focus on the understudied Latin American context . Our findings show that academic boundaries on notions of cultur e lose meaning at the ground level and technologies will need to engage with a broad er framework; one that encapsulates the way economics, politics, geogr aphy and local logistics are entangled in cultural experience. To this end, we introduce a framework for ' Pluriversal Conversational AI for H ealth ' which allows for the possibility that more relationality and tolerance, rather than just more data, may be called for .

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.01719

Country:

North America > Central America (0.61)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States (0.14)
(21 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Consumer Health (1.00)
Government (1.00)
(6 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

CogSimulator: A Model for Simulating User Cognition & Behavior with Minimal Data for Tailored Cognitive Enhancement

Bian, Weizhen, Zhou, Yubo, Luo, Yuanhang, Mo, Ming, Liu, Siyan, Gong, Yikai, Wan, Renjie, Luo, Ziyuan, Wang, Aobo

arXiv.org Artificial IntelligenceDec-10-2024

The interplay between cognition and gaming, notably through educational games enhancing cognitive skills, has garnered significant attention in recent years. This research introduces the CogSimulator, a novel algorithm for simulating user cognition in small-group settings with minimal data, as the educational game Wordle exemplifies. The CogSimulator employs Wasserstein-1 distance and coordinates search optimization for hyperparameter tuning, enabling precise few-shot predictions in new game scenarios. Comparative experiments with the Wordle dataset illustrate that our model surpasses most conventional machine learning models in mean Wasserstein-1 distance, mean squared error, and mean accuracy, showcasing its efficacy in cognitive enhancement through tailored game design.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.14188

Country:

Asia > China > Hong Kong (0.05)
North America > United States > New York (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

Lyu, Bohan, Cao, Yadi, Watson-Parris, Duncan, Bergen, Leon, Berg-Kirkpatrick, Taylor, Yu, Rose

arXiv.org Artificial IntelligenceNov-1-2024

Large Language Models (LLMs) demonstrate promising capabilities in solving simple scientific problems but often produce hallucinations for complex ones. While integrating LLMs with tools can increase reliability, this approach typically results in over-reliance on tools, diminishing the model's ability to solve simple problems through basic reasoning. In contrast, human experts first assess problem complexity using domain knowledge before choosing an appropriate solution approach. Inspired by this human problem-solving process, we propose a novel two-component fine-tuning method. In the first component World Knowledge Distillation (WKD), LLMs learn directly from solutions generated using tool's information to internalize domain knowledge. In the second component Tool Usage Adaptation (TUA), we partition problems into easy and hard categories based on the model's direct answering accuracy. While maintaining the same alignment target for easy problems as in WKD, we train the model to intelligently switch to tool usage for more challenging problems. We validate our method on six scientific benchmark datasets, spanning mathematics, climate science and epidemiology. On average, our models demonstrate a 28.18% improvement in answer accuracy and a 13.89% increase in tool usage precision across all datasets, surpassing state-of-the-art models including GPT-4o and Claude-3.5.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2411.00412

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Singapore (0.04)
(8 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Epidemiology (0.49)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Air pollution in South Africa: affordable new devices use AI to monitor hotspots in real time

AIHubAug-28-2024, 14:14:17 GMT

Air quality has become one of the most important public health issues in Africa. Poor air quality kills more people globally every year than HIV, TB and malaria combined. Air pollution makes people less productive because they get headaches and feel tired. India, for example, has poor air quality. The impact of India's poor air quality on its gross domestic product is about US 100 billion every year.

air quality, artificial intelligence, real time system, (12 more...)

AIHub

Country:

Asia > India (0.46)
Africa > South Africa > Gauteng > Soweto (0.07)
Africa > South Africa > Gauteng > Johannesburg (0.06)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Architecture > Real Time Systems (0.41)

Add feedback

Refining Skewed Perceptions in Vision-Language Models through Visual Representations

Dai, Haocheng, Joshi, Sarang

arXiv.org Artificial IntelligenceMay-22-2024

Large vision-language models (VLMs), such as CLIP, have become foundational, demonstrating remarkable success across a variety of downstream tasks. Despite their advantages, these models, akin to other foundational systems, inherit biases from the disproportionate distribution of real-world data, leading to misconceptions about the actual environment. Prevalent datasets like ImageNet are often riddled with non-causal, spurious correlations that can diminish VLM performance in scenarios where these contextual elements are absent. This study presents an investigation into how a simple linear probe can effectively distill task-specific core features from CLIP's embedding for downstream applications. Our analysis reveals that the CLIP text representations are often tainted by spurious correlations, inherited in the biased pre-training dataset. Empirical evidence suggests that relying on visual representations from CLIP, as opposed to text embedding, is more practical to refine the skewed perceptions in VLMs, emphasizing the superior utility of visual representations in overcoming embedded biases. Our codes will be available in here.

background, representation, spurious correlation, (14 more...)

arXiv.org Artificial Intelligence

2405.1403

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
Europe > Latvia > Riga Municipality > Riga (0.04)
Europe > Albania (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Long-form factuality in large language models

Wei, Jerry, Yang, Chengrun, Song, Xinying, Lu, Yifeng, Hu, Nathan, Huang, Jie, Tran, Dustin, Peng, Daiyi, Liu, Ruibo, Huang, Da, Du, Cosmo, Le, Quoc V.

arXiv.org Artificial IntelligenceApr-3-2024

Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether a fact is supported by the search results. Furthermore, we propose extending F1 score as an aggregated metric for long-form factuality. To do so, we balance the percentage of supported facts in a response (precision) with the percentage of provided facts relative to a hyperparameter representing a user's preferred response length (recall). Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time. At the same time, SAFE is more than 20 times cheaper than human annotators. We also benchmark thirteen language models on LongFact across four model families (Gemini, GPT, Claude, and PaLM-2), finding that larger language models generally achieve better long-form factuality. LongFact, SAFE, and all experimental code are available at https://github.com/google-deepmind/long-form-factuality.

factuality, individual fact, language model, (14 more...)

arXiv.org Artificial Intelligence

2403.18802

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > South Australia > Adelaide (0.14)
(50 more...)

Genre:

Research Report (1.00)
Personal > Honors (0.67)
Personal > Interview (0.47)
Personal > Obituary (0.45)

Industry:

Media > Television (1.00)
Media > Music (1.00)
Media > Film (1.00)
(21 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Elon Musk Went from Superhero to Supervillain

The New YorkerSep-11-2023, 10:00:00 GMT

In 2021, Elon Musk became the world's richest man (no woman came close), and Time named him Person of the Year: "This is the man who aspires to save our planet and get us a new one to inhabit: clown, genius, edgelord, visionary, industrialist, showman, cad; a madcap hybrid of Thomas Edison, P. T. Barnum, Andrew Carnegie and Watchmen's Doctor Manhattan, the brooding, blue-skinned man-god who invents electric cars and moves to Mars." Right about when Time was preparing that giddy announcement, three women whose ovaries and uteruses were involved in passing down the madcap man-god's genes were in the maternity ward of a hospital in Austin. Musk believes a declining birth rate is a threat to civilization and, with his trademark tirelessness, is doing his visionary edgelord best to ward off that threat. Shivon Zilis, a thirty-five-year-old venture capitalist and executive at Musk's company Neuralink, was pregnant with twins, conceived with Musk by in-vitro fertilization, and was experiencing complications. "He really wants smart people to have kids, so he encouraged me to," Zilis said.

elon musk, musk, south africa, (14 more...)

The New Yorker

Country:

North America > United States > New York (0.05)
North America > Canada (0.05)
Europe > Ukraine (0.05)
(2 more...)

Genre:

Summary/Review (0.40)
Personal (0.35)

Industry:

Transportation (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.69)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Data Augmentation for Neural NLP

Pluščec, Domagoj, Šnajder, Jan

arXiv.org Artificial IntelligenceFeb-22-2023

Data scarcity is a problem that occurs in languages and tasks where we do not have large amounts of labeled data but want to use state-of-the-art models. Such models are often deep learning models that require a significant amount of data to train. Acquiring data for various machine learning problems is accompanied by high labeling costs. Data augmentation is a low-cost approach for tackling data scarcity. This paper gives an overview of current state-of-the-art data augmentation methods used for natural language processing, with an emphasis on methods for neural and transformer-based models. Furthermore, it discusses the practical challenges of data augmentation, possible mitigations, and directions for future research.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.11412

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Dominican Republic (0.04)
(14 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry:

Government (0.68)
Information Technology > Security & Privacy (0.46)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback