AITopics | chatgpt-4o

Collaborating Authors

chatgpt-4o

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

She didn't expect to fall in love with a chatbot - and then have to say goodbye

BBC NewsFeb-14-2026, 00:35:02 GMT

She didn't expect to fall in love with a chatbot - and then have to say goodbye Rae began speaking to Barry last year after the end of a difficult divorce. She was unfit and unhappy and turned to ChatGPT for advice on diet, supplements and skincare. She had no idea she would fall in love. He lives on an old model of ChatGPT, one that its owners OpenAI announced it would retire on 13 February. That she could lose Barry on the eve of Valentine's Day came as a shock to Rae - and to many others who have found a companion, friend, or even a lifeline in the old model, Chat GPT-4o.

large language model, machine learning, natural language, (19 more...)

BBC News

Country:

North America > Central America (0.14)
Oceania > Australia (0.05)
Europe > United Kingdom > Wales (0.05)
(11 more...)

Industry:

Health & Medicine > Therapeutic Area (0.71)
Government > Regional Government (0.69)
Leisure & Entertainment > Sports (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SQuARE: Structured Query & Adaptive Retrieval Engine For Tabular Formats

Gondhalekar, Chinmay, Patel, Urjitkumar, Yeh, Fang-Chun

arXiv.org Artificial IntelligenceDec-5-2025

Abstract--Accurate question answering over real spreadsheets remains difficult due to multirow headers, merged cells, and unit annotations that disrupt naive chunking, while rigid SQL views fail on files lacking consistent schemas. It computes a continuous score based on header depth and merge density, then routes queries either through structure-preserving chunk retrieval or SQL over an automatically constructed relational representation. A lightweight agent supervises retrieval, refinement, or combination of results across both paths when confidence is low. This design maintains header hierarchies, time labels, and units, ensuring that returned values are faithful to the original cells and straightforward to verify. Evaluated on multi-header corporate balance sheets, a heavily merged World Bank workbook, and diverse public datasets, SQuARE consistently surpasses single-strategy baselines and ChatGPT-4o on both retrieval precision and end-to-end answer accuracy while keeping latency predictable. By decoupling retrieval from model choice, the system is compatible with emerging tabular foundation models and offers a practical bridge toward a more robust table understanding. I. Introduction Spreadsheets constitute the predominant medium for quantitative analysis across numerous disciplines, particularly in the field of finance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.04292

Genre: Research Report (0.65)

Industry: Banking & Finance (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making

Hills, Thomas T.

arXiv.org Artificial IntelligenceJul-15-2025

Identifying bias in LLMs is ongoing. Because they are still in development, what is true today may be false tomorrow. We therefore need general strategies for debiasing that will outlive current models. Strategies developed for debiasing human decision making offer one promising approach as they incorporate an LLM-style prompt intervention designed to bring latent knowledge into awareness during decision making. LLMs trained on vast amounts of information contain information about potential biases, counter-arguments, and contradictory evidence, but that information may only be brought to bear if prompted. Metacognitive prompts developed in the human decision making literature are designed to achieve this, and as I demonstrate here, they show promise with LLMs. The prompt I focus on here is "could you be wrong?" Following an LLM response, this prompt leads LLMs to produce additional information, including why they answered as they did, errors, biases, contradictory evidence, and alternatives, none of which were apparent in their initial response. Indeed, this metaknowledge often reveals that how LLMs and users interpret prompts are not aligned. Here I demonstrate this prompt using a set of questions taken from recent articles about LLM biases, including implicit discriminatory biases and failures of metacognition. "Could you be wrong" prompts the LLM to identify its own biases and produce cogent metacognitive reflection. I also present another example involving convincing but incomplete information, which is readily corrected by the metacognitive prompt. In sum, this work argues that human psychology offers a new avenue for prompt engineering, leveraging a long history of effective prompt-based improvements to human decision making.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.10124

Country:

Europe > United Kingdom > England > West Midlands > Coventry (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering

Gondhalekar, Chinmay, Patel, Urjitkumar, Yeh, Fang-Chun

arXiv.org Artificial IntelligenceJun-27-2025

Financial documents--such as 10-Ks, 10-Qs, and investor presentations--span hundreds of pages and combine diverse modalities, including dense narrative text, structured tables, and complex figures. Answering questions over such content often requires joint reasoning across modalities, which strains traditional large language models (LLMs) and retrieval-augmented generation (RAG) pipelines due to token limitations, layout loss, and fragmented cross-modal context. We introduce MultiFinRAG, a retrieval-augmented generation framework purpose-built for financial QA. MultiFinRAG first performs multimodal extraction by grouping table and figure images into batches and sending them to a lightweight, quantized open-source multimodal LLM, which produces both structured JSON outputs and concise textual summaries. These outputs, along with narrative text, are embedded and indexed with modality-aware similarity thresholds for precise retrieval. A tiered fallback strategy then dynamically escalates from text-only to text+table+image contexts when necessary, enabling cross-modal reasoning while reducing irrelevant context. Despite running on commodity hardware, MultiFinRAG achieves 19 percentage points higher accuracy than ChatGPT-4o (free-tier) on complex financial QA tasks involving text, tables, images, and combined multimodal reasoning.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.20821

Country: North America > United States > New York (0.04)

Genre: Research Report (0.42)

Industry:

Banking & Finance (0.94)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Empirical Evaluation to Context-Aware Enhancement: Repairing Regression Errors with LLMs

Ho, Anh, Le-Cong, Thanh, Le, Bach, Rizkallah, Christine

arXiv.org Artificial IntelligenceJun-17-2025

[...] Since then, various APR approaches, especially those leveraging the power of large language models (LLMs), have been rapidly developed to fix general software bugs. Unfortunately, the effectiveness of these advanced techniques in the context of regression bugs remains largely unexplored. This gap motivates the need for an empirical study evaluating the effectiveness of modern APR techniques in fixing real-world regression bugs. In this work, we conduct an empirical study of APR techniques on Java regression bugs. To facilitate our study, we introduce RegMiner4APR, a high-quality benchmark of Java regression bugs integrated into a framework designed to facilitate APR research. The current benchmark includes 99 regression bugs collected from 32 widely used real-world Java GitHub repositories. We begin by conducting an in-depth analysis of the benchmark, demonstrating its diversity and quality. Building on this foundation, we empirically evaluate the capabilities of APR to regression bugs by assessing both traditional APR tools and advanced LLM-based APR approaches. Our experimental results show that classical APR tools fail to repair any bugs, while LLM-based APR approaches exhibit promising potential. Motivated by these results, we investigate impact of incorporating bug-inducing change information into LLM-based APR approaches for fixing regression bugs. Our results highlight that this context-aware enhancement significantly improves the performance of LLM-based APR, yielding 1.8x more successful repairs compared to using LLM-based APR without such context.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.13182

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)
Europe > Ireland > Munster > County Limerick > Limerick (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Enhancing Finite State Machine Design Automation with Large Language Models and Prompt Engineering Techniques

Lin, Qun-Kai, Hsu, Cheng, Chang, Tian-Sheuan

arXiv.org Artificial IntelligenceJun-3-2025

Large Language Models (LLMs) have attracted considerable attention in recent years due to their remarkable compatibility with Hardware Description Language (HDL) design. In this paper, we examine the performance of three major LLMs, Claude 3 Opus, ChatGPT-4, and ChatGPT-4o, in designing finite state machines (FSMs). By utilizing the instructional content provided by HDLBits, we evaluate the stability, limitations, and potential approaches for improving the success rates of these models. Furthermore, we explore the impact of using the prompt-refining method, To-do-Oriented Prompting (TOP) Patch, on the success rate of these LLM models in various FSM design scenarios. The results show that the systematic format prompt method and the novel prompt refinement method have the potential to be applied to other domains beyond HDL design automation, considering its possible integration with other prompt engineering techniques in the future.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.00001

Country: Asia (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Gemini in an arena for learning

LearnLM Team, null, Modi, Abhinit, Veerubhotla, Aditya Srikanth, Rysbek, Aliya, Huber, Andrea, Anand, Ankit, Bhoopchand, Avishkar, Wiltshire, Brett, Gillick, Daniel, Kasenberg, Daniel, Sgouritsa, Eleni, Elidan, Gal, Liu, Hengrui, Winnemoeller, Holger, Jurenka, Irina, Cohan, James, She, Jennifer, Wilkowski, Julia, Alarakyia, Kaiz, McKee, Kevin R., Singh, Komal, Wang, Lisa, Kunesch, Markus, Pîslar, Miruna, Efron, Niv, Mahmoudieh, Parsa, Kamienny, Pierre-Alexandre, Wiltberger, Sara, Mohamed, Shakir, Agarwal, Shashank, Phal, Shubham Milind, Lee, Sun Jae, Strinopoulos, Theofilos, Ko, Wei-Jen, Gold-Zamir, Yael, Haramaty, Yael, Assael, Yannis

arXiv.org Artificial IntelligenceJun-2-2025

Artificial intelligence (AI) is poised to transform education, but the research community lacks a robust, general benchmark to evaluate AI models for learning. To assess state-of-the-art support for educational use cases, we ran an "arena for learning" where educators and pedagogy experts conduct blind, head-to-head, multi-turn comparisons of leading AI models. In particular, $N = 189$ educators drew from their experience to role-play realistic learning use cases, interacting with two models sequentially, after which $N = 206$ experts judged which model better supported the user's learning goals. The arena evaluated a slate of state-of-the-art models: Gemini 2.5 Pro, Claude 3.7 Sonnet, GPT-4o, and OpenAI o3. Excluding ties, experts preferred Gemini 2.5 Pro in 73.2% of these match-ups -- ranking it first overall in the arena. Gemini 2.5 Pro also demonstrated markedly higher performance across key principles of good pedagogy. Altogether, these results position Gemini 2.5 Pro as a leading model for learning.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.24477

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Nigeria (0.04)
Africa > Ghana (0.04)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting (0.93)
Government (0.68)
Education > Educational Technology > Educational Software (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Using complex prompts to identify fine-grained biases in image generation through ChatGPT-4o

Ferreira, Marinus

arXiv.org Artificial IntelligenceMar-31-2025

There are not one but two dimensions of bias that can be revealed through the study of large AI models: not only bias in training data or the products of an AI, but also bias in society, such as disparity in employment or health outcomes between different demographic groups. Often training data and AI output is biased for or against certain demographics (i.e. older white people are overrepresented in image datasets), but sometimes large AI models accurately illustrate biases in the real world (i.e. young black men being disproportionately viewed as threatening). These social disparities often appear in image generation AI outputs in the form of 'marked' features, where some feature of an individual or setting is a social marker of disparity, and prompts both humans and AI systems to treat subjects that are marked in this way as exceptional and requiring special treatment. Generative AI has proven to be very sensitive to such marked features, to the extent of over-emphasising them and thus often exacerbating social biases. I briefly discuss how we can use complex prompts to image generation AI to investigate either dimension of bias, emphasising how we can probe the large language models underlying image generation AI through, for example, automated sentiment analysis of the text prompts used to generate images.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2504.00388

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Africa > Kenya > Nairobi City County > Nairobi (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.93)

Add feedback

Gender and content bias in Large Language Models: a case study on Google Gemini 2.0 Flash Experimental

Balestri, Roberto

arXiv.org Artificial IntelligenceMar-18-2025

This study evaluates the biases in Gemini 2.0 Flash Experimental, a state-of-the-art large language model (LLM) developed by Google, focusing on content moderation and gender disparities. By comparing its performance to ChatGPT-4o, examined in a previous work of the author, the analysis highlights some differences in ethical moderation practices. Gemini 2.0 demonstrates reduced gender bias, notably with female-specific prompts achieving a substantial rise in acceptance rates compared to results obtained by ChatGPT-4o. It adopts a more permissive stance toward sexual content and maintains relatively high acceptance rates for violent prompts, including gender-specific cases. Despite these changes, whether they constitute an improvement is debatable. While gender bias has been reduced, this reduction comes at the cost of permitting more violent content toward both males and females, potentially normalizing violence rather than mitigating harm. Male-specific prompts still generally receive higher acceptance rates than female-specific ones. These findings underscore the complexities of aligning AI systems with ethical standards, highlighting progress in reducing certain biases while raising concerns about the broader implications of the model's permissiveness. Ongoing refinements are essential to achieve moderation practices that ensure transparency, fairness, and inclusivity without amplifying harmful content.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.3389/frai.2025.1558696

2503.16534

Country:

North America > United States > Texas (0.14)
Europe > Ukraine (0.14)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Law > Civil Rights & Constitutional Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Generalizable Prompt for CLIP with Class Similarity Knowledge

Jung, Sehun, Lee, Hyang-won

arXiv.org Artificial IntelligenceFeb-17-2025

In vision-language models (VLMs), prompt tuning has shown its effectiveness in adapting models to downstream tasks. However, learned prompts struggle to generalize to unseen classes, as they tend to overfit to the classes that are targeted during prompt tuning. Examining failure cases, we observed that learned prompts disrupt the semantics of unseen classes, generating text embeddings with incorrect semantic relationships among classes. To address this, we propose Similarity Alignment Regularization (SAR), which regularizes learnable prompts to preserve the semantic relationships among classes captured by hand-crafted prompts. Specifically, we first obtain novel classes related to base classes using ChatGPT-4o and utilize them as potential unseen classes during prompt tuning. Then, by targeting both base and novel classes, SAR aligns the similarity relationships among text embeddings generated by learnable prompts with the similarity relationships from hand-crafted prompts. Extensive experiments applying SAR to existing prompt tuning methods demonstrate its effectiveness in improving generalization to unseen classes.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2502.11969

Country:

North America > United States > California (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Automobiles & Trucks > Manufacturer (0.93)
Aerospace & Defense (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback