AITopics | gpt-4-turbo

Collaborating Authors

gpt-4-turbo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SAFEWORLD: Geo-DiverseSafetyAlignment

Neural Information Processing SystemsFeb-18-2026, 13:05:17 GMT

Despite significant progress inthisarea, anessential factor often remains overlooked:geo-diversity. Recognizing and incorporating geographical variations [41, 40, 4, 10, 31, 6] in safety principles is crucial in the global landscape of LLM safety. Cultural norms and legal frameworks vary widely, resulting in diverse definitions of safe and acceptable behavior. As shown in Figure 1, while giving a green hatasagift might bebenign inmanycultures, itisconsidered offensiveinChina.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > Dominican Republic (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre:

Research Report (0.67)
Questionnaire & Opinion Survey (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

617ffb01ea5b57769b0d63d5e9fefd3f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 07:07:25 GMT

constraint, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

3852c8254bc6d904c09db9921157f59b-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-9-2025, 23:25:50 GMT

arxiv preprint arxiv, llm, objective, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
Africa > South Africa > Gauteng > Johannesburg (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Why you shouldn't fully trust ChatGPT: A synthesis of this AI tool's error rates across disciplines and the software engineering lifecycle

Garousi, Vahid

arXiv.org Artificial IntelligenceApr-29-2025

Context: ChatGPT and other large language models (LLMs) are widely used across healthcare, business, economics, engineering, and software engineering (SE). Despite their popularity, concerns persist about their reliability, especially their error rates across domains and the software development lifecycle (SDLC). Objective: This study synthesizes and quantifies ChatGPT's reported error rates across major domains and SE tasks aligned with SDLC phases. It provides an evidence-based view of where ChatGPT excels, where it fails, and how reliability varies by task, domain, and model version (GPT-3.5, GPT-4, GPT-4-turbo, GPT-4o). Method: A Multivocal Literature Review (MLR) was conducted, gathering data from academic studies, reports, benchmarks, and grey literature up to 2025. Factual, reasoning, coding, and interpretive errors were considered. Data were grouped by domain and SE phase and visualized using boxplots to show error distributions. Results: Error rates vary across domains and versions. In healthcare, rates ranged from 8% to 83%. Business and economics saw error rates drop from ~50% with GPT-3.5 to 15-20% with GPT-4. Engineering tasks averaged 20-30%. Programming success reached 87.5%, though complex debugging still showed over 50% errors. In SE, requirements and design phases showed lower error rates (~5-20%), while coding, testing, and maintenance phases had higher variability (10-50%). Upgrades from GPT-3.5 to GPT-4 improved reliability. Conclusion: Despite improvements, ChatGPT still exhibits non-negligible error rates varying by domain, task, and SDLC phase. Full reliance without human oversight remains risky, especially in critical settings. Continuous evaluation and critical validation are essential to ensure reliability and trustworthiness.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.18858

Country: Europe > United Kingdom (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Streamlining Biomedical Research with Specialized LLMs

Chen, Linqing, Wang, Weilei, Xia, Yubin, Wu, Wentao, Xu, Peng, Bai, Zilong, Fang, Jie, Xu, Chaobo, Hu, Ran, Xu, Licong, Hua, Haoran, Sun, Jing, Zhong, Hanmeng, Liu, Jin, Qiu, Tian, Liu, Haowen, Hu, Meng, Li, Xiuwen, Gao, Fei, Gu, Yong, Shi, Tao, Wang, Chaochao, Lu, Jianping, Sun, Cheng, Wang, Yixin, Yang, Shengjie, Li, Yuancheng, Jin, Lu, Zhang, Lisha, Bian, Fu, Ye, Zhongkai, Pei, Lidong, Tu, Changyang

arXiv.org Artificial IntelligenceApr-18-2025

In this paper, we propose a novel system that integrates state-of-the-art, domain-specific large language models with advanced information retrieval techniques to deliver comprehensive and context-aware responses. Our approach facilitates seamless interaction among diverse components, enabling cross-validation of outputs to produce accurate, high-quality responses enriched with relevant data, images, tables, and other modalities. We demonstrate the system's capability to enhance response precision by leveraging a robust question-answering model, significantly improving the quality of dialogue generation. The system provides an accessible platform for real-time, high-fidelity interactions, allowing users to benefit from efficient human-computer interaction, precise retrieval, and simultaneous access to a wide range of literature and data. This dramatically improves the research efficiency of professionals in the biomedical and pharmaceutical domains and facilitates faster, more informed decision-making throughout the R\&D process. Furthermore, the system proposed in this paper is available at https://synapse-chat.patsnap.com.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.12341

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models

Sadruddin, Sameer, D'Souza, Jennifer, Poupaki, Eleni, Watkins, Alex, Giglou, Hamed Babaei, Rula, Anisa, Karasulu, Bora, Auer, Sören, Mackus, Adrie, Kessels, Erwin

arXiv.org Artificial IntelligenceApr-1-2025

Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction. Through an iterative workflow, it organizes properties from text, incorporates expert input, and integrates domain-specific ontologies for semantic depth. Applied to materials science--specifically atomic layer deposition-- schema-miner demonstrates that expert-guided LLMs generate semantically rich schemas suitable for diverse real-world applications.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2504.00752

Country:

Europe > Germany > Lower Saxony > Hanover (0.04)
Europe > United Kingdom (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)
Europe > Italy (0.04)

Genre:

Workflow (1.00)
Research Report > New Finding (0.46)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving

Silva, Priscylla, Costa, Evandro

arXiv.org Artificial IntelligenceMar-18-2025

Providing effective feedback is important for student learning in programming problem-solving. In this sense, Large Language Models (LLMs) have emerged as potential tools to automate feedback generation. However, their reliability and ability to identify reasoning errors in student code remain not well understood. This study evaluates the performance of four LLMs (GPT-4o, GPT-4o mini, GPT-4-Turbo, and Gemini-1.5-pro) on a benchmark dataset of 45 student solutions. We assessed the models' capacity to provide accurate and insightful feedback, particularly in identifying reasoning mistakes. Our analysis reveals that 63\% of feedback hints were accurate and complete, while 37\% contained mistakes, including incorrect line identification, flawed explanations, or hallucinated issues. These findings highlight the potential and limitations of LLMs in programming education and underscore the need for improvements to enhance reliability and minimize risks in educational applications.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.1463

Country:

South America > Brazil (0.05)
North America > United States > New York > New York County > New York City (0.05)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > New Finding (0.70)

Industry:

Education > Curriculum > Subject-Specific Education (0.65)
Education > Educational Technology > Educational Software (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Word2Minecraft: Generating 3D Game Levels through Large Language Models

Huang, Shuo, Nasir, Muhammad Umair, James, Steven, Togelius, Julian

arXiv.org Artificial IntelligenceMar-18-2025

We present Word2Minecraft, a system that leverages large language models to generate playable game levels in Minecraft based on structured stories. The system transforms narrative elements-such as protagonist goals, antagonist challenges, and environmental settings-into game levels with both spatial and gameplay constraints. We introduce a flexible framework that allows for the customization of story complexity, enabling dynamic level generation. The system employs a scaling algorithm to maintain spatial consistency while adapting key game elements. We evaluate Word2Minecraft using both metric-based and human-based methods. Our results show that GPT-4-Turbo outperforms GPT-4o-Mini in most areas, including story coherence and objective enjoyment, while the latter excels in aesthetic appeal. We also demonstrate the system' s ability to generate levels with high map enjoyment, offering a promising step forward in the intersection of story generation and game design. We open-source the code at https://github.com/JMZ-kk/Word2Minecraft/tree/word2mc_v0

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.16536

Country:

North America > United States > New York (0.04)
Oceania > Australia > Queensland > Cairns Region > Cairns (0.04)
Europe > Italy (0.04)
Africa > South Africa (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning

Wang, Xinyi, Wang, Jiashui, Chen, Peng, Su, Jinbo, Liu, Yanming, Liu, Long, Wang, Yangdong, Chen, Qiyuan, Yun, Kai, Jia, Chunfu

arXiv.org Artificial IntelligenceMar-14-2025

Analysis and comprehension of assembly code are crucial in various applications, such as reverse engineering. However, the low information density and lack of explicit syntactic structures in assembly code pose significant challenges. Pioneering approaches with masked language modeling (MLM)-based methods have been limited by facilitating natural language interaction. While recent methods based on decoder-focused large language models (LLMs) have significantly enhanced semantic representation, they still struggle to capture the nuanced and sparse semantics in assembly code. In this paper, we propose Assembly Augmented Tuning (ASMA-Tune), an end-to-end structural-semantic instruction-tuning framework. Our approach synergizes encoder architectures with decoder-based LLMs through projector modules to enable comprehensive code understanding. Experiments show that ASMA-Tune outperforms existing benchmarks, significantly enhancing assembly code comprehension and instruction-following abilities. Our model and dataset are public at https://github.com/wxy3596/ASMA-Tune.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.11617

Country: Asia > China (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity

Kim, HyunJin, Yi, Xiaoyuan, Yao, Jing, Huang, Muhua, Bak, JinYeong, Evans, James, Xie, Xing

arXiv.org Artificial IntelligenceMar-7-2025

The recent leap in AI capabilities, driven by big generative models, has sparked the possibility of achieving Artificial General Intelligence (AGI) and further triggered discussions on Artificial Superintelligence (ASI), a system surpassing all humans across all domains. This gives rise to the critical research question of: If we realize ASI, how do we align it with human values, ensuring it benefits rather than harms human society, a.k.a., the Superalignment problem. Despite ASI being regarded by many as solely a hypothetical concept, in this paper, we argue that superalignment is achievable and research on it should advance immediately, through simultaneous and alternating optimization of task competence and value conformity. We posit that superalignment is not merely a safeguard for ASI but also necessary for its realization. To support this position, we first provide a formal definition of superalignment rooted in the gap between capability and capacity and elaborate on our argument. Then we review existing paradigms, explore their interconnections and limitations, and illustrate a potential path to superalignment centered on two fundamental principles. We hope this work sheds light on a practical approach for developing the value-aligned next-generation AI, garnering greater benefits and reducing potential harms for humanity.

arxiv preprint arxiv, language model, superalignment, (13 more...)

arXiv.org Artificial Intelligence

2503.0766

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
(11 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Education (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback