AITopics

2503.15808

Country:

North America > United States (0.28)
Asia (0.28)

Industry:

Education > Educational Setting > Higher Education (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Akhtarshenas, Azim, Dini, Afshin, Ayoobi, Navid

ChatGPT or A Silent Everywhere Helper: A Survey of Large Language Models

arXiv.org Artificial IntelligenceMar-19-2025

Large Language Models (LLMs) have revo lutionized natural language processing Natural Language Processing (NLP), with Chat Generative Pre-trained Transformer (ChatGPT) standing out as a notable exampledue to its advanced capabilities and widespread applications. This survey provides a comprehensive analysis of ChatGPT, exploring its architecture, training processes, and functionalities. We examine its integration into various domains across industries such as customer service, education, healthcare, and entertainment. A comparative analysis with other LLMs highlights ChatGPT's unique features and performance metrics. Regarding benchmarks, the paper examines ChatGPT's comparative performance against other LLMs and discusses potential risks such as misinformation, bias, and data privacy concerns. Additionally, we offer a number of figures and tables that outline the backdrop of the discussion, the main ideas of the article, the numerous LLM models, a thorough list of datasets used for pre-training, fine-tuning, and evaluation, as well as particular LLM applications with pertinent references. Finally, we identify future research directions and technological advancements, underscoring the evolving landscape of LLMs and their profound impact on artificial intelligence Artificial Intelligence (AI) and society.

arxiv preprint, comprehension, information processing system, (14 more...)

2503.17403

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Louisiana (0.13)

Industry:

Media (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Neural Information Processing SystemsMar-18-2025, 18:51:22 GMT

Conformal Prediction using Conditional Histograms Matteo Sesia Department of Data Sciences and Operations University of Southern California, USA

This paper develops a conformal method to compute prediction intervals for nonparametric regression that can automatically adapt to skewed data. Leveraging black-box machine learning algorithms to estimate the conditional distribution of the outcome using histograms, it translates their output into the shortest prediction intervals with approximate conditional coverage. The resulting prediction intervals provably have marginal coverage in finite samples, while asymptotically achieving conditional coverage and optimal length if the black-box model is consistent. Numerical experiments with simulated and real data demonstrate improved performance compared to state-of-the-art alternatives, including conformalized quantile regression and other distributional conformal prediction approaches.

artificial intelligence, machine learning, prediction interval, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.86)

Industry:

Health & Medicine (0.68)
Education > Educational Setting > Higher Education (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsMar-18-2025, 04:31:37 GMT

Compositional Generalization via Neural-Symbolic Stack Machines Chen Liang, Adams Wei Yu UC Berkeley

Despite achieving tremendous success, existing deep learning models have exposed limitations in compositional generalization, the capability to learn compositional rules and apply them to unseen cases in a systematic manner. To tackle this issue, we propose the Neural-Symbolic Stack Machine (NeSS). It contains a neural network to generate traces, which are then executed by a symbolic stack machine enhanced with sequence manipulation operations. NeSS combines the expressive power of neural sequence models with the recursion supported by the symbolic stack machine. Without training supervision on execution traces, NeSS achieves 100% generalization performance in four domains: the SCAN benchmark of language-driven navigation tasks, the task of few-shot learning of compositional instructions, the compositional machine translation benchmark, and context-free grammar parsing tasks.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Overview (0.47)

Industry: Education > Educational Setting > Higher Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-17-2025

The Empty Chair: Using LLMs to Raise Missing Perspectives in Policy Deliberations

Fulay, Suyash, Roy, Deb

However, deliberative forums such as citizens' assemblies have shown promise in bypassing party polarization and fostering productive discussions on contentious political issues [3]. Unfortunately, most deliberations do not take place in carefully structured settings with nationally representative participants. Instead, they often occur within homogeneous groups [17]. When this happens, deliberation can lead to group polarization, where individuals become more extreme in their initial positions rather than engaging with opposing viewpoints [22]. This can be problematic if the goal of deliberation is to build common ground and consensus within a pluralistic electorate. Given that large language models (LLMs) have demonstrated some fidelity in accurately responding to opinion surveys [1, 20] and adopting different personas [12], we explore whether an LLM-powered tool can help introduce missing perspectives in group deliberation.

large language model, natural language, persona, (18 more...)

2503.13812

Country:

North America > United States (1.00)
North America > Mexico > Mexico City (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Government (1.00)
Law (0.93)
Education > Educational Setting > Higher Education (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

From G-Factor to A-Factor: Establishing a Psychometric Framework for AI Literacy

Li, Ning, Deng, Wenming, Chen, Jiatan

This research addresses the growing need to measure and understand AI literacy in the context of generative AI technologies. Through three sequential studies involving a total of 517 participants, we establish AI literacy as a coherent, measurable construct with significant implications for education, workforce development, and social equity. Study 1 (N=85) revealed a dominant latent factor - termed the "A-factor" - that accounts for 44.16% of variance across diverse AI interaction tasks. Study 2 (N=286) refined the measurement tool by examining four key dimensions of AI literacy: communication effectiveness, creative idea generation, content evaluation, and step-by-step collaboration, resulting in an 18-item assessment battery. Study 3 (N=146) validated this instrument in a controlled laboratory setting, demonstrating its predictive validity for real-world task performance. Results indicate that AI literacy significantly predicts performance on complex, language-based creative tasks but shows domain specificity in its predictive power. Additionally, regression analyses identified several significant predictors of AI literacy, including cognitive abilities (IQ), educational background, prior AI experience, and training history. The multidimensional nature of AI literacy and its distinct factor structure provide evidence that effective human-AI collaboration requires a combination of general and specialized abilities. These findings contribute to theoretical frameworks of human-AI collaboration while offering practical guidance for developing targeted educational interventions to promote equitable access to the benefits of generative AI technologies.

artificial intelligence, deep learning, machine learning, (17 more...)

2503.16517

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.49)
Education > Educational Setting > Higher Education (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.97)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.69)

Pareidolic Illusions of Meaning: ChatGPT, Pseudolaw and the Triumph of Form over Substance

McIntyre, Joe

The early 2020s has seen the rise of two strange and potentially quite impactful social phenomena, namely pseudolaw, where users rely upon pseudolegal arguments that mimic the form and ritual of legal argumentation but fundamentally distort the content of law, and generative AI/LLMs, which generate content that uses probabilistic calculations to create outputs that look like human generated text. This article argues that the juxtaposition of the two phenomena helps to reveal that they both share two fundamental traits as both elevate form and appearance over substance and content, and users of both routinely mistake the form for the substance. In drawing upon legal theory, computer science, linguistics and cognitive psychology, the article argues that both phenomena rely upon creating illusions of meaning that users mistake for the underlying primary phenomenon. I then explore four implications of this conception of both phenomena. Firstly, both rely on human tendencies of conceptual pareidolia resulting in the erroneous perception of meaningful linguistic legal patterns from nebulous inputs. Secondly, both rely upon the confidence heuristic, the human cognitive bias for treating confidence as a proxy for competence. Thirdly, both succeed when the primary concern is with the form of the output and not its content. Fourthly, both rely heavily upon the magical thinking of users and the desire for the promise of the approach to be real. The article argues that the legal context helps to reveal a solution for the problems caused by both phenomena as it is only where users possess sufficient legal and technological literacy that it becomes possible to reveal to them the illusionary nature of the phenomena.

large language model, machine learning, natural language, (14 more...)

2503.13556

Country:

Oceania > Australia (1.00)
North America > United States (1.00)
Europe > United Kingdom > England (0.28)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Media > News (1.00)
Education > Educational Setting > Higher Education (1.00)
Law > Litigation (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)

Dang, Hai, Swoopes, Chelse, Buschek, Daniel, Glassman, Elena L.

CorpusStudio: Surfacing Emergent Patterns in a Corpus of Prior Work while Writing

Many communities, including the scientific community, develop implicit writing norms. Understanding them is crucial for effective communication with that community. Writers gradually develop an implicit understanding of norms by reading papers and receiving feedback on their writing. However, it is difficult to both externalize this knowledge and apply it to one's own writing. We propose two new writing support concepts that reify document and sentence-level patterns in a given text corpus: (1) an ordered distribution over section titles and (2) given the user's draft and cursor location, many retrieved contextually relevant sentences. Recurring words in the latter are algorithmically highlighted to help users see any emergent norms. Study results (N=16) show that participants revised the structure and content using these concepts, gaining confidence in aligning with or breaking norms after reviewing many examples. These results demonstrate the value of reifying distributions over other authors' writing choices during the writing process.

information retrieval, large language model, machine learning, (20 more...)

doi: 10.1145/3706598.3713974

2503.12436

Country:

Europe (0.67)
North America > United States > New York (0.14)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management (0.92)
(4 more...)

Vera, Carlos Luengo, Picon, Ignacio Ferro, Nunez, M. Teresa del Val, Gandia, Jose Andres Gomez, Ancillo, Antonio de Lucas, Arroyo, Victor Ramos, Figueredo, Carlos Milan

Evaluating Large Language Models on the Spanish Medical Intern Resident (MIR) Examination 2024/2025:A Comparative Analysis of Clinical Reasoning and Knowledge Application

The MIR serves as a critical selection mechanism for medical graduates entering specialized training in Spain. A study is to be conducted on the ability of generative AI models to meet the challenges presented by MIR, with emphasis on clinical reasoning, image interpretation and epidemiological calculations. This research evaluates LLM performance in complex clinical scenarios and explores the extent to which LLMs demonstrate medical reasoning beyond mere information recall. Findings The results reveal key insights into the performance of 22 LLMs on the MIR 2024 and 2025 exams. The exam features 210 multiple-choice questions covering diverse medical domains and incorporates case-based scenarios, image interpretation (25 questions), and laboratory data analysis.

large language model, machine learning, natural language, (18 more...)

2503.00025

Country: Europe > Spain (0.50)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Education > Educational Setting > Higher Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)

arXiv.org Artificial IntelligenceMar-15-2025

General Scales Unlock AI Evaluation with Explanatory and Predictive Power

Zhou, Lexin, Pacchiardi, Lorenzo, Martínez-Plumed, Fernando, Collins, Katherine M., Moros-Daval, Yael, Zhang, Seraphina, Zhao, Qinlin, Huang, Yitian, Sun, Luning, Prunty, Jonathan E., Li, Zongqian, Sánchez-García, Pablo, Chen, Kexin Jiang, Casares, Pablo A. M., Zu, Jiyun, Burden, John, Mehrbakhsh, Behzad, Stillwell, David, Cebrian, Manuel, Wang, Jindong, Henderson, Peter, Wu, Sherry Tongshuang, Kyllonen, Patrick C., Cheke, Lucy, Xie, Xing, Hernández-Orallo, José

Ensuring safe and effective use of AI requires understanding and anticipating its performance on novel tasks, from advanced scientific challenges to transformed workplace activities. So far, benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems, given the low transferability across diverse tasks. In this paper, we introduce general scales for AI evaluation that can explain what common AI benchmarks really measure, extract ability profiles of AI systems, and predict their performance for new task instances, in- and out-of-distribution. Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate. Illustrated for 15 large language models and 63 tasks, high explanatory power is unleashed from inspecting the demand and ability profiles, bringing insights on the sensitivity and specificity exhibited by different benchmarks, and how knowledge, metacognition and reasoning are affected by model size, chain-of-thought and distillation. Surprisingly, high predictive power at the instance level becomes possible using these demand levels, providing superior estimates over black-box baseline predictors based on embeddings or finetuning, especially in out-of-distribution settings (new tasks and new benchmarks). The scales, rubrics, battery, techniques and results presented here represent a major step for AI evaluation, underpinning the reliable deployment of AI in the years ahead. (Collaborative platform: https://kinds-of-intelligence-cfi.github.io/ADELE.)

data mining, large language model, machine learning, (18 more...)

2503.06378

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Instructional Material (1.00)
Questionnaire & Opinion Survey (0.92)
Overview (0.92)
(2 more...)

Industry:

Leisure & Entertainment > Sports (1.00)
Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(12 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)