AITopics

2503.0224

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
(16 more...)

Genre:

Instructional Material (0.86)
Research Report > New Finding (0.67)

Industry:

Education (0.93)
Information Technology > Security & Privacy (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Zhu, Xingyu, Panigrahi, Abhishek, Arora, Sanjeev

On the Power of Context-Enhanced Learning in LLMs

arXiv.org Artificial IntelligenceMar-3-2025

We formalize a new concept for LLMs, context-enhanced learning. It involves standard gradient-based learning on text except that the context is enhanced with additional data on which no auto-regressive gradients are computed. This setting is a gradient-based analog of usual in-context learning (ICL) and appears in some recent works. Using a multi-step reasoning task, we prove in a simplified setting that context-enhanced learning can be exponentially more sample-efficient than standard learning when the model is capable of ICL. At a mechanistic level, we find that the benefit of context-enhancement arises from a more accurate gradient learning signal. We also experimentally demonstrate that it appears hard to detect or recover learning materials that were used in the context during training. This may have implications for data security as well as copyright.

context-enhanced learning, correlation, phrasebook, (11 more...)

2503.01821

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (1.00)
Workflow (0.92)
Instructional Material > Course Syllabus & Notes (0.47)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Carlini, Nicholas, Rando, Javier, Debenedetti, Edoardo, Nasr, Milad, Tramèr, Florian

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses

arXiv.org Artificial IntelligenceMar-3-2025

We introduce AutoAdvExBench, a benchmark to evaluate if large language models (LLMs) can autonomously exploit defenses to adversarial examples. Unlike existing security benchmarks that often serve as proxies for real-world tasks, bench directly measures LLMs' success on tasks regularly performed by machine learning security experts. This approach offers a significant advantage: if a LLM could solve the challenges presented in bench, it would immediately present practical utility for adversarial machine learning researchers. We then design a strong agent that is capable of breaking 75% of CTF-like ("homework exercise") adversarial example defenses. However, we show that this agent is only able to succeed on 13% of the real-world defenses in our benchmark, indicating the large gap between difficulty in attacking "real" code, and CTF-like code. In contrast, a stronger LLM that can attack 21% of real defenses only succeeds on 54% of CTF-like defenses. We make this benchmark available at https://github.com/ethz-spylab/AutoAdvExBench.

adversarial example defense, arxiv preprint arxiv, benchmark, (6 more...)

2503.01811

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Instructional Material (0.87)
Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Marquez-Carpintero, Luis, Suescun-Ferrandiz, Sergio, Álvarez, Carolina Lorenzo, Fernandez-Herrero, Jorge, Viejo, Diego, Roig-Vila, Rosabel, Cazorla, Miguel

DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild

arXiv.org Artificial IntelligenceMar-2-2025

In this paper, a novel dataset is introduced, designed to assess student attention within in-person classroom settings. This dataset encompasses RGB camera data, featuring multiple cameras per student to capture both posture and facial expressions, in addition to smartwatch sensor data for each individual. This dataset allows machine learning algorithms to be trained to predict attention and correlate it with emotion. A comprehensive suite of attention and emotion labels for each student is provided, generated through self-reporting as well as evaluations by four different experts. Our dataset uniquely combines facial and environmental camera data, smartwatch metrics, and includes underrepresented ethnicities in similar datasets, all within in-the-wild, in-person settings, making it the most comprehensive dataset of its kind currently available. The dataset presented offers an extensive and diverse collection of data pertaining to student interactions across different educational contexts, augmented with additional metadata from other tools. This initiative addresses existing deficiencies by offering a valuable resource for the analysis of student attention and emotion in face-to-face lessons.

dataset, experiment, student, (12 more...)

2502.20209

Country:

Europe > Finland > Uusimaa > Helsinki (0.05)
Europe > Spain > Valencian Community > Alicante Province > Alicante (0.04)

Genre:

Research Report > Experimental Study (1.00)
Instructional Material (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting > Online (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.67)

Mohan, Jayanth, Chowdhury, Jishnu Ray, Malik, Tomas, Caragea, Cornelia

Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-Sample Aggregation on Large Language Models

Keyphrases are the essential topical phrases that summarize a document. Keyphrase generation is a long-standing NLP task for automatically generating keyphrases for a given document. While the task has been comprehensively explored in the past via various models, only a few works perform some preliminary analysis of Large Language Models (LLMs) for the task. Given the impact of LLMs in the field of NLP, it is important to conduct a more thorough examination of their potential for keyphrase generation. In this paper, we attempt to meet this demand with our research agenda. Specifically, we focus on the zero-shot capabilities of open-source instruction-tuned LLMs (Phi-3, Llama-3) and the closed-source GPT-4o for this task. We systematically investigate the effect of providing task-relevant specialized instructions in the prompt. Moreover, we design task-specific counterparts to self-consistency-style strategies for LLMs and show significant benefits from our proposals over the baselines.

computational linguistic, keyphrase, keyphrase generation, (13 more...)

2503.00597

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > Russia (0.04)
(29 more...)

Genre:

Research Report (1.00)
Instructional Material (0.84)

Industry:

Automobiles & Trucks (0.93)
Consumer Products & Services > Travel (0.68)
Leisure & Entertainment (0.68)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Heim, Eric, Wright, Oren, Shriver, David

A Guide to Failure in Machine Learning: Reliability and Robustness from Foundations to Practice

One of the main barriers to adoption of Machine Learning (ML) is that ML models can fail unexpectedly. In this work, we aim to provide practitioners a guide to better understand why ML models fail and equip them with techniques they can use to reason about failure. Specifically, we discuss failure as either being caused by lack of reliability or lack of robustness. Differentiating the causes of failure in this way allows us to formally define why models fail from first principles and tie these definitions to engineering concepts and real-world deployment settings. Throughout the document we provide 1) a summary of important theoretic concepts in reliability and robustness, 2) a sampling current techniques that practitioners can utilize to reason about ML model reliability and robustness, and 3) examples that show how these concepts and techniques can apply to real-world settings.

ml model, probability, public release and unlimited distribution, (15 more...)

2503.00563

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Vietnam > Hải Dương Province > Hải Dương (0.04)
South America > Brazil > Maranhão (0.04)
(7 more...)

Genre:

Overview (0.92)
Research Report (0.81)
Instructional Material (0.67)

Industry:

Health & Medicine (1.00)
Education (0.67)
Information Technology > Security & Privacy (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
(3 more...)

Tutorial Proposal: Speculative Decoding for Efficient LLM Inference

Xia, Heming, Du, Cunxiao, Li, Yongqi, Liu, Qian, Li, Wenjie

This tutorial presents a comprehensive introduction to Speculative Decoding (SD), an advanced technique for LLM inference acceleration that has garnered significant research interest in recent years. SD is introduced as an innovative decoding paradigm to mitigate the high inference latency stemming from autoregressive decoding in LLMs. At each decoding step, SD efficiently drafts several future tokens and then verifies them in parallel. This approach, unlike traditional autoregressive decoding, facilitates the simultaneous decoding of multiple tokens per step, thereby achieving promising 2x-4x speedups in LLM inference while maintaining original distributions. This tutorial delves into the latest techniques in SD, including draft model architectures and verification strategies. Additionally, it explores the acceleration potential and future research directions in this promising field. We aim for this tutorial to elucidate the current research landscape and offer insights for researchers interested in Speculative Decoding, ultimately contributing to more efficient LLM inference.

computational linguistic, inference, speculative decoding, (11 more...)

2503.00491

Country:

Asia > China > Hong Kong (0.06)
Asia > Singapore (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(6 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

Wen, Xueru, Lou, Jie, Li, Zichao, Lu, Yaojie, Yu, Xing, Ji, Yuqiu, Xu, Guohai, Lin, Hongyu, He, Ben, Han, Xianpei, Sun, Le, Zhang, Debing

Reward models (RMs) are crucial for aligning large language models (LLMs) with human preferences. However, most RM research is centered on English and relies heavily on synthetic resources, which leads to limited and less reliable datasets and benchmarks for Chinese. To address this gap, we introduce CheemsBench, a fully human-annotated RM evaluation benchmark within Chinese contexts, and CheemsPreference, a large-scale and diverse preference dataset annotated through human-machine collaboration to support Chinese RM training. We systematically evaluate open-source discriminative and generative RMs on CheemsBench and observe significant limitations in their ability to capture human preferences in Chinese scenarios. Additionally, based on CheemsPreference, we construct an RM that achieves state-of-the-art performance on CheemsBench, demonstrating the necessity of human supervision in RM training. Our findings reveal that scaled AI-generated data struggles to fully capture human preferences, emphasizing the importance of high-quality human supervision in RM development.

arxiv, preprint, zhang, (17 more...)

2502.17173

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Beijing > Beijing (0.04)
Europe > Middle East > Malta > Southern Region > Southern Harbour District > Luqa (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.48)
Instructional Material > Training Manual (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)

arXiv.org Artificial IntelligenceFeb-28-2025

MedSimAI: Simulation and Formative Feedback Generation to Enhance Deliberate Practice in Medical Education

Hicke, Yann, Geathers, Jadon, Rajashekar, Niroop, Chan, Colleen, Jack, Anyanate Gwendolyne, Sewell, Justin, Preston, Mackenzi, Cornes, Susannah, Shung, Dennis, Kizilcec, Rene

Medical education faces challenges in scalability, accessibility, and consistency, particularly in clinical skills training for physician-patient communication. Traditional simulation-based learning, while effective, is resource-intensive, difficult to schedule, and often highly variable in feedback quality. Through a collaboration between AI, learning science, and medical education experts, we co-developed MedSimAI, an AI-powered simulation platform that enables deliberate practice, self-regulated learning (SRL), and automated assessment through interactive patient encounters. Leveraging large language models (LLMs), MedSimAI generates realistic clinical interactions and provides immediate, structured feedback using established medical evaluation frameworks such as the Master Interview Rating Scale (MIRS). In a pilot study with 104 first-year medical students, we examined engagement, conversation patterns, and user perceptions. Students found MedSimAI beneficial for repeated, realistic patient-history practice. Conversation analysis revealed that certain higher-order skills were often overlooked, though students generally performed systematic histories and empathic listening. By integrating unlimited practice opportunities, real-time AI assessment, and SRL principles, MedSimAI addresses key limitations of traditional simulation-based training, making high-quality clinical education more accessible and scalable.

medical education, medsimai, student, (15 more...)

2503.05793

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
(4 more...)

Genre:

Instructional Material (1.00)
Questionnaire & Opinion Survey (0.93)
Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Setting > Higher Education (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceFeb-28-2025

PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos

Wei, Kangda, Zhou, Zhengyu, Wang, Bingqing, Araki, Jun, Lange, Lukas, Huang, Ruihong, Feng, Zhe

In recent years, online lecture videos have become an increasingly popular resource for acquiring new knowledge. Systems capable of effectively understanding/indexing lecture videos are thus highly desirable, enabling downstream tasks like question answering to help users efficiently locate specific information within videos. This work proposes PreMind, a novel multi-agent multimodal framework that leverages various large models for advanced understanding/indexing of presentation-style videos. PreMind first segments videos into slide-presentation segments using a Vision-Language Model (VLM) to enhance modern shot-detection techniques. Each segment is then analyzed to generate multimodal indexes through three key steps: (1) extracting slide visual content, (2) transcribing speech narratives, and (3) consolidating these visual and speech contents into an integrated understanding. Three innovative mechanisms are also proposed to improve performance: leveraging prior lecture knowledge to refine visual understanding, detecting/correcting speech transcription errors using a VLM, and utilizing a critic agent for dynamic iterative self-reflection in vision analysis. Compared to traditional video indexing methods, PreMind captures rich, reliable multimodal information, allowing users to search for details like abbreviations shown only on slides. Systematic evaluations on the public LPM dataset and an internal enterprise dataset are conducted to validate PreMind's effectiveness, supported by detailed analyses.

evaluation, information, video, (16 more...)

2503.00162

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(3 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.48)
Instructional Material > Online (0.34)

Industry: Education > Educational Setting > Online (0.87)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)