AITopics

2503.13709

Country:

North America > United States > New York > New York County > New York City (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(12 more...)

Genre:

Overview (0.93)
Research Report (0.83)
Instructional Material > Course Syllabus & Notes (0.41)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Banking & Finance > Trading (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.46)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Time Series Analysis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-17-2025

Rendering Transparency to Ranking in Educational Assessment via Bayesian Comparative Judgement

Gray, Andy, Rahat, Alma, Lindsay, Stephen, Pearson, Jen, Crick, Tom

Ensuring transparency in educational assessment is increasingly critical, particularly post-pandemic, as demand grows for fairer and more reliable evaluation methods. Comparative Judgement (CJ) offers a promising alternative to traditional assessments, yet concerns remain about its perceived opacity. This paper examines how Bayesian Comparative Judgement (BCJ) enhances transparency by integrating prior information into the judgement process, providing a structured, data-driven approach that improves interpretability and accountability. BCJ assigns probabilities to judgement outcomes, offering quantifiable measures of uncertainty and deeper insights into decision confidence. By systematically tracking how prior data and successive judgements inform final rankings, BCJ clarifies the assessment process and helps identify assessor disagreements. Multi-criteria BCJ extends this by evaluating multiple learning outcomes (LOs) independently, preserving the richness of CJ while producing transparent, granular rankings aligned with specific assessment goals. It also enables a holistic ranking derived from individual LOs, ensuring comprehensive evaluations without compromising detailed feedback. Using a real higher education dataset with professional markers in the UK, we demonstrate BCJ's quantitative rigour and ability to clarify ranking rationales. Through qualitative analysis and discussions with experienced CJ practitioners, we explore its effectiveness in contexts where transparency is crucial, such as high-stakes national assessments. We highlight the benefits and limitations of BCJ, offering insights into its real-world application across various educational settings.

assessment, machine learning, natural language, (15 more...)

2503.15549

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.04)
Europe > United Kingdom > England > Somerset > Bath (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material (1.00)
Personal > Interview (0.67)

Industry:

Education > Educational Setting (1.00)
Education > Assessment & Standards (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceMar-16-2025

AI Agents: Evolution, Architecture, and Real-World Applications

Krishnan, Naveen

Artificial Intelligence (AI) has evolved dramatically over the past decade, transitioning from specialized systems designed for narrow tasks to increasingly sophisticated architectures capable of autonomous operation across diverse domains. Among these advancements, AI agents represent a particularly significant development, embodying a paradigm shift in how intelligent systems interact with their environments, make decisions, and achieve complex goals. Unlike traditional AI systems that execute predefined algorithms within constraints, AI agents possess the capacity to autonomously perceive, reason, and act, often adapting their behavior based on environmental feedback and accumulated experience. The concept of an AI agent refers to a system or program that is capable of autonomously performing tasks on behalf of a user or another system by designing its workflow and utilizing available tools. These agents can encompass a wide range of functionalities beyond natural language processing, including decision making, problem solving, interacting with external environments, and executing actions. As Kapoor et al. (2024) note in their analysis of agent benchmarks, the development of AI agents represents an exciting new research direction with significant implications for real-world applications across numerous industries. The evolution of AI agents has been accelerated by recent breakthroughs in large language models (LLMs), which have provided a foundation for more sophisticated reasoning capabilities. Modern AI agents leverage these advanced language models as core components, augmenting them with specialized modules for memory, planning, tool use, and environmental interaction. This integration enables agents to perform complex tasks that would be challenging or impossible for traditional AI systems, from reconciling financial statements to providing step-by-step instructions for field technicians based on contextual understanding of product information.

artificial intelligence, machine learning, natural language, (21 more...)

2503.12687

Genre: Instructional Material (1.00)

Industry:

Transportation (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Augmented Adversarial Trigger Learning

Wang, Zhe, Qi, Yanjun

Gradient optimization-based adversarial attack methods automate the learning of adversarial triggers to generate jailbreak prompts or leak system prompts. In this work, we take a closer look at the optimization objective of adversarial trigger learning and propose ATLA: Adversarial Trigger Learning with Augmented objectives. ATLA improves the negative log-likelihood loss used by previous studies into a weighted loss formulation that encourages the learned adversarial triggers to optimize more towards response format tokens. This enables ATLA to learn an adversarial trigger from just one query-response pair and the learned trigger generalizes well to other similar queries. We further design a variation to augment trigger optimization with an auxiliary loss that suppresses evasive responses. We showcase how to use ATLA to learn adversarial suffixes jailbreaking LLMs and to extract hidden system prompts. Empirically we demonstrate that ATLA consistently outperforms current state-of-the-art techniques, achieving nearly 100% success in attacking while requiring 80% fewer queries. ATLA learned jailbreak suffixes demonstrate high generalization to unseen queries and transfer well to new LLMs.

large language model, machine learning, natural language, (17 more...)

2503.12339

Country: North America > United States > Virginia (0.04)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Ferret: An Efficient Online Continual Learning Framework under Varying Memory Constraints

Zhou, Yuhao, Tian, Yuxin, Lv, Jindi, Shi, Mingjia, Li, Yuanxi, Ye, Qing, Zhang, Shuhao, Lv, Jiancheng

In the realm of high-frequency data streams, achieving real-time learning within varying memory constraints is paramount. This paper presents Ferret, a comprehensive framework designed to enhance online accuracy of Online Continual Learning (OCL) algorithms while dynamically adapting to varying memory budgets. Ferret employs a fine-grained pipeline parallelism strategy combined with an iterative gradient compensation algorithm, ensuring seamless handling of high-frequency data with minimal latency, and effectively counteracting the challenge of stale gradients in parallel training. To adapt to varying memory budgets, its automated model partitioning and pipeline planning optimizes performance regardless of memory limitations. Extensive experiments across 20 benchmarks and 5 integrated OCL algorithms show Ferret's remarkable efficiency, achieving up to 3.7$\times$ lower memory overhead to reach the same online accuracy compared to competing methods. Furthermore, Ferret consistently outperforms these methods across diverse memory budgets, underscoring its superior adaptability. These findings position Ferret as a premier solution for efficient and adaptive OCL framework in real-time environments.

artificial intelligence, machine learning, real time system, (13 more...)

2503.12053

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
(6 more...)

Genre:

Instructional Material > Online (0.61)
Research Report > Promising Solution (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Architecture > Real Time Systems (0.68)
(2 more...)

General Scales Unlock AI Evaluation with Explanatory and Predictive Power

Zhou, Lexin, Pacchiardi, Lorenzo, Martínez-Plumed, Fernando, Collins, Katherine M., Moros-Daval, Yael, Zhang, Seraphina, Zhao, Qinlin, Huang, Yitian, Sun, Luning, Prunty, Jonathan E., Li, Zongqian, Sánchez-García, Pablo, Chen, Kexin Jiang, Casares, Pablo A. M., Zu, Jiyun, Burden, John, Mehrbakhsh, Behzad, Stillwell, David, Cebrian, Manuel, Wang, Jindong, Henderson, Peter, Wu, Sherry Tongshuang, Kyllonen, Patrick C., Cheke, Lucy, Xie, Xing, Hernández-Orallo, José

Ensuring safe and effective use of AI requires understanding and anticipating its performance on novel tasks, from advanced scientific challenges to transformed workplace activities. So far, benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems, given the low transferability across diverse tasks. In this paper, we introduce general scales for AI evaluation that can explain what common AI benchmarks really measure, extract ability profiles of AI systems, and predict their performance for new task instances, in- and out-of-distribution. Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate. Illustrated for 15 large language models and 63 tasks, high explanatory power is unleashed from inspecting the demand and ability profiles, bringing insights on the sensitivity and specificity exhibited by different benchmarks, and how knowledge, metacognition and reasoning are affected by model size, chain-of-thought and distillation. Surprisingly, high predictive power at the instance level becomes possible using these demand levels, providing superior estimates over black-box baseline predictors based on embeddings or finetuning, especially in out-of-distribution settings (new tasks and new benchmarks). The scales, rubrics, battery, techniques and results presented here represent a major step for AI evaluation, underpinning the reliable deployment of AI in the years ahead. (Collaborative platform: https://kinds-of-intelligence-cfi.github.io/ADELE.)

data mining, large language model, machine learning, (17 more...)

2503.06378

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Austria > Vienna (0.13)
Europe > France (0.04)
(19 more...)

Genre:

Instructional Material (1.00)
Questionnaire & Opinion Survey (0.92)
Overview (0.92)
(2 more...)

Industry:

Leisure & Entertainment > Sports (1.00)
Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(12 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(6 more...)

Xie, Meijuan, Luo, Liling

The Status Quo and Future of AI-TPACK for Mathematics Teacher Education Students: A Case Study in Chinese Universities

As artificial intelligence (AI) technology becomes increasingly prevalent in the filed of education, there is a growing need for mathematics teacher education students (MTES) to demonstrate proficiency in the integration of AI with the technological pedagogical content knowledge (AI-TPACK). To study the issue, we firstly devised an systematic AI-TPACK scale and test on 412 MTES from seven universities. Through descriptive statistical analyses, we found that the current status of AI-TPACK for MTES in China is at a basic, preliminary stage. Secondly, we compared MTES between three different grades on the six variables and found that there is no discernible difference, which suggested that graduate studies were observed to have no promotion in the development of AI-TPACK competencies. Thirdly, we proposed a new AI-TPACK structural equation model (AI-TPACK-SEM) to explore the impact of self-efficacy and teaching beliefs on AI-TPACK. Our findings indicate a positive correlation between self-efficacy and AI-TPACK. We also come to a conclusion that may be contrary to common perception, excessive teaching beliefs may impede the advancement of AI-TPACK. Overall, this paper revealed the current status of AI-TPACK for MTES in China for the first time, designed a dedicated SEM to study the effect of specific factors on AI-TPACK, and proposed some suggestions on future developments.

ai-tpack, machine learning, natural language, (16 more...)

2503.13533

Country:

Asia > China (0.45)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.88)

Industry:

Education > Teacher Education (1.00)
Education > Educational Setting > Higher Education (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

arXiv.org Artificial IntelligenceMar-14-2025

LLM Agents for Education: Advances and Applications

Chu, Zhendong, Wang, Shen, Xie, Jian, Zhu, Tinghui, Yan, Yibo, Ye, Jinheng, Zhong, Aoxiao, Hu, Xuming, Liang, Jing, Yu, Philip S., Wen, Qingsong

Large Language Model (LLM) agents have demonstrated remarkable capabilities in automating tasks and driving innovation across diverse educational applications. In this survey, we provide a systematic review of state-of-the-art research on LLM agents in education, categorizing them into two broad classes: (1) \emph{Pedagogical Agents}, which focus on automating complex pedagogical tasks to support both teachers and students; and (2) \emph{Domain-Specific Educational Agents}, which are tailored for specialized fields such as science education, language learning, and professional development. We comprehensively examine the technological advancements underlying these LLM agents, including key datasets, benchmarks, and algorithmic frameworks that drive their effectiveness. Furthermore, we discuss critical challenges such as privacy, bias and fairness concerns, hallucination mitigation, and integration with existing educational ecosystems. This survey aims to provide a comprehensive technological overview of LLM agents for education, fostering further research and collaboration to enhance their impact for the greater good of learners and educators alike.

large language model, machine learning, natural language, (16 more...)

2503.11733

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Middle East > Jordan (0.04)
(13 more...)

Genre:

Overview (1.00)
Instructional Material (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Yang, Bin, Liang, Yuxuan, Guo, Chenjuan, Jensen, Christian S.

Data Driven Decision Making with Time Series and Spatio-temporal Data

arXiv.org Artificial IntelligenceMar-14-2025

Time series data captures properties that change over time. Such data occurs widely, ranging from the scientific and medical domains to the industrial and environmental domains. When the properties in time series exhibit spatial variations, we often call the data spatio-temporal. As part of the continued digitalization of processes throughout society, increasingly large volumes of time series and spatio-temporal data are available. In this tutorial, we focus on data-driven decision making with such data, e.g., enabling greener and more efficient transportation based on traffic time series forecasting. The tutorial adopts the holistic paradigm of "data-governance-analytics-decision." We first introduce the data foundation of time series and spatio-temporal data, which is often heterogeneous. Next, we discuss data governance methods that aim to improve data quality. We then cover data analytics, focusing on five desired characteristics: automation, robustness, generality, explainability, and resource efficiency. We finally cover data-driven decision making strategies and briefly discuss promising research directions. We hope that the tutorial will serve as a primary resource for researchers and practitioners who are interested in value creation from time series and spatio-temporal data.

guo, jensen, yang, (16 more...)

2503.08473

Country:

Europe > Denmark > North Jutland > Aalborg (0.05)
Asia > China > Guangdong Province > Guangzhou (0.05)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Instructional Material (0.69)

Industry:

Transportation (1.00)
Education (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Akhoroz, Mehmet, Yildirim, Caglar

Conversational AI as a Coding Assistant: Understanding Programmers' Interactions with and Expectations from Large Language Models for Coding

arXiv.org Artificial IntelligenceMar-14-2025

Conversational AI interfaces powered by large language models (LLMs) are increasingly used as coding assistants. However, questions remain about how programmers interact with LLM-based conversational agents, the challenges they encounter, and the factors influencing adoption. This study investigates programmers' usage patterns, perceptions, and interaction strategies when engaging with LLM-driven coding assistants. Through a survey, participants reported both the benefits, such as efficiency and clarity of explanations, and the limitations, including inaccuracies, lack of contextual awareness, and concerns about over-reliance. Notably, some programmers actively avoid LLMs due to a preference for independent learning, distrust in AI-generated code, and ethical considerations. Based on our findings, we propose design guidelines for improving conversational coding assistants, emphasizing context retention, transparency, multimodal support, and adaptability to user preferences. These insights contribute to the broader understanding of how LLM-based conversational agents can be effectively integrated into software development workflows while addressing adoption barriers and enhancing usability.

explanation, llm, participant, (16 more...)

2503.16508

Country: North America > United States > California > Ventura County > Thousand Oaks (0.04)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Instructional Material (1.00)

Industry:

Education (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)