Instructional Material
Redefining Elderly Care with Agentic AI: Challenges and Opportunities
Khalil, Ruhul Amin, Ahmad, Kashif, Ali, Hazrat
In this article, we explore the potential for transformation in elderly care through Agentic Artificial Intelligence (AI), powered by Large Language Models (LLMs). We discuss the proactive and autonomous decision-making facilitated by Agentic AI in elderly care. Personalized tracking of health, cognitive care, and environmental management, all aimed at enhancing independence and high-level living for older adults, represents important areas of application. With a potential for significant transformation of elderly care, Agentic AI also raises profound concerns about data privacy and security, decision independence, and access. We share key insights to emphasize the need for ethical safeguards, privacy protections, and transparent decision-making. Our goal in this article is to provide a balanced discussion of both the potential and the challenges associated with Agentic AI, and to provide insights into its responsible use in elderly care, to bring Agentic AI into harmony with the requirements and vulnerabilities specific to the elderly. Finally, we identify the priorities for the academic research communities, to achieve human-centered advancements and integration of Agentic AI in elderly care. T o the best of our knowledge, this is no existing study that reviews the role of Agentic AI in elderly care. Hence, we address the literature gap by analyzing the unique capabilities, applications, and limitations of LLM-based Agentic AI in elderly care.
Performance comparison of medical image classification systems using TensorFlow Keras, PyTorch, and JAX
Bećirović, Merjem, Kurtović, Amina, Smajlović, Nordin, Kapo, Medina, Akagić, Amila
Medical imaging plays a vital role in early disease diagnosis and monitoring. Specifically, blood microscopy offers valuable insights into blood cell morphology and the detection of hematological disorders. In recent years, deep learning-based automated classification systems have demonstrated high potential in enhancing the accuracy and efficiency of blood image analysis. However, a detailed performance analysis of specific deep learning frameworks appears to be lacking. This paper compares the performance of three popular deep learning frameworks, TensorFlow with Keras, PyTorch, and JAX, in classifying blood cell images from the publicly available BloodMNIST dataset. The study primarily focuses on inference time differences, but also classification performance for different image sizes. The results reveal variations in performance across frameworks, influenced by factors such as image resolution and framework-specific optimizations. Classification accuracy for JAX and PyTorch was comparable to current benchmarks, showcasing the efficiency of these frameworks for medical image classification.
Bridging MOOCs, Smart Teaching, and AI: A Decade of Evolution Toward a Unified Pedagogy
-- Over the past decade, higher education ha s evolved through three distinct paradigms: the emergence of Massive Open Online Courses (MOOCs), the integration of Smart Teaching technologies into classrooms, and the rise of AI - enhanced learning . Each paradigm is intended to address specific challenges in traditional education: MOOCs enable ubiquitous access to learning resources; Smart Teaching supports real - time interaction with data - driven insights; and generative AI offers personalized feedback and on - demand content generation. However, the se paradigms are often implemented in isol ation due to the ir disparate technological origins and policy - driven adoption . This paper examines the origins, strengths, and limitations of each paradigm, and advocates a unified pedagogical perspective that synthesizes their complementary affordances. W e propose a three - layer instructional framework that combines the scalability of MOOCs, the responsiveness of Smart Teaching, and the adaptivity of AI . To demonstrate its feasibility, we present a curriculum design for a project - based course . The findings highlight the framework's potential to enhance learner engagement, support instructors, and enable personalized yet scalable learning. T he landscape of higher education h as undergone multiple waves of digital transformation over the past decade .
Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification
Shen, Xing, Szeto, Justin, Li, Mingyang, Huang, Hengguan, Arbel, Tal
Multimodal large language models (MLLMs) have enormous potential to perform few-shot in-context learning in the context of medical image analysis. However, safe deployment of these models into real-world clinical practice requires an in-depth analysis of the accuracies of their predictions, and their associated calibration errors, particularly across different demographic subgroups. In this work, we present the first investigation into the calibration biases and demographic unfairness of MLLMs' predictions and confidence scores in few-shot in-context learning for medical image classification. We introduce CALIN, an inference-time calibration method designed to mitigate the associated biases. Specifically, CALIN estimates the amount of calibration needed, represented by calibration matrices, using a bi-level procedure: progressing from the population level to the subgroup level prior to inference. It then applies this estimation to calibrate the predicted confidence scores during inference. Experimental results on three medical imaging datasets: PA-PILA for fundus image classification, HAM10000 for skin cancer classification, and MIMIC-CXR for chest X-ray classification demonstrate CALIN's effectiveness at ensuring fair confidence calibration in its prediction, while improving its overall prediction accuracies and exhibiting minimum fairness-utility trade-off.
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Ghorbani, Asghar, Fattahi, Hanieh
Limited infrastructure, scarce educational resources, and unreliable internet access often hinder physics and photonics education in underdeveloped regions. These barriers create deep inequities in Science, Technology, Engineering, and Mathematics (STEM) education. This article explores how Small Language Models (SLMs)-compact, AI-powered tools that can run offline on low-power devices, offering a scalable solution. By acting as virtual tutors, enabling native-language instruction, and supporting interactive learning, SLMs can help address the shortage of trained educators and laboratory access. By narrowing the digital divide through targeted investment in AI technologies, SLMs present a scalable and inclusive solution to advance STEM education and foster scientific empowerment in marginalized communities.
Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design
Hedman, Marcel, Ivanova, Desi R., Guan, Cong, Rainforth, Tom
We develop a semi-amortized, policy-based, approach to Bayesian experimental design (BED) called Stepwise Deep Adaptive Design (Step-DAD). Like existing, fully amortized, policy-based BED approaches, Step-DAD trains a design policy upfront before the experiment. However, rather than keeping this policy fixed, Step-DAD periodically updates it as data is gathered, refining it to the particular experimental instance. This test-time adaptation improves both the flexibility and the robustness of the design strategy compared with existing approaches. Empirically, Step-DAD consistently demonstrates superior decision-making and robustness compared with current state-of-the-art BED methods.
CodeEdu: A Multi-Agent Collaborative Platform for Personalized Coding Education
Zhao, Jianing, Gao, Peng, Cao, Jiannong, Wen, Zhiyuan, Chen, Chen, Yin, Jianing, Yang, Ruosong, Yuan, Bo
Large Language Models (LLMs) have demonstrated considerable potential in improving coding education by providing support for code writing, explanation, and debugging. However, existing LLM-based approaches generally fail to assess students' abilities, design learning plans, provide personalized material aligned with individual learning goals, and enable interactive learning. Current work mostly uses single LLM agents, which limits their ability to understand complex code repositories and schedule step-by-step tutoring. Recent research has shown that multi-agent LLMs can collaborate to solve complicated problems in various domains like software engineering, but their potential in the field of education remains unexplored. In this work, we introduce CodeEdu, an innovative multi-agent collaborative platform that combines LLMs with tool use to provide proactive and personalized education in coding. Unlike static pipelines, CodeEdu dynamically allocates agents and tasks to meet student needs. Various agents in CodeEdu undertake certain functions specifically, including task planning, personalized material generation, real-time QA, step-by-step tutoring, code execution, debugging, and learning report generation, facilitated with extensive external tools to improve task efficiency. Automated evaluations reveal that CodeEdu substantially enhances students' coding performance.
VerilogDB: The Largest, Highest-Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation
Calzada, Paul E., Ibnat, Zahin, Rahman, Tanvir, Kandula, Kamal, Lu, Danyu, Saha, Sujan Kumar, Farahmandi, Farimah, Tehranipoor, Mark
Large Language Models (LLMs) are gaining popularity for hardware design automation, particularly through Register Transfer Level (RTL) code generation. In this work, we examine the current literature on RTL generation using LLMs and identify key requirements for training and fine-tuning datasets. We construct a robust Verilog dataset through an automated three-pronged process involving database (DB) creation and management with PostgreSQL, data collection from code hosting sites like OpenCores and GitHub, and data preprocessing to verify the codes' syntax, run logic synthesis, and extract relevant module metadata. We implement a scalable and efficient DB infrastructure to support analysis and detail our preprocessing pipeline to enforce high-quality data before DB insertion. The resulting dataset comprises 20,392 Verilog samples, 751 MB of Verilog code data, which is the largest high-quality Verilog dataset for LLM fine-tuning to our knowledge. We further evaluate the dataset, address associated challenges, and explore potential applications for future research and development in LLM-based hardware generation.
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
Ren, Z. Z., Shao, Zhihong, Song, Junxiao, Xin, Huajian, Wang, Haocheng, Zhao, Wanjia, Zhang, Liyue, Fu, Zhe, Zhu, Qihao, Yang, Dejian, Wu, Z. F., Gou, Zhibin, Ma, Shirong, Tang, Hongxuan, Liu, Yuxuan, Gao, Wenjun, Guo, Daya, Ruan, Chong
We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. In addition to standard benchmarks, we introduce ProverBench, a collection of 325 formalized problems, to enrich our evaluation, including 15 selected problems from the recent AIME competitions (years 24-25). Further evaluation on these 15 AIME problems shows that the model successfully solves 6 of them. In comparison, DeepSeek-V3 solves 8 of these problems using majority voting, highlighting that the gap between formal and informal mathematical reasoning in large language models is substantially narrowing.
Imitating Mistakes in a Learning Companion AI Agent for Online Peer Learning
Moribe, Sosui, Ushiama, Taketoshi
In recent years, peer learning has gained attention as a method that promotes spontaneous thinking among learners, and its effectiveness has been confirmed by numerous studies. This study aims to develop an AI Agent as a learning companion that enables peer learning anytime and anywhere. However, peer learning between humans has various limitations, and it is not always effective. Effective peer learning requires companions at the same proficiency levels. In this study, we assume that a learner's peers with the same proficiency level as the learner make the same mistakes as the learner does and focus on English composition as a specific example to validate this approach.