lecture video
CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA
Kovalev, Vsevolod, Kumar, Parteek
We study timestamped question answering over educational lecture videos under a single-GPU latency/memory budget. Given a natural-language query, the system retrieves relevant timestamped segments and synthesizes a grounded answer. We present CourseTimeQA (52.3 h, 902 queries across six courses) and a lightweight, latency-constrained cross-modal retriever (CrossFusion-RAG) that combines frozen encoders, a learned 512->768 vision projection, shallow query-agnostic cross-attention over ASR and frames with a temporal-consistency regularizer, and a small cross-attentive reranker. On CourseTimeQA, CrossFusion-RAG improves nDCG@10 by 0.10 and MRR by 0.08 over a strong BLIP-2 retriever while achieving approximately 1.55 s median end-to-end latency on a single A100. Closest comparators (zero-shot CLIP multi-frame pooling; CLIP + cross-encoder reranker + MMR; learned late-fusion gating; text-only hybrid with cross-encoder reranking and its MMR variant; caption-augmented text retrieval; non-learned temporal smoothing) are evaluated under matched hardware and indexing. We report robustness across ASR noise (WER quartiles), diagnostics for temporal localization, and full training/tuning details to support reproducible comparison.
- North America > United States > Washington (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report (0.65)
- Instructional Material (0.48)
Next-Gen Education: Enhancing AI for Microlearning
Saha, Suman, Rahbari, Fatemeh, Sadique, Farhan, Velamakanni, Sri Krishna Chaitanya, Farooque, Mahfuza, Rothwell, William J.
This paper explores integrating microlearning strategies into university curricula, particularly in computer science education, to counteract the decline in class attendance and engagement in US universities after COVID. As students increasingly opt for remote learning and recorded lectures, traditional educational approaches struggle to maintain engagement and effectiveness. Microlearning, which breaks complex subjects into manageable units, is proposed to address shorter attention spans and enhance educational outcomes. It uses interactive formats such as videos, quizzes, flashcards, and scenario-based exercises, which are especially beneficial for topics like algorithms and programming logic requiring deep understanding and ongoing practice. Adoption of microlearning is often limited by the effort needed to create such materials. This paper proposes leveraging AI tools, specifically ChatGPT, to reduce the workload for educators by automating the creation of supplementary materials. While AI can automate certain tasks, educators remain essential in guiding and shaping the learning process. This AI-enhanced approach ensures course content is kept current with the latest research and technology, with educators providing context and insights. By examining AI capabilities in microlearning, this study shows the potential to transform educational practices and outcomes in computer science, offering a practical model for combining advanced technology with established teaching methods.
- North America > United States > Pennsylvania (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Research Report > New Finding (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
- Questionnaire & Opinion Survey (0.93)
- Education > Educational Setting > Online (1.00)
- Education > Educational Setting > Higher Education (1.00)
- Education > Curriculum > Subject-Specific Education (0.89)
- Education > Educational Technology > Educational Software > Computer Based Training (0.68)
PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos
Wei, Kangda, Zhou, Zhengyu, Wang, Bingqing, Araki, Jun, Lange, Lukas, Huang, Ruihong, Feng, Zhe
In recent years, online lecture videos have become an increasingly popular resource for acquiring new knowledge. Systems capable of effectively understanding/indexing lecture videos are thus highly desirable, enabling downstream tasks like question answering to help users efficiently locate specific information within videos. This work proposes PreMind, a novel multi-agent multimodal framework that leverages various large models for advanced understanding/indexing of presentation-style videos. PreMind first segments videos into slide-presentation segments using a Vision-Language Model (VLM) to enhance modern shot-detection techniques. Each segment is then analyzed to generate multimodal indexes through three key steps: (1) extracting slide visual content, (2) transcribing speech narratives, and (3) consolidating these visual and speech contents into an integrated understanding. Three innovative mechanisms are also proposed to improve performance: leveraging prior lecture knowledge to refine visual understanding, detecting/correcting speech transcription errors using a VLM, and utilizing a critic agent for dynamic iterative self-reflection in vision analysis. Compared to traditional video indexing methods, PreMind captures rich, reliable multimodal information, allowing users to search for details like abbreviations shown only on slides. Systematic evaluations on the public LPM dataset and an internal enterprise dataset are conducted to validate PreMind's effectiveness, supported by detailed analyses.
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (3 more...)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (0.48)
- Instructional Material > Online (0.34)
Leveraging Lecture Content for Improved Feedback: Explorations with GPT-4 and Retrieval Augmented Generation
Jacobs, Sven, Jaschke, Steffen
This paper presents the use of Retrieval Augmented Generation (RAG) to improve the feedback generated by Large Language Models for programming tasks. For this purpose, corresponding lecture recordings were transcribed and made available to the Large Language Model GPT-4 as external knowledge source together with timestamps as metainformation by using RAG. The purpose of this is to prevent hallucinations and to enforce the use of the technical terms and phrases from the lecture. In an exercise platform developed to solve programming problems for an introductory programming lecture, students can request feedback on their solutions generated by GPT-4. For this task GPT-4 receives the students' code solution, the compiler output, the result of unit tests and the relevant passages from the lecture notes available through the use of RAG as additional context. The feedback generated by GPT-4 should guide students to solve problems independently and link to the lecture content, using the time stamps of the transcript as meta-information. In this way, the corresponding lecture videos can be viewed immediately at the corresponding positions. For the evaluation, students worked with the tool in a workshop and decided for each feedback whether it should be extended by RAG or not. First results based on a questionnaire and the collected usage data show that the use of RAG can improve feedback generation and is preferred by students in some situations. Due to the slower speed of feedback generation, the benefits are situation dependent.
- North America > United States > New York (0.05)
- Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Siegen (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- (5 more...)
FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts
Kawamura, Kazuki, Rekimoto, Jun
Quickly understanding lengthy lecture videos is essential for learners with limited time and interest in various topics to improve their learning efficiency. To this end, video summarization has been actively researched to enable users to view only important scenes from a video. However, these studies focus on either the visual or audio information of a video and extract important segments in the video. Therefore, there is a risk of missing important information when both the teacher's speech and visual information on the blackboard or slides are important, such as in a lecture video. To tackle this issue, we propose FastPerson, a video summarization approach that considers both the visual and auditory information in lecture videos. FastPerson creates summary videos by utilizing audio transcriptions along with on-screen images and text, minimizing the risk of overlooking crucial information for learners. Further, it provides a feature that allows learners to switch between the summary and original videos for each chapter of the video, enabling them to adjust the pace of learning based on their interests and level of understanding. We conducted an evaluation with 40 participants to assess the effectiveness of our method and confirmed that it reduced viewing time by 53\% at the same level of comprehension as that when using traditional video playback methods.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Instructional Material (1.00)
- Overview (0.93)
- Education > Educational Technology > Educational Software > Computer Based Training (1.00)
- Education > Educational Setting > Online (1.00)
Modern Constraint Programming Education: Lessons for the Future
Santanam, Tejas, Van Hentenryck, Pascal
A general overview of current CP courses and instructional methods is presented, with a focus on online and virtually-delivered courses. This is followed by a discussion of the novel approach taken to introductory CP education for engineering students at large scale at the Georgia Institute of Technology (Georgia Tech) in Atlanta, GA, USA. The paper summarizes important takeaways from the Georgia Tech CP course and ends with a discussion on the future of CP education. Some ideas for instructional methods, promotional methods, and organizational changes are proposed to aid in the long-term growth of CP education.
- North America > United States > Georgia > Fulton County > Atlanta (0.24)
- North America > United States > Indiana (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- (3 more...)
- Instructional Material > Course Syllabus & Notes (1.00)
- Research Report (0.70)
- Education > Educational Technology > Educational Software > Computer Based Training (1.00)
- Education > Educational Setting > Online (1.00)
- Education > Curriculum > Subject-Specific Education (0.66)
Large Language Model-based System to Provide Immediate Feedback to Students in Flipped Classroom Preparation Learning
Uchiyama, Shintaro, Umemura, Kyoji, Morita, Yusuke
This paper proposes a system that uses large language models to provide immediate feedback to students in flipped classroom preparation learning. This study aimed to solve challenges in the flipped classroom model, such as ensuring that students are emotionally engaged and motivated to learn. Students often have questions about the content of lecture videos in the preparation of flipped classrooms, but it is difficult for teachers to answer them immediately. The proposed system was developed using the ChatGPT API on a video-watching support system for preparation learning that is being used in real practice. Answers from ChatGPT often do not align with the context of the student's question. Therefore, this paper also proposes a method to align the answer with the context. This paper also proposes a method to collect the teacher's answers to the students' questions and use them as additional guides for the students. This paper discusses the design and implementation of the proposed system.
- Instructional Material (1.00)
- Research Report (0.84)
Introduction to Data-Centric AI
Typical machine learning classes teach techniques to produce effective models for a given dataset. In real-world applications, data is messy and improving models is not the only way to get better performance. You can also improve the dataset itself rather than treating it as fixed. Data-Centric AI (DCAI) is an emerging science that studies techniques to improve datasets, which is often the best way to improve performance in practical ML applications. While good data scientists have long practiced this manually via ad hoc trial/error and intuition, DCAI considers the improvement of data as a systematic engineering discipline.
- Asia > India > West Bengal > Kharagpur (0.07)
- North America > United States (0.05)
- Asia > India > West Bengal > Kolkata (0.05)
Recreating an ML Master's degree with Online Courses
A Bachelor's study usually takes six semesters; a Master's study takes four. But this is only an outline. I've witnessed people doing their BA in three semesters and some taking nine semesters. Sometimes there are so many exciting courses that you voluntarily stay longer to learn it all. Therefore, I've loosely structured the recreated curriculum into four semesters.
- Education > Educational Setting > Online (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (0.40)