Instructional Material
Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models
Hu, Zheng, Li, Zhe, Jiao, Ziyun, Nakagawa, Satoshi, Deng, Jiawen, Cai, Shimin, Zhou, Tao, Ren, Fuji
In recent years, knowledge graphs have been integrated into recommender systems as item-side auxiliary information, enhancing recommendation accuracy. However, constructing and integrating structural user-side knowledge remains a significant challenge due to the improper granularity and inherent scarcity of user-side features. Recent advancements in Large Language Models (LLMs) offer the potential to bridge this gap by leveraging their human behavior understanding and extensive real-world knowledge. Nevertheless, integrating LLM-generated information into recommender systems presents challenges, including the risk of noisy information and the need for additional knowledge transfer. In this paper, we propose an LLM-based user-side knowledge inference method alongside a carefully designed recommendation framework to address these challenges. Our approach employs LLMs to infer user interests based on historical behaviors, integrating this user-side information with item-side and collaborative data to construct a hybrid structure: the Collaborative Interest Knowledge Graph (CIKG). Furthermore, we propose a CIKG-based recommendation framework that includes a user interest reconstruction module and a cross-domain contrastive learning module to mitigate potential noise and facilitate knowledge transfer. We conduct extensive experiments on three real-world datasets to validate the effectiveness of our method. Our approach achieves state-of-the-art performance compared to competitive baselines, particularly for users with sparse interactions.
Unleashing the Power of Continual Learning on Non-Centralized Devices: A Survey
Li, Yichen, Wang, Haozhao, Xu, Wenchao, Xiao, Tianzhe, Liu, Hong, Tu, Minzhu, Wang, Yuying, Yang, Xin, Zhang, Rui, Yu, Shui, Guo, Song, Li, Ruixuan
Non-Centralized Continual Learning (NCCL) has become an emerging paradigm for enabling distributed devices such as vehicles and servers to handle streaming data from a joint non-stationary environment. To achieve high reliability and scalability in deploying this paradigm in distributed systems, it is essential to conquer challenges stemming from both spatial and temporal dimensions, manifesting as distribution shifts, catastrophic forgetting, heterogeneity, and privacy issues. This survey focuses on a comprehensive examination of the development of the non-centralized continual learning algorithms and the real-world deployment across distributed devices. We begin with an introduction to the background and fundamentals of non-centralized learning and continual learning. Then, we review existing solutions from three levels to represent how existing techniques alleviate the catastrophic forgetting and distribution shift. Additionally, we delve into the various types of heterogeneity issues, security, and privacy attributes, as well as real-world applications across three prevalent scenarios. Furthermore, we establish a large-scale benchmark to revisit this problem and analyze the performance of the state-of-the-art NCCL approaches. Finally, we discuss the important challenges and future research directions in NCCL.
Knowledge Distillation in RNN-Attention Models for Early Prediction of Student Performance
Leelaluk, Sukrit, Tang, Cheng, Švábenský, Valdemar, Shimada, Atsushi
Educational data mining (EDM) is a part of applied computing that focuses on automatically analyzing data from learning contexts. Early prediction for identifying at-risk students is a crucial and widely researched topic in EDM research. It enables instructors to support at-risk students to stay on track, preventing student dropout or failure. Previous studies have predicted students' learning performance to identify at-risk students by using machine learning on data collected from e-learning platforms. However, most studies aimed to identify at-risk students utilizing the entire course data after the course finished. This does not correspond to the real-world scenario that at-risk students may drop out before the course ends. To address this problem, we introduce an RNN-Attention-KD (knowledge distillation) framework to predict at-risk students early throughout a course. It leverages the strengths of Recurrent Neural Networks (RNNs) in handling time-sequence data to predict students' performance at each time step and employs an attention mechanism to focus on relevant time steps for improved predictive accuracy. At the same time, KD is applied to compress the time steps to facilitate early prediction. In an empirical evaluation, RNN-Attention-KD outperforms traditional neural network models in terms of recall and F1-measure. For example, it obtained recall and F1-measure of 0.49 and 0.51 for Weeks 1--3 and 0.51 and 0.61 for Weeks 1--6 across all datasets from four years of a university course. Then, an ablation study investigated the contributions of different knowledge transfer methods (distillation objectives). We found that hint loss from the hidden layer of RNN and context vector loss from the attention module on RNN could enhance the model's prediction performance for identifying at-risk students. These results are relevant for EDM researchers employing deep learning models.
Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation
Kim, Minkyoung, Kim, Yunha, Seo, Hyeram, Choi, Heejung, Han, Jiye, Kee, Gaeun, Ko, Soyoung, Jung, HyoJe, Kim, Byeolhee, Kim, Young-Hak, Park, Sanghyun, Jun, Tae Joon
Large language models (LLMs) have exhibited outstanding performance in natural language processing tasks. However, these models remain susceptible to adversarial attacks in which slight input perturbations can lead to harmful or misleading outputs. A gradient-based defensive suffix generation algorithm is designed to bolster the robustness of LLMs. By appending carefully optimized defensive suffixes to input prompts, the algorithm mitigates adversarial influences while preserving the models' utility. To enhance adversarial understanding, a novel total loss function ($L_{\text{total}}$) combining defensive loss ($L_{\text{def}}$) and adversarial loss ($L_{\text{adv}}$) generates defensive suffixes more effectively. Experimental evaluations conducted on open-source LLMs such as Gemma-7B, mistral-7B, Llama2-7B, and Llama2-13B show that the proposed method reduces attack success rates (ASR) by an average of 11\% compared to models without defensive suffixes. Additionally, the perplexity score of Gemma-7B decreased from 6.57 to 3.93 when applying the defensive suffix generated by openELM-270M. Furthermore, TruthfulQA evaluations demonstrate consistent improvements with Truthfulness scores increasing by up to 10\% across tested configurations. This approach significantly enhances the security of LLMs in critical applications without requiring extensive retraining.
Mastering AI: Big Data, Deep Learning, and the Evolution of Large Language Models -- AutoML from Basics to State-of-the-Art Techniques
Feng, Pohsun, Bi, Ziqian, Wen, Yizhu, Peng, Benji, Liu, Junyu, Yin, Caitlyn Heqi, Wang, Tianyang, Chen, Keyu, Zhang, Sen, Li, Ming, Xu, Jiawei, Liu, Ming, Pan, Xuanhe, Wang, Jinlang, Niu, Qian
In recent years, Artificial Intelligence (AI) and Machine Learning (ML) have grown tremendously in popularity across various industries. From healthcare and finance to retail and automotive, adopting machine learning models has led to significant advancements[1]. However, building machine learning models traditionally requires deep knowledge in multiple areas, such as data preprocessing, feature engineering, model selection, hyperparameter tuning, and evaluation[2]. For many beginners and even experienced practitioners, this process can be time-consuming and technically challenging. This is where AutoML (Automated Machine Learning) comes in. AutoML simplifies the process of building machine learning models by automating many of the steps that would otherwise require manual intervention [3]. AutoML tools can automatically preprocess data, select the most suitable algorithms, and fine-tune hyperparameters to produce highly accurate models [4]. This automation not only speeds up the model development cycle but also allows users without deep knowledge of machine learning to create models with comparable performance to those made by experienced data scientists.
jinns: a JAX Library for Physics-Informed Neural Networks
Gangloff, Hugo, Jouvin, Nicolas
jinns is an open-source Python library for physics-informed neural networks, built to tackle both forward and inverse problems, as well as meta-model learning. Rooted in the JAX ecosystem, it provides a versatile framework for efficiently prototyping real-problems, while easily allowing extensions to specific needs. Furthermore, the implementation leverages existing popular JAX libraries such as equinox and optax for model definition and optimisation, bringing a sense of familiarity to the user. Many models are available as baselines, and the documentation provides reference implementations of different use-cases along with step-by-step tutorials for extensions to specific needs. The code is available on Gitlab https://gitlab.com/mia_jinns/jinns.
Deep Learning and Machine Learning -- Natural Language Processing: From Theory to Application
Chen, Keyu, Fei, Cheng, Bi, Ziqian, Liu, Junyu, Peng, Benji, Zhang, Sen, Pan, Xuanhe, Xu, Jiawei, Wang, Jinlang, Yin, Caitlyn Heqi, Zhang, Yichao, Feng, Pohsun, Wen, Yizhu, Wang, Tianyang, Li, Ming, Ren, Jintao, Niu, Qian, Chen, Silin, Hsieh, Weiche, Yan, Lawrence K. Q., Liang, Chia Xin, Xu, Han, Tseng, Hong-Ming, Song, Xinyuan, Liu, Ming
With a focus on natural language processing (NLP) and the role of large language models (LLMs), we explore the intersection of machine learning, deep learning, and artificial intelligence. As artificial intelligence continues to revolutionize fields from healthcare to finance, NLP techniques such as tokenization, text classification, and entity recognition are essential for processing and understanding human language. This paper discusses advanced data preprocessing techniques and the use of frameworks like Hugging Face for implementing transformer-based models. Additionally, it highlights challenges such as handling multilingual data, reducing bias, and ensuring model robustness. By addressing key aspects of data processing and model fine-tuning, this work aims to provide insights into deploying effective and ethically sound AI solutions.
ConDo: Continual Domain Expansion for Absolute Pose Regression
Li, Zijun, Cai, Zhipeng, Yang, Bochun, Shen, Xuelun, Shen, Siqi, Fan, Xiaoliang, Paulitsch, Michael, Wang, Cheng
Visual localization is a fundamental machine learning problem. Absolute Pose Regression (APR) trains a scene-dependent model to efficiently map an input image to the camera pose in a pre-defined scene. However, many applications have continually changing environments, where inference data at novel poses or scene conditions (weather, geometry) appear after deployment. Training APR on a fixed dataset leads to overfitting, making it fail catastrophically on challenging novel data. This work proposes Continual Domain Expansion (ConDo), which continually collects unlabeled inference data to update the deployed APR. Instead of applying standard unsupervised domain adaptation methods which are ineffective for APR, ConDo effectively learns from unlabeled data by distilling knowledge from scene-agnostic localization methods. By sampling data uniformly from historical and newly collected data, ConDo can effectively expand the generalization domain of APR. Large-scale benchmarks with various scene types are constructed to evaluate models under practical (long-term) data changes. ConDo consistently and significantly outperforms baselines across architectures, scene types, and data changes. On challenging scenes (Fig.1), it reduces the localization error by >7x (14.8m vs 1.7m). Analysis shows the robustness of ConDo against compute budgets, replay buffer sizes and teacher prediction noise. Comparing to model re-training, ConDo achieves similar performance up to 25x faster.
Efficient Language-instructed Skill Acquisition via Reward-Policy Co-Evolution
Huang, Changxin, Chang, Yanbin, Lin, Junfan, Liang, Junyang, Zeng, Runhao, Li, Jianqiang
The ability to autonomously explore and resolve tasks with minimal human guidance is crucial for the self-development of embodied intelligence. Although reinforcement learning methods can largely ease human effort, it's challenging to design reward functions for real-world tasks, especially for high-dimensional robotic control, due to complex relationships among joints and tasks. Recent advancements large language models (LLMs) enable automatic reward function design. However, approaches evaluate reward functions by re-training policies from scratch placing an undue burden on the reward function, expecting it to be effective throughout the whole policy improvement process. We argue for a more practical strategy in robotic autonomy, focusing on refining existing policies with policy-dependent reward functions rather than a universal one. To this end, we propose a novel reward-policy co-evolution framework where the reward function and the learned policy benefit from each other's progressive on-the-fly improvements, resulting in more efficient and higher-performing skill acquisition. Specifically, the reward evolution process translates the robot's previous best reward function, descriptions of tasks and environment into text inputs. These inputs are used to query LLMs to generate a dynamic amount of reward function candidates, ensuring continuous improvement at each round of evolution. For policy evolution, our method generates new policy populations by hybridizing historically optimal and random policies. Through an improved Bayesian optimization, our approach efficiently and robustly identifies the most capable and plastic reward-policy combination, which then proceeds to the next round of co-evolution. Despite using less data, our approach demonstrates an average normalized improvement of 95.3% across various high-dimensional robotic skill learning tasks.
Everyday AR through AI-in-the-Loop
Suzuki, Ryo, Gonzalez-Franco, Mar, Sra, Misha, Lindlbauer, David
This workshop brings together experts and practitioners from augmented reality (AR) and artificial intelligence (AI) to shape the future of AI-in-the-loop everyday AR experiences. With recent advancements in both AR hardware and AI capabilities, we envision that everyday AR -- always-available and seamlessly integrated into users' daily environments -- is becoming increasingly feasible. This workshop will explore how AI can drive such everyday AR experiences. We discuss a range of topics, including adaptive and context-aware AR, generative AR content creation, always-on AI assistants, AI-driven accessible design, and real-world-oriented AI agents. Our goal is to identify the opportunities and challenges in AI-enabled AR, focusing on creating novel AR experiences that seamlessly blend the digital and physical worlds. Through the workshop, we aim to foster collaboration, inspire future research, and build a community to advance the research field of AI-enhanced AR.