Hanoi
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Baker, Bowen, Huizinga, Joost, Gao, Leo, Dou, Zehao, Guan, Melody Y., Madry, Aleksander, Zaremba, Wojciech, Pachocki, Jakub, Farhi, David
Exploiting flaws or misspecifications in learning objectives, known as reward hacking in reinforcement learning (RL) settings, remains a critical failure mode in modern AI systems [1-3] and has led to agents exhibiting misaligned behaviors across many domains such as language modeling [4-6], control tasks [7-10], and recommendation systems [11-14]. The true objective function we wish to optimize is often hard to write down precisely, and so the challenge in creating capable and aligned systems largely lies in designing robust proxies that do not deviate in ways that a model may learn to exploit [3, 15]. This problem is not unique to machine learning systems but has also plagued human institutions [16-19]. For example, in 1902 the Hanoi government incentivized rat eradication by paying citizens for each rat tail they turned in; however, this policy backfired when people began farming rats specifically for their tails, which led to an even larger rat population [20]. Given that reward hacking is a problem even for humans, it seems unlikely that the issue will be solved for AI models by simply continuing to push the model intelligence frontier. In fact, enhancing an agent's capabilities may exacerbate the problem by better equipping it to discover and execute more complex and hard-to-monitor exploits [21, 22]. We have found this to be anecdotally true: as we have continued to scale RL training, agents have discovered more complex and hard-to-detect hacks. Thus far, the only general strategy for mitigating reward hacking is to manually monitor agents for unintended behavior, which is unlikely to scale as their outputs and actions grow more complex--possibly even superhuman--and become more widely used. The emerging generation of large language models (LLMs) [23] that are trained with reinforcement learning to reason in chain-of-thought (CoT) [24-26] offers a promising new avenue for monitoring.
BERT-based model for Vietnamese Fact Verification Dataset
Tran, Bao, Khanh, T. N., Tuong, Khang Nguyen, Dang, Thien, Nguyen, Quang, Thinh, Nguyen T., Hung, Vo T.
The rapid advancement of information and communication technology has facilitated easier access to information. However, this progress has also necessitated more stringent verification measures to ensure the accuracy of information, particularly within the context of Vietnam. This paper introduces an approach to address the challenges of Fact Verification using the Vietnamese dataset by integrating both sentence selection and classification modules into a unified network architecture. The proposed approach leverages the power of large language models by utilizing pre-trained PhoBERT and XLM-RoBERTa as the backbone of the network. The proposed model was trained on a Vietnamese dataset, named ISE-DSC01, and demonstrated superior performance compared to the baseline model across all three metrics. Notably, we achieved a Strict Accuracy level of 75.11\%, indicating a remarkable 28.83\% improvement over the baseline model.
Coupling Agent-Based Simulations and VR universes: the case of GAMA and Unity
Drogoul, Alexis, Taillandier, Patrick, Brugiรจre, Arthur, Martinez, Louis, Sillano, Lรฉon, Lesquoy, Baptiste, Nghi, Huynh Quang
Agent-based models (ABMs) and video games, including those taking advantage of virtual reality (VR), have undergone a remarkable parallel evolution, achieving impressive levels of complexity and sophistication. This paper argues that while ABMs prioritize scientific analysis and understanding and VR aims for immersive entertainment, they both simulate artificial worlds and can benefit from closer integration. Coupling both approaches indeed opens interesting possibilities for research and development in various fields, and in particular education, at the heart of the SIMPLE project, an EU-funded project on the development of digital tools for awareness raising on environmental issues. However, existing tools often present limitations, including technical complexity, limited functionalities, and lack of interoperability. To address these challenges, we introduce a novel framework for linking GAMA, a popular ABM platform, with Unity, a widely used game engine. This framework enables seamless data exchange, real-time visualization, and user interaction within VR environments, allowing researchers to leverage the strengths of both ABMs and VR for more impactful and engaging simulations. We demonstrate the capabilities of our framework through two prototypes built to highlight its potential in representing and interacting with complex socio-environmental system models. We conclude by emphasizing the importance of continued collaboration between the ABM and VR communities to develop robust, user-friendly tools, paving the way for a new era of collaborative research and immersive experiences in simulations.
Learning Compositional Neural Programs with Recursive Tree Search and Planning
We propose a novel reinforcement learning algorithm, AlphaNPI, that incorporates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero. NPI contributes structural biases in the form of modularity, hierarchy and recursion, which are helpful to reduce sample complexity, improve generalization and increase interpretability. AlphaZero contributes powerful neural network guided search algorithms, which we augment with recursion. AlphaNPI only assumes a hierarchical program specification with sparse rewards: 1 when the program execution satisfies the specification, and 0 otherwise. This specification enables us to overcome the need for strong supervision in the form of execution traces and consequently train NPI models effectively with reinforcement learning. The experiments show that AlphaNPI can sort as well as previous strongly supervised NPI variants. The AlphaNPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disks. The experiments also show that when deploying our neural network policies, it is advantageous to do planning with guided Monte Carlo tree search.
Using Large Language Models for education managements in Vietnamese with low resources
Minh, Duc Do, Van, Vinh Nguyen, Cong, Thang Dam
Large language models (LLMs), such as GPT-4, Gemini 1.5, Claude 3.5 Sonnet, and Llama3, have demonstrated significant advancements in various NLP tasks since the release of ChatGPT in 2022. Despite their success, fine-tuning and deploying LLMs remain computationally expensive, especially in resource-constrained environments. In this paper, we proposed VietEduFrame, a framework specifically designed to apply LLMs to educational management tasks in Vietnamese institutions. Our key contribution includes the development of a tailored dataset, derived from student education documents at Hanoi VNU, which addresses the unique challenges faced by educational systems with limited resources. Through extensive experiments, we show that our approach outperforms existing methods in terms of accuracy and efficiency, offering a promising solution for improving educational management in under-resourced environments. While our framework leverages synthetic data to supplement real-world examples, we discuss potential limitations regarding broader applicability and robustness in future implementations.
Input-Aware Dynamic Backdoor Attack Hanoi University of Science and Technology, 3
In recent years, neural backdoor attack has been considered to be a potential security threat to deep learning systems. Such systems, while achieving the state-of-the-art performance on clean data, perform abnormally on inputs with predefined triggers. Current backdoor techniques, however, rely on uniform trigger patterns, which are easily detected and mitigated by current defense methods. In this work, we propose a novel backdoor attack technique in which the triggers vary from input to input. To achieve this goal, we implement an input-aware trigger generator driven by diversity loss.
Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges
Van Dinh, Nguyen, Dang, Thanh Chi, Nguyen, Luan Thanh, Van Nguyen, Kiet
Vietnamese, a low-resource language, is typically categorized into three primary dialect groups that belong to Northern, Central, and Southern Vietnam. However, each province within these regions exhibits its own distinct pronunciation variations. Despite the existence of various speech recognition datasets, none of them has provided a fine-grained classification of the 63 dialects specific to individual provinces of Vietnam. To address this gap, we introduce Vietnamese Multi-Dialect (ViMD) dataset, a novel comprehensive dataset capturing the rich diversity of 63 provincial dialects spoken across Vietnam. Our dataset comprises 102.56 hours of audio, consisting of approximately 19,000 utterances, and the associated transcripts contain over 1.2 million words. To provide benchmarks and simultaneously demonstrate the challenges of our dataset, we fine-tune state-of-the-art pre-trained models for two downstream tasks: (1) Dialect identification and (2) Speech recognition. The empirical results suggest two implications including the influence of geographical factors on dialects, and the constraints of current approaches in speech recognition tasks involving multi-dialect speech data. Our dataset is available for research purposes.
Towards Layer-Wise Personalized Federated Learning: Adaptive Layer Disentanglement via Conflicting Gradients
Nguyen, Minh Duong, Le, Khanh, Do, Khoi, Tran, Nguyen H., Nguyen, Duc, Trinh, Chien, Yang, Zhaohui
Hanoi, Vietnam Nguyen H. Tran Khoi Do In personalized Federated Learning (pFL), high data heterogeneity can cause significant gradient divergence across devices, adversely affecting the learning process. This divergence, especially when gradients from different users form an obtuse angle during aggregation, can negate progress, leading to severe weight and gradient update degradation. To address this issue, we introduce a new approach to pFL design, namely Federated Learning with Layer-wise Aggregation via Gradient Analysis (FedLAG), utilizing the concept of gradient conflict at the layer level. Specifically, when layer-wise gradients of different clients form acute angles, those gradients align in the same direction, enabling updates across different clients toward identifying client-invariant features. Conversely, when layer-wise gradient pairs make create obtuse angles, the layers tend to focus on client-specific tasks. In hindsights, FedLAG assigns layers for personalization based on the extent of layer-wise gradient conflicts. Specifically, layers with gradient conflicts are excluded from the global aggregation process. The theoretical evaluation demonstrates that when integrated into other pFL baselines, FedLAG enhances pFL performance by a certain margin. Therefore, our proposed method achieves superior convergence behavior compared with other baselines. Extensive experiments show that our FedLAG outperforms several state-of-the-art methods and can be easily incorporated with many existing methods to further enhance performance. The challenge of non-independent and non-identically distributed (non-IID) data significantly impacts personalized Federated Learning (pFL).
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
Van Nguyen, Quan, Tran, Dan Quang, Pham, Huy Quang, Nguyen, Thang Kien-Bao, Nguyen, Nghia Hieu, Van Nguyen, Kiet, Nguyen, Ngan Luu-Thuy
Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images. Initially, this task was researched, focusing on methods to help machines understand objects and scene contexts in images. However, some text appearing in the image that carries explicit information about the full content of the image is not mentioned. Along with the continuous development of the AI era, there have been many studies on the reading comprehension ability of VQA models in the world. As a developing country, conditions are still limited, and this task is still open in Vietnam. Therefore, we introduce the first large-scale dataset in Vietnamese specializing in the ability to understand text appearing in images, we call it ViTextVQA (\textbf{Vi}etnamese \textbf{Text}-based \textbf{V}isual \textbf{Q}uestion \textbf{A}nswering dataset) which contains \textbf{over 16,000} images and \textbf{over 50,000} questions with answers. Through meticulous experiments with various state-of-the-art models, we uncover the significance of the order in which tokens in OCR text are processed and selected to formulate answers. This finding helped us significantly improve the performance of the baseline models on the ViTextVQA dataset. Our dataset is available at this \href{https://github.com/minhquan6203/ViTextVQA-Dataset}{link} for research purposes.
VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension
Ngo, Thinh Phuoc, Dang, Khoa Tran Anh, Luu, Son T., Van Nguyen, Kiet, Nguyen, Ngan Luu-Thuy
This paper presents the development process of a Vietnamese spoken language corpus for machine reading comprehension (MRC) tasks and provides insights into the challenges and opportunities associated with using real-world data for machine reading comprehension tasks. The existing MRC corpora in Vietnamese mainly focus on formal written documents such as Wikipedia articles, online newspapers, or textbooks. In contrast, the VlogQA consists of 10,076 question-answer pairs based on 1,230 transcript documents sourced from YouTube -- an extensive source of user-uploaded content, covering the topics of food and travel. By capturing the spoken language of native Vietnamese speakers in natural settings, an obscure corner overlooked in Vietnamese research, the corpus provides a valuable resource for future research in reading comprehension tasks for the Vietnamese language. Regarding performance evaluation, our deep-learning models achieved the highest F1 score of 75.34% on the test set, indicating significant progress in machine reading comprehension for Vietnamese spoken language data. In terms of EM, the highest score we accomplished is 53.97%, which reflects the challenge in processing spoken-based content and highlights the need for further improvement.