Cao, Zhe
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
Cao, Hongye, Wang, Yanming, Jing, Sijia, Peng, Ziyue, Bai, Zhixin, Cao, Zhe, Fang, Meng, Feng, Fan, Wang, Boyan, Liu, Jiaheng, Yang, Tianpei, Huo, Jing, Gao, Yang, Meng, Fanyu, Yang, Xi, Deng, Chao, Feng, Junlan
With the rapid advancement of Large Language Models (LLMs), the safety of LLMs has been a critical concern requiring precise assessment. Current benchmarks primarily concentrate on single-turn dialogues or a single jailbreak attack method to assess the safety. Additionally, these benchmarks have not taken into account the LLM's capability of identifying and handling unsafe information in detail. To address these issues, we propose a fine-grained benchmark SafeDialBench for evaluating the safety of LLMs across various jailbreak attacks in multi-turn dialogues. Specifically, we design a two-tier hierarchical safety taxonomy that considers 6 safety dimensions and generates more than 4000 multi-turn dialogues in both Chinese and English under 22 dialogue scenarios. We employ 7 jailbreak attack strategies, such as reference attack and purpose reverse, to enhance the dataset quality for dialogue generation. Notably, we construct an innovative assessment framework of LLMs, measuring capabilities in detecting, and handling unsafe information and maintaining consistency when facing jailbreak attacks. Experimental results across 17 LLMs reveal that Yi-34B-Chat and GLM4-9B-Chat demonstrate superior safety performance, while Llama3.1-8B-Instruct and o3-mini exhibit safety vulnerabilities.
Learning Camouflaged Object Detection from Noisy Pseudo Label
Zhang, Jin, Zhang, Ruiheng, Shi, Yanjiao, Cao, Zhe, Liu, Nian, Khan, Fahad Shahbaz
Existing Camouflaged Object Detection (COD) methods rely heavily on large-scale pixel-annotated training sets, which are both time-consuming and labor-intensive. Although weakly supervised methods offer higher annotation efficiency, their performance is far behind due to the unclear visual demarcations between foreground and background in camouflaged images. In this paper, we explore the potential of using boxes as prompts in camouflaged scenes and introduce the first weakly semi-supervised COD method, aiming for budget-efficient and high-precision camouflaged object segmentation with an extremely limited number of fully labeled images. Critically, learning from such limited set inevitably generates pseudo labels with serious noisy pixels. To address this, we propose a noise correction loss that facilitates the model's learning of correct pixels in the early learning stage, and corrects the error risk gradients dominated by noisy pixels in the memorization stage, ultimately achieving accurate segmentation of camouflaged objects from noisy labels. When using only 20% of fully labeled data, our method shows superior performance over the state-of-the-art methods.
The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Kong, Lingdong, Xie, Shaoyuan, Hu, Hanjiang, Niu, Yaru, Ooi, Wei Tsang, Cottereau, Benoit R., Ng, Lai Xing, Ma, Yuexin, Zhang, Wenwei, Pan, Liang, Chen, Kai, Liu, Ziwei, Qiu, Weichao, Zhang, Wei, Cao, Xu, Lu, Hao, Chen, Ying-Cong, Kang, Caixin, Zhou, Xinning, Ying, Chengyang, Shang, Wentao, Wei, Xingxing, Dong, Yinpeng, Yang, Bo, Jiang, Shengyin, Ma, Zeliang, Ji, Dengyi, Li, Haiwen, Huang, Xingliang, Tian, Yu, Kou, Genghua, Jia, Fan, Liu, Yingfei, Wang, Tiancai, Li, Ying, Hao, Xiaoshuai, Yang, Yifan, Zhang, Hui, Wei, Mengchuan, Zhou, Yi, Zhao, Haimei, Zhang, Jing, Li, Jinke, He, Xiao, Cheng, Xiaoqiang, Zhang, Bingyang, Zhao, Lirong, Ding, Dianlei, Liu, Fangsheng, Yan, Yixiang, Wang, Hongming, Ye, Nanfei, Luo, Lun, Tian, Yubo, Zuo, Yiwei, Cao, Zhe, Ren, Yi, Li, Yunfan, Liu, Wenjie, Wu, Xun, Mao, Yifan, Li, Ming, Liu, Jian, Liu, Jiayang, Qin, Zihan, Chu, Cunxi, Xu, Jialei, Zhao, Wenbo, Jiang, Junjun, Liu, Xianming, Wang, Ziyan, Li, Chiwei, Li, Shilong, Yuan, Chendong, Yang, Songyue, Liu, Wentao, Chen, Peng, Zhou, Bin, Wang, Yubo, Zhang, Chi, Sun, Jianhang, Chen, Hai, Yang, Xiao, Wang, Lizhong, Fu, Dongyi, Lin, Yongchun, Yang, Huitong, Li, Haoang, Luo, Yadan, Cheng, Xianjing, Xu, Yong
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Wen, Ying, Wan, Ziyu, Zhou, Ming, Hou, Shufang, Cao, Zhe, Le, Chenyang, Chen, Jingxiao, Tian, Zheng, Zhang, Weinan, Wang, Jun
The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems. Consequently, IDM should possess the ability to continuously acquire new skills and effectively generalize across a broad range of applications. The advancement of Artificial General Intelligence (AGI) that transcends task and application boundaries is critical for enhancing IDM. Recent studies have extensively investigated the Transformer neural architecture as a foundational model for various tasks, including computer vision, natural language processing, and reinforcement learning. We propose that a Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks using the Transformer architecture, offering a promising solution for expanding IDM applications in complex real-world situations. In this paper, we discuss the efficiency and generalization improvements offered by a foundation decision model for IDM and explore its potential applications in multi-agent game AI, production scheduling, and robotics tasks. Lastly, we present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks, such as text generation, image captioning, video game playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 represents an initial step toward more autonomous and efficient real-world IDM applications.