Chen, Wentao
FullStack Bench: Evaluating LLMs as Full Stack Coders
Bytedance-Seed-Foundation-Code-Team, null, :, null, Cheng, Yao, Chen, Jianfeng, Chen, Jie, Chen, Li, Chen, Liyu, Chen, Wentao, Chen, Zhengyu, Geng, Shijie, Li, Aoyan, Li, Bo, Li, Bowen, Li, Linyi, Liu, Boyi, Liu, Jerry, Liu, Kaibo, Liu, Qi, Liu, Shukai, Liu, Siyao, Liu, Tianyi, Liu, Tingkai, Liu, Yongfei, Long, Rui, Mai, Jing, Ning, Guanghan, Peng, Z. Y., Shen, Kai, Su, Jiahao, Su, Jing, Sun, Tao, Sun, Yifan, Tao, Yunzhe, Wang, Guoyin, Wang, Siwei, Wang, Xuwu, Wang, Yite, Wang, Zihan, Xia, Jinxiang, Xiang, Liang, Xiao, Xia, Xiao, Yongsheng, Xi, Chenguang, Xin, Shulin, Xu, Jingjing, Xu, Shikun, Yang, Hongxia, Yang, Jack, Yang, Yingxiang, Yuan, Jianbo, Zhang, Jun, Zhang, Yufeng, Zhang, Yuyu, Zheng, Shen, Zhu, He, Zhu, Ming
As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing. However, most existing datasets only evaluate limited application domains. To address this gap, we have developed a comprehensive code evaluation dataset FullStack Bench focusing on full-stack programming, which encompasses a wide range of application domains (e.g., basic programming, data analysis, software engineering, mathematics, and machine learning). Besides, to assess multilingual programming capabilities, in FullStack Bench, we design real-world instructions and corresponding unit test cases from 16 widely-used programming languages to reflect real-world usage scenarios rather than simple translations. Moreover, we also release an effective code sandbox execution tool (i.e., SandboxFusion) supporting various programming languages and packages to evaluate the performance of our FullStack Bench efficiently. Comprehensive experimental results on our FullStack Bench demonstrate the necessity and effectiveness of our FullStack Bench and SandboxFusion.
The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model
Chen, Jiawei, Chen, Wentao, Su, Jing, Xu, Jingjing, Lin, Hongyu, Ren, Mengjie, Lu, Yaojie, Han, Xianpei, Sun, Le
Large language models (LLMs) have shown significant multilingual capabilities. However, the mechanisms underlying the development of these capabilities during pre-training are not well understood. In this paper, we use code LLMs as an experimental platform to explore the evolution of multilingual capabilities in LLMs during the pre-training process. Based on our observations, we propose the Babel Tower Hypothesis, which describes the entire process of LLMs acquiring new language capabilities. During the learning process, multiple languages initially share a single knowledge system dominated by the primary language and gradually develop language-specific knowledge systems. Experimental results show that the internal state changes of the LLM are consistent with our Babel Tower Hypothesis. Building on these insights, we propose a novel method to construct an optimized pre-training corpus for multilingual code LLMs, which significantly outperforms LLMs trained on the original corpus. The proposed Babel Tower Hypothesis provides new insights into designing pre-training data distributions to achieve optimal multilingual capabilities in LLMs. A united human race speaking a single language migrates to Shinar where they agree to build a great city with a tower that would reach the sky. Yahweh, observing these efforts and remarking on humanity's power in unity, confounds their speech so that they can no longer understand each other and scatters them around the world, leaving the city unfinished.
Understanding Particles From Video: Property Estimation of Granular Materials via Visuo-Haptic Learning
Zhang, Zeqing, Zheng, Guangze, Ji, Xuebo, Chen, Guanqi, Jia, Ruixing, Chen, Wentao, Chen, Guanhua, Zhang, Liangjun, Pan, Jia
Granular materials (GMs) are ubiquitous in daily life. Understanding their properties is also important, especially in agriculture and industry. However, existing works require dedicated measurement equipment and also need large human efforts to handle a large number of particles. In this paper, we introduce a method for estimating the relative values of particle size and density from the video of the interaction with GMs. It is trained on a visuo-haptic learning framework inspired by a contact model, which reveals the strong correlation between GM properties and the visual-haptic data during the probe-dragging in the GMs. After training, the network can map the visual modality well to the haptic signal and implicitly characterize the relative distribution of particle properties in its latent embeddings, as interpreted in that contact model. Therefore, we can analyze GM properties using the trained encoder, and only visual information is needed without extra sensory modalities and human efforts for labeling. The presented GM property estimator has been extensively validated via comparison and ablation experiments. The generalization capability has also been evaluated and a real-world application on the beach is also demonstrated. Experiment videos are available at \url{https://sites.google.com/view/gmwork/vhlearning} .
Improve Cross-Architecture Generalization on Dataset Distillation
Zhou, Binglin, Zhong, Linhao, Chen, Wentao
Dataset distillation, a pragmatic approach in machine learning, aims to create a smaller synthetic dataset from a larger existing dataset. However, existing distillation methods primarily adopt a model-based paradigm, where the synthetic dataset inherits model-specific biases, limiting its generalizability to alternative models. In response to this constraint, we propose a novel methodology termed "model pool". This approach involves selecting models from a diverse model pool based on a specific probability distribution during the data distillation process. Additionally, we integrate our model pool with the established knowledge distillation approach and apply knowledge distillation to the test process of the distilled dataset. Our experimental results validate the effectiveness of the model pool approach across a range of existing models while testing, demonstrating superior performance compared to existing methodologies.
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Wang, Yiqi, Chen, Wentao, Han, Xiaotian, Lin, Xudong, Zhao, Haiteng, Liu, Yongfei, Zhai, Bohan, Yuan, Jianbo, You, Quanzeng, Yang, Hongxia
Strong Artificial Intelligence (Strong AI) or Artificial General Intelligence (AGI) with abstract reasoning ability is the goal of next-generation AI. Recent advancements in Large Language Models (LLMs), along with the emerging field of Multimodal Large Language Models (MLLMs), have demonstrated impressive capabilities across a wide range of multimodal tasks and applications. Particularly, various MLLMs, each with distinct model architectures, training data, and training stages, have been evaluated across a broad range of MLLM benchmarks. These studies have, to varying degrees, revealed different aspects of the current capabilities of MLLMs. However, the reasoning abilities of MLLMs have not been systematically investigated. In this survey, we comprehensively review the existing evaluation protocols of multimodal reasoning, categorize and illustrate the frontiers of MLLMs, introduce recent trends in applications of MLLMs on reasoning-intensive tasks, and finally discuss current practices and future directions. We believe our survey establishes a solid base and sheds light on this important topic, multimodal reasoning.
Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization
Qiu, Jiahao, Yuan, Hui, Zhang, Jinghong, Chen, Wentao, Wang, Huazheng, Wang, Mengdi
Even with the best and largest pre-trained protein language models such Advances in biotechnology have demonstrated human's unprecedented as ESM-1b [33] and ProGen2 [29], one often needs to explore capabilities to engineer proteins. They make it an almost unknown domain and learn a new function possible to directly design the amino acid sequences that map in order to discover new drugs. This is especially true encode proteins for desired functions, towards improving with antibody engineering. Antibodies have highly diverse biochemical or enzymatic properties such as stability, binding complementarity-determining region (CDR) sequences that affinity, or catalytic activity. Directed evolution (DE), can be altered, resulting in a huge sequence space to explore for example, is a method for exploring new protein designs for optimal properties. The binding of antibodies to their targets with properties of interest and maximal utility, by mimicking are extrinsic properties of antibodies and it is difficult to the natural evolution process. The development of DE accurately model the sequence-binding relationships solely was honored in 2018 with the awarding of the Nobel Prize from the sequences alone. Further, most of the exploration in Chemistry to Frances Arnold for the directed evolution strategies used in practice lack theoretical guarantees. of enzymes, and George Smith and Gregory Winter for the development of phage display [3, 41, 48].
Polymer-Based Self-Calibrated Optical Fiber Tactile Sensor
Chen, Wentao, Yan, Youcan, Zhang, Zeqing, Yang, Lei, Pan, Jia
Human skin can accurately sense the self-decoupled normal and shear forces when in contact with objects of different sizes. Although there exist many soft and conformable tactile sensors on robotic applications able to decouple the normal force and shear forces, the impact of the size of object in contact on the force calibration model has been commonly ignored. Here, using the principle that contact force can be derived from the light power loss in the soft optical fiber core, we present a soft tactile sensor that decouples normal and shear forces and calibrates the measurement results based on the object size, by designing a two-layered weaved polymer-based optical fiber anisotropic structure embedded in a soft elastomer. Based on the anisotropic response of optical fibers, we developed a linear calibration algorithm to simultaneously measure the size of the contact object and the decoupled normal and shear forces calibrated the object size. By calibrating the sensor at the robotic arm tip, we show that robots can reconstruct the force vector at an average accuracy of 0.15N for normal forces, 0.17N for shear forces in X-axis , and 0.18N for shear forces in Y-axis, within the sensing range of 0-2N in all directions, and the average accuracy of object size measurement of 0.4mm, within the test indenter diameter range of 5-12mm.
Coarse-to-Fine Covid-19 Segmentation via Vision-Language Alignment
Shan, Dandan, Li, Zihan, Chen, Wentao, Li, Qingde, Tian, Jie, Hong, Qingqi
Segmentation of COVID-19 lesions can assist physicians in better diagnosis and treatment of COVID-19. However, there are few relevant studies due to the lack of detailed information and high-quality annotation in the COVID-19 dataset. To solve the above problem, we propose C2FVL, a Coarse-to-Fine segmentation framework via Vision-Language alignment to merge text information containing the number of lesions and specific locations of image information. The introduction of text information allows the network to achieve better prediction results on challenging datasets. We conduct extensive experiments on two COVID-19 datasets including chest X-ray and CT, and the results demonstrate that our proposed method outperforms other state-of-the-art segmentation methods.