Jiao, Licheng
A match made in consistency heaven: when large language models meet evolutionary algorithms
Chao, Wang, Zhao, Jiaxuan, Jiao, Licheng, Li, Lingling, Liu, Fang, Yang, Shuyuan
Pre-trained large language models (LLMs) have powerful capabilities for generating creative natural text. Evolutionary algorithms (EAs) can discover diverse solutions to complex real-world problems. Motivated by the common collective and directionality of text sequence generation and evolution, this paper illustrates the strong consistency of LLMs and EAs, which includes multiple one-to-one key characteristics: token embedding and genotype-phenotype mapping, position encoding and fitness shaping, position embedding and selection, attention and crossover, feed-forward neural network and mutation, model training and parameter update, and multi-task learning and multi-objective optimization. Based on this consistency perspective, existing coupling studies are analyzed, including evolutionary fine-tuning and LLM-enhanced EAs. Leveraging these insights, we outline a fundamental roadmap for future research in coupling LLMs and EAs, while highlighting key challenges along the way. The consistency not only reveals the evolution mechanism behind LLMs but also facilitates the development of evolved artificial agents that approach or surpass biological organisms.
A Multi-objective Complex Network Pruning Framework Based on Divide-and-conquer and Global Performance Impairment Ranking
Shang, Ronghua, Zhu, Songling, Wu, Yinan, Zhang, Weitong, Jiao, Licheng, Xu, Songhua
Model compression plays a vital role in the practical deployment of deep neural networks (DNNs), and evolutionary multi-objective (EMO) pruning is an essential tool in balancing the compression rate and performance of the DNNs. However, due to its population-based nature, EMO pruning suffers from the complex optimization space and the resource-intensive structure verification process, especially in complex networks. To this end, a multi-objective complex network pruning framework based on divide-and-conquer and global performance impairment ranking (EMO-DIR) is proposed in this paper. Firstly, a divide-and-conquer EMO network pruning method is proposed, which decomposes the complex task of EMO pruning on the entire network into easier sub-tasks on multiple sub-networks. On the one hand, this decomposition narrows the pruning optimization space and decreases the optimization difficulty; on the other hand, the smaller network structure converges faster, so the proposed algorithm consumes lower computational resources. Secondly, a sub-network training method based on cross-network constraints is designed, which could bridge independent EMO pruning sub-tasks, allowing them to collaborate better and improving the overall performance of the pruned network. Finally, a multiple sub-networks joint pruning method based on EMO is proposed. This method combines the Pareto Fronts from EMO pruning results on multiple sub-networks through global performance impairment ranking to design a joint pruning scheme. The rich experiments on CIFAR-10/100 and ImageNet-100/1k are conducted. The proposed algorithm achieves a comparable performance with the state-of-the-art pruning methods.
Transferring Modality-Aware Pedestrian Attentive Learning for Visible-Infrared Person Re-identification
Guo, Yuwei, Zhang, Wenhao, Jiao, Licheng, Wang, Shuang, Wang, Shuo, Liu, Fang
Visible-infrared person re-identification (VI-ReID) aims to search the same pedestrian of interest across visible and infrared modalities. Existing models mainly focus on compensating for modality-specific information to reduce modality variation. However, these methods often lead to a higher computational overhead and may introduce interfering information when generating the corresponding images or features. To address this issue, it is critical to leverage pedestrian-attentive features and learn modality-complete and -consistent representation. In this paper, a novel Transferring Modality-Aware Pedestrian Attentive Learning (TMPA) model is proposed, focusing on the pedestrian regions to efficiently compensate for missing modality-specific features. Specifically, we propose a region-based data augmentation module PedMix to enhance pedestrian region coherence by mixing the corresponding regions from different modalities. A lightweight hybrid compensation module, i.e., the Modality Feature Transfer (MFT), is devised to integrate cross attention and convolution networks to fully explore the discriminative modality-complete features with minimal computational overhead. Extensive experiments conducted on the benchmark SYSU-MM01 and RegDB datasets demonstrated the effectiveness of our proposed TMPA model.
Orthogonal Uncertainty Representation of Data Manifold for Robust Long-Tailed Learning
Ma, Yanbiao, Jiao, Licheng, Liu, Fang, Yang, Shuyuan, Liu, Xu, Li, Lingling
In scenarios with long-tailed distributions, the model's ability to identify tail classes is limited due to the under-representation of tail samples. Class rebalancing, information augmentation, and other techniques have been proposed to facilitate models to learn the potential distribution of tail classes. The disadvantage is that these methods generally pursue models with balanced class accuracy on the data manifold, while ignoring the ability of the model to resist interference. By constructing noisy data manifold, we found that the robustness of models trained on unbalanced data has a long-tail phenomenon. That is, even if the class accuracy is balanced on the data domain, it still has bias on the noisy data manifold. However, existing methods cannot effectively mitigate the above phenomenon, which makes the model vulnerable in long-tailed scenarios. In this work, we propose an Orthogonal Uncertainty Representation (OUR) of feature embedding and an end-to-end training strategy to improve the long-tail phenomenon of model robustness. As a general enhancement tool, OUR has excellent compatibility with other methods and does not require additional data generation, ensuring fast and efficient training. Comprehensive evaluations on long-tailed datasets show that our method significantly improves the long-tail phenomenon of robustness, bringing consistent performance gains to other long-tailed learning methods.
The Robust Semantic Segmentation UNCV2023 Challenge Results
Yu, Xuanlong, Zuo, Yi, Wang, Zitao, Zhang, Xiaowen, Zhao, Jiaxuan, Yang, Yuting, Jiao, Licheng, Peng, Rui, Wang, Xinyi, Zhang, Junpei, Zhang, Kexin, Liu, Fang, Alcover-Couso, Roberto, SanMiguel, Juan C., Escudero-Viñolo, Marcos, Tian, Hanlin, Matsui, Kenta, Wang, Tianhao, Adan, Fahmy, Gao, Zhitong, He, Xuming, Bouniot, Quentin, Moghaddam, Hossein, Rai, Shyam Nandan, Cermelli, Fabio, Masone, Carlo, Pilzer, Andrea, Ricci, Elisa, Bursuc, Andrei, Solin, Arno, Trapp, Martin, Li, Rui, Yao, Angela, Chen, Wenlong, Simpson, Ivor, Campbell, Neill D. F., Franchi, Gianni
This paper outlines the winning solutions employed in addressing the MUAD uncertainty quantification challenge held at ICCV 2023. The challenge was centered around semantic segmentation in urban environments, with a particular focus on natural adversarial scenarios. The report presents the results of 19 submitted entries, with numerous techniques drawing inspiration from cutting-edge uncertainty quantification methodologies presented at prominent conferences in the fields of computer vision and machine learning and journals over the past few years. Within this document, the challenge is introduced, shedding light on its purpose and objectives, which primarily revolved around enhancing the robustness of semantic segmentation in urban scenes under varying natural adversarial conditions. The report then delves into the top-performing solutions. Moreover, the document aims to provide a comprehensive overview of the diverse solutions deployed by all participants. By doing so, it seeks to offer readers a deeper insight into the array of strategies that can be leveraged to effectively handle the inherent uncertainties associated with autonomous driving and semantic segmentation, especially within urban environments.
SoccerNet 2023 Challenges Results
Cioppa, Anthony, Giancola, Silvio, Somers, Vladimir, Magera, Floriane, Zhou, Xin, Mkhallati, Hassan, Deliège, Adrien, Held, Jan, Hinojosa, Carlos, Mansourian, Amir M., Miralles, Pierre, Barnich, Olivier, De Vleeschouwer, Christophe, Alahi, Alexandre, Ghanem, Bernard, Van Droogenbroeck, Marc, Kamal, Abdullah, Maglo, Adrien, Clapés, Albert, Abdelaziz, Amr, Xarles, Artur, Orcesi, Astrid, Scott, Atom, Liu, Bin, Lim, Byoungkwon, Chen, Chen, Deuser, Fabian, Yan, Feng, Yu, Fufu, Shitrit, Gal, Wang, Guanshuo, Choi, Gyusik, Kim, Hankyul, Guo, Hao, Fahrudin, Hasby, Koguchi, Hidenari, Ardö, Håkan, Salah, Ibrahim, Yerushalmy, Ido, Muhammad, Iftikar, Uchida, Ikuma, Be'ery, Ishay, Rabarisoa, Jaonary, Lee, Jeongae, Fu, Jiajun, Yin, Jianqin, Xu, Jinghang, Nang, Jongho, Denize, Julien, Li, Junjie, Zhang, Junpei, Kim, Juntae, Synowiec, Kamil, Kobayashi, Kenji, Zhang, Kexin, Habel, Konrad, Nakajima, Kota, Jiao, Licheng, Ma, Lin, Wang, Lizhi, Wang, Luping, Li, Menglong, Zhou, Mengying, Nasr, Mohamed, Abdelwahed, Mohamed, Liashuha, Mykola, Falaleev, Nikolay, Oswald, Norbert, Jia, Qiong, Pham, Quoc-Cuong, Song, Ran, Hérault, Romain, Peng, Rui, Chen, Ruilong, Liu, Ruixuan, Baikulov, Ruslan, Fukushima, Ryuto, Escalera, Sergio, Lee, Seungcheon, Chen, Shimin, Ding, Shouhong, Someya, Taiga, Moeslund, Thomas B., Li, Tianjiao, Shen, Wei, Zhang, Wei, Li, Wei, Dai, Wei, Luo, Weixin, Zhao, Wending, Zhang, Wenjie, Yang, Xinquan, Ma, Yanbiao, Joo, Yeeun, Zeng, Yingsen, Gan, Yiyang, Zhu, Yongqiang, Zhong, Yujie, Ruan, Zheng, Li, Zhiheng, Huang, Zhijian, Meng, Ziyu
The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on https://www.soccer-net.org. Baselines and development kits can be found on https://github.com/SoccerNet.
Delving into Semantic Scale Imbalance
Ma, Yanbiao, Jiao, Licheng, Liu, Fang, Li, Yuxin, Yang, Shuyuan, Liu, Xu
Model bias triggered by long-tailed data has been widely studied. However, measure based on the number of samples cannot explicate three phenomena simultaneously: (1) Given enough data, the classification performance gain is marginal with additional samples. (2) Classification performance decays precipitously as the number of training samples decreases when there is insufficient data. (3) Model trained on sample-balanced datasets still has different biases for different classes. In this work, we define and quantify the semantic scale of classes, which is used to measure the feature diversity of classes. It is exciting to find experimentally that there is a marginal effect of semantic scale, which perfectly describes the first two phenomena. Further, the quantitative measurement of semantic scale imbalance is proposed, which can accurately reflect model bias on multiple datasets, even on sample-balanced data, revealing a novel perspective for the study of class imbalance. Due to the prevalence of semantic scale imbalance, we propose semantic-scale-balanced learning, including a general loss improvement scheme and a dynamic re-weighting training framework that overcomes the challenge of calculating semantic scales in real-time during iterations. Comprehensive experiments show that dynamic semantic-scale-balanced learning consistently enables the model to perform superiorly on large-scale long-tailed and non-long-tailed natural and medical datasets, which is a good starting point for mitigating the prevalent but unnoticed model bias.
Curvature-Balanced Feature Manifold Learning for Long-Tailed Classification
Ma, Yanbiao, Jiao, Licheng, Liu, Fang, Yang, Shuyuan, Liu, Xu, Li, Lingling
To address the challenges of long-tailed classification, researchers have proposed several approaches to reduce model bias, most of which assume that classes with few samples are weak classes. However, recent studies have shown that tail classes are not always hard to learn, and model bias has been observed on sample-balanced datasets, suggesting the existence of other factors that affect model bias. In this work, we systematically propose a series of geometric measurements for perceptual manifolds in deep neural networks, and then explore the effect of the geometric characteristics of perceptual manifolds on classification difficulty and how learning shapes the geometric characteristics of perceptual manifolds. An unanticipated finding is that the correlation between the class accuracy and the separation degree of perceptual manifolds gradually decreases during training, while the negative correlation with the curvature gradually increases, implying that curvature imbalance leads to model bias. Therefore, we propose curvature regularization to facilitate the model to learn curvature-balanced and flatter perceptual manifolds. Evaluations on multiple long-tailed and non-long-tailed datasets show the excellent performance and exciting generality of our approach, especially in achieving significant performance improvements based on current state-of-the-art techniques. Our work opens up a geometric analysis perspective on model bias and reminds researchers to pay attention to model bias on non-long-tailed and even sample-balanced datasets. The code and model will be made public.
Bi-level Multi-objective Evolutionary Learning: A Case Study on Multi-task Graph Neural Topology Search
Wang, Chao, Jiao, Licheng, Zhao, Jiaxuan, Li, Lingling, Liu, Xu, Liu, Fang, Yang, Shuyuan
The construction of machine learning models involves many bi-level multi-objective optimization problems (BL-MOPs), where upper level (UL) candidate solutions must be evaluated via training weights of a model in the lower level (LL). Due to the Pareto optimality of sub-problems and the complex dependency across UL solutions and LL weights, an UL solution is feasible if and only if the LL weight is Pareto optimal. It is computationally expensive to determine which LL Pareto weight in the LL Pareto weight set is the most appropriate for each UL solution. This paper proposes a bi-level multi-objective learning framework (BLMOL), coupling the above decision-making process with the optimization process of the UL-MOP by introducing LL preference $r$. Specifically, the UL variable and $r$ are simultaneously searched to minimize multiple UL objectives by evolutionary multi-objective algorithms. The LL weight with respect to $r$ is trained to minimize multiple LL objectives via gradient-based preference multi-objective algorithms. In addition, the preference surrogate model is constructed to replace the expensive evaluation process of the UL-MOP. We consider a novel case study on multi-task graph neural topology search. It aims to find a set of Pareto topologies and their Pareto weights, representing different trade-offs across tasks at UL and LL, respectively. The found graph neural network is employed to solve multiple tasks simultaneously, including graph classification, node classification, and link prediction. Experimental results demonstrate that BLMOL can outperform some state-of-the-art algorithms and generate well-representative UL solutions and LL weights.
RSBNet: One-Shot Neural Architecture Search for A Backbone Network in Remote Sensing Image Recognition
Peng, Cheng, Li, Yangyang, Shang, Ronghua, Jiao, Licheng
Recently, a massive number of deep learning based approaches have been successfully applied to various remote sensing image (RSI) recognition tasks. However, most existing advances of deep learning methods in the RSI field heavily rely on the features extracted by the manually designed backbone network, which severely hinders the potential of deep learning models due the complexity of RSI and the limitation of prior knowledge. In this paper, we research a new design paradigm for the backbone architecture in RSI recognition tasks, including scene classification, land-cover classification and object detection. A novel one-shot architecture search framework based on weight-sharing strategy and evolutionary algorithm is proposed, called RSBNet, which consists of three stages: Firstly, a supernet constructed in a layer-wise search space is pretrained on a self-assembled large-scale RSI dataset based on an ensemble single-path training strategy. Next, the pre-trained supernet is equipped with different recognition heads through the switchable recognition module and respectively fine-tuned on the target dataset to obtain task-specific supernet. Finally, we search the optimal backbone architecture for different recognition tasks based on the evolutionary algorithm without any network training. Extensive experiments have been conducted on five benchmark datasets for different recognition tasks, the results show the effectiveness of the proposed search paradigm and demonstrate that the searched backbone is able to flexibly adapt different RSI recognition tasks and achieve impressive performance.