Goto

Collaborating Authors

Wang, Zhe


ST3D++: Denoised Self-training for Unsupervised Domain Adaptation on 3D Object Detection

arXiv.org Artificial Intelligence

In this paper, we present a self-training method, named ST3D++, with a holistic pseudo label denoising pipeline for unsupervised domain adaptation on 3D object detection. ST3D++ aims at reducing noise in pseudo label generation as well as alleviating the negative impacts of noisy pseudo labels on model training. First, ST3D++ pre-trains the 3D object detector on the labeled source domain with random object scaling (ROS) which is designed to reduce target domain pseudo label noise arising from object scale bias of the source domain. Then, the detector is progressively improved through alternating between generating pseudo labels and training the object detector with pseudo-labeled target domain data. Here, we equip the pseudo label generation process with a hybrid quality-aware triplet memory to improve the quality and stability of generated pseudo labels. Meanwhile, in the model training stage, we propose a source data assisted training strategy and a curriculum data augmentation policy to effectively rectify noisy gradient directions and avoid model over-fitting to noisy pseudo labeled data. These specific designs enable the detector to be trained on meticulously refined pseudo labeled target data with denoised training signals, and thus effectively facilitate adapting an object detector to a target domain without requiring annotations. Finally, our method is assessed on four 3D benchmark datasets (i.e., Waymo, KITTI, Lyft, and nuScenes) for three common categories (i.e., car, pedestrian and bicycle). ST3D++ achieves state-of-the-art performance on all evaluated settings, outperforming the corresponding baseline by a large margin (e.g., 9.6% $\sim$ 38.16% on Waymo $\rightarrow$ KITTI in terms of AP$_{\text{3D}}$), and even surpasses the fully supervised oracle results on the KITTI 3D object detection benchmark with target prior. Code will be available.


Time-series Imputation of Temporally-occluded Multiagent Trajectories

arXiv.org Artificial Intelligence

In multiagent environments, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents' decision-making processes, make such systems complex and interesting to study from a dynamical perspective. Significant research has been conducted on learning models for forward-direction estimation of agent behaviors, for example, pedestrian predictions used for collision-avoidance in self-driving cars. However, in many settings, only sporadic observations of agents may be available in a given trajectory sequence. For instance, in football, subsets of players may come in and out of view of broadcast video footage, while unobserved players continue to interact off-screen. In this paper, we study the problem of multiagent time-series imputation, where available past and future observations of subsets of agents are used to estimate missing observations for other agents. Our approach, called the Graph Imputer, uses forward- and backward-information in combination with graph networks and variational autoencoders to enable learning of a distribution of imputed trajectories. We evaluate our approach on a dataset of football matches, using a projective camera module to train and evaluate our model for the off-screen player state estimation setting. We illustrate that our method outperforms several state-of-the-art approaches, including those hand-crafted for football.


From Motor Control to Team Play in Simulated Humanoid Football

arXiv.org Artificial Intelligence

Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment. We develop a method that combines imitation learning, single- and multi-agent reinforcement learning and population-based training, and makes use of transferable representations of behaviour for decision making at different levels of abstraction. In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds. We investigate the emergence of behaviours at different levels of abstraction, as well as the representations that underlie these behaviours using several analysis techniques, including statistics from real-world sports analytics. Our work constitutes a complete demonstration of integrated decision-making at multiple scales in a physically embodied multi-agent setting. See project video at https://youtu.be/KHMwq9pv7mg.


Game Plan: What AI can do for Football, and What Football can do for AI

Journal of Artificial Intelligence Research

The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players’ and coordinated teams’ behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual).


Relate and Predict: Structure-Aware Prediction with Jointly Optimized Neural DAG

arXiv.org Artificial Intelligence

Understanding relationships between feature variables is one important way humans use to make decisions. However, state-of-the-art deep learning studies either focus on task-agnostic statistical dependency learning or do not model explicit feature dependencies during prediction. We propose a deep neural network framework, dGAP, to learn neural dependency Graph and optimize structure-Aware target Prediction simultaneously. dGAP trains towards a structure self-supervision loss and a target prediction loss jointly. Our method leads to an interpretable model that can disentangle sparse feature relationships, informing the user how relevant dependencies impact the target task. We empirically evaluate dGAP on multiple simulated and real datasets. dGAP is not only more accurate, but can also recover correct dependency structure.


FFConv: Fast Factorized Neural Network Inference on Encrypted Data

arXiv.org Artificial Intelligence

Homomorphic Encryption (HE), allowing computations on encrypted data (ciphertext) without decrypting it first, enables secure but prohibitively slow Neural Network (HENN) inference for privacy-preserving applications in clouds. To reduce HENN inference latency, one approach is to pack multiple messages into a single ciphertext in order to reduce the number of ciphertexts and support massive parallelism of Homomorphic Multiply-Add (HMA) operations between ciphertexts. However, different ciphertext packing schemes have to be designed for different convolution layers and each of them introduces overheads that are far more expensive than HMA operations. In this paper, we propose a low-rank factorization method called FFConv to unify convolution and ciphertext packing. To our knowledge, FFConv is the first work that is capable of accelerating the overheads induced by different ciphertext packing schemes simultaneously, without incurring a significant increase in noise budget. Compared to prior art LoLa and Falcon, our method reduces the inference latency by up to 87% and 12%, respectively, with comparable accuracy on MNIST and CIFAR-10.


Multi-Modality Cut and Paste for 3D Object Detection

arXiv.org Artificial Intelligence

Three-dimensional (3D) object detection is essential in autonomous driving. There are observations that multi-modality methods based on both point cloud and imagery features perform only marginally better or sometimes worse than approaches that solely use single-modality point cloud. This paper investigates the reason behind this counter-intuitive phenomenon through a careful comparison between augmentation techniques used by single modality and multi-modality methods. We found that existing augmentations practiced in single-modality detection are equally useful for multi-modality detection. Then we further present a new multi-modality augmentation approach, Multi-mOdality Cut and pAste (MoCa). MoCa boosts detection performance by cutting point cloud and imagery patches of ground-truth objects and pasting them into different scenes in a consistent manner while avoiding collision between objects. We also explore beneficial architecture design and optimization practices in implementing a good multi-modality detector. Without using ensemble of detectors, our multi-modality detector achieves new state-of-the-art performance on nuScenes dataset and competitive performance on KITTI 3D benchmark. Our method also wins the best PKL award in the 3rd nuScenes detection challenge. Code and models will be released at https://github.com/open-mmlab/mmdetection3d.


Cross-Layer Distillation with Semantic Calibration

arXiv.org Artificial Intelligence

Recently proposed knowledge distillation approaches based on feature-map transfer validate that intermediate layers of a teacher model can serve as effective targets for training a student model to obtain better generalization ability. Existing studies mainly focus on particular representation forms for knowledge transfer between manually specified pairs of teacher-student intermediate layers. However, semantics of intermediate layers may vary in different networks and manual association of layers might lead to negative regularization caused by semantic mismatch between certain teacher-student layer pairs. To address this problem, we propose Semantic Calibration for Cross-layer Knowledge Distillation (SemCKD), which automatically assigns proper target layers of the teacher model for each student layer with an attention mechanism. With a learned attention distribution, each student layer distills knowledge contained in multiple layers rather than a single fixed intermediate layer from the teacher model for appropriate cross-layer supervision in training. Consistent improvements over state-of-the-art approaches are observed in extensive experiments with various network architectures for teacher and student models, demonstrating the effectiveness and flexibility of the proposed attention based soft layer association mechanism for cross-layer distillation.


A Multi-intersection Vehicular Cooperative Control based on End-Edge-Cloud Computing

arXiv.org Artificial Intelligence

Cooperative Intelligent Transportation Systems (C-ITS) will change the modes of road safety and traffic management, especially at intersections without traffic lights, namely unsignalized intersections. Existing researches focus on vehicle control within a small area around an unsignalized intersection. In this paper, we expand the control domain to a large area with multiple intersections. In particular, we propose a Multi-intersection Vehicular Cooperative Control (MiVeCC) to enable cooperation among vehicles in a large area with multiple unsignalized intersections. Firstly, a vehicular end-edge-cloud computing framework is proposed to facilitate end-edge-cloud vertical cooperation and horizontal cooperation among vehicles. Then, the vehicular cooperative control problems in the cloud and edge layers are formulated as Markov Decision Process (MDP) and solved by two-stage reinforcement learning. Furthermore, to deal with high-density traffic, vehicle selection methods are proposed to reduce the state space and accelerate algorithm convergence without performance degradation. A multi-intersection simulation platform is developed to evaluate the proposed scheme. Simulation results show that the proposed MiVeCC can improve travel efficiency at multiple intersections by up to 4.59 times without collision compared with existing methods.


Game Plan: What AI can do for Football, and What Football can do for AI

arXiv.org Artificial Intelligence

The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players' and coordinated teams' behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual).