AITopics

Reliable detection and tracking of surrounding objects are indispensable for comprehensive motion prediction and planning of autonomous vehicles. Due to the limitations of individual sensors, the fusion of multiple sensor modalities is required to improve the overall detection capabilities. Additionally, robust motion tracking is essential for reducing the effect of sensor noise and improving state estimation accuracy. The reliability of the autonomous vehicle software becomes even more relevant in complex, adversarial high-speed scenarios at the vehicle handling limits in autonomous racing. In this paper, we present a modular multi-modal sensor fusion and tracking method for high-speed applications. The method is based on the Extended Kalman Filter (EKF) and is capable of fusing heterogeneous detection inputs to track surrounding objects consistently. A novel delay compensation approach enables to reduce the influence of the perception software latency and to output an updated object list. It is the first fusion and tracking method validated in high-speed real-world scenarios at the Indy Autonomous Challenge 2021 and the Autonomous Challenge at CES (AC@CES) 2022, proving its robustness and computational efficiency on embedded systems. It does not require any labeled data and achieves position tracking residuals below 0.1 m. The related code is available as open-source software at https://github.com/TUMFTM/FusionTracking.

detection, estimation, fusion, (15 more...)

doi: 10.1109/TIV.2023.3271624

2310.08114

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Asia > Singapore (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Automobiles & Trucks (0.89)
Leisure & Entertainment > Sports > Motorsports (0.67)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Continual Learning via Manifold Expansion Replay

Xu, Zihao, Tang, Xuan, Shi, Yufei, Zhang, Jianfeng, Yang, Jian, Chen, Mingsong, Wei, Xian

In continual learning, the learner learns multiple tasks in sequence, with data being acquired only once for each task. Catastrophic forgetting is a major challenge to continual learning. To reduce forgetting, some existing rehearsal-based methods use episodic memory to replay samples of previous tasks. However, in the process of knowledge integration when learning a new task, this strategy also suffers from catastrophic forgetting due to an imbalance between old and new knowledge. To address this problem, we propose a novel replay strategy called Manifold Expansion Replay (MaER). We argue that expanding the implicit manifold of the knowledge representation in the episodic memory helps to improve the robustness and expressiveness of the model. To this end, we propose a greedy strategy to keep increasing the diameter of the implicit manifold represented by the knowledge in the buffer during memory management. In addition, we introduce Wasserstein distance instead of cross entropy as distillation loss to preserve previous knowledge. With extensive experimental validation on MNIST, CIFAR10, CIFAR100, and TinyImageNet, we show that the proposed method significantly improves the accuracy in continual learning setup, outperforming the state of the arts.

continual learning, learning, manifold, (12 more...)

2310.08038

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
Asia > China > Henan Province > Zhengzhou (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.70)
Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.34)

Yoon, Minji, Koh, Jing Yu, Hooi, Bryan, Salakhutdinov, Ruslan

Multimodal Graph Learning for Generative Tasks

Multimodal learning combines multiple data modalities, broadening the types and complexity of data our models can utilize: for example, from plain text to image-caption pairs. Most multimodal learning algorithms focus on modeling simple one-to-one pairs of data from two modalities, such as image-caption pairs, or audio-text pairs. However, in most real-world settings, entities of different modalities interact with each other in more complex and multifaceted ways, going beyond one-to-one mappings. We propose to represent these complex relationships as graphs, allowing us to capture data with any number of modalities, and with complex relationships between modalities that can flexibly vary from one sample to another. Toward this goal, we propose Multimodal Graph Learning (MMGL), a general and systematic framework for capturing information from multiple multimodal neighbors with relational structures among them. In particular, we focus on MMGL for generative tasks, building upon pretrained Language Models (LMs), aiming to augment their text generation with multimodal neighbor contexts. We study three research questions raised by MMGL: (1) how can we infuse multiple neighbor information into the pretrained LMs, while avoiding scalability issues? (2) how can we infuse the graph structure information among multimodal neighbors into the LMs? and (3) how can we finetune the pretrained LMs to learn from the neighbor context in a parameter-efficient manner? We conduct extensive experiments to answer these three questions on MMGL and analyze the empirical results to pave the way for future MMGL research.

information, neighbor, neighbor information, (12 more...)

2310.07478

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Greece (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.34)

Jin, Xisen, Ren, Xiang, Preotiuc-Pietro, Daniel, Cheng, Pengxiang

Dataless Knowledge Fusion by Merging Weights of Language Models

Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. Oftentimes fine-tuned models are readily available but their training data is not, due to data privacy or intellectual property concerns. This creates a barrier to fusing knowledge across individual models to yield a better single model. In this paper, we study the problem of merging individual models built on different training data sets to obtain a single model that performs well both across all data set domains and can generalize on out-ofdomain data. We propose a dataless knowledge fusion method that merges models in their parameter space, guided by weights that minimize prediction differences between the merged model and the individual models. Over a battery of evaluation settings, we show that the proposed method significantly outperforms baselines such as Fisher-weighted averaging or model ensembling. Further, we find that our method is a promising alternative to multi-task learning that can preserve or sometimes improve over the individual models without access to the training data. Finally, model merging is more efficient than training a multi-task model, thus making it applicable to a wider set of scenarios. The dominant paradigm for solving NLP tasks ranging from classification to sequence tagging involves fine-tuning a pretrained language model (PLM) using task-specific labeled data (Devlin et al., 2019; He et al., 2021). This results in specialized models that are explicitly trained to run inference over a single domain and task.

conference paper, dataset, individual model, (15 more...)

2212.09849

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.90)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Pasi, Piyush Singh, Battepati, Karthikeya, Jyothi, Preethi, Ramakrishnan, Ganesh, Mahapatra, Tanmay, Singh, Manoj

Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration

arXiv.org Artificial IntelligenceOct-10-2023

The problem of audio-to-text alignment has seen significant amount of research using complete supervision during training. However, this is typically not in the context of long audio recordings wherein the text being queried does not appear verbatim within the audio file. This work is a collaboration with a non-governmental organization called CARE India that collects long audio health surveys from young mothers residing in rural parts of Bihar, India. Given a question drawn from a questionnaire that is used to guide these surveys, we aim to locate where the question is asked within a long audio recording. This is of great value to African and Asian organizations that would otherwise have to painstakingly go through long and noisy audio recordings to locate questions (and answers) of interest. Our proposed framework, INDENT, uses a cross-attention-based model and prior information on the temporal ordering of sentences to learn speech embeddings that capture the semantics of the underlying spoken text. These learnt embeddings are used to retrieve the corresponding audio segment based on text queries at inference time. We empirically demonstrate the significant effectiveness (improvement in R-avg of about 3%) of our model over those obtained using text-based heuristics. We also show how noisy ASR, generated using state-of-the-art ASR models for Indian languages, yields better results when used in place of speech. INDENT, trained only on Hindi data is able to cater to all languages supported by the (semantically) shared text space. We illustrate this empirically on 11 Indic languages.

audio recording, ndent, speech, (15 more...)

2310.06702

Country:

Asia > India > Bihar (0.34)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre:

Personal > Interview (0.50)
Research Report (0.50)
Questionnaire & Opinion Survey (0.35)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.40)

arXiv.org Artificial IntelligenceOct-10-2023

V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric Heterogenous Distillation Network

He, Caizhen, Wang, Hai, Chen, Long, Luo, Tong, Cai, Yingfeng

Object detection is the central issue of intelligent traffic systems, and recent advancements in single-vehicle lidar-based 3D detection indicate that it can provide accurate position information for intelligent agents to make decisions and plan. Compared with single-vehicle perception, multi-view vehicle-road cooperation perception has fundamental advantages, such as the elimination of blind spots and a broader range of perception, and has become a research hotspot. However, the current perception of cooperation focuses on improving the complexity of fusion while ignoring the fundamental problems caused by the absence of single-view outlines. We propose a multi-view vehicle-road cooperation perception system, vehicle-to-everything cooperative perception (V2X-AHD), in order to enhance the identification capability, particularly for predicting the vehicle's shape. At first, we propose an asymmetric heterogeneous distillation network fed with different training data to improve the accuracy of contour recognition, with multi-view teacher features transferring to single-view student features. While the point cloud data are sparse, we propose Spara Pillar, a spare convolutional-based plug-in feature extraction backbone, to reduce the number of parameters and improve and enhance feature extraction capabilities. Moreover, we leverage the multi-head self-attention (MSA) to fuse the single-view feature, and the lightweight design makes the fusion feature a smooth expression. The results of applying our algorithm to the massive open dataset V2Xset demonstrate that our method achieves the state-of-the-art result. The V2X-AHD can effectively improve the accuracy of 3D object detection and reduce the number of network parameters, according to this study, which serves as a benchmark for cooperative perception. The code for this article is available at https://github.com/feeling0414-lab/V2X-AHD.

algorithm, perception, point cloud, (14 more...)

2310.06603

Country: Asia > China > Jiangsu Province > Changzhou (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.68)
Information Technology (0.68)
Automobiles & Trucks (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Zhuang, Chen, Zhao, Hongbo

Plane Constraints Aided Multi-Vehicle Cooperative Positioning Using Factor Graph Optimization

arXiv.org Artificial IntelligenceOct-10-2023

The development of vehicle-to-vehicle (V2V) communication facil-itates the study of cooperative positioning (CP) techniques for vehicular applications. The CP methods can improve the posi-tioning availability and accuracy by inter-vehicle ranging and data exchange between vehicles. However, the inter-vehicle rang-ing can be easily interrupted due to many factors such as obsta-cles in-between two cars. Without inter-vehicle ranging, the other cooperative data such as vehicle positions will be wasted, leading to performance degradation of range-based CP methods. To fully utilize the cooperative data and mitigate the impact of inter-vehicle ranging loss, a novel cooperative positioning method aided by plane constraints is proposed in this paper. The positioning results received from cooperative vehicles are used to construct the road plane for each vehicle. The plane parameters are then introduced into CP scheme to impose constraints on positioning solutions. The state-of-art factor graph optimization (FGO) algo-rithm is employed to integrate the plane constraints with raw data of Global Navigation Satellite Systems (GNSS) as well as inter-vehicle ranging measurements. The proposed CP method has the ability to resist the interruptions of inter-vehicle ranging since the plane constraints are computed by just using position-related data. A vehicle can still benefit from the position data of cooperative vehicles even if the inter-vehicle ranging is unavaila-ble. The experimental results indicate the superiority of the pro-posed CP method in positioning performance over the existing methods, especially when the inter-ranging interruptions occur.

constraint, cp method, vehicle, (12 more...)

2310.06414

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.48)

arXiv.org Artificial IntelligenceOct-9-2023

Parameter Efficient Multi-task Model Fusion with Partial Linearization

Tang, Anke, Shen, Li, Luo, Yong, Zhan, Yibing, Hu, Han, Du, Bo, Chen, Yixin, Tao, Dacheng

Large pre-trained models have enabled significant advances in machine learning and served as foundation components. Model fusion methods, such as task arithmetic, have been proven to be powerful and scalable to incorporate fine-tuned weights from different tasks into a multi-task model. However, efficiently fine-tuning large pre-trained models on multiple downstream tasks remains challenging, leading to inefficient multi-task model fusion. In this work, we propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques like LoRA fine-tuning. Specifically, our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters. This allows us to leverage the the advantages of model fusion over linearized fine-tuning, while still performing fine-tuning and inference efficiently. We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model, outperforming standard adapter tuning and task arithmetic alone. Experimental results demonstrate the capabilities of our proposed partial linearization technique to effectively construct unified multi-task models via the fusion of fine-tuned task vectors. We evaluate performance over an increasing number of tasks and find that our approach outperforms standard parameter-efficient fine-tuning techniques. The results highlight the benefits of partial linearization for scalable and efficient multi-task model fusion.

fine-tuning, multi-task model, task vector, (10 more...)

2310.04742

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Iraq (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
(8 more...)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-8-2023

Anyview: Generalizable Indoor 3D Object Detection with Variable Frames

Wu, Zhenyu, Xu, Xiuwei, Wang, Ziwei, Xia, Chong, Zhao, Linqing, Lu, Jiwen, Yan, Haibin

In this paper, we propose a novel network framework for indoor 3D object detection to handle variable input frame numbers in practical scenarios. Existing methods only consider fixed frames of input data for a single detector, such as monocular RGB-D images or point clouds reconstructed from dense multi-view RGB-D images. While in practical application scenes such as robot navigation and manipulation, the raw input to the 3D detectors is the RGB-D images with variable frame numbers instead of the reconstructed scene point cloud. However, the previous approaches can only handle fixed frame input data and have poor performance with variable frame input. In order to facilitate 3D object detection methods suitable for practical tasks, we present a novel 3D detection framework named AnyView for our practical applications, which generalizes well across different numbers of input frames with a single model. To be specific, we propose a geometric learner to mine the local geometric features of each input RGB-D image frame and implement local-global feature interaction through a designed spatial mixture module. Meanwhile, we further utilize a dynamic token strategy to adaptively adjust the number of extracted features for each frame, which ensures consistent global feature density and further enhances the generalization after fusion. Extensive experiments on the ScanNet dataset show our method achieves both great generalizability and high detection accuracy with a simple and clean architecture containing a similar amount of parameters with the baselines.

anyview, detection, point cloud, (12 more...)

2310.05346

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.46)

Li, Xibo, Patel, Shruti, Stronzek-Pfeifer, David, Büskens, Christof

Indoor Localization for an Autonomous Model Car: A Marker-Based Multi-Sensor Fusion Framework

arXiv.org Artificial IntelligenceOct-8-2023

Global navigation satellite systems readily provide accurate position information when localizing a robot outdoors. However, an analogous standard solution does not exist yet for mobile robots operating indoors. This paper presents an integrated framework for indoor localization and experimental validation of an autonomous driving system based on an advanced driver-assistance system (ADAS) model car. The global pose of the model car is obtained by fusing information from fiducial markers, inertial sensors and wheel odometry. In order to achieve robust localization, we investigate and compare two extensions to the Extended Kalman Filter; first with adaptive noise tuning and second with Chi-squared test for measurement outlier detection. An efficient and low-cost ground truth measurement method using a single LiDAR sensor is also proposed to validate the results. The performance of the localization algorithms is tested on a complete autonomous driving system with trajectory planning and model predictive control.

autonomous model car, indoor localization, marker-based multi-sensor fusion framework

doi: 10.1109/CASE56687.2023.10260399

2310.05198

Genre: Research Report (0.40)

Industry: Automobiles & Trucks (0.73)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.40)