Calgary
Representation learning of dynamic networks
Wang, Haixu, Cao, Jiguo, Pei, Jian
This study presents a novel representation learning model tailored for dynamic networks, which describes the continuously evolving relationships among individuals within a population. The problem is encapsulated in the dimension reduction topic of functional data analysis. With dynamic networks represented as matrix-valued functions, our objective is to map this functional data into a set of vector-valued functions in a lower-dimensional learning space. This space, defined as a metric functional space, allows for the calculation of norms and inner products. By constructing this learning space, we address (i) attribute learning, (ii) community detection, and (iii) link prediction and recovery of individual nodes in the dynamic network. Our model also accommodates asymmetric low-dimensional representations, enabling the separate study of nodes' regulatory and receiving roles. Crucially, the learning method accounts for the time-dependency of networks, ensuring that representations are continuous over time. The functional learning space we define naturally spans the time frame of the dynamic networks, facilitating both the inference of network links at specific time points and the reconstruction of the entire network structure without direct observation. We validated our approach through simulation studies and real-world applications. In simulations, we compared our methods link prediction performance to existing approaches under various data corruption scenarios. For real-world applications, we examined a dynamic social network replicated across six ant populations, demonstrating that our low-dimensional learning space effectively captures interactions, roles of individual ants, and the social evolution of the network. Our findings align with existing knowledge of ant colony behavior.
A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM
Liao, Zongbo, Zhang, Xuanxuan, Zhang, Tianxiang, Li, Zhi, Zheng, Zhenqi, Wen, Zhichao, Li, You
LiDAR is widely used in Simultaneous Localization and Mapping (SLAM) and autonomous driving. The LiDAR odometry is of great importance in multi-sensor fusion. However, in some unstructured environments, the point cloud registration cannot constrain the poses of the LiDAR due to its sparse geometric features, which leads to the degeneracy of multi-sensor fusion accuracy. To address this problem, we propose a novel real-time approach to sense and compensate for the degeneracy of LiDAR. Firstly, this paper introduces the degeneracy factor with clear meaning, which can measure the degeneracy of LiDAR. Then, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering method adaptively perceives the degeneracy with better environmental generalization. Finally, the degeneracy perception results are utilized to fuse LiDAR and IMU, thus effectively resisting degeneracy effects. Experiments on our dataset show the method's high accuracy and robustness and validate our algorithm's adaptability to different environments and LiDAR scanning modalities.
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Deitke, Matt, Clark, Christopher, Lee, Sangho, Tripathi, Rohun, Yang, Yue, Park, Jae Sung, Salehi, Mohammadreza, Muennighoff, Niklas, Lo, Kyle, Soldaini, Luca, Lu, Jiasen, Anderson, Taira, Bransom, Erin, Ehsani, Kiana, Ngo, Huong, Chen, YenSung, Patel, Ajay, Yatskar, Mark, Callison-Burch, Chris, Head, Andrew, Hendrix, Rose, Bastani, Favyen, VanderBilt, Eli, Lambert, Nathan, Chou, Yvonne, Chheda, Arnavi, Sparks, Jenna, Skjonsberg, Sam, Schmitz, Michael, Sarnat, Aaron, Bischoff, Byron, Walsh, Pete, Newell, Chris, Wolters, Piper, Gupta, Tanmay, Zeng, Kuo-Hao, Borchardt, Jon, Groeneveld, Dirk, Nam, Crystal, Lebrecht, Sophie, Wittlif, Caitlin, Schoenick, Carissa, Michel, Oscar, Krishna, Ranjay, Weihs, Luca, Smith, Noah A., Hajishirzi, Hannaneh, Girshick, Ross, Farhadi, Ali, Kembhavi, Aniruddha
Today's most advanced vision-language models (VLMs) remain proprietary. The strongest open-weight models rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling these closed VLMs into open ones. As a result, the community has been missing foundational knowledge about how to build performant VLMs from scratch. We present Molmo, a new family of VLMs that are state-of-the-art in their class of openness. Our key contribution is a collection of new datasets called PixMo, including a dataset of highly detailed image captions for pre-training, a free-form image Q&A dataset for fine-tuning, and an innovative 2D pointing dataset, all collected without the use of external VLMs. The success of our approach relies on careful modeling choices, a well-tuned training pipeline, and, most critically, the quality of our newly collected datasets. Our best-in-class 72B model not only outperforms others in the class of open weight and data models, but also outperforms larger proprietary models including Claude 3.5 Sonnet, and Gemini 1.5 Pro and Flash, second only to GPT-4o based on both academic benchmarks and on a large human evaluation. Our model weights, new datasets, and source code are available at https://molmo.allenai.org/blog.
MACAW: A Causal Generative Model for Medical Imaging
Vigneshwaran, Vibujithan, Ohara, Erik, Wilms, Matthias, Forkert, Nils
Although deep learning techniques show promising results for many neuroimaging tasks in research settings, they have not yet found widespread use in clinical scenarios. One of the reasons for this problem is that many machine learning models only identify correlations between the input images and the outputs of interest, which can lead to many practical problems, such as encoding of uninformative biases and reduced explainability. Thus, recent research is exploring if integrating a priori causal knowledge into deep learning models is a potential avenue to identify these problems. This work introduces a new causal generative architecture named Masked Causal Flow (MACAW) for neuroimaging applications. Within this context, three main contributions are described. First, a novel approach that integrates complex causal structures into normalizing flows is proposed. Second, counterfactual prediction is performed to identify the changes in effect variables associated with a cause variable. Finally, an explicit Bayesian inference for classification is derived and implemented, providing an inherent uncertainty estimation. The feasibility of the proposed method was first evaluated using synthetic data and then using MRI brain data from more than 23000 participants of the UK biobank study. The evaluation results show that the proposed method can (1) accurately encode causal reasoning and generate counterfactuals highlighting the structural changes in the brain known to be associated with aging, (2) accurately predict a subject's age from a single 2D MRI slice, and (3) generate new samples assuming other values for subject-specific indicators such as age, sex, and body mass index. The code for a toy dataset is available at the following link: https://github.com/vibujithan/macaw-2D.git.
Real-Time Energy-Optimal Path Planning for Electric Vehicles
Ahmadi, Saman, Tack, Guido, Harabor, Daniel, Kilby, Philip, Jalili, Mahdi
The rapid adoption of electric vehicles (EVs) in modern transport systems has made energy-aware routing a critical task in their successful integration, especially within large-scale networks. In cases where an EV's remaining energy is limited and charging locations are not easily accessible, some destinations may only be reachable through an energy-optimal path: a route that consumes less energy than all other alternatives. The feasibility of such energy-efficient paths depends heavily on the accuracy of the energy model used for planning, and thus failing to account for vehicle dynamics can lead to inaccurate energy estimates, rendering some planned routes infeasible in reality. This paper explores the impact of vehicle dynamics on energy-optimal path planning for EVs. We develop an accurate energy model that incorporates key vehicle dynamics parameters into energy calculations, thereby reducing the risk of planning infeasible paths under battery constraints. The paper also introduces two novel online reweighting functions that allow for a faster, pre-processing free, pathfinding in the presence of negative energy costs resulting from regenerative braking, making them ideal for real-time applications. Through extensive experimentation on real-world transport networks, we demonstrate that our approach considerably enhances energy-optimal pathfinding for EVs in both computational efficiency and energy estimation accuracy.
Continuous K-space Recovery Network with Image Guidance for Fast MRI Reconstruction
Meng, Yucong, Yang, Zhiwei, Duan, Minghong, Shi, Yonghong, Song, Zhijian
Magnetic resonance imaging (MRI) is a crucial tool for clinical diagnosis while facing the challenge of long scanning time. To reduce the acquisition time, fast MRI reconstruction aims to restore high-quality images from the undersampled k-space. Existing methods typically train deep learning models to map the undersampled data to artifact-free MRI images. However, these studies often overlook the unique properties of k-space and directly apply general networks designed for image processing to k-space recovery, leaving the precise learning of k-space largely underexplored. In this work, we propose a continuous k-space recovery network from a new perspective of implicit neural representation with image domain guidance, which boosts the performance of MRI reconstruction. Specifically, (1) an implicit neural representation based encoder-decoder structure is customized to continuously query unsampled k-values. (2) an image guidance module is designed to mine the semantic information from the low-quality MRI images to further guide the k-space recovery. (3) a multi-stage training strategy is proposed to recover dense k-space progressively. Extensive experiments conducted on CC359, fastMRI, and IXI datasets demonstrate the effectiveness of our method and its superiority over other competitors.
Building 6G Radio Foundation Models with Transformer Architectures
Aboulfotouh, Ahmed, Eshaghbeigi, Ashkan, Abou-Zeid, Hatem
Foundation deep learning (DL) models are general models, designed to learn general, robust and adaptable representations of their target modality, enabling finetuning across a range of downstream tasks. These models are pretrained on large, unlabeled datasets using self-supervised learning (SSL). Foundation models have demonstrated better generalization than traditional supervised approaches, a critical requirement for wireless communications where the dynamic environment demands model adaptability. In this work, we propose and demonstrate the effectiveness of a Vision Transformer (ViT) as a radio foundation model for spectrogram learning. We introduce a Masked Spectrogram Modeling (MSM) approach to pretrain the ViT in a self-supervised fashion. We evaluate the ViT-based foundation model on two downstream tasks: Channel State Information (CSI)-based Human Activity sensing and Spectrogram Segmentation. Experimental results demonstrate competitive performance to supervised training while generalizing across diverse domains. Notably, the pretrained ViT model outperforms a four-times larger model that is trained from scratch on the spectrogram segmentation task, while requiring significantly less training time, and achieves competitive performance on the CSI-based human activity sensing task. This work demonstrates the effectiveness of ViT with MSM for pretraining as a promising technique for scalable foundation model development in future 6G networks.
Adversarial Attacks Using Differentiable Rendering: A Survey
Hull, Matthew, Zhang, Chao, Kira, Zsolt, Chau, Duen Horng
Differentiable rendering methods have emerged as a promising means for generating photo-realistic and physically plausible adversarial attacks by manipulating 3D objects and scenes that can deceive deep neural networks (DNNs). Recently, differentiable rendering capabilities have evolved significantly into a diverse landscape of libraries, such as Mitsuba, PyTorch3D, and methods like Neural Radiance Fields and 3D Gaussian Splatting for solving inverse rendering problems that share conceptually similar properties commonly used to attack DNNs, such as back-propagation and optimization. However, the adversarial machine learning research community has not yet fully explored or understood such capabilities for generating attacks. Some key reasons are that researchers often have different attack goals, such as misclassification or misdetection, and use different tasks to accomplish these goals by manipulating different representation in a scene, such as the mesh or texture of an object. This survey adopts a task-oriented unifying framework that systematically summarizes common tasks, such as manipulating textures, altering illumination, and modifying 3D meshes to exploit vulnerabilities in DNNs. Our framework enables easy comparison of existing works, reveals research gaps and spotlights exciting future research directions in this rapidly evolving field. Through focusing on how these tasks enable attacks on various DNNs such as image classification, facial recognition, object detection, optical flow and depth estimation, our survey helps researchers and practitioners better understand the vulnerabilities of computer vision systems against photorealistic adversarial attacks that could threaten real-world applications.
Self-Supervised Radio Pre-training: Toward Foundational Models for Spectrogram Learning
Aboulfotouh, Ahmed, Eshaghbeigi, Ashkan, Karslidis, Dimitrios, Abou-Zeid, Hatem
Foundational deep learning (DL) models are general models, trained on large, diverse, and unlabelled datasets, typically using self-supervised learning techniques have led to significant advancements especially in natural language processing. These pretrained models can be fine-tuned for related downstream tasks, offering faster development and reduced training costs, while often achieving improved performance. In this work, we introduce Masked Spectrogram Modeling, a novel self-supervised learning approach for pretraining foundational DL models on radio signals. Adopting a Convolutional LSTM architecture for efficient spatio-temporal processing, we pretrain the model with an unlabelled radio dataset collected from over-the-air measurements. Subsequently, the pretrained model is fine-tuned for two downstream tasks: spectrum forecasting and segmentation. Experimental results demonstrate that our methodology achieves competitive performance in both forecasting accuracy and segmentation, validating its effectiveness for developing foundational radio models.
Direct Speech-to-Speech Neural Machine Translation: A Survey
Gupta, Mahendra, Dutta, Maitreyee, Maurya, Chandresh Kumar
Speech-to-Speech Translation (S2ST) models transform speech from one language to another target language with the same linguistic information. S2ST is important for bridging the communication gap among communities and has diverse applications. In recent years, researchers have introduced direct S2ST models, which have the potential to translate speech without relying on intermediate text generation, have better decoding latency, and the ability to preserve paralinguistic and non-linguistic features. However, direct S2ST has yet to achieve quality performance for seamless communication and still lags behind the cascade models in terms of performance, especially in real-world translation. To the best of our knowledge, no comprehensive survey is available on the direct S2ST system, which beginners and advanced researchers can look upon for a quick survey. The present work provides a comprehensive review of direct S2ST models, data and application issues, and performance metrics. We critically analyze the models' performance over the benchmark datasets and provide research challenges and future directions.