Goto

Collaborating Authors

 Banff


Understanding and Mitigating the High Computational Cost in Path Data Diffusion

arXiv.org Artificial Intelligence

Advancements in mobility services, navigation systems, and smart transportation technologies have made it possible to collect large amounts of path data. Modeling the distribution of this path data, known as the Path Generation (PG) problem, is crucial for understanding urban mobility patterns and developing intelligent transportation systems. Recent studies have explored using diffusion models to address the PG problem due to their ability to capture multimodal distributions and support conditional generation. A recent work devises a diffusion process explicitly in graph space and achieves state-of-the-art performance. However, this method suffers a high computation cost in terms of both time and memory, which prohibits its application. In this paper, we analyze this method both theoretically and experimentally and find that the main culprit of its high computation cost is its explicit design of the diffusion process in graph space. To improve efficiency, we devise a Latent-space Path Diffusion (LPD) model, which operates in latent space instead of graph space. Our LPD significantly reduces both time and memory costs by up to 82.8% and 83.1%, respectively. Despite these reductions, our approach does not suffer from performance degradation. It outperforms the state-of-the-art method in most scenarios by 24.5%~34.0%.


Semantic Communication based on Generative AI: A New Approach to Image Compression and Edge Optimization

arXiv.org Artificial Intelligence

As digital technologies advance, communication networks face challenges in handling the vast data generated by intelligent devices. Autonomous vehicles, smart sensors, and IoT systems necessitate new paradigms. This thesis addresses these challenges by integrating semantic communication and generative models for optimized image compression and edge network resource allocation. Unlike bit-centric systems, semantic communication prioritizes transmitting meaningful data specifically selected to convey the meaning rather than obtain a faithful representation of the original data. The communication infrastructure can benefit to significant improvements in bandwidth efficiency and latency reduction. Central to this work is the design of semantic-preserving image compression using Generative Adversarial Networks and Denoising Diffusion Probabilistic Models. These models compress images by encoding only semantically relevant features, allowing for high-quality reconstruction with minimal transmission. Additionally, a Goal-Oriented edge network optimization framework is introduced, leveraging the Information Bottleneck principle and stochastic optimization to dynamically allocate resources and enhance efficiency. By integrating semantic communication into edge networks, this approach balances computational efficiency and communication effectiveness, making it suitable for real-time applications. The thesis compares semantic-aware models with conventional image compression techniques using classical and semantic evaluation metrics. Results demonstrate the potential of combining generative AI and semantic communication to create more efficient semantic-goal-oriented communication networks that meet the demands of modern data-driven applications.


Imitation Game for Adversarial Disillusion with Multimodal Generative Chain-of-Thought Role-Play

arXiv.org Artificial Intelligence

As the cornerstone of artificial intelligence, machine perception confronts a fundamental threat posed by adversarial illusions. These adversarial attacks manifest in two primary forms: deductive illusion, where specific stimuli are crafted based on the victim model's general decision logic, and inductive illusion, where the victim model's general decision logic is shaped by specific stimuli. The former exploits the model's decision boundaries to create a stimulus that, when applied, interferes with its decision-making process. The latter reinforces a conditioned reflex in the model, embedding a backdoor during its learning phase that, when triggered by a stimulus, causes aberrant behaviours. The multifaceted nature of adversarial illusions calls for a unified defence framework, addressing vulnerabilities across various forms of attack. In this study, we propose a disillusion paradigm based on the concept of an imitation game. At the heart of the imitation game lies a multimodal generative agent, steered by chain-of-thought reasoning, which observes, internalises and reconstructs the semantic essence of a sample, liberated from the classic pursuit of reversing the sample to its original state. As a proof of concept, we conduct experimental simulations using a multimodal generative dialogue agent and evaluates the methodology under a variety of attack scenarios.


A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization

arXiv.org Machine Learning

Bayesian optimization is a widely used method for optimizing expensive black-box functions, with Expected Improvement being one of the most commonly used acquisition functions. In contrast, information-theoretic acquisition functions aim to reduce uncertainty about the function's optimum and are often considered fundamentally distinct from EI. In this work, we challenge this prevailing perspective by introducing a unified theoretical framework, Variational Entropy Search, which reveals that EI and information-theoretic acquisition functions are more closely related than previously recognized. We demonstrate that EI can be interpreted as a variational inference approximation of the popular information-theoretic acquisition function, named Max-value Entropy Search. Building on this insight, we propose VES-Gamma, a novel acquisition function that balances the strengths of EI and MES. Extensive empirical evaluations across both low- and high-dimensional synthetic and real-world benchmarks demonstrate that VES-Gamma is competitive with state-of-the-art acquisition functions and in many cases outperforms EI and MES.


Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information

arXiv.org Artificial Intelligence

We develop a portfolio allocation framework that leverages deep learning techniques to address challenges arising from high-dimensional, non-stationary, and low-signal-to-noise market information. Our approach includes a dynamic embedding method that reduces the non-stationary, high-dimensional state space into a lower-dimensional representation. We design a reinforcement learning (RL) framework that integrates generative autoencoders and online meta-learning to dynamically embed market information, enabling the RL agent to focus on the most impactful parts of the state space for portfolio allocation decisions. Empirical analysis based on the top 500 U.S. stocks demonstrates that our framework outperforms common portfolio benchmarks and the predict-then-optimize (PTO) approach using machine learning, particularly during periods of market stress. Traditional factor models do not fully explain this superior performance. The framework's ability to time volatility reduces its market exposure during turbulent times. Ablation studies confirm the robustness of this performance across various reinforcement learning algorithms. Additionally, the embedding and meta-learning techniques effectively manage the complexities of high-dimensional, noisy, and non-stationary financial data, enhancing both portfolio performance and risk management.


Objects matter: object-centric world models improve reinforcement learning in visually complex environments

arXiv.org Artificial Intelligence

Deep reinforcement learning has achieved remarkable success in learning control policies from pixels across a wide range of tasks, yet its application remains hindered by low sample efficiency, requiring significantly more environment interactions than humans to reach comparable performance. Model-based reinforcement learning (MBRL) offers a solution by leveraging learnt world models to generate simulated experience, thereby improving sample efficiency. However, in visually complex environments, small or dynamic elements can be critical for decision-making. Yet, traditional MBRL methods in pixel-based environments typically rely on auto-encoding with an $L_2$ loss, which is dominated by large areas and often fails to capture decision-relevant details. To address these limitations, we propose an object-centric MBRL pipeline, which integrates recent advances in computer vision to allow agents to focus on key decision-related elements. Our approach consists of four main steps: (1) annotating key objects related to rewards and goals with segmentation masks, (2) extracting object features using a pre-trained, frozen foundation vision model, (3) incorporating these object features with the raw observations to predict environmental dynamics, and (4) training the policy using imagined trajectories generated by this object-centric world model. Building on the efficient MBRL algorithm STORM, we call this pipeline OC-STORM. We demonstrate OC-STORM's practical value in overcoming the limitations of conventional MBRL approaches on both Atari games and the visually complex game Hollow Knight.


Crossfire: An Elastic Defense Framework for Graph Neural Networks Under Bit Flip Attacks

arXiv.org Artificial Intelligence

Bit Flip Attacks (BFAs) are a well-established class of adversarial attacks, originally developed for Convolutional Neural Networks within the computer vision domain. Most recently, these attacks have been extended to target Graph Neural Networks (GNNs), revealing significant vulnerabilities. This new development naturally raises questions about the best strategies to defend GNNs against BFAs, a challenge for which no solutions currently exist. Given the applications of GNNs in critical fields, any defense mechanism must not only maintain network performance, but also verifiably restore the network to its pre-attack state. Verifiably restoring the network to its pre-attack state also eliminates the need for costly evaluations on test data to ensure network quality. We offer first insights into the effectiveness of existing honeypot- and hashing-based defenses against BFAs adapted from the computer vision domain to GNNs, and characterize the shortcomings of these approaches. To overcome their limitations, we propose Crossfire, a hybrid approach that exploits weight sparsity and combines hashing and honeypots with bit-level correction of out-of-distribution weight elements to restore network integrity. Crossfire is retraining-free and does not require labeled data. Averaged over 2,160 experiments on six benchmark datasets, Crossfire offers a 21.8% higher probability than its competitors of reconstructing a GNN attacked by a BFA to its pre-attack state. These experiments cover up to 55 bit flips from various attacks. Moreover, it improves post-repair prediction quality by 10.85%. Computational and storage overheads are negligible compared to the inherent complexity of even the simplest GNNs.


A Unified Blockwise Measurement Design for Learning Quantum Channels and Lindbladians via Low-Rank Matrix Sensing

arXiv.org Machine Learning

Quantum superoperator learning is a pivotal task in quantum information science, enabling accurate reconstruction of unknown quantum operations from measurement data. We propose a robust approach based on the matrix sensing techniques for quantum superoperator learning that extends beyond the positive semidefinite case, encompassing both quantum channels and Lindbladians. We first introduce a randomized measurement design using a near-optimal number of measurements. By leveraging the restricted isometry property (RIP), we provide theoretical guarantees for the identifiability and recovery of low-rank superoperators in the presence of noise. Additionally, we propose a blockwise measurement design that restricts the tomography to the sub-blocks, significantly enhancing performance while maintaining a comparable scale of measurements. We also provide a performance guarantee for this setup. Our approach employs alternating least squares (ALS) with acceleration for optimization in matrix sensing. Numerical experiments validate the efficiency and scalability of the proposed methods.


Ontology-Enhanced Educational Annotation Activities

arXiv.org Artificial Intelligence

Information and communications technology and technology-enhanced learning have unquestionably transformed traditional teaching-learning processes and are positioned as key factors to promote quality education, one of the basic sustainable development goals of the 2030 agenda. Document annotation, which was traditionally carried out with pencil and paper and currently benefits from digital document annotation tools, is a representative example of this transformation. Using document annotation tools, students can enrich the documents with annotations that highlight the most relevant aspects of these documents. As the conceptual complexity of the learning domain increases, the annotation of the documents may require comprehensive domain knowledge and an expert analysis capability that students usually lack. Consequently, a proliferation of irrelevant, incorrect, and/or poorly decontextualized annotations may appear, while other relevant aspects are completely ignored by the students. The main hypothesis proposed by this paper is that the use of a guiding annotation ontology in the annotation activities is a keystone aspect to alleviate these shortcomings. Consequently, comprehension is improved, exhaustive content analysis is promoted, and meta-reflective thinking is developed. To test this hypothesis, we describe our own annotation tool, \@note, which fully implements this ontology-enhanced annotation paradigm, and we provide experimental evidence about how \@note can improve academic performance via a pilot study concerning critical literary annotation.


Graph Representation Learning with Diffusion Generative Models

arXiv.org Artificial Intelligence

Diffusion models have established themselves as state-of-the-art generative models across various data modalities, including images and videos, due to their ability to accurately approximate complex data distributions. Unlike traditional generative approaches such as VAEs and GANs, diffusion models employ a progressive denoising process that transforms noise into meaningful data over multiple iterative steps. This gradual approach enhances their expressiveness and generation quality. Not only that, diffusion models have also been shown to extract meaningful representations from data while learning to generate samples. Despite their success, the application of diffusion models to graph-structured data remains relatively unexplored, primarily due to the discrete nature of graphs, which necessitates discrete diffusion processes distinct from the continuous methods used in other domains. In this work, we leverage the representational capabilities of diffusion models to learn meaningful embeddings for graph data. By training a discrete diffusion model within an autoencoder framework, we enable both effective autoencoding and representation learning tailored to the unique characteristics of graph-structured data. We only need the encoder at the end to extract representations. Our approach demonstrates the potential of discrete diffusion models to be used for graph representation learning.