Goto

Collaborating Authors

 Oceania


Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning

arXiv.org Artificial Intelligence

Extracting relevant information from a stream of high-dimensional observations is a central challenge for deep reinforcement learning agents. Actor-critic algorithms add further complexity to this challenge, as it is often unclear whether the same information will be relevant to both the actor and the critic. To this end, we here explore the principles that underlie effective representations for the actor and for the critic in on-policy algorithms. We focus our study on understanding whether the actor and critic will benefit from separate, rather than shared, representations. Our primary finding is that when separated, the representations for the actor and critic systematically specialise in extracting different types of information from the environment -- the actor's representation tends to focus on action-relevant information, while the critic's representation specialises in encoding value and dynamics information. We conduct a rigourous empirical study to understand how different representation learning approaches affect the actor and critic's specialisations and their downstream performance, in terms of sample efficiency and generation capabilities. Finally, we discover that a separated critic plays an important role in exploration and data collection during training. Our code, trained models and data are accessible at https://github.com/francelico/deac-rep.


Phraselette: A Poet's Procedural Palette

arXiv.org Artificial Intelligence

According to the recently introduced theory of artistic support tools, creativity support tools exert normative influences over artistic production, instantiating a normative ground that shapes both the process and product of artistic expression. We argue that the normative ground of most existing automated writing tools is misaligned with writerly values and identify a potential alternative frame-material writing support-for experimental poetry tools that flexibly support the finding, processing, transforming, and shaping of text(s). Based on this frame, we introduce Phraselette, an artistic material writing support interface that helps experimental poets search for words and phrases. To provide material writing support, Phraselette is designed to counter the dominant mode of automated writing tools, while offering language model affordances in line with writerly values. We further report on an extended expert evaluation involving 10 published poets that indicates support for both our framing of material writing support and for Phraselette itself.


States of LLM-generated Texts and Phase Transitions between them

arXiv.org Artificial Intelligence

While not long ago probabilistic autoregressive language models were just models that assign probabilities to sequences of words (Bahl et al., 1983), now they are the cornerstone of any task in computational linguistics through prompting (Sanh et al., 2022) or fine-tuning (Radford et al., 2018). Such models being successfully commercialized, the number of practical applications of these models is rapidly growing, as is the number of papers considering various aspects of the use of probabilistic autoregressive language models. It is all the more surprising that the statistical properties of the output sequences produced by such models have been studied relatively little. We aim to fill this gap somewhat and empirically demonstrate that, depending on the temperature parameter, LLMs can generate text that can be classified as solid (periodic phase), critical state (that has autocorrelations decay according to the power law) or gas (amorphous phase) from the point of view of autocorrelation analysis. Our main contributions are the following: 1. We clearly identify three phases of LLM-generated texts - periodic, critical and amorphous 2. We show through computational experiments that for LLM-generated texts, there is a phase transition from ordered to amorphous state at about the same temperatures between 0.7 and 1, for different LLMs 3. We show that for amorphous state, long-range autocorrelations decay follows the exponential law independently from the generation temperature, for different LLMs 4. We show that for temperatures between 0.7 and 1 autocorrelations exhibit power law decay on medium distances of up to 2000 words, implying isles of connectivity of these sizes. We go on to introduce the key concepts.


Higher-Order Belief in Incomplete Information MAIDs

arXiv.org Artificial Intelligence

Multi-agent influence diagrams (MAIDs) are probabilistic graphical models which represent strategic interactions between agents. MAIDs are equivalent to extensive form games (EFGs) but have a more compact and informative structure. However, MAIDs cannot, in general, represent settings of incomplete information -- wherein agents have different beliefs about the game being played, and different beliefs about each-other's beliefs. In this paper, we introduce incomplete information MAIDs (II-MAIDs). We define both infinite and finite-depth II-MAIDs and prove an equivalence relation to EFGs with incomplete information and no common prior over types. We prove that II-MAIDs inherit classical equilibria concepts via this equivalence, but note that these solution concepts are often unrealistic in the setting with no common prior because they violate common knowledge of rationality. We define a more realistic solution concept based on recursive best-response. Throughout, we describe an example with a hypothetical AI agent undergoing evaluation to illustrate the applicability of II-MAIDs.


Attackers Can Do Better: Over- and Understated Factors of Model Stealing Attacks

arXiv.org Artificial Intelligence

Machine learning models were shown to be vulnerable to model stealing attacks, which lead to intellectual property infringement. Among other methods, substitute model training is an all-encompassing attack applicable to any machine learning model whose behaviour can be approximated from input-output queries. Whereas prior works mainly focused on improving the performance of substitute models by, e.g. developing a new substitute training method, there have been only limited ablation studies on the impact the attacker's strength has on the substitute model's performance. As a result, different authors came to diverse, sometimes contradicting, conclusions. In this work, we exhaustively examine the ambivalent influence of different factors resulting from varying the attacker's capabilities and knowledge on a substitute training attack. Our findings suggest that some of the factors that have been considered important in the past are, in fact, not that influential; instead, we discover new correlations between attack conditions and success rate. In particular, we demonstrate that better-performing target models enable higher-fidelity attacks and explain the intuition behind this phenomenon. Further, we propose to shift the focus from the complexity of target models toward the complexity of their learning tasks. Therefore, for the substitute model, rather than aiming for a higher architecture complexity, we suggest focusing on getting data of higher complexity and an appropriate architecture. Finally, we demonstrate that even in the most limited data-free scenario, there is no need to overcompensate weak knowledge with millions of queries. Our results often exceed or match the performance of previous attacks that assume a stronger attacker, suggesting that these stronger attacks are likely endangering a model owner's intellectual property to a significantly higher degree than shown until now.


Attention-Based Synthetic Data Generation for Calibration-Enhanced Survival Analysis: A Case Study for Chronic Kidney Disease Using Electronic Health Records

arXiv.org Artificial Intelligence

Access to real-world healthcare data is limited by stringent privacy regulations and data imbalances, hindering advancements in research and clinical applications. Synthetic data presents a promising solution, yet existing methods often fail to ensure the realism, utility, and calibration essential for robust survival analysis. Here, we introduce Masked Clinical Modelling (MCM), an attention-based framework capable of generating high-fidelity synthetic datasets that preserve critical clinical insights, such as hazard ratios, while enhancing survival model calibration. Unlike traditional statistical methods like SMOTE and machine learning models such as VAEs, MCM supports both standalone dataset synthesis for reproducibility and conditional simulation for targeted augmentation, addressing diverse research needs. Validated on a chronic kidney disease electronic health records dataset, MCM reduced the general calibration loss over the entire dataset by 15%; and MCM reduced a mean calibration loss by 9% across 10 clinically stratified subgroups, outperforming 15 alternative methods.


Differential Machine Learning for Time Series Prediction

arXiv.org Artificial Intelligence

Accurate time series prediction is challenging due to the inherent nonlinearity and sensitivity to initial conditions. We propose a novel approach that enhances neural network predictions through differential learning, which involves training models on both the original time series and its differential series. Specifically, we develop a differential long short-term memory (Diff-LSTM) network that uses a shared LSTM cell to simultaneously process both data streams, effectively capturing intrinsic patterns and temporal dynamics. Evaluated on the Mackey-Glass, Lorenz, and R\"ossler chaotic time series, as well as a real-world financial dataset from ACI Worldwide Inc., our results demonstrate that the Diff- LSTM network outperforms prevalent models such as recurrent neural networks, convolutional neural networks, and bidirectional and encoder-decoder LSTM networks in both short-term and long-term predictions. This framework offers a promising solution for enhancing time series prediction, even when comprehensive knowledge of the underlying dynamics of the time series is not fully available.


M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance

arXiv.org Artificial Intelligence

We present M2-omni, a cutting-edge, open-source omni-MLLM that achieves competitive performance to GPT-4o. M2-omni employs a unified multimodal sequence modeling framework, which empowers Large Language Models(LLMs) to acquire comprehensive cross-modal understanding and generation capabilities. Specifically, M2-omni can process arbitrary combinations of audio, video, image, and text modalities as input, generating multimodal sequences interleaving with audio, image, or text outputs, thereby enabling an advanced and interactive real-time experience. The training of such an omni-MLLM is challenged by significant disparities in data quantity and convergence rates across modalities. To address these challenges, we propose a step balance strategy during pre-training to handle the quantity disparities in modality-specific data. Additionally, a dynamically adaptive balance strategy is introduced during the instruction tuning stage to synchronize the modality-wise training progress, ensuring optimal convergence. Notably, we prioritize preserving strong performance on pure text tasks to maintain the robustness of M2-omni's language understanding capability throughout the training process. To our best knowledge, M2-omni is currently a very competitive open-source model to GPT-4o, characterized by its comprehensive modality and task support, as well as its exceptional performance. We expect M2-omni will advance the development of omni-MLLMs, thus facilitating future research in this domain.


Applicability of the Minimal Dominating Set for Influence Maximisation in Multilayer Networks

arXiv.org Artificial Intelligence

The minimal dominating set (MDS) is a well-established concept in network controllability and has been successfully applied in various domains, including sensor placement, network resilience, and epidemic containment. In this study, we adapt the local-improvement MDS routine and explore its potential for enhancing seed selection for influence maximisation in multilayer networks (MLN). We employ the Linear Threshold Model (LTM), which offers an intuitive representation of influence spread or opinion dynamics by accounting for peer influence accumulation. To ensure interpretability, we utilise rank-refining seed selection methods, with the results further filtered with MDS. Our findings reveal that incorporating MDS into the seed selection process improves spread only within a specific range of situations. Notably, the improvement is observed for larger seed set budgets, lower activation thresholds, and when an "AND" strategy is used to aggregate influence across network layers. This scenario reflects situations where an individual does not require the majority of their acquaintances to hold a target opinion, but must be influenced across all social circles.


Clustering-based Meta Bayesian Optimization with Theoretical Guarantee

arXiv.org Machine Learning

Bayesian Optimization (BO) is a well-established method for addressing black-box optimization problems. In many real-world scenarios, optimization often involves multiple functions, emphasizing the importance of leveraging data and learned functions from prior tasks to enhance efficiency in the current task. To expedite convergence to the global optimum, recent studies have introduced meta-learning strategies, collectively referred to as meta-BO, to incorporate knowledge from historical tasks. However, in practical settings, the underlying functions are often heterogeneous, which can adversely affect optimization performance for the current task. Additionally, when the number of historical tasks is large, meta-BO methods face significant scalability challenges. In this work, we propose a scalable and robust meta-BO method designed to address key challenges in heterogeneous and large-scale meta-tasks. Our approach (1) effectively partitions transferred meta-functions into highly homogeneous clusters, (2) learns the geometry-based surrogate prototype that capture the structural patterns within each cluster, and (3) adaptively synthesizes meta-priors during the online phase using statistical distance-based weighting policies. Experimental results on real-world hyperparameter optimization (HPO) tasks, combined with theoretical guarantees, demonstrate the robustness and effectiveness of our method in overcoming these challenges.