South America
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
Havrilla, Alex, Dai, Andrew, O'Mahony, Laura, Oostermeijer, Koen, Zisler, Vera, Albalak, Alon, Milo, Fabrizio, Raparthy, Sharath Chandra, Gandhi, Kanishk, Abbasi, Baber, Phung, Duy, Iyer, Maia, Mahan, Dakota, Blagden, Chase, Gureja, Srishti, Hamdy, Mohammed, Li, Wen-Ding, Paolini, Giovanni, Ammanamanchi, Pawan Sasanka, Meyerson, Elliot
Synthetic data generation with Large Language Models is a promising paradigm for augmenting natural data over a nearly infinite range of tasks. Given this variety, direct comparisons among synthetic data generation algorithms are scarce, making it difficult to understand where improvement comes from and what bottlenecks exist. We propose to evaluate algorithms via the makeup of synthetic data generated by each algorithm in terms of data quality, diversity, and complexity. We choose these three characteristics for their significance in open-ended processes and the impact each has on the capabilities of downstream models. We find quality to be essential for in-distribution model generalization, diversity to be essential for out-of-distribution generalization, and complexity to be beneficial for both. Further, we emphasize the existence of Quality-Diversity trade-offs in training data and the downstream effects on model performance. We then examine the effect of various components in the synthetic data pipeline on each data characteristic. This examination allows us to taxonomize and compare synthetic data generation algorithms through the components they utilize and the resulting effects on data QDC composition. This analysis extends into a discussion on the importance of balancing QDC in synthetic data for efficient reinforcement learning and self-improvement algorithms. Analogous to the QD trade-offs in training data, often there exist trade-offs between model output quality and output diversity which impact the composition of synthetic data. We observe that many models are currently evaluated and optimized only for output quality, thereby limiting output diversity and the potential for self-improvement. We argue that balancing these trade-offs is essential to the development of future self-improvement algorithms and highlight a number of works making progress in this direction.
SafeWorld: Geo-Diverse Safety Alignment
Yin, Da, Qiu, Haoyi, Huang, Kung-Hsiang, Chang, Kai-Wei, Peng, Nanyun
In the rapidly evolving field of Large Language Models (LLMs), ensuring safety is a crucial and widely discussed topic. However, existing works often overlook the geo-diversity of cultural and legal standards across the world. To demonstrate the challenges posed by geo-diverse safety standards, we introduce SafeWorld, a novel benchmark specifically designed to evaluate LLMs' ability to generate responses that are not only helpful but also culturally sensitive and legally compliant across diverse global contexts. SafeWorld encompasses 2,342 test user queries, each grounded in high-quality, human-verified cultural norms and legal policies from 50 countries and 493 regions/races. On top of it, we propose a multi-dimensional automatic safety evaluation framework that assesses the contextual appropriateness, accuracy, and comprehensiveness of responses. Our evaluations reveal that current LLMs struggle to meet these criteria. To enhance LLMs' alignment with geo-diverse safety standards, we synthesize helpful preference pairs for Direct Preference Optimization (DPO) alignment training. The preference pair construction aims to encourage LLMs to behave appropriately and provide precise references to relevant cultural norms and policies when necessary. Our trained SafeWorldLM outperforms all competing models, including GPT-4o on all three evaluation dimensions by a large margin. Global human evaluators also note a nearly 20% higher winning rate in helpfulness and harmfulness evaluation. Our code and data can be found here: https://github.com/PlusLabNLP/SafeWorld.
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective
Dong, Yushun, Soga, Patrick, He, Yinhan, Wang, Song, Li, Jundong
Graph Neural Networks (GNNs) have achieved remarkable success in various graph-based learning tasks. While their performance is often attributed to the powerful neighborhood aggregation mechanism, recent studies suggest that other components such as non-linear layers may also significantly affecting how GNNs process the input graph data in the spectral domain. Such evidence challenges the prevalent opinion that neighborhood aggregation mechanisms dominate the behavioral characteristics of GNNs in the spectral domain. To demystify such a conflict, this paper introduces a comprehensive benchmark to measure and evaluate GNNs' capability in capturing and leveraging the information encoded in different frequency components of the input graph data. Specifically, we first conduct an exploratory study demonstrating that GNNs can flexibly yield outputs with diverse frequency components even when certain frequencies are absent or filtered out from the input graph data. We then formulate a novel research problem of measuring and benchmarking the performance of GNNs from a spectral perspective. To take an initial step towards a comprehensive benchmark, we design an evaluation protocol supported by comprehensive theoretical analysis. Finally, we introduce a comprehensive benchmark on real-world datasets, revealing insights that challenge prevalent opinions from a spectral perspective. We believe that our findings will open new avenues for future advancements in this area. Our implementations can be found at: https://github.com/yushundong/Spectral-benchmark.
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models
Lian, Haoran, Chen, Junmin, Huang, Wei, Xiong, Yizhe, Hu, Wenping, Ding, Guiguang, Chen, Hui, Niu, Jianwei, Lin, Zijia, Zhang, Fuzheng, Zhang, Di
Recently, Large language models (LLMs) have revolutionized Natural Language Processing (NLP). Pretrained LLMs, due to limited training context size, struggle with handling long token sequences, limiting their performance on various downstream tasks. Current solutions toward long context modeling often employ multi-stage continual pertaining, which progressively increases the effective context length through several continual pretraining stages. However, those approaches require extensive manual tuning and human expertise. In this paper, we introduce a novel single-stage continual pretraining method, Head-Adaptive Rotary Position Encoding (HARPE), to equip LLMs with long context modeling capabilities while simplifying the training process. Our HARPE leverages different Rotary Position Encoding (RoPE) base frequency values across different attention heads and directly trains LLMs on the target context length. Extensive experiments on 4 language modeling benchmarks, including the latest RULER benchmark, demonstrate that HARPE excels in understanding and integrating long-context tasks with single-stage training, matching and even outperforming existing multi-stage methods. Our results highlight that HARPE successfully breaks the stage barrier for training LLMs with long context modeling capabilities.
NRSurNN3dq4: A Deep Learning Powered Numerical Relativity Surrogate for Binary Black Hole Waveforms
Freitas, Osvaldo Gramaxo, Theodoropoulos, Anastasios, Villanueva, Nino, Fernandes, Tiago, Nunes, Solange, Font, Josรฉ A., Onofre, Antonio, Torres-Fornรฉ, Alejandro, Martin-Guerrero, Josรฉ D.
Gravitational wave approximants are widely used tools in gravitational-wave astronomy. They allow for dense coverage of the parameter space of binary black hole (BBH) mergers for purposes of parameter inference, or, more generally, match filtering tasks, while avoiding the computationally expensive full evolution of numerical relativity simulations. However, this comes at a slight cost in terms of accuracy when compared to numerical relativity waveforms, depending on the approach. One way to minimize this is by constructing so-called~\textit{surrogate models} which, instead of using approximate physics or phenomenological formulae, rather interpolate within the space of numerical relativity waveforms. In this work, we introduce~\texttt{NRSurNN3dq4}, a surrogate model for non-precessing BBH merger waveforms powered by neural networks. By relying on the power of deep learning, this approximant is remarkably fast and competitively accurate, as it can generate millions of waveforms in a tenth of a second, while mismatches with numerical relativity waveforms are restrained below $10^{-3}$. We implement this approximant within the~\textsc{bilby} framework for gravitational-wave parameter inference, and show that it it is suitable for parameter estimation tasks.
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models
Tran, Hieu, Yao, Zonghai, Wang, Junda, Zhang, Yifan, Yang, Zhichao, Yu, Hong
This work introduces RARE (Retrieval-Augmented Reasoning Enhancement), a versatile extension to the mutual reasoning framework (rStar), aimed at enhancing reasoning accuracy and factual integrity across large language models (LLMs) for complex, knowledge-intensive tasks such as commonsense and medical reasoning. RARE incorporates two innovative actions within the Monte Carlo Tree Search (MCTS) framework: A6, which generates search queries based on the initial problem statement, performs information retrieval using those queries, and augments reasoning with the retrieved data to formulate the final answer; and A7, which leverages information retrieval specifically for generated sub-questions and re-answers these sub-questions with the relevant contextual information. Additionally, a Retrieval-Augmented Factuality Scorer is proposed to replace the original discriminator, prioritizing reasoning paths that meet high standards of factuality. Experimental results with LLaMA 3.1 show that RARE enables open-source LLMs to achieve competitive performance with top open-source models like GPT-4 and GPT-4o. This research establishes RARE as a scalable solution for improving LLMs in domains where logical coherence and factual integrity are critical.
Robust Feature Engineering Techniques for Designing Efficient Motor Imagery-Based BCI-Systems
Gardezi, Syed Saim, Jawed, Soyiba, Khan, Mahnoor, Bukhari, Muneeba, Khan, Rizwan Ahmed
A multitude of individuals across the globe grapple with motor disabilities. Neural prosthetics utilizing Brain-Computer Interface (BCI) technology exhibit promise for improving motor rehabilitation outcomes. The intricate nature of EEG data poses a significant hurdle for current BCI systems. Recently, a qualitative repository of EEG signals tied to both upper and lower limb execution of motor and motor imagery tasks has been unveiled. Despite this, the productivity of the Machine Learning (ML) Models that were trained on this dataset was alarmingly deficient, and the evaluation framework seemed insufficient. To enhance outcomes, robust feature engineering (signal processing) methodologies are implemented. A collection of time domain, frequency domain, and wavelet-derived features was obtained from 16-channel EEG signals, and the Maximum Relevance Minimum Redundancy (MRMR) approach was employed to identify the four most significant features. For classification K Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree (DT), and Na\"ive Bayes (NB) models were implemented with these selected features, evaluating their effectiveness through metrics such as testing accuracy, precision, recall, and F1 Score. By leveraging SVM with a Gaussian Kernel, a remarkable maximum testing accuracy of 92.50% for motor activities and 95.48% for imagery activities is achieved. These results are notably more dependable and gratifying compared to the previous study, where the peak accuracy was recorded at 74.36%. This research work provides an in-depth analysis of the MI Limb EEG dataset and it will help in designing and developing simple, cost-effective and reliable BCI systems for neuro-rehabilitation.
'It feels like admin': why are people falling out of love with dating apps?
About 1.4 million people in the UK have left the online dating scene in the past 12 months, but is that a sign that the apps don't work or that people are turning away from dating altogether? Ofcom's 2024 Online Nation report shows that dating app use declined significantly between 2023 and 2024, with a drop of nearly 16% in the use of the top 10 most popular dating apps this year. Tinder experienced the biggest loss, with more than half a million users abandoning the platform since May 2023. Bumble and Hinge were also hit hard, losing 368,000 and 131,000 users respectively in the same period. According to researchers at the University of Leeds' Centre for Love, Sex, and Relationships (CLSR), a contributing factor in the decline of online dating could be a sense of detachment from reality and fatigue at the process.
Conformal Symplectic Optimization for Stable Reinforcement Learning
Lyu, Yao, Zhang, Xiangteng, Li, Shengbo Eben, Duan, Jingliang, Tao, Letian, Xu, Qing, He, Lei, Li, Keqiang
Training deep reinforcement learning (RL) agents necessitates overcoming the highly unstable nonconvex stochastic optimization inherent in the trial-and-error mechanism. To tackle this challenge, we propose a physics-inspired optimization algorithm called relativistic adaptive gradient descent (RAD), which enhances long-term training stability. By conceptualizing neural network (NN) training as the evolution of a conformal Hamiltonian system, we present a universal framework for transferring long-term stability from conformal symplectic integrators to iterative NN updating rules, where the choice of kinetic energy governs the dynamical properties of resulting optimization algorithms. By utilizing relativistic kinetic energy, RAD incorporates principles from special relativity and limits parameter updates below a finite speed, effectively mitigating abnormal gradient influences. Additionally, RAD models NN optimization as the evolution of a multi-particle system where each trainable parameter acts as an independent particle with an individual adaptive learning rate. We prove RAD's sublinear convergence under general nonconvex settings, where smaller gradient variance and larger batch sizes contribute to tighter convergence. Notably, RAD degrades to the well-known adaptive moment estimation (ADAM) algorithm when its speed coefficient is chosen as one and symplectic factor as a small positive value. Experimental results show RAD outperforming nine baseline optimizers with five RL algorithms across twelve environments, including standard benchmarks and challenging scenarios. Notably, RAD achieves up to a 155.1% performance improvement over ADAM in Atari games, showcasing its efficacy in stabilizing and accelerating RL training.
Fully Distributed Online Training of Graph Neural Networks in Networked Systems
Olshevskyi, Rostyslav, Zhao, Zhongyuan, Chan, Kevin, Verma, Gunjan, Swami, Ananthram, Segarra, Santiago
Graph neural networks (GNNs) are powerful tools for developing scalable, decentralized artificial intelligence in large-scale networked systems, such as wireless networks, power grids, and transportation networks. Currently, GNNs in networked systems mostly follow a paradigm of `centralized training, distributed execution', which limits their adaptability and slows down their development cycles. In this work, we fill this gap for the first time by developing a communication-efficient, fully distributed online training approach for GNNs applied to large networked systems. For a mini-batch with $B$ samples, our approach of training an $L$-layer GNN only adds $L$ rounds of message passing to the $LB$ rounds required by GNN inference, with doubled message sizes. Through numerical experiments in graph-based node regression, power allocation, and link scheduling in wireless networks, we demonstrate the effectiveness of our approach in training GNNs under supervised, unsupervised, and reinforcement learning paradigms.