Overview
On the Evolution of Federated Post-Training Large Language Models: A Model Accessibility View
Guo, Tao, Wang, Junxiao, Huo, Fushuo, Cui, Laizhong, Guo, Song, Gui, Jie, Tao, Dacheng
Federated Learning (FL) enables training models across decentralized data silos while preserving client data privacy. Recent research has explored efficient methods for post-training large language models (LLMs) within FL to address computational and communication challenges. While existing approaches often rely on access to LLMs' internal information, which is frequently restricted in real-world scenarios, an inference-only paradigm (black-box FedLLM) has emerged to address these limitations. This paper presents a comprehensive survey on federated tuning for LLMs. We propose a taxonomy categorizing existing studies along two axes: model access-based and parameter efficiency-based optimization. We classify FedLLM approaches into white-box, gray-box, and black-box techniques, highlighting representative methods within each category. We review emerging research treating LLMs as black-box inference APIs and discuss promising directions and open challenges for future research.
An Investigation of Visual Foundation Models Robustness
Gupta, Sandeep, Passerone, Roberto
Visual Foundation Models (VFMs) are becoming ubiquitous in computer vision, powering systems for diverse tasks such as object detection, image classification, segmentation, pose estimation, and motion tracking. VFMs are capitalizing on seminal innovations in deep learning models, such as LeNet-5, AlexNet, ResNet, VGGNet, InceptionNet, DenseNet, YOLO, and ViT, to deliver superior performance across a range of critical computer vision applications. These include security-sensitive domains like biometric verification, autonomous vehicle perception, and medical image analysis, where robustness is essential to fostering trust between technology and the end-users. This article investigates network robustness requirements crucial in computer vision systems to adapt effectively to dynamic environments influenced by factors such as lighting, weather conditions, and sensor characteristics. We examine the prevalent empirical defenses and robust training employed to enhance vision network robustness against real-world challenges such as distributional shifts, noisy and spatially distorted inputs, and adversarial attacks. Subsequently, we provide a comprehensive analysis of the challenges associated with these defense mechanisms, including network properties and components to guide ablation studies and benchmarking metrics to evaluate network robustness.
Dac-Fake: A Divide and Conquer Framework for Detecting Fake News on Social Media
Jain, Mayank Kumar, Gopalani, Dinesh, Meena, Yogesh Kumar, Jain, Nishant
With the rapid evolution of technology and the Internet, the proliferation of fake news on social media has become a critical issue, leading to widespread misinformation that can cause societal harm. Traditional fact checking methods are often too slow to prevent the dissemination of false information. Therefore, the need for rapid, automated detection of fake news is paramount. We introduce DaCFake, a novel fake news detection model using a divide and conquer strategy that combines content and context based features. Our approach extracts over eighty linguistic features from news articles and integrates them with either a continuous bag of words or a skipgram model for enhanced detection accuracy. We evaluated the performance of DaCFake on three datasets including Kaggle, McIntire + PolitiFact, and Reuter achieving impressive accuracy rates of 97.88%, 96.05%, and 97.32%, respectively. Additionally, we employed a ten-fold cross validation to further enhance the model's robustness and accuracy. These results highlight the effectiveness of DaCFake in early detection of fake news, offering a promising solution to curb misinformation on social media platforms.
Machine Learning in Micromobility: A Systematic Review of Datasets, Techniques, and Applications
Yan, Sen, Kaundanya, Chinmaya, O'Connor, Noel E., Little, Suzanne, Liu, Mingming
Micromobility systems, which include lightweight and low-speed vehicles such as bicycles, e-bikes, and e-scooters, have become an important part of urban transportation and are used to solve problems such as traffic congestion, air pollution, and high transportation costs. Successful utilisation of micromobilities requires optimisation of complex systems for efficiency, environmental impact mitigation, and overcoming technical challenges for user safety. Machine Learning (ML) methods have been crucial to support these advancements and to address their unique challenges. However, there is insufficient literature addressing the specific issues of ML applications in micromobilities. This survey paper addresses this gap by providing a comprehensive review of datasets, ML techniques, and their specific applications in micromobilities. Specifically, we collect and analyse various micromobility-related datasets and discuss them in terms of spatial, temporal, and feature-based characteristics. In addition, we provide a detailed overview of ML models applied in micromobilities, introducing their advantages, challenges, and specific use cases. Furthermore, we explore multiple ML applications, such as demand prediction, energy management, and safety, focusing on improving efficiency, accuracy, and user experience. Finally, we propose future research directions to address these issues, aiming to help future researchers better understand this field.
Integrating Time Series into LLMs via Multi-layer Steerable Embedding Fusion for Enhanced Forecasting
Chen, Zhuomin, Li, Dan, Zhou, Jiahui, Wu, Shunyu, Ye, Haozheng, Lou, Jian, Ng, See-Kiong
Time series (TS) data are ubiquitous across various application areas, rendering time series forecasting (TSF) a fundamental task. With the astounding advances in large language models (LLMs), a variety of methods have been developed to adapt LLMs for time series forecasting. Despite unlocking the potential of LLMs in comprehending TS data, existing methods are inherently constrained by their shallow integration of TS information, wherein LLMs typically access TS representations at shallow layers, primarily at the input layer. This causes the influence of TS representations to progressively fade in deeper layers and eventually leads to ineffective adaptation between textual embeddings and TS representations. In this paper, we propose the Multi-layer Steerable Embedding Fusion (MSEF), a novel framework that enables LLMs to directly access time series patterns at all depths, thereby mitigating the progressive loss of TS information in deeper layers. Specifically, MSEF leverages off-the-shelf time series foundation models to extract semantically rich embeddings, which are fused with intermediate text representations across LLM layers via layer-specific steering vectors. These steering vectors are designed to continuously optimize the alignment between time series and textual modalities and facilitate a layer-specific adaptation mechanism that ensures efficient few-shot learning capabilities. Experimental results on seven benchmarks demonstrate significant performance improvements by MSEF compared with baselines, with an average reduction of 31.8% in terms of MSE. The code is available at https://github.com/One1sAll/MSEF.
Quantum Federated Learning: A Comprehensive Survey
Nguyen, Dinh C., Uddin, Md Raihan, Shaon, Shaba, Rahman, Ratun, Dobre, Octavia, Niyato, Dusit
Quantum federated learning (QFL) is a combination of distributed quantum computing and federated machine learning, integrating the strengths of both to enable privacy-preserving decentralized learning with quantum-enhanced capabilities. It appears as a promising approach for addressing challenges in efficient and secure model training across distributed quantum systems. This paper presents a comprehensive survey on QFL, exploring its key concepts, fundamentals, applications, and emerging challenges in this rapidly developing field. Specifically, we begin with an introduction to the recent advancements of QFL, followed by discussion on its market opportunity and background knowledge. We then discuss the motivation behind the integration of quantum computing and federated learning, highlighting its working principle. Moreover, we review the fundamentals of QFL and its taxonomy. Particularly, we explore federation architecture, networking topology, communication schemes, optimization techniques, and security mechanisms within QFL frameworks. Furthermore, we investigate applications of QFL across several domains which include vehicular networks, healthcare networks, satellite networks, metaverse, and network security. Additionally, we analyze frameworks and platforms related to QFL, delving into its prototype implementations, and provide a detailed case study. Key insights and lessons learned from this review of QFL are also highlighted. We complete the survey by identifying current challenges and outlining potential avenues for future research in this rapidly advancing field.
An Efficient Hybridization of Graph Representation Learning and Metaheuristics for the Constrained Incremental Graph Drawing Problem
Charytitsch, Bruna C. B., Nascimento, Marรญa C. V.
Hybridizing machine learning techniques with metaheuristics has attracted significant attention in recent years. Many attempts employ supervised or reinforcement learning to support the decision-making of heuristic methods. However, in some cases, these techniques are deemed too time-consuming and not competitive with hand-crafted heuristics. This paper proposes a hybridization between metaheuristics and a less expensive learning strategy to extract the latent structure of graphs, known as Graph Representation Learning (GRL). For such, we approach the Constrained Incremental Graph Drawing Problem (C-IGDP), a hierarchical graph visualization problem. There is limited literature on methods for this problem, for which Greedy Randomized Search Procedures (GRASP) heuristics have shown promising results. In line with this, this paper investigates the gains of incorporating GRL into the construction phase of GRASP, which we refer to as Graph Learning GRASP (GL-GRASP). In computational experiments, we first analyze the results achieved considering different node embedding techniques, where deep learning-based strategies stood out. The evaluation considered the primal integral measure that assesses the quality of the solutions according to the required time for such. According to this measure, the best GL-GRASP heuristics demonstrated superior performance than state-of-the-art literature GRASP heuristics for the problem. A scalability test on newly generated denser instances under a fixed time limit further confirmed the robustness of the GL-GRASP heuristics.
Low-dimensional embeddings of high-dimensional data
de Bodt, Cyril, Diaz-Papkovich, Alex, Bleher, Michael, Bunte, Kerstin, Coupette, Corinna, Damrich, Sebastian, Sanmartin, Enrique Fita, Hamprecht, Fred A., Horvรกt, Emลke-รgnes, Kohli, Dhruv, Krishnaswamy, Smita, Lee, John A., Lelieveldt, Boudewijn P. F., McInnes, Leland, Nabney, Ian T., Noichl, Maximilian, Poliฤar, Pavlin G., Rieck, Bastian, Wolf, Guy, Mishne, Gal, Kobak, Dmitry
Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from biology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.
A Review of Developmental Interpretability in Large Language Models
This review synthesizes the nascent but critical field of developmental interpretability for Large Language Models. We chart the field's evolution from static, post-hoc analysis of trained models to a dynamic investigation of the training process itself. We begin by surveying the foundational methodologies, including representational probing, causal tracing, and circuit analysis, that enable researchers to deconstruct the learning process. The core of this review examines the developmental arc of LLM capabilities, detailing key findings on the formation and composition of computational circuits, the biphasic nature of knowledge acquisition, the transient dynamics of learning strategies like in-context learning, and the phenomenon of emergent abilities as phase transitions in training. We explore illuminating parallels with human cognitive and linguistic development, which provide valuable conceptual frameworks for understanding LLM learning. Finally, we argue that this developmental perspective is not merely an academic exercise but a cornerstone of proactive AI safety, offering a pathway to predict, monitor, and align the processes by which models acquire their capabilities. We conclude by outlining the grand challenges facing the field, such as scalability and automation, and propose a research agenda for building more transparent, reliable, and beneficial AI systems.
A Survey of Deep Learning for Geometry Problem Solving
Ma, Jianzhe, Wang, Wenxuan, Jin, Qin
Geometry problem solving, a crucial aspect of mathematical reasoning, is vital across various domains, including education, the assessment of AI's mathematical abilities, and multimodal capability evaluation. The recent surge in deep learning technologies, particularly the emergence of multimodal large language models, has significantly accelerated research in this area. This paper provides a survey of the applications of deep learning in geometry problem solving, including (i) a comprehensive summary of the relevant tasks in geometry problem solving; (ii) a thorough review of related deep learning methods; (iii) a detailed analysis of evaluation metrics and methods; and (iv) a critical discussion of the current challenges and future directions that can be explored. Our objective is to offer a comprehensive and practical reference of deep learning for geometry problem solving, thereby fostering further advancements in this field. We create a continuously updated list of papers on GitHub: https://github.com/majianz/dl4gps.