Goto

Collaborating Authors

 Overview


FedAPM: Federated Learning via ADMM with Partial Model Personalization

arXiv.org Artificial Intelligence

In federated learning (FL), the assumption that datasets from different devices are independent and identically distributed (i.i.d.) often does not hold due to user differences, and the presence of various data modalities across clients makes using a single model impractical. Personalizing certain parts of the model can effectively address these issues by allowing those parts to differ across clients, while the remaining parts serve as a shared model. However, we found that partial model personalization may exacerbate client drift (each client's local model diverges from the shared model), thereby reducing the effectiveness and efficiency of FL algorithms. We propose an FL framework based on the alternating direction method of multipliers (ADMM), referred to as FedAPM, to mitigate client drift. We construct the augmented Lagrangian function by incorporating first-order and second-order proximal terms into the objective, with the second-order term providing fixed correction and the first-order term offering compensatory correction between the local and shared models. Our analysis demonstrates that FedAPM, by using explicit estimates of the Lagrange multiplier, is more stable and efficient in terms of convergence compared to other FL frameworks. We establish the global convergence of FedAPM training from arbitrary initial points to a stationary point, achieving three types of rates: constant, linear, and sublinear, under mild assumptions. We conduct experiments using four heterogeneous and multimodal datasets with different metrics to validate the performance of FedAPM. Specifically, FedAPM achieves faster and more accurate convergence, outperforming the SOTA methods with average improvements of 12.3% in test accuracy, 16.4% in F1 score, and 18.0% in AUC while requiring fewer communication rounds.


From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

arXiv.org Artificial Intelligence

Compound Al Systems (CAIS) is an emerging paradigm that integrates large language models (LLMs) with external components, such as retrievers, agents, tools, and orchestrators, to overcome the limitations of standalone models in tasks requiring memory, reasoning, real-time grounding, and multimodal understanding. These systems enable more capable and context-aware behaviors by composing multiple specialized modules into cohesive workflows. Despite growing adoption in both academia and industry, the CAIS landscape remains fragmented, lacking a unified framework for analysis, taxonomy, and evaluation. In this survey, we define the concept of CAIS, propose a multi-dimensional taxonomy based on component roles and orchestration strategies, and analyze four foundational paradigms: Retrieval-Augmented Generation (RAG), LLM Agents, Multimodal LLMs (MLLMs), and orchestration-centric architectures. We review representative systems, compare design trade-offs, and summarize evaluation methodologies across these paradigms. Finally, we identify key challenges-including scalability, interoperability, benchmarking, and coordination-and outline promising directions for future research. This survey aims to provide researchers and practitioners with a comprehensive foundation for understanding, developing, and advancing the next generation of system-level artificial intelligence.


Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey

arXiv.org Artificial Intelligence

A core aspect of compositionality, systematicity is a desirable property in ML models as it enables strong generalization to novel contexts. This has led to numerous studies proposing benchmarks to assess systematic generalization, as well as models and training regimes designed to enhance it. Many of these efforts are framed as addressing the challenge posed by Fodor and Pylyshyn. However, while they argue for systematicity of representations, existing benchmarks and models primarily focus on the systematicity of behaviour. We emphasize the crucial nature of this distinction. Furthermore, building on Hadley's (1994) taxonomy of systematic generalization, we analyze the extent to which behavioural systematicity is tested by key benchmarks in the literature across language and vision. Finally, we highlight ways of assessing systematicity of representations in ML models as practiced in the field of mechanistic interpretability.


A Comprehensive Survey on the Risks and Limitations of Concept-based Models

arXiv.org Artificial Intelligence

Concept-based Models are a class of inherently explainable networks that improve upon standard Deep Neural Networks by providing a rationale behind their predictions using human-understandable `concepts'. With these models being highly successful in critical applications like medical diagnosis and financial risk prediction, there is a natural push toward their wider adoption in sensitive domains to instill greater trust among diverse stakeholders. However, recent research has uncovered significant limitations in the structure of such networks, their training procedure, underlying assumptions, and their susceptibility to adversarial vulnerabilities. In particular, issues such as concept leakage, entangled representations, and limited robustness to perturbations pose challenges to their reliability and generalization. Additionally, the effectiveness of human interventions in these models remains an open question, raising concerns about their real-world applicability. In this paper, we provide a comprehensive survey on the risks and limitations associated with Concept-based Models. In particular, we focus on aggregating commonly encountered challenges and the architecture choices mitigating these challenges for Supervised and Unsupervised paradigms. We also examine recent advances in improving their reliability and discuss open problems and promising avenues of future research in this domain.


Multimodal Generative AI with Autoregressive LLMs for Human Motion Understanding and Generation: A Way Forward

arXiv.org Artificial Intelligence

This paper presents an in-depth survey on the use of multimodal Generative Artificial Intelligence (GenAI) and autoregressive Large Language Models (LLMs) for human motion understanding and generation, offering insights into emerging methods, architectures, and their potential to advance realistic and versatile motion synthesis. Focusing exclusively on text and motion modalities, this research investigates how textual descriptions can guide the generation of complex, human-like motion sequences. The paper explores various generative approaches, including autoregressive models, diffusion models, Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and transformer-based models, by analyzing their strengths and limitations in terms of motion quality, computational efficiency, and adaptability. It highlights recent advances in text-conditioned motion generation, where textual inputs are used to control and refine motion outputs with greater precision. The integration of LLMs further enhances these models by enabling semantic alignment between instructions and motion, improving coherence and contextual relevance. This systematic survey underscores the transformative potential of text-to-motion GenAI and LLM architectures in applications such as healthcare, humanoids, gaming, animation, and assistive technologies, while addressing ongoing challenges in generating efficient and realistic human motion.


Latent Guided Sampling for Combinatorial Optimization

arXiv.org Machine Learning

Combinatorial Optimization (CO) consists of finding the best solution from a discrete set of possibilities by optimizing a given objective function subject to constraints. It has widespread applications across various domains, including vehicle routing (Veres and Moussa, 2019), production planning (Dolgui et al., 2019), and drug discovery (Liu et al., 2017). However, its NP-hard nature and the complexity of many problem variants make solving CO problems highly challenging. Traditional heuristic methods (e.g., (Kirkpatrick et al., 1983; Glover, 1989; Mladenovi c and Hansen, 1997)) rely on hand-crafted rules to guide the search, providing near-optimal solutions with significantly lower computational costs. Inspired by the success of deep learning in computer vision (Krizhevsky et al., 2012; He et al., 2016) and natural language processing (Vaswani et al., 2017; Devlin, 2018), recent years have seen a surge in learning-based Neural Combinatorial Optimization (NCO) approaches for solving CO problems, including the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP). 1


Spatially Resolved Meteorological and Ancillary Data in Central Europe for Rainfall Streamflow Modeling

arXiv.org Machine Learning

We present a dataset for rainfall streamflow modeling that is fully spatially resolved with the aim of taking neural network-driven hydrological modeling beyond lumped catchments. To this end, we compiled data covering five river basins in central Europe: upper Danube, Elbe, Oder, Rhine, and Weser. The dataset contains meteorological forcings, as well as ancillary information on soil, rock, land cover, and orography. The data is harmonized to a regular 9km times 9km grid and contains daily values that span from October 1981 to September 2011. We also provide code to further combine our dataset with publicly available river discharge data for end-to-end rainfall streamflow modeling.


From Instructions to ODRL Usage Policies: An Ontology Guided Approach

arXiv.org Artificial Intelligence

This study presents an approach that uses large language models such as GPT-4 to generate usage policies in the W3C Open Digital Rights Language ODRL automatically from natural language instructions. Our approach uses the ODRL ontology and its documentation as a central part of the prompt. Our research hypothesis is that a curated version of existing ontology documentation will better guide policy generation. We present various heuristics for adapting the ODRL ontology and its documentation to guide an end-to-end KG construction process. We evaluate our approach in the context of dataspaces, i.e., distributed infrastructures for trustworthy data exchange between multiple participating organizations for the cultural domain. We created a benchmark consisting of 12 use cases of varying complexity. Our evaluation shows excellent results with up to 91.95% accuracy in the resulting knowledge graph.


Human Fall Detection using Transfer Learning-based 3D CNN

arXiv.org Artificial Intelligence

Unintentional or accidental falls are one of the significant health issues in senior persons. The population of senior persons is increasing steadily. So, there is a need for an automated fall detection monitoring system. This paper introduces a vision-based fall detection system using a pre-trained 3D CNN. Unlike 2D CNN, 3D CNN extracts not only spatial but also temporal features. The proposed model leverages the original learned weights of a 3D CNN model pre-trained on the Sports1M dataset to extract the spatio-temporal features. Only the SVM classifier was trained, which saves the time required to train the 3D CNN. Stratified shuffle five split cross-validation has been used to split the dataset into training and testing data. Extracted features from the proposed 3D CNN model were fed to an SVM classifier to classify the activity as fall or ADL. Two datasets, GMDCSA and CAUCAFall, were utilized to conduct the experiment. The source code for this work can be accessed via the following link: https://github.com/ekramalam/HFD_3DCNN.


Mind the Gap: A Practical Attack on GGUF Quantization

arXiv.org Artificial Intelligence

With the increasing size of frontier LLMs, post-training quantization has become the standard for memory-efficient deployment. Recent work has shown that basic rounding-based quantization schemes pose security risks, as they can be exploited to inject malicious behaviors into quantized models that remain hidden in full precision. However, existing attacks cannot be applied to more complex quantization methods, such as the GGUF family used in the popular ollama and llama$.$cpp frameworks. In this work, we address this gap by introducing the first attack on GGUF. Our key insight is that the quantization error -- the difference between the full-precision weights and their (de-)quantized version -- provides sufficient flexibility to construct malicious quantized models that appear benign in full precision. Leveraging this, we develop an attack that trains the target malicious LLM while constraining its weights based on quantization errors. We demonstrate the effectiveness of our attack on three popular LLMs across nine GGUF quantization data types on three diverse attack scenarios: insecure code generation ($Δ$=$88.7\%$), targeted content injection ($Δ$=$85.0\%$), and benign instruction refusal ($Δ$=$30.1\%$). Our attack highlights that (1) the most widely used post-training quantization method is susceptible to adversarial interferences, and (2) the complexity of quantization schemes alone is insufficient as a defense.