Goto

Collaborating Authors

 Overview


Interpretation Gaps in LLM-Assisted Comprehension of Privacy Documents

arXiv.org Artificial Intelligence

This article explores the gaps that can manifest when using a large language model (LLM) to obtain simplified interpretations of data practices from a complex privacy policy. We exemplify these gaps to showcase issues in accuracy, completeness, clarity and representation, while advocating for continued research to realize an LLM's true potential in revolutionizing privacy management through personal assistants and automated compliance checking.


Enhanced Sentiment Analysis of Iranian Restaurant Reviews Utilizing Sentiment Intensity Analyzer & Fuzzy Logic

arXiv.org Artificial Intelligence

This research presents an advanced sentiment analysis framework studied on Iranian restaurant reviews, combining fuzzy logic with conventional sentiment analysis techniques to assess both sentiment polarity and intensity. A dataset of 1266 reviews, alongside corresponding star ratings, was compiled and preprocessed for analysis. Initial sentiment analysis was conducted using the Sentiment Intensity Analyzer (VADER), a rule-based tool that assigns sentiment scores across positive, negative, and neutral categories. However, a noticeable bias toward neutrality often led to an inaccurate representation of sentiment intensity. To mitigate this issue, based on a fuzzy perspective, two refinement techniques were introduced, applying square-root and fourth-root transformations to amplify positive and negative sentiment scores while maintaining neutrality. This led to three distinct methodologies: Approach 1, utilizing unaltered VADER scores; Approach 2, modifying sentiment values using the square root; and Approach 3, applying the fourth root for further refinement. A Fuzzy Inference System incorporating comprehensive fuzzy rules was then developed to process these refined scores and generate a single, continuous sentiment value for each review based on each approach. Comparative analysis, including human supervision and alignment with customer star ratings, revealed that the refined approaches significantly improved sentiment analysis by reducing neutrality bias and better capturing sentiment intensity. Despite these advancements, minor over-amplification and persistent neutrality in domain-specific cases were identified, leading us to propose several future studies to tackle these occasional barriers. The study's methodology and outcomes offer valuable insights for businesses seeking a more precise understanding of consumer sentiment, enhancing sentiment analysis across various industries.


A Survey on Federated Fine-tuning of Large Language Models

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have achieved remarkable success across a wide range of tasks, with fine-tuning playing a pivotal role in adapting them to specific downstream applications. Federated Learning (FL) offers a promising approach that enables collaborative model adaptation while ensuring data privacy, i.e., FedLLM. In this survey, we provide a systematic and thorough review of the integration of LLMs with FL. Specifically, we first trace the historical evolution of both LLMs and FL, while summarizing relevant prior surveys. We then present an in-depth analysis of the fundamental challenges encountered in deploying FedLLM. Following this, we conduct an extensive study of existing parameter-efficient fine-tuning (PEFT) methods and explore their applicability in FL. Furthermore, we introduce a comprehensive evaluation benchmark to rigorously assess FedLLM performance and discuss its diverse real-world applications across multiple domains. Finally, we identify critical open challenges and outline promising research directions to drive future advancements in FedLLM. We maintain an active \href{https://github.com/Clin0212/Awesome-Federated-LLM-Learning}{GitHub repository} tracking cutting-edge advancements. This survey serves as a foundational resource for researchers and practitioners, offering insights into the evolving landscape of federated fine-tuning for LLMs while guiding future innovations in privacy-preserving AI.


General Scales Unlock AI Evaluation with Explanatory and Predictive Power

arXiv.org Artificial Intelligence

Ensuring safe and effective use of AI requires understanding and anticipating its performance on novel tasks, from advanced scientific challenges to transformed workplace activities. So far, benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems, given the low transferability across diverse tasks. In this paper, we introduce general scales for AI evaluation that can explain what common AI benchmarks really measure, extract ability profiles of AI systems, and predict their performance for new task instances, in- and out-of-distribution. Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate. Illustrated for 15 large language models and 63 tasks, high explanatory power is unleashed from inspecting the demand and ability profiles, bringing insights on the sensitivity and specificity exhibited by different benchmarks, and how knowledge, metacognition and reasoning are affected by model size, chain-of-thought and distillation. Surprisingly, high predictive power at the instance level becomes possible using these demand levels, providing superior estimates over black-box baseline predictors based on embeddings or finetuning, especially in out-of-distribution settings (new tasks and new benchmarks). The scales, rubrics, battery, techniques and results presented here represent a major step for AI evaluation, underpinning the reliable deployment of AI in the years ahead. (Collaborative platform: https://kinds-of-intelligence-cfi.github.io/ADELE.)


Simulation-based Bayesian inference under model misspecification

arXiv.org Machine Learning

Simulation-based Bayesian inference (SBI) methods are widely used for parameter estimation in complex models where evaluating the likelihood is challenging but generating simulations is relatively straightforward. However, these methods commonly assume that the simulation model accurately reflects the true data-generating process, an assumption that is frequently violated in realistic scenarios. In this paper, we focus on the challenges faced by SBI methods under model misspecification. We consolidate recent research aimed at mitigating the effects of misspecification, highlighting three key strategies: i) robust summary statistics, ii) generalised Bayesian inference, and iii) error modelling and adjustment parameters. To illustrate both the vulnerabilities of popular SBI methods and the effectiveness of misspecification-robust alternatives, we present empirical results on an illustrative example.


Integrating Dynamical Systems Modeling with Spatiotemporal scRNA-seq Data Analysis

arXiv.org Artificial Intelligence

Understanding the dynamic nature of biological systems is fundamental to deciphering cellular behavior, developmental processes, and disease progression. Single-cell RNA sequencing (scRNA-seq) has provided static snapshots of gene expression, offering valuable insights into cellular states at a single time point. Recent advancements in temporally resolved scRNA-seq, spatial transcriptomics (ST), and time-series spatial transcriptomics (temporal-ST) have further revolutionized our ability to study the spatiotemporal dynamics of individual cells. These technologies, when combined with computational frameworks such as Markov chains, stochastic differential equations (SDEs), and generative models like optimal transport and Schr\"odinger bridges, enable the reconstruction of dynamic cellular trajectories and cell fate decisions. This review discusses how these dynamical system approaches offer new opportunities to model and infer cellular dynamics from a systematic perspective.


Challenges and Advancements in Modeling Shock Fronts with Physics-Informed Neural Networks: A Review and Benchmarking Study

arXiv.org Artificial Intelligence

Solving partial differential equations (PDEs) with discontinuous solutions , such as shock waves in multiphase viscous flow in porous media , is critical for a wide range of scientific and engineering applications, as they represent sudden changes in physical quantities. Physics-Informed Neural Networks (PINNs), an approach proposed for solving PDEs, encounter significant challenges when applied to such systems. Accurately solving PDEs with discontinuities using PINNs requires specialized techniques to ensure effective solution accuracy and numerical stability. A benchmarking study was conducted on two multiphase flow problems in porous media: the classic Buckley-Leverett (BL) problem and a fully coupled system of equations involving shock waves but with varying levels of solution complexity. The findings show that PM and LM approaches can provide accurate solutions for the BL problem by effectively addressing the infinite gradients associated with shock occurrences. In contrast, AM methods failed to effectively resolve the shock waves. When applied to fully coupled PDEs (with more complex loss landscape), the generalization error in the solutions quickly increased, highlighting the need for ongoing innovation. This study provides a comprehensive review of existing techniques for managing PDE discontinuities using PINNs, offering information on their strengths and limitations. The results underscore the necessity for further research to improve PINNs ability to handle complex discontinuities, particularly in more challenging problems with complex loss landscapes. This includes problems involving higher dimensions or multiphysics systems, where current methods often struggle to maintain accuracy and efficiency.


FlowKac: An Efficient Neural Fokker-Planck solver using Temporal Normalizing flows and the Feynman Kac-Formula

arXiv.org Machine Learning

Solving the Fokker-Planck equation for high-dimensional complex dynamical systems remains a pivotal yet challenging task due to the intractability of analytical solutions and the limitations of traditional numerical methods. In this work, we present FlowKac, a novel approach that reformulates the Fokker-Planck equation using the Feynman-Kac formula, allowing to query the solution at a given point via the expected values of stochastic paths. A key innovation of FlowKac lies in its adaptive stochastic sampling scheme which significantly reduces the computational complexity while maintaining high accuracy. This sampling technique, coupled with a time-indexed normalizing flow, designed for capturing time-evolving probability densities, enables robust sampling of collocation points, resulting in a flexible and mesh-free solver. This formulation mitigates the curse of dimensionality and enhances computational efficiency and accuracy, which is particularly crucial for applications that inherently require dimensions beyond the conventional three. We validate the robustness and scalability of our method through various experiments on a range of stochastic differential equations, demonstrating significant improvements over existing techniques.


A Survey on SAR ship classification using Deep Learning

arXiv.org Artificial Intelligence

Deep learning (DL) has emerged as a powerful tool for Synthetic Aperture Radar (SAR) ship classification. This survey comprehensively analyzes the diverse DL techniques employed in this domain. We identify critical trends and challenges, highlighting the importance of integrating handcrafted features, utilizing public datasets, data augmentation, fine-tuning, explainability techniques, and fostering interdisciplinary collaborations to improve DL model performance. This survey establishes a first-of-its-kind taxonomy for categorizing relevant research based on DL models, handcrafted feature use, SAR attribute utilization, and the impact of fine-tuning. We discuss the methodologies used in SAR ship classification tasks and the impact of different techniques. Finally, the survey explores potential avenues for future research, including addressing data scarcity, exploring novel DL architectures, incorporating interpretability techniques, and establishing standardized performance metrics. By addressing these challenges and leveraging advancements in DL, researchers can contribute to developing more accurate and efficient ship classification systems, ultimately enhancing maritime surveillance and related applications.


Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring

arXiv.org Artificial Intelligence

The rapid development of machine learning (ML) technologies, particularly large language models (LLMs), has led to major advancements in natural language processing (NLP, Abbasi et al. 2023). While much of this advancement happened under the umbrella of the common task framework which espouses transparency and openness (Abbasi et al. 2023), in recent years, closed LLMs such as GPT-3 and GPT-4 have set new performance standards in tasks ranging from text generation to question answering, demonstrating unprecedented capabilities in zero-shot and few-shot learning scenarios (Brown et al. 2020, OpenAI 2023). Given the strong performance of closed LLMs such as GPT-4, many studies within the LLM-as-a-judge paradigm rely on their scores as ground truth benchmarks for evaluating both open and closed LLMs (Chiang and Lee 2023), further entrenching the dominance of SOTA closed LLMs (Vergho et al. 2024). Along with closed LLMs, there are also LLMs where the pre-trained models (i.e., training weights) and inference code are publicly available ("open LLMs") such as Llama (Touvron et al. 2023, Dubey et al. 2024) as well as LLMs where the full training data and training code are also available ("open-source LLMs") such as OLMo (Groeneveld et al. 2024). Open and open-source LLMs provide varying levels of transparency for developers and researchers (Liu et al. 2023). Access to model weights, training data, and inference code enables several benefits for the user-developer-researcher community, including lower costs per input/output token through third-party API services, support for local/offline pre-training and fine-tuning, and deeper analysis of model biases and debiasing strategies. However, the dominance of closed LLMs raises a number of concerns, including accessibility and fairness (Strubell et al. 2020, Bender 2021, Irugalbandara et al. 2024).