Goto

Collaborating Authors

 Overview


A Survey: Towards Privacy and Security in Mobile Large Language Models

arXiv.org Artificial Intelligence

--Mobile Large Language Models (LLMs) are revolutionizing diverse fields such as healthcare, finance, and education with their ability to perform advanced natural language processing tasks on-the-go. However, the deployment of these models in mobile and edge environments introduces significant challenges related to privacy and security due to their resource-intensive nature and the sensitivity of the data they process. This survey provides a comprehensive overview of privacy and security issues associated with mobile LLMs, systematically categorizing existing solutions such as differential privacy, federated learning, and prompt encryption. Furthermore, we analyze vulnerabilities unique to mobile LLMs, including adversarial attacks, membership inference, and side-channel attacks, offering an in-depth comparison of their effectiveness and limitations. T o bridge this gap, we propose potential applications, discuss open challenges, and suggest future research directions, paving the way for the development of trustworthy, privacy-compliant, and scalable mobile LLM systems. The advent of mobile Large Language Models (LLMs) represents a significant milestone in the evolution of AI, enabling advanced natural language processing capabilities to be deployed in mobile and edge environments [1]-[3]. By bringing powerful AI tools closer to end-users, mobile LLMs are revolutionizing industries such as healthcare [4], finance [5], and education [6] with real-time, on-device applications.


Extrapolated Markov Chain Oversampling Method for Imbalanced Text Classification

arXiv.org Artificial Intelligence

Text classification is the task of automatically assigning text documents correct labels from a predefined set of categories. In real-life (text) classification tasks, observations and misclassification costs are often unevenly distributed between the classes - known as the problem of imbalanced data. Synthetic oversampling is a popular approach to imbalanced classification. The idea is to generate synthetic observations in the minority class to balance the classes in the training set. Many general-purpose oversampling methods can be applied to text data; however, imbalanced text data poses a number of distinctive difficulties that stem from the unique nature of text compared to other domains. One such factor is that when the sample size of text increases, the sample vocabulary (i.e., feature space) is likely to grow as well. We introduce a novel Markov chain based text oversampling method. The transition probabilities are estimated from the minority class but also partly from the majority class, thus allowing the minority feature space to expand in oversampling. We evaluate our approach against prominent oversampling methods and show that our approach is able to produce highly competitive results against the other methods in several real data examples, especially when the imbalance is severe.


AI-Driven Marine Robotics: Emerging Trends in Underwater Perception and Ecosystem Monitoring

arXiv.org Artificial Intelligence

Marine ecosystems face increasing pressure due to climate change, driving the need for scalable, AI-powered monitoring solutions. This paper examines the rapid emergence of underwater AI as a major research frontier and analyzes the factors that have transformed marine perception from a niche application into a catalyst for AI innovation. We identify three convergent drivers: environmental necessity for ecosystem-scale monitoring, democratization of underwater datasets through citizen science platforms, and researcher migration from saturated terrestrial computer vision domains. Our analysis reveals how unique underwater challenges - turbidity, cryptic species detection, expert annotation bottlenecks, and cross-ecosystem generalization - are driving fundamental advances in weakly supervised learning, open-set recognition, and robust perception under degraded conditions. We survey emerging trends in datasets, scene understanding and 3D reconstruction, highlighting the paradigm shift from passive observation toward AI-driven, targeted intervention capabilities. The paper demonstrates how underwater constraints are pushing the boundaries of foundation models, self-supervised learning, and perception, with methodological innovations that extend far beyond marine applications to benefit general computer vision, robotics, and environmental monitoring.


Doctoral Thesis: Geometric Deep Learning For Camera Pose Prediction, Registration, Depth Estimation, and 3D Reconstruction

arXiv.org Artificial Intelligence

Modern deep learning developments create new opportunities for 3D mapping technology, scene reconstruction pipelines, and virtual reality development. Despite advances in 3D deep learning technology, direct training of deep learning models on 3D data faces challenges due to the high dimensionality inherent in 3D data and the scarcity of labeled datasets. Structure-from-motion (SfM) and Simultaneous Localization and Mapping (SLAM) exhibit robust performance when applied to structured indoor environments but often struggle with ambiguous features in unstructured environments. These techniques often struggle to generate detailed geometric representations effective for downstream tasks such as rendering and semantic analysis. Current limitations require the development of 3D representation methods that combine traditional geometric techniques with deep learning capabilities to generate robust geometry-aware deep learning models. The dissertation provides solutions to the fundamental challenges in 3D vision by developing geometric deep learning methods tailored for essential tasks such as camera pose estimation, point cloud registration, depth prediction, and 3D reconstruction. The integration of geometric priors or constraints, such as including depth information, surface normals, and equivariance into deep learning models, enhances both the accuracy and robustness of geometric representations. This study systematically investigates key components of 3D vision, including camera pose estimation, point cloud registration, depth estimation, and high-fidelity 3D reconstruction, demonstrating their effectiveness across real-world applications such as digital cultural heritage preservation and immersive VR/AR environments.


Structure and Destructure: Dual Forces in the Making of Knowledge Engines

arXiv.org Artificial Intelligence

The making of knowledge engines in natural language processing has been shaped by two seemingly distinct paradigms: one grounded in structure, the other driven by massively available unstructured data. The structured paradigm leverages predefined symbolic interactions, such as knowledge graphs, as priors and designs models to capture them. In contrast, the unstructured paradigm centers on scaling transformer architectures with increasingly vast data and model sizes, as seen in modern large language models. Despite their divergence, this thesis seeks to establish conceptual connections bridging these paradigms. Two complementary forces, structure and destructure, emerge across both paradigms: structure organizes seen symbolic interactions, while destructure, through periodic embedding resets, improves model plasticity and generalization to unseen scenarios. These connections form a new recipe for developing general knowledge engines that can support transparent, controllable, and adaptable intelligent systems.


Fairness in Federated Learning: Trends, Challenges, and Opportunities

arXiv.org Artificial Intelligence

At the intersection of the cutting-edge technologies and privacy concerns, Federated Learning (FL) with its distributed architecture, stands at the forefront in a bid to facilitate collaborative model training across multiple clients while preserving data privacy. However, the applicability of FL systems is hindered by fairness concerns arising from numerous sources of heterogeneity that can result in biases and undermine a system's effectiveness, with skewed predictions, reduced accuracy, and inefficient model convergence. This survey thus explores the diverse sources of bias, including but not limited to, data, client, and model biases, and thoroughly discusses the strengths and limitations inherited within the array of the state-of-the-art techniques utilized in the literature to mitigate such disparities in the FL training process. We delineate a comprehensive overview of the several notions, theoretical underpinnings, and technical aspects associated with fairness and their adoption in FL-based multidisciplinary environments. Furthermore, we examine salient evaluation metrics leveraged to measure fairness quantitatively. Finally, we envisage exciting open research directions that have the potential to drive future advancements in achieving fairer FL frameworks, in turn, offering a strong foundation for future research in this pivotal area.


Designing LMS and Instructional Strategies for Integrating Generative-Conversational AI

arXiv.org Artificial Intelligence

Higher education faces growing challenges in delivering personalized, scalable, and pedagogically coherent learning experiences. This study introduces a structured framework for designing an AI-powered Learning Management System (AI-LMS) that integrates generative and conversational AI to support adaptive, interactive, and learner-centered instruction. Using a design-based research (DBR) methodology, the framework unfolds through five phases: literature review, SWOT analysis, development of ethical-pedagogical principles, system design, and instructional strategy formulation. The resulting AI-LMS features modular components -- including configurable prompts, adaptive feedback loops, and multi-agent conversation flows -- aligned with pedagogical paradigms such as behaviorist, constructivist, and connectivist learning theories. By combining AI capabilities with human-centered design and ethical safeguards, this study advances a practical model for AI integration in education. Future research will validate and refine the system through real-world implementation.


Learning to Shop Like Humans: A Review-driven Retrieval-Augmented Recommendation Framework with LLMs

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown strong potential in recommendation tasks due to their strengths in language understanding, reasoning and knowledge integration. These capabilities are especially beneficial for review-based recommendation, which relies on semantically rich user-generated texts to reveal fine-grained user preferences and item attributes. However, effectively incorporating reviews into LLM-based recommendation remains challenging due to (1) inefficient to dynamically utilize user reviews under LLMs' constrained context windows, and (2) lacking effective mechanisms to prioritize reviews most relevant to the user's current decision context. To address these challenges, we propose RevBrowse, a review-driven recommendation framework inspired by the "browse-then-decide" decision process commonly observed in online user behavior. RevBrowse integrates user reviews into the LLM-based reranking process to enhance its ability to distinguish between candidate items. To improve the relevance and efficiency of review usage, we introduce PrefRAG, a retrieval-augmented module that disentangles user and item representations into structured forms and adaptively retrieves preference-relevant content conditioned on the target item. Extensive experiments on four Amazon review datasets demonstrate that RevBrowse achieves consistent and significant improvements over strong baselines, highlighting its generalizability and effectiveness in modeling dynamic user preferences. Furthermore, since the retrieval-augmented process is transparent, RevBrowse offers a certain level of interpretability by making visible which reviews influence the final recommendation.


Fusion to Enhance: Fusion Visual Encoder to Enhance Multimodal Language Model

arXiv.org Artificial Intelligence

Multimodal Large Language Models (MLLMs) have made significant progress in bridging visual perception with high-level textual reasoning. However, they face a fundamental contradiction: while excelling at complex semantic understanding, these models often fail at basic visual tasks that require precise detail perception. This deficiency primarily stems from the prevalent architectural reliance on a single vision encoder optimized for high-level semantic alignment, which inherently sacrifices the ability to capture fine-grained visual information. To address this issue, we introduce Fusion to Enhance (FtZ), a novel vision tower framework. FtZ moves beyond the single-encoder design by innovatively composing a semantically powerful anchor encoder with a perception-rich augmenting encoder via a lightweight Multi-Head Cross-Attention mechanism. Experimental results demonstrate that on several challenging benchmarks demanding fine-grained visual understanding, such as TextVQA, POPE, MMMU, MME and MM-Vet, our FtZ model significantly outperforms baselines that use only a single encoder or existing feature fusion methods. This work proves that composing heterogeneous expert encoders is an efficient and effective path to overcoming the visual perception bottleneck in current MLLMs, offering a new design paradigm for building next-generation AI systems with stronger perceptual capabilities.


Intelligent Spectrum Management in Satellite Communications

arXiv.org Artificial Intelligence

Satellite Communication (SatCom) networks represent a fundamental pillar in modern global connectivity, facilitating reliable service and extensive coverage across a plethora of applications. The expanding demand for high-bandwidth services and the proliferation of mega satellite constellations highlight the limitations of traditional exclusive satellite spectrum allocation approaches. Cognitive Radio (CR) leading to Cognitive Satellite (CogSat) networks through Dynamic Spectrum Management (DSM), which enables the dynamic adaptability of radio equipment to environmental conditions for optimal performance, presents a promising solution for the emerging spectrum scarcity. In this survey, we explore the adaptation of intelligent DSM methodologies to SatCom, leveraging satellite network integrations. We discuss contributions and hurdles in regulations and standardizations in realizing intelligent DSM in SatCom, and deep dive into DSM techniques, which enable CogSat networks. Furthermore, we extensively evaluate and categorize state-of-the-art Artificial Intelligence (AI)/Machine Learning (ML) methods leveraged for DSM while exploring operational resilience and robustness of such integrations. In addition, performance evaluation metrics critical for adaptive resource management and system optimization in CogSat networks are thoroughly investigated. This survey also identifies open challenges and outlines future research directions in regulatory frameworks, network architectures, and intelligent spectrum management, paving the way for sustainable and scalable SatCom networks for enhanced global connectivity.