Overview
Joint Resource Optimization, Computation Offloading and Resource Slicing for Multi-Edge Traffic-Cognitive Networks
Xiaoyang, Ting, Zhang, Minfeng, gonglee, Shu, Zhang, Saimin Chen
The evolving landscape of edge computing envisions platforms operating as dynamic intermediaries between application providers and edge servers (ESs), where task offloading is coupled with payments for computational services. This paper investigates a multi - agent system where both the platform and ESs are self - interested entities, addressing the joint optimization of revenue maximization, resourc e allocation, and task offloading. We propose a novel Stackelberg game - based framework to model interactions between stakeholders and solve the optimization problem using a Bayesian Optimization - based centralized algorithm. Extensive numerical evaluations demonstrate the effectiveness of t he proposed mechanisms in achieving superior performance compared to existing baselines. Keywords -- Mobile edge computing, computation offloading, resource slicing, DRL - driven traffic prediction I. Introduction In recent years, a surge of novel applications, such as augmented reality, interactive gaming, and autonomous driving, has placed unprecedented demands on computational and network resources. These applications are both resource - intensive and delay - sensitive, necessitating robust and low - latency computi ng frameworks. Multi - access edge computing (MEC), previously referred to as mobile edge computing, has emerged as a promising paradigm to address these challenges.
Synthetic Data Generation with LLM for Improved Depression Prediction
Kang, Andrea, Chen, Jun Yu, Lee-Youngzie, Zoe, Fu, Shuhao
Automatic detection of depression is a rapidly growing field of research at the intersection of psychology and machine learning. However, with its exponential interest comes a growing concern for data privacy and scarcity due to the sensitivity of such a topic. In this paper, we propose a pipeline for Large Language Models (LLMs) to generate synthetic data to improve the performance of depression prediction models. Starting from unstructured, naturalistic text data from recorded transcripts of clinical interviews, we utilize an open-source LLM to generate synthetic data through chain-of-thought prompting. This pipeline involves two key steps: the first step is the generation of the synopsis and sentiment analysis based on the original transcript and depression score, while the second is the generation of the synthetic synopsis/sentiment analysis based on the summaries generated in the first step and a new depression score. Not only was the synthetic data satisfactory in terms of fidelity and privacy-preserving metrics, it also balanced the distribution of severity in the training dataset, thereby significantly enhancing the model's capability in predicting the intensity of the patient's depression. By leveraging LLMs to generate synthetic data that can be augmented to limited and imbalanced real-world datasets, we demonstrate a novel approach to addressing data scarcity and privacy concerns commonly faced in automatic depression detection, all while maintaining the statistical integrity of the original dataset. This approach offers a robust framework for future mental health research and applications.
Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey
Kuang, Jiayi, Xie, Jingyou, Luo, Haohao, Li, Ronghao, Xu, Zhe, Cheng, Xianfeng, Li, Yinghui, Lin, Xika, Shen, Ying
Visual Question Answering (VQA) is a challenge task that combines natural language processing and computer vision techniques and gradually becomes a benchmark test task in multimodal large language models (MLLMs). The goal of our survey is to provide an overview of the development of VQA and a detailed description of the latest models with high timeliness. This survey gives an up-to-date synthesis of natural language understanding of images and text, as well as the knowledge reasoning module based on image-question information on the core VQA tasks. In addition, we elaborate on recent advances in extracting and fusing modal information with vision-language pretraining models and multimodal large language models in VQA. We also exhaustively review the progress of knowledge reasoning in VQA by detailing the extraction of internal knowledge and the introduction of external knowledge. Finally, we present the datasets of VQA and different evaluation metrics and discuss possible directions for future work.
Maximally Separated Active Learning
Kasarla, Tejaswi, Jha, Abhishek, Tervoort, Faye, Cucchiara, Rita, Mettes, Pascal
Active Learning aims to optimize performance while minimizing annotation costs by selecting the most informative samples from an unlabelled pool. Traditional uncertainty sampling often leads to sampling bias by choosing similar uncertain samples. We propose an active learning method that utilizes fixed equiangular hyperspherical points as class prototypes, ensuring consistent inter-class separation and robust feature representations. Our approach introduces Maximally Separated Active Learning (MSAL) for uncertainty sampling and a combined strategy (MSAL-D) for incorporating diversity. This method eliminates the need for costly clustering steps, while maintaining diversity through hyperspherical uniformity. We demonstrate strong performance over existing active learning techniques across five benchmark datasets, highlighting the method's effectiveness and integration ease. The code is available on GitHub.
Rewiring Techniques to Mitigate Oversquashing and Oversmoothing in GNNs: A Survey
Attali, Hugo, Buscaldi, Davide, Pernelle, Nathalie
Graph Neural Networks (GNNs) are powerful tools for learning from graph-structured data, but their effectiveness is often constrained by two critical challenges: oversquashing, where the excessive compression of information from distant nodes results in significant information loss, and oversmoothing, where repeated message-passing iterations homogenize node representations, obscuring meaningful distinctions. These issues, intrinsically linked to the underlying graph structure, hinder information flow and constrain the expressiveness of GNNs. In this survey, we examine graph rewiring techniques, a class of methods designed to address these structural bottlenecks by modifying graph topology to enhance information diffusion. We provide a comprehensive review of state-of-the-art rewiring approaches, delving into their theoretical underpinnings, practical implementations, and performance trade-offs.
Knowledge-aware Evolutionary Graph Neural Architecture Search
Wang, Chao, Zhao, Jiaxuan, Li, Lingling, Jiao, Licheng, Liu, Fang, Liu, Xu, Yang, Shuyuan
Graph neural architecture search (GNAS) can customize high-performance graph neural network architectures for specific graph tasks or datasets. However, existing GNAS methods begin searching for architectures from a zero-knowledge state, ignoring the prior knowledge that may improve the search efficiency. The available knowledge base (e.g. NAS-Bench-Graph) contains many rich architectures and their multiple performance metrics, such as the accuracy (#Acc) and number of parameters (#Params). This study proposes exploiting such prior knowledge to accelerate the multi-objective evolutionary search on a new graph dataset, named knowledge-aware evolutionary GNAS (KEGNAS). KEGNAS employs the knowledge base to train a knowledge model and a deep multi-output Gaussian process (DMOGP) in one go, which generates and evaluates transfer architectures in only a few GPU seconds. The knowledge model first establishes a dataset-to-architecture mapping, which can quickly generate candidate transfer architectures for a new dataset. Subsequently, the DMOGP with architecture and dataset encodings is designed to predict multiple performance metrics for candidate transfer architectures on the new dataset. According to the predicted metrics, non-dominated candidate transfer architectures are selected to warm-start the multi-objective evolutionary algorithm for optimizing the #Acc and #Params on a new dataset. Empirical studies on NAS-Bench-Graph and five real-world datasets show that KEGNAS swiftly generates top-performance architectures, achieving 4.27% higher accuracy than advanced evolutionary baselines and 11.54% higher accuracy than advanced differentiable baselines. In addition, ablation studies demonstrate that the use of prior knowledge significantly improves the search performance.
Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos
AlDahoul, Nouar, Tan, Myles Joshua Toledo, Kasireddy, Harishwar Reddy, Zaki, Yasir
The widespread dissemination of hate speech, harassment, harmful and sexual content, and violence across websites and media platforms presents substantial challenges and provokes widespread concern among different sectors of society. Governments, educators, and parents are often at odds with media platforms about how to regulate, control, and limit the spread of such content. Technologies for detecting and censoring the media contents are a key solution to addressing these challenges. Techniques from natural language processing and computer vision have been used widely to automatically identify and filter out sensitive content such as offensive languages, violence, nudity, and addiction in both text, images, and videos, enabling platforms to enforce content policies at scale. However, existing methods still have limitations in achieving high detection accuracy with fewer false positives and false negatives. Therefore, more sophisticated algorithms for understanding the context of both text and image may open rooms for improvement in content censorship to build a more efficient censorship system. In this paper, we evaluate existing LLM-based content moderation solutions such as OpenAI moderation model and Llama-Guard3 and study their capabilities to detect sensitive contents. Additionally, we explore recent LLMs such as GPT, Gemini, and Llama in identifying inappropriate contents across media outlets. Various textual and visual datasets like X tweets, Amazon reviews, news articles, human photos, cartoons, sketches, and violence videos have been utilized for evaluation and comparison. The results demonstrate that LLMs outperform traditional techniques by achieving higher accuracy and lower false positive and false negative rates. This highlights the potential to integrate LLMs into websites, social media platforms, and video-sharing services for regulatory and content moderation purposes.
TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations
Nobari, Arash Dargahi, Rafiei, Davood
The integration of tabular data from diverse sources is often hindered by inconsistencies in formatting and representation, posing significant challenges for data analysts and personal digital assistants. Existing methods for automating tabular data transformations are limited in scope, often focusing on specific types of transformations or lacking interpretability. In this paper, we introduce TabulaX, a novel framework that leverages Large Language Models (LLMs) for multi-class tabular transformations. TabulaX first classifies input tables into four transformation classes (string-based, numerical, algorithmic, and general) and then applies tailored methods to generate human-interpretable transformation functions, such as numeric formulas or programming code. This approach enhances transparency and allows users to understand and modify the mappings. Through extensive experiments on real-world datasets from various domains, we demonstrate that TabulaX outperforms existing state-of-the-art approaches in terms of accuracy, supports a broader class of transformations, and generates interpretable transformations that can be efficiently applied.
A Method for Building Large Language Models with Predefined KV Cache Capacity
Yi, Zhonghua, Niu, Ge, Wang, Lei, Tang, Wei, Zhang, Liqiu
This paper introduces a novel approach, the Bounded-Cache Transformer (BCT), for building large language models with a predefined Key-Value (KV) cache capacity. The BCT addresses the excessive memory consumption issue in traditional KV caches by implementing a bounded-length KV cache, which is particularly suitable for the attention layers in Transformer decode-only architectures. By dynamically updating the key-value vector sequences, the BCT achieves efficient inference within limited cache capacity, significantly reducing memory usage while maintaining model performance and system throughput. Experimental results demonstrate that the BCT significantly reduces memory usage while maintaining the model's inference quality, offering a new solution for efficient inference in large language models.
WavChat: A Survey of Spoken Dialogue Models
Ji, Shengpeng, Chen, Yifu, Fang, Minghui, Zuo, Jialong, Lu, Jingyu, Wang, Hanting, Jiang, Ziyue, Zhou, Long, Liu, Shujie, Cheng, Xize, Yang, Xiaoda, Wang, Zehan, Yang, Qian, Li, Jian, Jiang, Yidi, He, Jingzhen, Chu, Yunfei, Xu, Jin, Zhao, Zhou
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain. Compared to traditional three-tier cascaded spoken dialogue models that comprise speech recognition (ASR), large language models (LLMs), and text-to-speech (TTS), modern spoken dialogue models exhibit greater intelligence. These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech. Moreover, they generate high-quality, multi-turn speech responses with low latency, enabling real-time interaction through simultaneous listening and speaking capability. Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems and the underlying technologies. To address this, we have first compiled existing spoken dialogue systems in the chronological order and categorized them into the cascaded and end-to-end paradigms. We then provide an in-depth overview of the core technologies in spoken dialogue models, covering aspects such as speech representation, training paradigm, streaming, duplex, and interaction capabilities. Each section discusses the limitations of these technologies and outlines considerations for future research. Additionally, we present a thorough review of relevant datasets, evaluation metrics, and benchmarks from the perspectives of training and evaluating spoken dialogue systems. We hope this survey will contribute to advancing both academic research and industrial applications in the field of spoken dialogue systems. The related material is available at https://github.com/jishengpeng/WavChat.