Oceania
Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach
Oh, Changdae, Fang, Zhen, Im, Shawn, Du, Xuefeng, Li, Yixuan
Multimodal large language models (MLLMs) have shown promising capabilities but struggle under distribution shifts, where evaluation data differ from instruction tuning distributions. Although previous works have provided empirical evaluations, we argue that establishing a formal framework that can characterize and quantify the risk of MLLMs is necessary to ensure the safe and reliable application of MLLMs in the real world. By taking an information-theoretic perspective, we propose the first theoretical framework that enables the quantification of the maximum risk of MLLMs under distribution shifts. Central to our framework is the introduction of Effective Mutual Information (EMI), a principled metric that quantifies the relevance between input queries and model responses. We derive an upper bound for the EMI difference between in-distribution (ID) and out-of-distribution (OOD) data, connecting it to visual and textual distributional discrepancies. Extensive experiments on real benchmark datasets, spanning 61 shift scenarios empirically validate our theoretical insights.
UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs
Xiong, Yizhe, Huang, Wei, Ye, Xin, Chen, Hui, Lin, Zijia, Lian, Haoran, Su, Zhenpeng, Han, Jungong, Ding, Guiguang
Post-training is essential for adapting Large Language Models (LLMs) to real-world applications. Deploying post-trained models faces significant challenges due to substantial memory overhead and noticeable inference latency. Existing work has identified significant redundancies in LLMs and proposed efficient architectures, namely intra-layer KV sharing and cross-layer KV sharing. However, intra-layer KV sharing still results in high inference costs, while cross-layer KV sharing leads to significant performance degradation. As a result, both methods remain suboptimal for post-training pre-trained LLMs. In this paper, we identify that the \texttt{Softmax} operation is a primary bottleneck for LLM inference and discover that it is actually highly redundant during post-training. We propose Softmax \textbf{Uni}fication in \textbf{Att}e\textbf{n}tion (\textbf{UniAttn}), a novel post-training method that unifies Softmax activations across transformer blocks to reduce LLM inference costs. Additionally, UniAttn adopts a linear projection to compensate for the errors induced by Softmax unification. Experiments show that UniAttn matches the performance of standard post-training while significantly reducing inference costs, outperforming existing efficient architectures during post-training. Our code will be available at \url{https://github.com/Bostoncake/UniAttn}.
Looking into the Future of Health-Care Services: Can Life-Like Agents Change the Future of Health-Care Services?
Torkestani, Mohammad Saleh, Davis, Robert, Sarrafzadeh, Abdolhossein
The increasing availability of computer-mediated knowledge and the advancement of information and communication technologies have altered the methods through which health care information is sought [3] [25] [30]. The Internet has had a significant impact on healthcare service and is a virtual medical library for an estimated 75-80% of users in developed countries [4] [5] [11]. On an average day, more than six million patients and their caregivers in the United States use the Internet to obtain health and medical information. This number exceeds the average daily number of 2.27 million Americans who make visits to physician offices [11] [18] [26]. Furthermore, not only patients but their caregivers want to get actively involved in the health-care management of their loved ones. In a research nearly 60% of people who identified themselves as caregivers use the Internet to find answers to their health-related questions [16]. This computer mediated environment has become, as Vargo and Lusch [32] argue, a fundamental hub where "people exchange to acquire the benefits of specialized competencies (knowledge and skills), or services."
The Impact of Persona-based Political Perspectives on Hateful Content Detection
Civelli, Stefano, Bernardelle, Pietro, Demartini, Gianluca
While pretraining language models with politically diverse content has been shown to improve downstream task fairness, such approaches require significant computational resources often inaccessible to many researchers and organizations. Recent work has established that persona-based prompting can introduce political diversity in model outputs without additional training. However, it remains unclear whether such prompting strategies can achieve results comparable to political pretraining for downstream tasks. We investigate this question using persona-based prompting strategies in multimodal hate-speech detection tasks, specifically focusing on hate speech in memes. Our analysis reveals that when mapping personas onto a political compass and measuring persona agreement, inherent political positioning has surprisingly little correlation with classification decisions. Notably, this lack of correlation persists even when personas are explicitly injected with stronger ideological descriptors. Our findings suggest that while LLMs can exhibit political biases in their responses to direct political questions, these biases may have less impact on practical classification tasks than previously assumed. This raises important questions about the necessity of computationally expensive political pretraining for achieving fair performance in downstream tasks.
TrojanTime: Backdoor Attacks on Time Series Classification
Dong, Chang, Sun, Zechao, Bai, Guangdong, Piao, Shuying, Chen, Weitong, Zhang, Wei Emma
Time Series Classification (TSC) is highly vulnerable to backdoor attacks, posing significant security threats. Existing methods primarily focus on data poisoning during the training phase, designing sophisticated triggers to improve stealthiness and attack success rate (ASR). However, in practical scenarios, attackers often face restrictions in accessing training data. Moreover, it is a challenge for the model to maintain generalization ability on clean test data while remaining vulnerable to poisoned inputs when data is inaccessible. To address these challenges, we propose TrojanTime, a novel two-step training algorithm. In the first stage, we generate a pseudo-dataset using an external arbitrary dataset through target adversarial attacks. The clean model is then continually trained on this pseudo-dataset and its poisoned version. To ensure generalization ability, the second stage employs a carefully designed training strategy, combining logits alignment and batch norm freezing. We evaluate TrojanTime using five types of triggers across four TSC architectures in UCR benchmark datasets from diverse domains. The results demonstrate the effectiveness of TrojanTime in executing backdoor attacks while maintaining clean accuracy. Finally, to mitigate this threat, we propose a defensive unlearning strategy that effectively reduces the ASR while preserving clean accuracy.
LIBRA: Measuring Bias of Large Language Model from a Local Context
Pang, Bo, Qiao, Tingrui, Walker, Caroline, Cunningham, Chris, Koh, Yun Sing
Large Language Models (LLMs) have significantly advanced natural language processing applications, yet their widespread use raises concerns regarding inherent biases that may reduce utility or harm for particular social groups. Despite the advancement in addressing LLM bias, existing research has two major limitations. First, existing LLM bias evaluation focuses on the U.S. cultural context, making it challenging to reveal stereotypical biases of LLMs toward other cultures, leading to unfair development and use of LLMs. Second, current bias evaluation often assumes models are familiar with the target social groups. When LLMs encounter words beyond their knowledge boundaries that are unfamiliar in their training data, they produce irrelevant results in the local context due to hallucinations and overconfidence, which are not necessarily indicative of inherent bias. This research addresses these limitations with a Local Integrated Bias Recognition and Assessment Framework (LIBRA) for measuring bias using datasets sourced from local corpora without crowdsourcing. Implementing this framework, we develop a dataset comprising over 360,000 test cases in the New Zealand context. Furthermore, we propose the Enhanced Idealized CAT Score (EiCAT), integrating the iCAT score with a beyond knowledge boundary score (bbs) and a distribution divergence-based bias measurement to tackle the challenge of LLMs encountering words beyond knowledge boundaries. Our results show that the BERT family, GPT-2, and Llama-3 models seldom understand local words in different contexts. While Llama-3 exhibits larger bias, it responds better to different cultural contexts. The code and dataset are available at: https://github.com/ipangbo/LIBRA.
PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning
Feng, Yu, Geng, Yangli-ao, Zhu, Yifan, Han, Zongfu, Yu, Xie, Xue, Kaiwen, Luo, Haoran, Sun, Mengyang, Zhang, Guangwei, Song, Meina
Federated learning (FL) has gained widespread attention for its privacy-preserving and collaborative learning capabilities. Due to significant statistical heterogeneity, traditional FL struggles to generalize a shared model across diverse data domains. Personalized federated learning addresses this issue by dividing the model into a globally shared part and a locally private part, with the local model correcting representation biases introduced by the global model. Nevertheless, locally converged parameters more accurately capture domain-specific knowledge, and current methods overlook the potential benefits of these parameters. To address these limitations, we propose PM-MoE architecture. This architecture integrates a mixture of personalized modules and an energy-based personalized modules denoising, enabling each client to select beneficial personalized parameters from other clients. We applied the PM-MoE architecture to nine recent model-split-based personalized federated learning algorithms, achieving performance improvements with minimal additional training. Extensive experiments on six widely adopted datasets and two heterogeneity settings validate the effectiveness of our approach. The source code is available at \url{https://github.com/dannis97500/PM-MOE}.
Enhancing Field-Oriented Control of Electric Drives with Tiny Neural Network Optimized for Micro-controllers
Elele, Martin Joel Mouk, Pau, Danilo, Zhuang, Shixin, Facchinetti, Tullio
The deployment of neural networks on resource-constrained microcontrollers has gained momentum, driving many advancements in Tiny Neural Networks. This paper introduces a tiny feed-forward neural network, TinyFC, integrated into the Field-Oriented Control (FOC) of Permanent Magnet Synchronous Motors (PMSMs). Proportional-Integral (PI) controllers are widely used in FOC for their simplicity, although their limitations in handling nonlinear dynamics hinder precision. To address this issue, a lightweight 1,400 parameters TinyFC was devised to enhance the FOC performance while fitting into the computational and memory constraints of Figure 1: Workflow diagram to deploy NN-augmented FOC a micro-controller. Advanced optimization techniques, including pruning, hyperparameter tuning, and quantization to 8-bit integers, such as automotive, industrial, naval and aeronautics, where compact were applied to reduce the model's footprint while preserving the size and precision control are essential [19]. PMSMs consist of network effectiveness. Simulation results show the proposed approach a stator housing the windings and a rotor containing permanent significantly reduced overshoot by up to 87.5%, with the magnets. The operational interaction between the stator's rotating pruned model achieving complete overshoot elimination, highlighting magnetic field and the rotor's fixed magnetic field enables synchronization the potential of tiny neural networks in real-time motor control at synchronous speed [10].
Compilation and Fast Model Counting beyond CNF
de Colnet, Alexis, Szeider, Stefan, Zhang, Tianwei
Circuits in deterministic decomposable negation normal form (d-DNNF) are representations of Boolean functions that enable linear-time model counting. This paper strengthens our theoretical knowledge of what classes of functions can be efficiently transformed, or compiled, into d-DNNF. Our main contribution is the fixed-parameter tractable (FPT) compilation of conjunctions of specific constraints parameterized by incidence treewidth. This subsumes the known result for CNF. The constraints in question are all functions representable by constant-width ordered binary decision diagrams (OBDDs) for all variable orderings. For instance, this includes parity constraints and cardinality constraints with constant threshold. The running time of the FPT compilation is singly exponential in the incidence treewidth but hides large constants in the exponent. To balance that, we give a more efficient FPT algorithm for model counting that applies to a sub-family of the constraints and does not require compilation.
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Lindenmaier, Gabriel, Papay, Sean, Padó, Sebastian
Transformer-based language models have recently been at the forefront of active research in text generation. However, these models' advances come at the price of prohibitive training costs, with parameter counts in the billions and compute requirements measured in petaflop/s-decades. In this paper, we investigate transformer-based architectures for improving model performance in a low-data regime by selectively replacing attention layers with feed-forward and quasi-recurrent neural network layers. We test these architectures on the standard Enwik8 and Wikitext-103 corpora. Our results show that our reduced architectures outperform existing models with a comparable number of parameters, and obtain comparable performance to larger models while significantly reducing the number of parameters.