Oceania
Harnessing Test-time Adaptation for NLU tasks Involving Dialects of English
Nguyen, Duke, Joshi, Aditya, Salim, Flora
Test-time adaptation (TTA) is an excellent method which helps generalize models across domains, tasks, and distributions without the use of labeled datasets. Thus, TTA is very useful in natural language processing (NLP) in the dialectal setting, since oftentimes, models are trained on Standard American English (SAE), evaluated on Indian English or Nigerian English, of which distribution differs significantly from the former. This is especially useful since dialectal datasets are scarce. In this paper, we explore one of the most famous TTA techniques, SHOT, in dialectal NLP. We finetune and evaluate SHOT on different combinations of dialectal GLUE. Our findings show that SHOT is a viable technique when labeled datasets are unavailable. We also theoretically propose the concept of dialectal gap and show that it has a positive correlation with the effectiveness of SHOT. We also find that in many cases, finetuning on SAE yields higher performance than finetuning on dialectal data. Our code is available at https://github.com/dukenguyenxyz/dialect-adaptation
Mapping the Trust Terrain: LLMs in Software Engineering -- Insights and Perspectives
Khati, Dipin, Liu, Yijin, Palacio, David N., Zhang, Yixuan, Poshyvanyk, Denys
Applications of Large Language Models (LLMs) are rapidly growing in industry and academia for various software engineering (SE) tasks. As these models become more integral to critical processes, ensuring their reliability and trustworthiness becomes essential. Consequently, the concept of trust in these systems is becoming increasingly critical. Well-calibrated trust is important, as excessive trust can lead to security vulnerabilities, and risks, while insufficient trust can hinder innovation. However, the landscape of trust-related concepts in LLMs in SE is relatively unclear, with concepts such as trust, distrust, and trustworthiness lacking clear conceptualizations in the SE community. To bring clarity to the current research status and identify opportunities for future work, we conducted a comprehensive review of $88$ papers: a systematic literature review of $18$ papers focused on LLMs in SE, complemented by an analysis of 70 papers from broader trust literature. Additionally, we conducted a survey study with 25 domain experts to gain insights into practitioners' understanding of trust and identify gaps between existing literature and developers' perceptions. The result of our analysis serves as a roadmap that covers trust-related concepts in LLMs in SE and highlights areas for future exploration.
Practical Abstractions for Model Checking Continuous-Time Multi-Agent Systems
Kim, Yan, Jamroga, Wojciech, Penczek, Wojciech, Petrucci, Laure
Much work has been done to contain the state-space explosion by smart representation and/or reduction of input models. Symbolic Model checking of temporal logics in a well established technique model checking based on SATor BDD-based representations to verify and validate properties of multi-agent systems (MAS). of the state/transition space [36, 42, 44, 45, 47, 48, 50] fall into the However, practical model checking requires input models of manageable former group. Model reduction methods include partial-order reduction size. In this paper, we extend the model reduction method [28, 41, 49], equivalence-based reductions [2, 6, 25], and by variable-based abstraction, proposed recently by Jamroga and state abstraction [21], see below for a detailed discussion.
A Framework to Assess Multilingual Vulnerabilities of LLMs
Tang, Likai, Bogahawatta, Niruth, Ginige, Yasod, Xu, Jiarui, Sun, Shixuan, Ranathunga, Surangika, Seneviratne, Suranga
Large Language Models (LLMs) are acquiring a wider range of capabilities, including understanding and responding in multiple languages. While they undergo safety training to prevent them from answering illegal questions, imbalances in training data and human evaluation resources can make these models more susceptible to attacks in low-resource languages (LRL). This paper proposes a framework to automatically assess the multilingual vulnerabilities of commonly used LLMs. Using our framework, we evaluated six LLMs across eight languages representing varying levels of resource availability. We validated the assessments generated by our automated framework through human evaluation in two languages, demonstrating that the framework's results align with human judgments in most cases. Our findings reveal vulnerabilities in LRL; however, these may pose minimal risk as they often stem from the model's poor performance, resulting in incoherent responses.
Reliable and Efficient Amortized Model-based Evaluation
Truong, Sang, Tu, Yuheng, Liang, Percy, Li, Bo, Koyejo, Sanmi
Comprehensive evaluations of language models (LM) during both development and deployment phases are necessary because these models possess numerous capabilities (e.g., mathematical reasoning, legal support, or medical diagnostic) as well as safety risks (e.g., racial bias, toxicity, or misinformation). The average score across a wide range of benchmarks provides a signal that helps guide the use of these LMs in practice. Currently, holistic evaluations are costly due to the large volume of benchmark questions, making frequent evaluations impractical. A popular attempt to lower the cost is to compute the average score on a subset of the benchmark. This approach, unfortunately, often renders an unreliable measure of LM performance because the average score is often confounded with the difficulty of the questions in the benchmark subset. Item response theory (IRT) was designed to address this challenge, providing a reliable measurement by careful controlling for question difficulty. Unfortunately, question difficulty is expensive to estimate. Facing this challenge, we train a model that predicts question difficulty from its content, enabling a reliable measurement at a fraction of the cost. In addition, we leverage this difficulty predictor to further improve the evaluation efficiency through training a question generator given a difficulty level. This question generator is essential in adaptive testing, where, instead of using a random subset of the benchmark questions, informative questions are adaptively chosen based on the current estimation of LLM performance. Experiments on 22 common natural language benchmarks and 172 LMs show that this approach is more reliable and efficient compared to current common practice.
SparseLUT: Sparse Connectivity Optimization for Lookup Table-based Deep Neural Networks
Lou, Binglei, Wu, Ruilin, Leong, Philip
The deployment of deep neural networks (DNNs) on resource-constrained edge devices such as field-programmable gate arrays (FPGAs) requires a careful balance of latency, power, and resource usage while maintaining high accuracy. Existing Lookup Table (LUT)-based DNNs, including LogicNets, PolyLUT, PolyLUT-Add, and NeuraLUT, exploit native FPGA resources with random sparse connectivity. This paper introduces SparseLUT, a connectivity-centric training technique tailored for LUT-based DNNs. SparseLUT leverages a non-greedy training strategy that prioritizes the pruning of less significant connections and strategically regrows alternative ones, resulting in efficient convergence to the target sparsity. Experimental results show consistent accuracy improvements across benchmarks, including up to a 2.13\% increase on MNIST and a 0.94\% improvement for Jet Substructure Classification compared to random sparsity. This is done without any hardware overhead and achieves state-of-the-art results for LUT-based DNNs.
Robust Decision-Making Via Free Energy Minimization
Shafiei, Allahkaram, Jesawada, Hozefa, Friston, Karl, Russo, Giovanni
Despite their groundbreaking performance, state-of-the-art autonomous agents can misbehave when training and environmental conditions become inconsistent, with minor mismatches leading to undesirable behaviors or even catastrophic failures. Robustness towards these training/environment ambiguities is a core requirement for intelligent agents and its fulfillment is a long-standing challenge when deploying agents in the real world. Here, departing from mainstream views seeking robustness through training, we introduce DR-FREE, a free energy model that installs this core property by design. It directly wires robustness into the agent decision-making mechanisms via free energy minimization. By combining a robust extension of the free energy principle with a novel resolution engine, DR-FREE returns a policy that is optimal-yet-robust against ambiguity. Moreover, for the first time, it reveals the mechanistic role of ambiguity on optimal decisions and requisite Bayesian belief updating. We evaluate DR-FREE on an experimental testbed involving real rovers navigating an ambiguous environment filled with obstacles. Across all the experiments, DR-FREE enables robots to successfully navigate towards their goal even when, in contrast, standard free energy minimizing agents that do not use DR-FREE fail. In short, DR-FREE can tackle scenarios that elude previous methods: this milestone may inspire both deployment in multi-agent settings and, at a perhaps deeper level, the quest for a biologically plausible explanation of how natural agents - with little or no training - survive in capricious environments.
Island-Based Evolutionary Computation with Diverse Surrogates and Adaptive Knowledge Transfer for High-Dimensional Data-Driven Optimization
Zhang, Xian-Rong, Gong, Yue-Jiao, Cao, Zhiguang, Zhang, Jun
In recent years, there has been a growing interest in data-driven evolutionary algorithms (DDEAs) employing surrogate models to approximate the objective functions with limited data. However, current DDEAs are primarily designed for lower-dimensional problems and their performance drops significantly when applied to large-scale optimization problems (LSOPs). To address the challenge, this paper proposes an offline DDEA named DSKT-DDEA. DSKT-DDEA leverages multiple islands that utilize different data to establish diverse surrogate models, fostering diverse subpopulations and mitigating the risk of premature convergence. In the intra-island optimization phase, a semi-supervised learning method is devised to fine-tune the surrogates. It not only facilitates data argumentation, but also incorporates the distribution information gathered during the search process to align the surrogates with the evolving local landscapes. Then, in the inter-island knowledge transfer phase, the algorithm incorporates an adaptive strategy that periodically transfers individual information and evaluates the transfer effectiveness in the new environment, facilitating global optimization efficacy. Experimental results demonstrate that our algorithm is competitive with state-of-the-art DDEAs on problems with up to 1000 dimensions, while also exhibiting decent parallelism and scalability. Our DSKT-DDEA is open-source and accessible at: https://github.com/LabGong/DSKT-DDEA.
Collaborative AI Enhances Image Understanding in Materials Science
Yin, Ruoyan Avery, Ren, Zhichu, Yin, Zongyou, Zhang, Zhen, Kim, So Yeon, Hsu, Chia-Wei, Li, Ju
-- The Copilot for Real-world Experimental Scientist (CRESt) system empowers researchers to control autonomous laboratories through conversational AI, providing a seamless interface for managing complex experimental workflows. We have enhanced CRESt by integrating a multi-agent collaboration mechanism that utilizes the complementary strengths of the ChatGPT and Gemini models for precise image analysis in materials science. This innovative approach significantly improves the accuracy of experimental outcomes by fostering structured debates between the AI models, which enhances decision-making processes in materials phase analysis. Additionally, to evaluate the generalizability of this approach, we tested it on a quantitative task of counting particles. Here, the collaboration between the AI models also led to improved results, demonstrating the versatility and robustness of this method. By harnessing this dual-AI framework, this approach stands as a pioneering method for enhancing experimental accuracy and efficiency in materials research, with applications extending beyond CRESt to broader scientific experimentation and analysis. I. INTRODUCTION In recent years, the field of image analysis has undergone transformative changes, primarily driven by advances in artificial intelligence (AI).
WMINet: A Wheel-Mounted Inertial Learning Approach For Mobile-Robot Positioning
Autonomous mobile robots are widely used for navigation, transportation, and inspection tasks indoors and outdoors. In practical situations of limited satellite signals or poor lighting conditions, navigation depends only on inertial sensors. In such cases, the navigation solution rapidly drifts due to inertial measurement errors. In this work, we propose WMINet a wheel-mounted inertial deep learning approach to estimate the mobile robot's position based only on its inertial sensors. To that end, we merge two common practical methods to reduce inertial drift: a wheel-mounted approach and driving the mobile robot in periodic trajectories. Additionally, we enforce a wheelbase constraint to further improve positioning performance. To evaluate our proposed approach we recorded using the Rosbot-XL a wheel-mounted initial dataset totaling 190 minutes, which is made publicly available. Our approach demonstrated a 66\% improvement over state-of-the-art approaches. As a consequence, our approach enables navigation in challenging environments and bridges the pure inertial gap. This enables seamless robot navigation using only inertial sensors for short periods.