Oceania
Probabilities-Informed Machine Learning
As a natural evolution of traditional regression methods [3], ML models such as Support Vector Regression (SVR) [4] and Artificial Neural Networks (ANN) [5] have been developed to handle non-linear relationships and highdimensional datasets [6] with increasing accuracy and robustness. For instance, SVR has proven to be a robust regression tool because it can generalize well with limited data and capture nonlinear relationships using kernel functions [7]. Similarly, ANN, inspired by the neural architecture of the human brain, has become foundational to ML [5]. Typically, these methods use inputs (X) and outputs (Y) to construct surrogate models that aim to minimize the difference between the predicted and actual output values. These models have found applications across diverse fields, including engineering, medicine, and economics, demonstrating their versatility and potential [8], [9], [10]. In many real-world applications, additional prior information regarding the output model can be leveraged to enhance its accuracy and robustness [11] [12]. For instance, in physical systems, knowledge of the governing laws of physics has been successfully incorporated into ML by developing physics-informed neural networks (PINNs) [13], leading to improved efficiency and accuracy in prediction tasks [14]. In addition to physical laws, probabilistic information about the structure of the problem may also exist in practical scenarios [15]. Moreover, in many systems, the output variable is inherently probabilistic, necessitating models to approximate the probabilistic structure of the output [16].
Planning-Driven Programming: A Large Language Model Programming Workflow
Lei, Chao, Chang, Yanchuan, Lipovetzky, Nir, Ehinger, Krista A.
The strong performance of large language models (LLMs) raises extensive discussion on their application to code generation. Recent research suggests continuous program refinements through visible tests to improve code generation accuracy in LLMs. However, these methods suffer from LLMs' inefficiency and limited reasoning capacity. In this work, we propose an LLM programming workflow (LPW) designed to improve both initial code generation and subsequent refinements within a structured two-phase workflow. Specifically, the solution generation phase formulates a solution plan, which is then verified through visible tests to specify the intended natural language solution. Subsequently, the code implementation phase drafts an initial code according to the solution plan and its verification. If the generated code fails the visible tests, the plan verification serves as the intended solution to consistently inform the refinement process for correcting bugs. Compared to state-of-the-art methods across various existing LLMs, LPW significantly improves the Pass@1 accuracy by up to 16.4% on well-established text-to-code generation benchmarks. LPW also sets new state-of-the-art Pass@1 accuracy, achieving 98.2% on HumanEval, 84.8% on MBPP, 59.3% on LiveCode, 62.6% on APPS, and 34.7% on CodeContest, using GPT-4o as the backbone.
CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic
Yao, Huaiyuan, Da, Longchao, Nandam, Vishnu, Turnau, Justin, Liu, Zhiwei, Pang, Linsey, Wei, Hua
The integration of autonomous vehicles into urban traffic has great potential to improve efficiency by reducing congestion and optimizing traffic flow systematically. In this paper, we introduce CoMAL (Collaborative Multi-Agent LLMs), a framework designed to address the mixed-autonomy traffic problem by collaboration among autonomous vehicles to optimize traffic flow. CoMAL is built upon large language models, operating in an interactive traffic simulation environment. It utilizes a Perception Module to observe surrounding agents and a Memory Module to store strategies for each agent. The overall workflow includes a Collaboration Module that encourages autonomous vehicles to discuss the effective strategy and allocate roles, a reasoning engine to determine optimal behaviors based on assigned roles, and an Execution Module that controls vehicle actions using a hybrid approach combining rule-based models. Experimental results demonstrate that CoMAL achieves superior performance on the Flow benchmark. Additionally, we evaluate the impact of different language models and compare our framework with reinforcement learning approaches. It highlights the strong cooperative capability of LLM agents and presents a promising solution to the mixed-autonomy traffic challenge. The code is available at https://github.com/Hyan-Yao/CoMAL.
Supervised Learning with Evolving Tasks and Performance Guarantees
Álvarez, Verónica, Mazuelas, Santiago, Lozano, Jose A.
Multiple supervised learning scenarios are composed by a sequence of classification tasks. For instance, multi-task learning and continual learning aim to learn a sequence of tasks that is either fixed or grows over time. Existing techniques for learning tasks that are in a sequence are tailored to specific scenarios, lacking adaptability to others. In addition, most of existing techniques consider situations in which the order of the tasks in the sequence is not relevant. However, it is common that tasks in a sequence are evolving in the sense that consecutive tasks often have a higher similarity. This paper presents a learning methodology that is applicable to multiple supervised learning scenarios and adapts to evolving tasks. Differently from existing techniques, we provide computable tight performance guarantees and analytically characterize the increase in the effective sample size. Experiments on benchmark datasets show the performance improvement of the proposed methodology in multiple scenarios and the reliability of the presented performance guarantees.
Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding
Elhenawy, Mohammed, Ashqar, Huthaifa I., Rakotonirainy, Andry, Alhadidi, Taqwa I., Jaber, Ahmed, Tami, Mohammad Abu
Scene understanding is essential for enhancing driver safety, generating human-centric explanations for Automated Vehicle (AV) decisions, and leveraging Artificial Intelligence (AI) for retrospective driving video analysis. This study developed a dynamic scene retrieval system using Contrastive Language-Image Pretraining (CLIP) models, which can be optimized for real-time deployment on edge devices. The proposed system outperforms state-of-the-art in-context learning methods, including the zero-shot capabilities of GPT-4o, particularly in complex scenarios. By conducting frame-level analysis on the Honda Scenes Dataset, which contains a collection of about 80 hours of annotated driving videos capturing diverse real-world road and weather conditions, our study highlights the robustness of CLIP models in learning visual concepts from natural language supervision. Results also showed that fine-tuning the CLIP models, such as ViT-L/14 and ViT-B/32, significantly improved scene classification, achieving a top F1 score of 91.1%. These results demonstrate the ability of the system to deliver rapid and precise scene recognition, which can be used to meet the critical requirements of Advanced Driver Assistance Systems (ADAS). This study shows the potential of CLIP models to provide scalable and efficient frameworks for dynamic scene understanding and classification. Furthermore, this work lays the groundwork for advanced autonomous vehicle technologies by fostering a deeper understanding of driver behavior, road conditions, and safety-critical scenarios, marking a significant step toward smarter, safer, and more context-aware autonomous driving systems.
Long-range Brain Graph Transformer
Yu, Shuo, Jin, Shan, Li, Ming, Sarwar, Tabinda, Xia, Feng
Understanding communication and information processing among brain regions of interest (ROIs) is highly dependent on long-range connectivity, which plays a crucial role in facilitating diverse functional neural integration across the entire brain. However, previous studies generally focused on the short-range dependencies within brain networks while neglecting the long-range dependencies, limiting an integrated understanding of brain-wide communication. To address this limitation, we propose Adaptive Long-range aware TransformER (ALTER), a brain graph transformer to capture long-range dependencies between brain ROIs utilizing biased random walk. Specifically, we present a novel long-range aware strategy to explicitly capture long-range dependencies between brain ROIs. By guiding the walker towards the next hop with higher correlation value, our strategy simulates the real-world brain-wide communication. Furthermore, by employing the transformer framework, ALERT adaptively integrates both short- and long-range dependencies between brain ROIs, enabling an integrated understanding of multi-level communication across the entire brain. Extensive experiments on ABIDE and ADNI datasets demonstrate that ALTER consistently outperforms generalized state-of-the-art graph learning methods (including SAN, Graphormer, GraphTrans, and LRGNN) and other graph learning based brain network analysis methods (including FBNETGEN, BrainNetGNN, BrainGNN, and BrainNETTF) in neurological disease diagnosis. Cases of long-range dependencies are also presented to further illustrate the effectiveness of ALTER. The implementation is available at https://github.com/yushuowiki/ALTER.
Extending LLMs to New Languages: A Case Study of Llama and Persian Adaptation
Sani, Samin Mahdizadeh, Sadeghi, Pouya, Vu, Thuy-Trang, Yaghoobzadeh, Yadollah, Haffari, Gholamreza
Large language models (LLMs) have made great progress in classification and text generation tasks. However, they are mainly trained on English data and often struggle with low-resource languages. In this study, we explore adding a new language, i.e., Persian, to Llama (a model with a limited understanding of Persian) using parameter-efficient fine-tuning. We employ a multi-stage approach involving pretraining on monolingual Persian data, aligning representations through bilingual pretraining and instruction datasets, and instruction-tuning with task-specific datasets. We evaluate the model's performance at each stage on generation and classification tasks. Our findings suggest that incorporating the Persian language, through bilingual data alignment, can enhance classification accuracy for Persian tasks, with no adverse impact and sometimes even improvements on English tasks. Additionally, the results highlight the model's initial strength as a critical factor when working with limited training data, with cross-lingual alignment offering minimal benefits for the low-resource language. Knowledge transfer from English to Persian has a marginal effect, primarily benefiting simple classification tasks.
Demystifying Domain-adaptive Post-training for Financial LLMs
Ke, Zixuan, Ming, Yifei, Nguyen, Xuan-Phi, Xiong, Caiming, Joty, Shafiq
Domain-adaptive post-training of large language models (LLMs) has emerged as a promising approach for specialized domains such as medicine and finance. However, significant challenges remain in identifying optimal adaptation criteria and training strategies across varying data and model configurations. To address these challenges, we introduce FINDAP, a systematic and fine-grained investigation into domain-adaptive post-training of LLMs for the finance domain. Our approach begins by identifying the core capabilities required for the target domain and designing a comprehensive evaluation suite aligned with these needs. We then analyze the effectiveness of key post-training stages, including continual pretraining, instruction tuning, and preference alignment. Building on these insights, we propose an effective training recipe centered on a novel preference data distillation method, which leverages process signals from a generative reward model. The resulting model, Llama-Fin, achieves state-of-the-art performance across a wide range of financial tasks. Our analysis also highlights how each post-training stage contributes to distinct capabilities, uncovering specific challenges and effective solutions, providing valuable insights for domain adaptation of LLMs. Project page: https://github.com/SalesforceAIResearch/FinDap
Ethical Concerns of Generative AI and Mitigation Strategies: A Systematic Mapping Study
Huang, Yutan, Arora, Chetan, Houng, Wen Cheng, Kanij, Tanjila, Madulgalla, Anuradha, Grundy, John
The evolution of Generative AI, particularly Large Language Models (LLMs), has seen remarkable advancements since 2020 with the introduction of models like Chat-GPT and Bard. LLMs have revolutionized tasks, such as writing assistance, code generation, and customer support automation, by leveraging vast amounts of data to generate coherent and contextually relevant natural language (NL) responses [1, 2]. As a subset of Generative AI--systems designed to create new content--LLMs go beyond traditional AI techniques, which focus primarily on analyzing existing data. LLMs, in contrast, are capable of generating text, images, and music that mimic human creativity [3]. This capability is powered by advancements in neural network architectures, especially transformers, which enable LLMs to learn the nuances of human language and produce semantically accurate content [4].
Discovering new robust local search algorithms with neuro-evolution
Sakhri, Mohamed Salim Amri, Goëffon, Adrien, Goudet, Olivier, Saubion, Frédéric, Touhami, Chaïmaâ
This paper explores a novel approach aimed at overcoming existing challenges in the realm of local search algorithms. Our aim is to improve the decision process that takes place within a local search algorithm so as to make the best possible transitions in the neighborhood at each iteration. To improve this process, we propose to use a neural network that has the same input information as conventional local search algorithms. In this paper, which is an extension of the work [Goudet et al. 2024] presented at EvoCOP2024, we investigate different ways of representing this information so as to make the algorithm as efficient as possible but also robust to monotonic transformations of the problem objective function. To assess the efficiency of this approach, we develop an experimental setup centered around NK landscape problems, offering the flexibility to adjust problem size and ruggedness. This approach offers a promising avenue for the emergence of new local search algorithms and the improvement of their problem-solving capabilities for black-box problems.