Oceania
Logic-Constrained Shortest Paths for Flight Planning
Euler, Ricardo, Casas, Pedro Maristany de las, Borndörfer, Ralf
The Logic-Constrained Shortest Path Problem (LCSP) combines a one-to-one shortest path problem with satisfiability constraints imposed on the routing graph. This setting arises in flight planning, where air traffic control (ATC) authorities are enforcing a set of traffic flow restrictions (TFRs) on aircraft routes in order to increase safety and throughput. We propose a new branch and bound-based algorithm for the LCSP. The resulting algorithm has three main degrees of freedom: the node selection rule, the branching rule and the conflict. While node selection and branching rules have been long studied in the MIP and SAT communities, most of them cannot be applied out of the box for the LCSP. We review the existing literature and develop tailored variants of the most prominent rules. The conflict, the set of variables to which the branching rule is applied, is unique to the LCSP. We analyze its theoretical impact on the B&B algorithm. In the second part of the paper, we show how to model the Flight Planning Problem with TFRs as an LCSP and solve it using the branch and bound algorithm. We demonstrate the algorithm's efficiency on a dataset consisting of a global flight graph and a set of around 20000 real TFRs obtained from our industry partner Lufthansa Systems GmbH. We make this dataset publicly available. Finally, we conduct an empirical in-depth analysis of node selection rules, branching rules and conflicts. Carefully choosing an appropriate combination yields an improvement of an order of magnitude compared to an uninformed choice.
MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond
Huzaifah, Muhammad, Lin, Geyu, Liu, Tianchi, Sailor, Hardik B., Tan, Kye Min, Vangani, Tarun K., Wang, Qiongqiong, Wong, Jeremy H. M., Chen, Nancy F., Aw, Ai Ti
This technical report describes the MERaLiON-SpeechEncoder, a foundation model designed to support a wide range of downstream speech applications. Developed as part of Singapore's National Multimodal Large Language Model Programme, the MERaLiON-SpeechEncoder is tailored to address the speech processing needs in Singapore and the surrounding Southeast Asian region. The model currently supports mainly English, including the variety spoken in Singapore. We are actively expanding our datasets to gradually cover other languages in subsequent releases. The MERaLiON-SpeechEncoder was pre-trained from scratch on 200,000 hours of unlabelled speech data using a self-supervised learning approach based on masked language modelling. We describe our training procedure and hyperparameter tuning experiments in detail below. Our evaluation demonstrates improvements to spontaneous and Singapore speech benchmarks for speech recognition, while remaining competitive to other state-of-the-art speech encoders across ten other speech tasks. We commit to releasing our model, supporting broader research endeavours, both in Singapore and beyond.
ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks
Shukla, Arth, Tao, Stone, Su, Hao
Figure 1: Live-rendered frames taken from ManiSkill-HAB environments while running policy rollouts with skill chaining. High-quality benchmarks are the foundation for embodied AI research, enabling significant advancements in long-horizon navigation, manipulation and rearrangement tasks. However, as frontier tasks in robotics get more advanced, they require faster simulation speed, more intricate test environments, and larger demonstration datasets. To this end, we present MS-HAB, a holistic benchmark for lowlevel manipulation and in-home object rearrangement. First, we provide a GPUaccelerated implementation of the Home Assistant Benchmark (HAB). We support realistic low-level control and achieve over 3x the speed of previous magical grasp implementations at similar GPU memory usage. Second, we train extensive reinforcement learning (RL) and imitation learning (IL) baselines for future work to compare against. Finally, we develop a rule-based trajectory filtering system to sample specific demonstrations from our RL policies which match predefined criteria for robot behavior and safety. Combining demonstration filtering with our fast environments enables efficient, controlled data generation at scale. An important goal of embodied AI is to create robots that can solve manipulation tasks in home-scale environments. Recently, faster and more realistic simulation, home-scale rearrangement benchmarks, and large robot datasets have provided important platforms to accelerate research towards this goal. However, there remains a need for all of these features in one unified benchmark. Fast Manipulation Simulation with Realistic Physics and Rendering: Using ManiSkill3 (Tao et al., 2024), we implement a GPU-accelerated version of the HAB (Szot et al., 2021), an apartmentscale rearrangement benchmark containing three long-horizon tasks using the Fetch mobile manipulator (ZebraTechnologies, 2024). While the original HAB uses magical grasp (teleport closest object within 15cm to the gripper), we require realistic grasping.
Federated Unlearning Model Recovery in Data with Skewed Label Distributions
Yu, Xinrui, Pei, Wenbin, Xue, Bing, Zhang, Qiang
In federated learning, federated unlearning is a technique that provides clients with a rollback mechanism that allows them to withdraw their data contribution without training from scratch. However, existing research has not considered scenarios with skewed label distributions. Unfortunately, the unlearning of a client with skewed data usually results in biased models and makes it difficult to deliver high-quality service, complicating the recovery process. This paper proposes a recovery method of federated unlearning with skewed label distributions. Specifically, we first adopt a strategy that incorporates oversampling with deep learning to supplement the skewed class data for clients to perform recovery training, therefore enhancing the completeness of their local datasets. Afterward, a density-based denoising method is applied to remove noise from the generated data, further improving the quality of the remaining clients' datasets. Finally, all the remaining clients leverage the enhanced local datasets and engage in iterative training to effectively restore the performance of the unlearning model. Extensive evaluations on commonly used federated learning datasets with varying degrees of skewness show that our method outperforms baseline methods in restoring the performance of the unlearning model, particularly regarding accuracy on the skewed class.
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Chang, Tyler A., Rajagopal, Dheeraj, Bolukbasi, Tolga, Dixon, Lucas, Tenney, Ian
Training data attribution (TDA) methods aim to attribute model outputs back to specific training examples, and the application of these methods to large language model (LLM) outputs could significantly advance model transparency and data curation. However, it has been challenging to date to apply these methods to the full scale of LLM pretraining. In this paper, we refine existing gradient-based methods to work effectively at scale, allowing us to retrieve influential examples for an 8B-parameter language model from a pretraining corpus of over 160B tokens with no need for subsampling or pre-filtering. Our method combines several techniques, including optimizer state correction, a task-specific Hessian approximation, and normalized encodings, which we find to be critical for performance at scale. In quantitative evaluations on a fact tracing task, our method performs best at identifying examples that influence model predictions, but classical, model-agnostic retrieval methods such as BM25 still perform better at finding passages which explicitly contain relevant facts. These results demonstrate a misalignment between factual *attribution* and causal *influence*. With increasing model size and training tokens, we find that influence more closely aligns with factual attribution. Finally, we examine different types of examples identified as influential by our method, finding that while many directly entail a particular fact, others support the same output by reinforcing priors on relation types, common entities, and names. We release our prompt set and model outputs, along with a web-based visualization tool to explore influential examples for factual predictions, commonsense reasoning, arithmetic, and open-ended generation for an 8B-parameter LLM.
The Clear Sky Corridor: Insights Towards Aerosol Formation in Exoplanets Using An AI-based Survey of Exoplanet Atmospheres
Ashtari, Reza, Stevenson, Kevin B., Sing, David, Lopez-Morales, Mercedes, Alam, Munazza K., Nikolov, Nikolay K., Evans-Soma, Thomas M.
Producing optimized and accurate transmission spectra of exoplanets from telescope data has traditionally been a manual and labor-intensive procedure. Here we present the results of the first attempt to improve and standardize this procedure using artificial intelligence (AI) based processing of light curves and spectroscopic data from transiting exoplanets observed with the Hubble Space Telescope's (HST) Wide Field Camera 3 (WFC3) instrument. We implement an AI-based parameter optimizer that autonomously operates the Eureka pipeline to produce homogeneous transmission spectra of publicly available HST WFC3 datasets, spanning exoplanet types from hot Jupiters to sub-Neptunes. Surveying 42 exoplanets with temperatures between 280 and 2580 Kelvin, we confirm modeled relationships between the amplitude of the water band at 1.4um in hot Jupiters and their equilibrium temperatures. We also identify a similar, novel trend in Neptune/sub-Neptune atmospheres, but shifted to cooler temperatures. Excitingly, a planet mass versus equilibrium temperature diagram reveals a "Clear Sky Corridor," where planets between 700 and 1700 Kelvin (depending on the mass) show stronger 1.4um H2O band measurements. This novel trend points to metallicity as a potentially important driver of aerosol formation. As we unveil and include these new discoveries into our understanding of aerosol formation, we enter a thrilling future for the study of exoplanet atmospheres. With HST sculpting this foundational understanding for aerosol formation in various exoplanet types, ranging from Jupiters to sub-Neptunes, we present a compelling platform for the James Webb Space Telescope (JWST) to discover similar atmospheric trends for more planets across a broader wavelength range.
Factored space models: Towards causality between levels of abstraction
Garrabrant, Scott, Mayer, Matthias Georg, Wache, Magdalena, Lang, Leon, Eisenstat, Sam, Dell, Holger
Causality plays an important role in understanding intelligent behavior, and there is a wealth of literature on mathematical models for causality, most of which is focused on causal graphs. Causal graphs are a powerful tool for a wide range of applications, in particular when the relevant variables are known and at the same level of abstraction. However, the given variables can also be unstructured data, like pixels of an image. Meanwhile, the causal variables, such as the positions of objects in the image, can be arbitrary deterministic functions of the given variables. Moreover, the causal variables may form a hierarchy of abstractions, in which the macro-level variables are deterministic functions of the micro-level variables. Causal graphs are limited when it comes to modeling this kind of situation. In the presence of deterministic relationships there is generally no causal graph that satisfies both the Markov condition and the faithfulness condition. We introduce factored space models as an alternative to causal graphs which naturally represent both probabilistic and deterministic relationships at all levels of abstraction. Moreover, we introduce structural independence and establish that it is equivalent to statistical independence in every distribution that factorizes over the factored space. This theorem generalizes the classical soundness and completeness theorem for d-separation.
Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models
Background: Despite the current ubiquity of Large Language Models (LLMs) across the medical domain, there is a surprising lack of studies which address their reasoning behaviour. We emphasise the importance of understanding reasoning behaviour as opposed to high-level prediction accuracies, since it is equivalent to explainable AI (XAI) in this context. In particular, achieving XAI in medical LLMs used in the clinical domain will have a significant impact across the healthcare sector. Results: Therefore, we define the concept of reasoning behaviour in the specific context of medical LLMs. We then categorise and discuss the current state of the art of methods which evaluate reasoning behaviour in medical LLMs. Finally, we propose theoretical frameworks which can empower medical professionals or machine learning engineers to gain insight into the low-level reasoning operations of these previously obscure models. Conclusion: The subsequent increased transparency and trust in medical machine learning models by clinicians as well as patients will accelerate the integration, application as well as further development of medical AI for the healthcare system as a whole
Combining Domain-Specific Models and LLMs for Automated Disease Phenotyping from Survey Data
Beeri, Gal, Chamot, Benoit, Latchem, Elena, Venkatesh, Shruthi, Whalan, Sarah, Kruger, Van Zyl, Martino, David
Funding and support: The Generative AI Challenge is funded by grants from the Future Health Research and Innovation Fund (FHRIF), Grant ID IC2023-GAIA/11. Conflict of interest statement: The authors declare no conflicts of interest. Abstract This exploratory pilot study investigated the potential of combining a domain-specific model, BERN2, with large language models (LLMs) to enhance automated disease phenotyping from research survey data. Motivated by the need for efficient and accurate methods to harmonize the growing volume of survey data with standardized disease ontologies, we employed BERN2, a biomedical named entity recognition and normalization model, to extract disease information from the ORIGINS birth cohort survey data. After rigorously evaluating BERN2's performance against a manually curated ground truth dataset, we integrated various LLMs using prompt engineering, Retrieval-Augmented Generation (RAG), and Instructional Fine-Tuning (IFT) to refine the model's outputs. BERN2 demonstrated high performance in extracting and normalizing disease mentions, and the integration of LLMs, particularly with Few Shot Inference and RAG orchestration, further improved accuracy. This approach, especially when incorporating structured examples, logical reasoning prompts, and detailed context, offers a promising avenue for developing tools to enable efficient cohort profiling and data harmonization across large, heterogeneous research datasets. Introduction The increasing availability of research survey data from cohort studies and clinical trials offers unprecedented opportunities to advance biomedical research and improve healthcare (1).
Semantic Role Labeling of NomBank Partitives
Meyers, Adam, Savant, Advait Pravin, Ortega, John E.
This article is about Semantic Role Labeling for English partitive nouns (5%/REL of the price/ARG1; The price/ARG1 rose 5 percent/REL) in the NomBank annotated corpus. Several systems are described using traditional and transformer-based machine learning, as well as ensembling. Our highest scoring system achieves an F1 of 91.74% using "gold" parses from the Penn Treebank and 91.12% when using the Berkeley Neural parser. This research includes both classroom and experimental settings for system development.