Antarctica
Unraveling Anomalies in Time: Unsupervised Discovery and Isolation of Anomalous Behavior in Bio-regenerative Life Support System Telemetry
Rewicki, Ferdinand, Gawlikowski, Jakob, Niebling, Julia, Denzler, Joachim
Bio-regenerative Life Support Systems (BLSSs) are artificial ecosystems that consist of multiple symbiotic relationships. BLSSs are crucial for sustaining long-duration space missions by facilitating food production and managing essential material cycles for respiratory air, water, biomass, and waste. The EDEN NEXT GEN Project, part of the EDEN roadmap at the German Aerospace Center (DLR), aims to develop a fully integrated ground demonstrator of a BLSS comprising all subsystems, with the ultimate goal of realizing a flight-ready BLSS within the next decade. This initiative builds upon insights from the EDEN ISS project, which investigated controlled environment agriculture (CEA) technologies for space exploration. EDEN ISS, a near-closed-loop research greenhouse deployed in Antarctica from 2017 to 2021, focused on crop production, including lettuces, bell peppers, leafy greens, and various herbs. To ensure the safe and stable operation of BLSSs, we explore methods to mitigate risks regarding system health, particularly regarding food production and nourishment shortages for isolated crews.
Embedding machine-learnt sub-grid variability improves climate model biases
Giles, Daniel, Briant, James, Morcrette, Cyril J., Guillas, Serge
The under-representation of cloud formation is a long-standing bias associated with climate simulations. Parameterisation schemes are required to capture cloud processes within current climate models but have known biases. We overcome these biases by embedding a Multi-Output Gaussian Process (MOGP) trained on high resolution Unified Model simulations to represent the variability of temperature and specific humidity within a climate model. A trained MOGP model is coupled in-situ with a simplified Atmospheric General Circulation Model named SPEEDY. The temperature and specific humidity profiles of SPEEDY are perturbed at fixed intervals according to the variability predicted from the MOGP. Ten-year predictions are generated for both control and ML-hybrid models. The hybrid model reduces the global precipitation bias by 18\% and over the tropics by 22\%. To further understand the drivers of these improvements, physical quantities of interest are explored, such as the distribution of lifted index values and the alteration of the Hadley cell. The control and hybrid set-ups are also run in a plus 4K sea-surface temperature experiment to explore the effects of the approach on patterns relating to cloud cover and precipitation in a warmed climate setting.
You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes
Magomere, Jabez, Ishida, Shu, Afonja, Tejumade, Salama, Aya, Kochin, Daniel, Yuehgoh, Foutse, Hamzaoui, Imane, Sefala, Raesetje, Alaagib, Aisha, Semenova, Elizaveta, Crais, Lauren, Hall, Siobhan Mackenzie
Foundation models are increasingly ubiquitous in our daily lives, used in everyday tasks such as text-image searches, interactions with chatbots, and content generation. As use increases, so does concern over the disparities in performance and fairness of these models for different people in different parts of the world. To assess these growing regional disparities, we present World Wide Dishes, a mixed text and image dataset consisting of 765 dishes, with dish names collected in 131 local languages. World Wide Dishes has been collected purely through human contribution and decentralised means, by creating a website widely distributed through social networks. Using the dataset, we demonstrate a novel means of operationalising capability and representational biases in foundation models such as language models and text-to-image generative models. We enrich these studies with a pilot community review to understand, from a first-person perspective, how these models generate images for people in five African countries and the United States. We find that these models generally do not produce quality text and image outputs of dishes specific to different regions. This is true even for the US, which is typically considered to be more well-resourced in training data - though the generation of US dishes does outperform that of the investigated African countries. The models demonstrate a propensity to produce outputs that are inaccurate as well as culturally misrepresentative, flattening, and insensitive. These failures in capability and representational bias have the potential to further reinforce stereotypes and disproportionately contribute to erasure based on region. The dataset and code are available at https://github.com/oxai/world-wide-dishes/.
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
Hou, Bairu, Zhang, Yang, Andreas, Jacob, Chang, Shiyu
This paper focuses on the task of hallucination detection, which aims to determine the truthfulness of LLM-generated statements. To address this problem, a popular class of methods utilize the LLM's self-consistencies in its beliefs in a set of logically related augmented statements generated by the LLM, which does not require external knowledge databases and can work with both white-box and black-box LLMs. However, in many existing approaches, the augmented statements tend to be very monotone and unstructured, which makes it difficult to integrate meaningful information from the LLM beliefs in these statements. Also, many methods work with the binarized version of the LLM's belief, instead of the continuous version, which significantly loses information. To overcome these limitations, in this paper, we propose Belief Tree Propagation (BTProp), a probabilistic framework for LLM hallucination detection. BTProp introduces a belief tree of logically related statements by recursively decomposing a parent statement into child statements with three decomposition strategies, and builds a hidden Markov tree model to integrate the LLM's belief scores in these statements in a principled way. Experiment results show that our method improves baselines by 3%-9% (evaluated by AUROC and AUC-PR) on multiple hallucination detection benchmarks. Code is available at https://github.com/UCSB-NLP-Chang/BTProp.
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Kim, Joongwon, Paranjape, Bhargavi, Khot, Tushar, Hajishirzi, Hannaneh
Language agents perform complex tasks by using tools to execute each step precisely. However, most existing agents are based on proprietary models or designed to target specific tasks, such as mathematics or multi-hop question answering. We introduce Husky, a holistic, open-source language agent that learns to reason over a unified action space to address a diverse set of complex tasks involving numerical, tabular, and knowledge-based reasoning. Husky iterates between two stages: 1) generating the next action to take towards solving a given task and 2) executing the action using expert models and updating the current solution state. We identify a thorough ontology of actions for addressing complex tasks and curate high-quality data to train expert models for executing these actions. Our experiments show that Husky outperforms prior language agents across 14 evaluation datasets. Moreover, we introduce HuskyQA, a new evaluation set which stress tests language agents for mixed-tool reasoning, with a focus on retrieving missing knowledge and performing numerical reasoning. Despite using 7B models, Husky matches or even exceeds frontier LMs such as GPT-4 on these tasks, showcasing the efficacy of our holistic approach in addressing complex reasoning problems. Our code and models are available at https://github.com/agent-husky/Husky-v1.
Stochastic Guidance of Buoyancy Controlled Vehicles under Ice Shelves using Ocean Currents
Rossi, Federico, Branch, Andrew, Schodlok, Michael P., Stanton, Timothy, Fenty, Ian G., Hook, Joshua Vander, Clark, Evan B.
We propose a novel technique for guidance of buoyancy-controlled vehicles in uncertain under-ice ocean flows. In-situ melt rate measurements collected at the grounding zone of Antarctic ice shelves, where the ice shelf meets the underlying bedrock, are essential to constrain models of future sea level rise. Buoyancy-controlled vehicles, which control their vertical position in the water column through internal actuation but have no means of horizontal propulsion, offer an affordable and reliable platform for such in-situ data collection. However, reaching the grounding zone requires vehicles to traverse tens of kilometers under the ice shelf, with approximate position knowledge and no means of communication, in highly variable and uncertain ocean currents. To address this challenge, we propose a partially observable MDP approach that exploits model-based knowledge of the under-ice currents and, critically, of their uncertainty, to synthesize effective guidance policies. The approach uses approximate dynamic programming to model uncertainty in the currents, and QMDP to address localization uncertainty. Numerical experiments show that the policy can deliver up to 88.8% of underwater vehicles to the grounding zone -- a 33% improvement compared to state-of-the-art guidance techniques, and a 262% improvement over uncontrolled drifters. Collectively, these results show that model-based under-ice guidance is a highly promising technique for exploration of under-ice cavities, and has the potential to enable cost-effective and scalable access to these challenging and rarely observed environments.
Woman critically injured in ride 'malfunction'
Woman critically injured in ride'malfunction' 10 hours agoShareBBCOn Sunday tarpaulin was seen around one of the rides, although it is not clear which ride suffered the malfunction A woman in her 40s is in hospital with life-threatening injuries after a funfair ride malfunctioned at a country show in south London, the Met Police has said. The incident happened during Lambeth Country Show in Brockwell Park at about 18:20 BST on Saturday. A man in his 40s is also being treated for "potentially life-threatening injuries", the force said. Lambeth Council said the investigation would "determine the cause of the malfunction". Two other people, a man in his 50s and an 11-year-old girl, were injured in the incident and have since been discharged from hospital, the Met said.
ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models
Heng, Yuzhao, Deng, Chunyuan, Li, Yitong, Yu, Yue, Li, Yinghao, Zhang, Rongzhi, Zhang, Chao
Although Large Language Models (LLMs) exhibit remarkable adaptability across domains, these models often fall short in structured knowledge extraction tasks such as named entity recognition (NER). This paper explores an innovative, cost-efficient strategy to harness LLMs with modest NER capabilities for producing superior NER datasets. Our approach diverges from the basic class-conditional prompts by instructing LLMs to self-reflect on the specific domain, thereby generating domain-relevant attributes (such as category and emotions for movie reviews), which are utilized for creating attribute-rich training data. Furthermore, we preemptively generate entity terms and then develop NER context data around these entities, effectively bypassing the LLMs' challenges with complex structures. Our experiments across both general and niche domains reveal significant performance enhancements over conventional data generation methods while being more cost-effective than existing alternatives.
VERA: Generating Visual Explanations of Two-Dimensional Embeddings via Region Annotation
Poličar, Pavlin G., Zupan, Blaž
Two-dimensional embeddings obtained from dimensionality reduction techniques, such as MDS, t-SNE, and UMAP, are widely used across various disciplines to visualize high-dimensional data. These visualizations provide a valuable tool for exploratory data analysis, allowing researchers to visually identify clusters, outliers, and other interesting patterns in the data. However, interpreting the resulting visualizations can be challenging, as it often requires additional manual inspection to understand the differences between data points in different regions of the embedding space. To address this issue, we propose Visual Explanations via Region Annotation (VERA), an automatic embedding-annotation approach that generates visual explanations for any two-dimensional embedding. VERA produces informative explanations that characterize distinct regions in the embedding space, allowing users to gain an overview of the embedding landscape at a glance. Unlike most existing approaches, which typically require some degree of manual user intervention, VERA produces static explanations, automatically identifying and selecting the most informative visual explanations to show to the user. We illustrate the usage of VERA on a real-world data set and validate the utility of our approach with a comparative user study. Our results demonstrate that the explanations generated by VERA are as useful as fully-fledged interactive tools on typical exploratory data analysis tasks but require significantly less time and effort from the user.
WilKE: Wise-Layer Knowledge Editor for Lifelong Knowledge Editing
Hu, Chenhui, Cao, Pengfei, Chen, Yubo, Liu, Kang, Zhao, Jun
Knowledge editing aims to rectify inaccuracies in large language models (LLMs) without costly retraining for outdated or erroneous knowledge. However, current knowledge editing methods primarily focus on single editing, failing to meet the requirements for lifelong editing. This study reveals a performance degradation encountered by knowledge editing in lifelong editing, characterized by toxicity buildup and toxicity flash, with the primary cause identified as pattern unmatch. We introduce a knowledge editing approach named Wise-Layer Knowledge Editor (WilKE), which selects editing layer based on the pattern matching degree of editing knowledge across different layers in language models. Experimental results demonstrate that, in lifelong editing, WilKE exhibits an average improvement of 46.2% and 67.8% on editing GPT2-XL and GPT-J relative to state-of-the-art knowledge editing methods.