Goto

Collaborating Authors

 Indian Ocean


Breaking Focus: Contextual Distraction Curse in Large Language Models

arXiv.org Artificial Intelligence

Large Language Models (LLMs) (Zhou et al., 2023b) have demonstrated remarkable capabilities across various Natural Language Processing (NLP) tasks, revolutionizing wide downstream applications such as medicine (Zhao et al., 2023), education (Kasneci et al., 2023), and science (Li et al., 2024b; Guo et al., 2023; Huang et al., 2024e). Despite their impressive performance, recent studies have exposed various vulnerabilities in LLMs, including susceptibility to jailbreaking attacks (Zou et al., 2023), hallucination issues (Xu et al., 2024b), and consistency problems (Liang et al., 2024; Huang et al., 2024a). These vulnerabilities highlight the limitations of LLMs in handling nuanced and adversarial scenarios, making it critical to uncover and analyze additional weaknesses to improve their reliability. In this work, we investigate a novel vulnerability termed Contextual Distraction Vulnerability (CDV), where semantically coherent but non-essential contextual additions to a question degrade LLM performance. For instance, a customer service chatbot might miss a refund request hidden in a short story about discovering products through social media influencers. Similarly, a technical query about machine learning could be misunderstood if it's preceded by a student's emotional account of exam preparation anxiety. Unlike adversarial attacks that inject semantically meaningless noise into inputs (Zou et al., 2023; Shi et al., 2024) and distraction brought by long-context input (Bai et al., 2023), for CDV, our study demonstrates that semantically coherent without a long context yet contextually distracting modifications are sufficient to disrupt the decision-making process of even the most advanced LLMs. This vulnerability underscores a critical weakness in LLMs' ability to filter out irrelevant information and prioritize core knowledge, which is essential for robust reasoning. Recent studies have demonstrated the powerful generative capabilities of LLM Xu et al. (2024a); Wu et al. (2024), To systematically investigate this vulnerability, we propose a methodology for


VQLTI: Long-Term Tropical Cyclone Intensity Forecasting with Physical Constraints

arXiv.org Artificial Intelligence

Tropical cyclone (TC) intensity forecasting is crucial for early disaster warning and emergency decision-making. Numerous researchers have explored deep-learning methods to address computational and post-processing issues in operational forecasting. Regrettably, they exhibit subpar long-term forecasting capabilities. We use two strategies to enhance long-term forecasting. (1) By enhancing the matching between TC intensity and spatial information, we can improve long-term forecasting performance. (2) Incorporating physical knowledge and physical constraints can help mitigate the accumulation of forecasting errors. To achieve the above strategies, we propose the VQLTI framework. VQLTI transfers the TC intensity information to a discrete latent space while retaining the spatial information differences, using large-scale spatial meteorological data as conditions. Furthermore, we leverage the forecast from the weather prediction model FengWu to provide additional physical knowledge for VQLTI. Additionally, we calculate the potential intensity (PI) to impose physical constraints on the latent variables. In the global long-term TC intensity forecasting, VQLTI achieves state-of-the-art results for the 24h to 120h, with the MSW (Maximum Sustained Wind) forecast error reduced by 35.65%-42.51% compared to ECMWF-IFS.


Fundamental Computational Limits in Pursuing Invariant Causal Prediction and Invariance-Guided Regularization

arXiv.org Machine Learning

Pursuing invariant prediction from heterogeneous environments opens the door to learning causality in a purely data-driven way and has several applications in causal discovery and robust transfer learning. However, existing methods such as ICP [Peters et al., 2016] and EILLS [Fan et al., 2024] that can attain sample-efficient estimation are based on exponential time algorithms. In this paper, we show that such a problem is intrinsically hard in computation: the decision problem, testing whether a non-trivial prediction-invariant solution exists across two environments, is NP-hard even for the linear causal relationship. In the world where P$\neq$NP, our results imply that the estimation error rate can be arbitrarily slow using any computationally efficient algorithm. This suggests that pursuing causality is fundamentally harder than detecting associations when no prior assumption is pre-offered. Given there is almost no hope of computational improvement under the worst case, this paper proposes a method capable of attaining both computationally and statistically efficient estimation under additional conditions. Furthermore, our estimator is a distributionally robust estimator with an ellipse-shaped uncertain set where more uncertainty is placed on spurious directions than invariant directions, resulting in a smooth interpolation between the most predictive solution and the causal solution by varying the invariance hyper-parameter. Non-asymptotic results and empirical applications support the claim.


Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization

arXiv.org Artificial Intelligence

Ensuring contextual faithfulness in retrieval-augmented large language models (LLMs) is crucial for building trustworthy information-seeking systems, particularly in long-form question-answering (LFQA) scenarios. In this work, we identify a salient correlation between LFQA faithfulness and retrieval heads, a set of attention heads responsible for retrieving contextual information. Leveraging this insight, we propose RHIO, a framework designed to teach LLMs to explicitly discriminate between faithful and unfaithful generations. RHIO first augments unfaithful samples that simulate realistic model-intrinsic errors by selectively masking retrieval heads. Then, these samples are incorporated into joint training, enabling the model to distinguish unfaithful outputs from faithful ones conditioned on control tokens. Furthermore, these control tokens are leveraged to self-induce contrastive outputs, amplifying their difference through contrastive decoding. Additionally, to facilitate the evaluation of contextual faithfulness, we also introduce GroundBench, a comprehensive benchmark compiled from five existing LFQA datasets. Extensive experimental results on GroundBench demonstrate that RHIO significantly improves faithfulness, even outperforming GPT-4o.


Enhancing kelp forest detection in remote sensing images using crowdsourced labels with Mixed Vision Transformers and ConvNeXt segmentation models

arXiv.org Artificial Intelligence

Kelp forests, as foundation species, are vital to marine ecosystems, providing essential food and habitat for numerous organisms. This study explores the integration of crowdsourced labels with advanced artificial intelligence models to develop a fast and accurate kelp canopy detection pipeline using Landsat images. Building on the success of a machine learning competition, where this approach ranked third and performed consistently well on both local validation and public and private leaderboards, the research highlights the effectiveness of combining Mixed Vision Transformers (MIT) with ConvNeXt models. Training these models on various image sizes significantly enhanced the accuracy of the ensemble results. U-Net emerged as the best segmentation architecture, with UpperNet also contributing to the final ensemble. Key Landsat bands, such as ShortWave InfraRed (SWIR1) and Near-InfraRed (NIR), were crucial while altitude data was used in postprocessing to eliminate false positives on land. The methodology achieved a high detection rate, accurately identifying about three out of four pixels containing kelp canopy while keeping false positives low. Despite the medium resolution of Landsat satellites, their extensive historical coverage makes them effective for studying kelp forests. This work also underscores the potential of combining machine learning models with crowdsourced data for effective and scalable environmental monitoring. All running code for training all models and inference can be found at https://github.com/IoannisNasios/Kelp_Forests.


Characterizing the Effects of Translation on Intertextuality using Multilingual Embedding Spaces

arXiv.org Artificial Intelligence

Rhetorical devices are difficult to translate, but they are crucial to the translation of literary documents. We investigate the use of multilingual embedding spaces to characterize the preservation of intertextuality, one common rhetorical device, across human and machine translation. To do so, we use Biblical texts, which are both full of intertextual references and are highly translated works. We provide a metric to characterize intertextuality at the corpus level and provide a quantitative analysis of the preservation of this rhetorical device across extant human translations and machine-generated counterparts. We go on to provide qualitative analysis of cases wherein human translations over- or underemphasize the intertextuality present in the text, whereas machine translations provide a neutral baseline. This provides support for established scholarship proposing that human translators have a propensity to amplify certain literary characteristics of the original manuscripts.


LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering

arXiv.org Artificial Intelligence

Multiple Choice Question Answering (MCQA) is an important problem with numerous real-world applications, such as medicine, law, and education. The high cost of building MCQA datasets makes few-shot learning pivotal in this domain. While Large Language Models (LLMs) can enable few-shot learning, their direct application in real-world scenarios is often hindered by their high computational cost. To address this challenge, we propose a simple yet effective approach that uses LLMs for data generation and scoring. Our approach utilizes LLMs to create MCQA data which contains questions and choices, and to assign probability scores to the generated choices. We then use the generated data and LLM-assigned scores to finetune a smaller and more efficient encoder-only model, DeBERTa-v3-base by leveraging distillation loss. Extensive experiments on the Massive Multitask Language Understanding (MMLU) benchmark demonstrate that our method improves accuracy from 28.9% to 39.3%, representing a gain of over 10% compared to a baseline finetuned directly on 5-shot examples. This shows the effectiveness of LLM-driven data generation and knowledge distillation for few-shot MCQA.


2025 in SPACEFLIGHT: The incredible missions set to take off next year, revealed - from China's daring asteroid retrieval to the first private trip to Venus

Daily Mail - Science & tech

From NASA's mission to study Jupiter's icy moon Europa to Elon Musk's SpaceX catching its Starship rocket mid-air, there's no doubt 2024 saw some incredible space feats. 'In 2024, NASA made leap after giant leap to explore, discover, and inspire – all while bringing real, tangible, and substantial benefits to the American people and to all of humanity,' said NASA Administrator Bill Nelson. And 2025 is set to be an even more remarkable year for space agencies and companies around the world, who have an assortment of exciting missions lined up. Among them are NASA, which is sending two twin spacecraft to Mars – although its upcoming return to the moon has been delayed yet again. There's also the European Space Agency, which is set to launch its futuristic'Space Rider' spaceplane – described as a'robotic laboratory the size of two minivans'.


Logic-Constrained Shortest Paths for Flight Planning

arXiv.org Artificial Intelligence

The Logic-Constrained Shortest Path Problem (LCSP) combines a one-to-one shortest path problem with satisfiability constraints imposed on the routing graph. This setting arises in flight planning, where air traffic control (ATC) authorities are enforcing a set of traffic flow restrictions (TFRs) on aircraft routes in order to increase safety and throughput. We propose a new branch and bound-based algorithm for the LCSP. The resulting algorithm has three main degrees of freedom: the node selection rule, the branching rule and the conflict. While node selection and branching rules have been long studied in the MIP and SAT communities, most of them cannot be applied out of the box for the LCSP. We review the existing literature and develop tailored variants of the most prominent rules. The conflict, the set of variables to which the branching rule is applied, is unique to the LCSP. We analyze its theoretical impact on the B&B algorithm. In the second part of the paper, we show how to model the Flight Planning Problem with TFRs as an LCSP and solve it using the branch and bound algorithm. We demonstrate the algorithm's efficiency on a dataset consisting of a global flight graph and a set of around 20000 real TFRs obtained from our industry partner Lufthansa Systems GmbH. We make this dataset publicly available. Finally, we conduct an empirical in-depth analysis of node selection rules, branching rules and conflicts. Carefully choosing an appropriate combination yields an improvement of an order of magnitude compared to an uninformed choice.


Dialogue with the Machine and Dialogue with the Art World: Evaluating Generative AI for Culturally-Situated Creativity

arXiv.org Artificial Intelligence

This paper proposes dialogue as a method for evaluating generative AI tools for culturally-situated creative practice, that recognizes the socially situated nature of art. Drawing on sociologist Howard Becker's concept of Art Worlds, this method expands the scope of traditional AI and creativity evaluations beyond benchmarks, user studies with crowd-workers, or focus groups conducted with artists. Our method involves two mutually informed dialogues: 1) 'dialogues with art worlds' placing artists in conversation with experts such as art historians, curators, and archivists, and 2)'dialogues with the machine,' facilitated through structured artist- and critic-led experimentation with state-of-the-art generative AI tools. We demonstrate the value of this method through a case study with artists and experts steeped in non-western art worlds, specifically the Persian Gulf. We trace how these dialogues help create culturally rich and situated forms of evaluation for representational possibilities of generative AI that mimic the reception of generative artwork in the broader art ecosystem. Putting artists in conversation with commentators also allow artists to shift their use of the tools to respond to their cultural and creative context. Our study can provide generative AI researchers an understanding of the complex dynamics of technology, human creativity and the socio-politics of art worlds, to build more inclusive machines for diverse art worlds.