Antarctica
Assessing the Robustness of Retrieval-Augmented Generation Systems in K-12 Educational Question Answering with Knowledge Discrepancies
Zheng, Tianshi, Li, Weihan, Bai, Jiaxin, Wang, Weiqi, Song, Yangqiu
Retrieval-Augmented Generation (RAG) systems have demonstrated remarkable potential as question answering systems in the K-12 Education domain, where knowledge is typically queried within the restricted scope of authoritative textbooks. However, the discrepancy between textbooks and the parametric knowledge in Large Language Models (LLMs) could undermine the effectiveness of RAG systems. To systematically investigate the robustness of RAG systems under such knowledge discrepancies, we present EduKDQA, a question answering dataset that simulates knowledge discrepancies in real applications by applying hypothetical knowledge updates in answers and source documents. EduKDQA includes 3,005 questions covering five subjects, under a comprehensive question typology from the perspective of context utilization and knowledge integration. We conducted extensive experiments on retrieval and question answering performance. We find that most RAG systems suffer from a substantial performance drop in question answering with knowledge discrepancies, while questions that require integration of contextual knowledge and parametric knowledge pose a challenge to LLMs.
Hershey shares jump on Cadbury owner buyout report
Hershey shares jump on Cadbury owner buyout report Getty ImagesMondelez has reportedly made a preliminary approach to the maker of the iconic Hershey's milk chocolate bar Shares in US chocolate maker Hershey have jumped by more than 10% after a report that Mondelez International, which owns UK-based Cadbury, has approached the firm about a potential buyout. A deal could create a snack food giant with combined sales of almost 50bn ( 39.2bn) a year. Both Mondelez and Hershey declined to comment on the report when contacted by BBC News. In 2016, Hershey rejected a 23bn takeover offer from Mondelez. The approach is still in the preliminary stages and it is not certain that talks will lead to a deal, according to Bloomberg.
How Certain are Uncertainty Estimates? Three Novel Earth Observation Datasets for Benchmarking Uncertainty Quantification in Machine Learning
Wang, Yuanyuan, Song, Qian, Wasif, Dawood, Shahzad, Muhammad, Koller, Christoph, Bamber, Jonathan, Zhu, Xiao Xiang
Uncertainty quantification (UQ) is essential for assessing the reliability of Earth observation (EO) products. However, the extensive use of machine learning models in EO introduces an additional layer of complexity, as those models themselves are inherently uncertain. While various UQ methods do exist for machine learning models, their performance on EO datasets remains largely unevaluated. A key challenge in the community is the absence of the ground truth for uncertainty, i.e. how certain the uncertainty estimates are, apart from the labels for the image/signal. This article fills this gap by introducing three benchmark datasets specifically designed for UQ in EO machine learning models. These datasets address three common problem types in EO: regression, image segmentation, and scene classification. They enable a transparent comparison of different UQ methods for EO machine learning models. We describe the creation and characteristics of each dataset, including data sources, preprocessing steps, and label generation, with a particular focus on calculating the reference uncertainty. We also showcase baseline performance of several machine learning models on each dataset, highlighting the utility of these benchmarks for model development and comparison. Overall, this article offers a valuable resource for researchers and practitioners working in artificial intelligence for EO, promoting a more accurate and reliable quality measure of the outputs of machine learning models. The dataset and code are accessible via https://gitlab.lrz.de/ai4eo/WG_Uncertainty.
A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions
Shorinwa, Ola, Mei, Zhiting, Lidard, Justin, Ren, Allen Z., Majumdar, Anirudha
The remarkable performance of large language models (LLMs) in content generation, coding, and common-sense reasoning has spurred widespread integration into many facets of society. However, integration of LLMs raises valid questions on their reliability and trustworthiness, given their propensity to generate hallucinations: plausible, factually-incorrect responses, which are expressed with striking confidence. Previous work has shown that hallucinations and other non-factual responses generated by LLMs can be detected by examining the uncertainty of the LLM in its response to the pertinent prompt, driving significant research efforts devoted to quantifying the uncertainty of LLMs. This survey seeks to provide an extensive review of existing uncertainty quantification methods for LLMs, identifying their salient features, along with their strengths and weaknesses. We present existing methods within a relevant taxonomy, unifying ostensibly disparate methods to aid understanding of the state of the art. Furthermore, we highlight applications of uncertainty quantification methods for LLMs, spanning chatbot and textual applications to embodied artificial intelligence applications in robotics. We conclude with open research challenges in uncertainty quantification of LLMs, seeking to motivate future research.
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Singh, Shivalika, Romanou, Angelika, Fourrier, Clémentine, Adelani, David I., Ngui, Jian Gang, Vila-Suero, Daniel, Limkonchotiwat, Peerat, Marchisio, Kelly, Leong, Wei Qi, Susanto, Yosephine, Ng, Raymond, Longpre, Shayne, Ko, Wei-Yin, Smith, Madeline, Bosselut, Antoine, Oh, Alice, Martins, Andre F. T., Choshen, Leshem, Ippolito, Daphne, Ferrante, Enzo, Fadaee, Marzieh, Ermis, Beyza, Hooker, Sara
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks. These biases stem not only from language but also from the cultural knowledge required to interpret questions, reducing the practical utility of translated datasets like MMLU. Furthermore, translation often introduces artifacts that can distort the meaning or clarity of questions in the target language. A common practice in multilingual evaluation is to rely on machine-translated evaluation sets, but simply translating a dataset is insufficient to address these challenges. In this work, we trace the impact of both of these issues on multilingual evaluations and ensuing model performances. Our large-scale evaluation of state-of-the-art open and proprietary models illustrates that progress on MMLU depends heavily on learning Western-centric concepts, with 28% of all questions requiring culturally sensitive knowledge. Moreover, for questions requiring geographic knowledge, an astounding 84.9% focus on either North American or European regions. Rankings of model evaluations change depending on whether they are evaluated on the full portion or the subset of questions annotated as culturally sensitive, showing the distortion to model rankings when blindly relying on translated MMLU. We release Global-MMLU, an improved MMLU with evaluation coverage across 42 languages -- with improved overall quality by engaging with compensated professional and community annotators to verify translation quality while also rigorously evaluating cultural biases present in the original dataset. This comprehensive Global-MMLU set also includes designated subsets labeled as culturally sensitive and culturally agnostic to allow for more holistic, complete evaluation.
LVLM-COUNT: Enhancing the Counting Ability of Large Vision-Language Models
Qharabagh, Muhammad Fetrat, Ghofrani, Mohammadreza, Fountoulakis, Kimon
Counting is a fundamental skill for various visual tasks in real-life applications, requiring both object recognition and robust counting capabilities. Despite their advanced visual perception, large vision-language models (LVLMs) struggle with counting tasks, especially when the number of objects exceeds those commonly encountered during training. We enhance LVLMs' counting abilities using a divide-and-conquer approach, breaking counting problems into sub-counting tasks. Unlike prior methods, which do not generalize well to counting datasets on which they have not been trained, our method performs well on new datasets without any additional training or fine-tuning. We demonstrate that our approach enhances counting capabilities across various datasets and benchmarks.
Third of NI adults visit porn sites, Ofcom finds
Third of NI adults visit porn sites, Ofcom finds Getty ImagesA new Ofcom report finds over 430,000 adults in Northern Ireland visited "pornographic content services" online in May 2024 Adults in Northern Ireland are more likely to look at pornography online than those in any other part of the UK. That is according to new research published by the communications regulator Ofcom. It said that more than 430,000 adults in Northern Ireland visited "pornographic content services" online in May 2024 - more than one third of the adult population. That was higher than the proportion of adults viewing similar content in Wales, Scotland and England. The figures come from Ofcom's Online Nation report for 2024, which looks into the UK's digital habits.
Uncertainty in Supply Chain Digital Twins: A Quantum-Classical Hybrid Approach
Abdullah, Abdullah, Sandjaja, Fannya Ratana, Majeed, Ayesha Abdul, Wickremasinghe, Gyan, Rafferty, Karen, Sharma, Vishal
This study investigates uncertainty quantification (UQ) using quantum-classical hybrid machine learning (ML) models for applications in complex and dynamic fields, such as attaining resiliency in supply chain digital twins and financial risk assessment. Although quantum feature transformations have been integrated into ML models for complex data tasks, a gap exists in determining their impact on UQ within their hybrid architectures (quantum-classical approach). This work applies existing UQ techniques for different models within a hybrid framework, examining how quantum feature transformation affects uncertainty propagation. Increasing qubits from 4 to 16 shows varied model responsiveness to outlier detection (OD) samples, which is a critical factor for resilient decision-making in dynamic environments. This work shows how quantum computing techniques can transform data features for UQ, particularly when combined with traditional methods.
Minecraft enters real world with 110m global theme park deal
The global gaming phenomenon Minecraft is coming to the real world for the first time in a global deal to open themed rides, attractions, hotel rooms and retail outlets, starting with the UK and US. Minecraft has struck a deal with UK-headquartered Merlin Entertainments – Europe's largest theme park operator and the second biggest globally after Disney – which runs more than 135 attractions in 23 countries including Alton Towers, Legoland, Sea Life, Madame Tussauds and the London Eye. Under the terms of the deal, Merlin will invest more than 85m ( 110m) in the first two attractions. They are due to open in the UK and the US in 2026 and 2027, in either an existing theme park or as new city centre attractions. Over the longer term the two companies plan to expand the strategic partnership, which is called "Adventures Made Real", to other countries and territories.
Advancing Marine Heatwave Forecasts: An Integrated Deep Learning Approach
Ning, Ding, Vetrova, Varvara, Koh, Yun Sing, Bryan, Karin R.
Marine heatwaves (MHWs), an extreme climate phenomenon, pose significant challenges to marine ecosystems and industries, with their frequency and intensity increasing due to climate change. This study introduces an integrated deep learning approach to forecast short-to-long-term MHWs on a global scale. The approach combines graph representation for modeling spatial properties in climate data, imbalanced regression to handle skewed data distributions, and temporal diffusion to enhance forecast accuracy across various lead times. To the best of our knowledge, this is the first study that synthesizes three spatiotemporal anomaly methodologies to predict MHWs. Additionally, we introduce a method for constructing graphs that avoids isolated nodes and provide a new publicly available sea surface temperature anomaly graph dataset. We examine the trade-offs in the selection of loss functions and evaluation metrics for MHWs. We analyze spatial patterns in global MHW predictability by focusing on historical hotspots, and our approach demonstrates better performance compared to traditional numerical models in regions such as the middle south Pacific, equatorial Atlantic near Africa, south Atlantic, and high-latitude Indian Ocean. We highlight the potential of temporal diffusion to replace the conventional sliding window approach for long-term forecasts, achieving improved prediction up to six months in advance. These insights not only establish benchmarks for machine learning applications in MHW forecasting but also enhance understanding of general climate forecasting methodologies.