Atlantic Ocean
Improving Water Quality Time-Series Prediction in Hong Kong using Sentinel-2 MSI Data and Google Earth Engine Cloud Computing
Effective water quality monitoring in coastal regions is crucial due to the progressive deterioration caused by pollution and human activities. To address this, this study develops time-series models to predict chlorophyll-a (Chl-a), suspended solids (SS), and turbidity using Sentinel-2 satellite data and Google Earth Engine (GEE) in the coastal regions of Hong Kong. Leveraging Long Short-Term Memory (LSTM) Recurrent Neural Networks, the study incorporates extensive temporal datasets to enhance prediction accuracy. The models utilize spectral data from Sentinel-2, focusing on optically active components, and demonstrate that selected variables closely align with the spectral characteristics of Chl-a and SS. The results indicate improved predictive performance over previous methods, highlighting the potential for remote sensing technology in continuous and comprehensive water quality assessment.
Towards turbine-location-aware multi-decadal wind power predictions with CMIP6
Effenberger, Nina, Ludwig, Nicole
With the increasing amount of renewable energy in the grid, long-term wind power forecasting for multiple decades becomes more critical. In these long-term forecasts, climate data is essential as it allows us to account for climate change. Yet the resolution of climate models is often very coarse. In this paper, we show that by including turbine locations when downscaling with Gaussian Processes, we can generate valuable aggregate wind power predictions despite the low resolution of the CMIP6 climate models. This work is a first step towards multi-decadal turbine-location-aware wind power forecasting using global climate model output.
LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback
Banerjee, Tanushree, Zhu, Richard, Yang, Runzhe, Narasimhan, Karthik
Large Language Models (LLMs) excel at generating human-like dialogues and comprehending text. However, understanding the subtleties of complex exchanges in language remains a challenge. We propose a bootstrapping framework that leverages self-generated feedback to enhance LLM reasoning capabilities for lie detection. The framework consists of three stages: suggestion, feedback collection, and modification. In the suggestion stage, a cost-effective language model generates initial predictions based on game state and dialogue. The feedback-collection stage involves a language model providing feedback on these predictions. In the modification stage, a more advanced language model refines the initial predictions using the auto-generated feedback. We investigate the application of the proposed framework for detecting betrayal and deception in Diplomacy games, and compare it with feedback from professional human players. The LLM-generated feedback exhibits superior quality and significantly enhances the performance of the model. Our approach achieves a 39% improvement over the zero-shot baseline in lying-F1 without the need for any training data, rivaling state-of-the-art supervised learning results.
FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation
Sign language translation has historically been peripheral to mainstream machine translation research. In order to help converge the fields, we introduce FLEURS-ASL, an extension of the multiway parallel benchmarks FLORES (for text) and FLEURS (for speech) to support their first sign language (as video), American Sign Language, translated by 5 Certified Deaf Interpreters. FLEURS-ASL can be used to evaluate a variety of tasks -- primarily sentence- and discourse-level translation -- between ASL and 200 other languages as text, or 102 languages as speech. We provide baselines for tasks from ASL to English text using a unified modeling approach that incorporates timestamp tokens and previous text tokens in a 34-second context window, trained on random video clips from YouTube-ASL. This model meets or exceeds the performance of phrase-level baselines while supporting a multitude of new tasks. We also use FLEURS-ASL to show that multimodal frontier models have virtually no understanding of ASL, underscoring the importance of including sign languages in standard evaluation suites.
How far can Ukraine's military go inside Russia?
Moscow has come under one of the largest drone attacks of the war.Read more When President Vladimir Putin launched Russia's so-called "special military operation" in Ukraine two-and-a-half years ago, he expected a speedy victory. Not only did that not happen, but Ukraine has now brought the war home to Russia. Russia faces manpower woes after failing to stop Ukraine's Kursk incursion list 2 of 4 Russians flock to evacuation centre to flee Ukraine's incursion in Kursk list 4 of 4 The capital has faced one of its biggest drone attacks of the war – according to the mayor of Moscow. Meanwhile, Ukraine's incursion into the Kursk region caught Russia by surprise. Has Ukraine's bold move put on hold discussions about a stalemate and possible negotiations involving concessions to Russia? What are the prospects for a Gaza ceasefire deal?
Enhancing Multi-hop Reasoning through Knowledge Erasure in Large Language Model Editing
Zhang, Mengqi, Fang, Bowen, Liu, Qiang, Ren, Pengjie, Wu, Shu, Chen, Zhumin, Wang, Liang
Large language models (LLMs) face challenges with internal knowledge inaccuracies and outdated information. Knowledge editing has emerged as a pivotal approach to mitigate these issues. Although current knowledge editing techniques exhibit promising performance in single-hop reasoning tasks, they show limitations when applied to multi-hop reasoning. Drawing on cognitive neuroscience and the operational mechanisms of LLMs, we hypothesize that the residual single-hop knowledge after editing causes edited models to revert to their original answers when processing multi-hop questions, thereby undermining their performance in multihop reasoning tasks. To validate this hypothesis, we conduct a series of experiments that empirically confirm our assumptions. Building on the validated hypothesis, we propose a novel knowledge editing method that incorporates a Knowledge Erasure mechanism for Large language model Editing (KELE). Specifically, we design an erasure function for residual knowledge and an injection function for new knowledge. Through joint optimization, we derive the optimal recall vector, which is subsequently utilized within a rank-one editing framework to update the parameters of targeted model layers. Extensive experiments on GPT-J and GPT-2 XL demonstrate that KELE substantially enhances the multi-hop reasoning capability of edited LLMs.
Reading with Intent
Reichman, Benjamin, Talamadupula, Kartik, Jawale, Toshish, Heck, Larry
Retrieval augmented generation (RAG) systems augment how knowledge language models are by integrating external information sources such as Wikipedia, internal documents, scientific papers, or the open internet. RAG systems that rely on the open internet as their knowledge source have to contend with the complexities of human-generated content. Human communication extends much deeper than just the words rendered as text. Intent, tonality, and connotation can all change the meaning of what is being conveyed. Recent real-world deployments of RAG systems have shown some difficulty in understanding these nuances of human communication. One significant challenge for these systems lies in processing sarcasm. Though the Large Language Models (LLMs) that make up the backbone of these RAG systems are able to detect sarcasm, they currently do not always use these detections for the subsequent processing of text. To address these issues, in this paper, we synthetically generate sarcastic passages from Natural Question's Wikipedia retrieval corpus. We then test the impact of these passages on the performance of both the retriever and reader portion of the RAG pipeline. We introduce a prompting system designed to enhance the model's ability to interpret and generate responses in the presence of sarcasm, thus improving overall system performance. Finally, we conduct ablation studies to validate the effectiveness of our approach, demonstrating improvements in handling sarcastic content within RAG systems.
Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models
Chen, Yuyan, Wu, Chenwei, Yan, Songzhou, Liu, Panjun, Zhou, Haoyu, Xiao, Yanghua
Teachers are important to imparting knowledge and guiding learners, and the role of large language models (LLMs) as potential educators is emerging as an important area of study. Recognizing LLMs' capability to generate educational content can lead to advances in automated and personalized learning. While LLMs have been tested for their comprehension and problem-solving skills, their capability in teaching remains largely unexplored. In teaching, questioning is a key skill that guides students to analyze, evaluate, and synthesize core concepts and principles. Therefore, our research introduces a benchmark to evaluate the questioning capability in education as a teacher of LLMs through evaluating their generated educational questions, utilizing Anderson and Krathwohl's taxonomy across general, monodisciplinary, and interdisciplinary domains. We shift the focus from LLMs as learners to LLMs as educators, assessing their teaching capability through guiding them to generate questions. We apply four metrics, including relevance, coverage, representativeness, and consistency, to evaluate the educational quality of LLMs' outputs. Our results indicate that GPT-4 demonstrates significant potential in teaching general, humanities, and science courses; Claude2 appears more apt as an interdisciplinary teacher. Furthermore, the automatic scores align with human perspectives.
Exploiting Large Language Models Capabilities for Question Answer-Driven Knowledge Graph Completion Across Static and Temporal Domains
Yang, Rui, Zhu, Jiahao, Man, Jianping, Fang, Li, Zhou, Yi
Knowledge graph completion (KGC) aims to identify missing triples in a knowledge graph (KG). This is typically achieved through tasks such as link prediction and instance completion. However, these methods often focus on either static knowledge graphs (SKGs) or temporal knowledge graphs (TKGs), addressing only within-scope triples. This paper introduces a new generative completion framework called Generative Subgraph-based KGC (GS-KGC). GS-KGC employs a question-answering format to directly generate target entities, addressing the challenge of questions having multiple possible answers. We propose a strategy that extracts subgraphs centered on entities and relationships within the KG, from which negative samples and neighborhood information are separately obtained to address the one-to-many problem. Our method generates negative samples using known facts to facilitate the discovery of new information. Furthermore, we collect and refine neighborhood path data of known entities, providing contextual information to enhance reasoning in large language models (LLMs). Our experiments evaluated the proposed method on four SKGs and two TKGs, achieving state-of-the-art Hits@1 metrics on five datasets. Analysis of the results shows that GS-KGC can discover new triples within existing KGs and generate new facts beyond the closed KG, effectively bridging the gap between closed-world and open-world KGC.
Development of an AI Anti-Bullying System Using Large Language Model Key Topic Detection
Tassava, Matthew, Kolodjski, Cameron, Milbrath, Jordan, Bishop, Adorah, Flanders, Nathan, Fetsch, Robbie, Hanson, Danielle, Straub, Jeremy
It has become a pronounced problem due to the increasing ubiquity of online platforms that provide a means to conduct it. A significant amount of this cyberbullying is conducted by and targets teenagers. It is difficult for teenage students to shut themselves off from the digital world in which the cyberbullying is taking place. Given how entrenched the use of digital apps is by today's youth, and the pronounced consequences of it - including victim self-harm, in some cases - cyberbullying is at least as much of a threat as physical bullying. Additionally, because of the obfuscation caused by the online environment, authorities (such as parents, teachers and law enforcement) may have difficulty determining what has occurred and who the actors participating are.