Atlantic Ocean
Russia launches dozens of drones into Ukraine in latest air raid: Kyiv
Russia launched 36 drone attacks overnight on Ukraine, according to Kyiv's air force, in Moscow's latest air raid targeting the country. Ukraine's air force said in a statement on Tuesday that its defence systems had destroyed 27 of the drones. The attacks using Iran-made Shahed drones targeted the Odesa, Mykolaiv and Kherson regions of Ukraine, the air force said on the Telegram messaging app. Moscow launched a total of 36 Iranian-made drones from the Russia-annexed Crimean peninsula, it added. The air force did not say which targets, if any, the nine other drones may have hit.
MoT: Memory-of-Thought Enables ChatGPT to Self-Improve
Large Language Models (LLMs) have shown impressive abilities in various tasks. However, fundamentally improving them depends on high-quality datasets or computationally expensive fine-tuning. On the contrary, humans can easily improve themselves by self-thinking and memory, without external resources. In this paper, we propose a framework, MoT, to let the LLM self-improve through Memory-of-Thought, without annotated datasets and parameter updates. Specifically, MoT is divided into two stages: 1. before the test stage, the LLM pre-thinks on the unlabeled dataset and saves the high-confidence thoughts as external memory; 2. During the test stage, given a test question, the LLM recalls relevant memory to help itself reason and answer it. Experimental results show that MoT can help ChatGPT significantly improve its abilities in arithmetic reasoning, commonsense reasoning, factual reasoning, and natural language inference. Further analyses show that each component contributes critically to the improvements and MoT can lead to consistent improvements across various CoT methods and LLMs.
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels
Long, Da, Xing, Wei W., Krishnapriyan, Aditi S., Kirby, Robert M., Zhe, Shandian, Mahoney, Michael W.
Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity as well as noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS). We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We combine it with a Bayesian spike-and-slab prior -- an ideal Bayesian sparse distribution -- for effective operator selection and uncertainty quantification. We develop an expectation propagation expectation-maximization (EP-EM) algorithm for efficient posterior inference and function estimation. To overcome the computational challenge of kernel regression, we place the function values on a mesh and induce a Kronecker product construction, and we use tensor algebra methods to enable efficient computation and optimization. We show the significant advantages of KBASS on a list of benchmark ODE and PDE discovery tasks.
Automatic Evaluation of Attribution by Large Language Models
Yue, Xiang, Wang, Boshi, Chen, Ziru, Zhang, Kai, Su, Yu, Sun, Huan
A recent focus of large language model (LLM) development, as exemplified by generative search engines, is to incorporate external references to generate and support its claims. However, evaluating the attribution, i.e., verifying whether the generated statement is fully supported by the cited reference, remains an open problem. Although human evaluation is common practice, it is costly and time-consuming. In this paper, we investigate the automatic evaluation of attribution given by LLMs. We begin by defining different types of attribution errors, and then explore two approaches for automatic evaluation: prompting LLMs and fine-tuning smaller LMs. The fine-tuning data is repurposed from related tasks such as question answering, fact-checking, natural language inference, and summarization. We manually curate a set of test examples covering 12 domains from a generative search engine, New Bing. Our results on this curated test set and simulated examples from existing benchmarks highlight both promising signals and challenges. We hope our problem formulation, testbeds, and findings will help lay the foundation for future studies on this important problem.
AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval
Yan, Qi, Seraj, Raihan, He, Jiawei, Meng, Lili, Sylvain, Tristan
Machine-based prediction of real-world events is garnering attention due to its potential for informed decision-making. Whereas traditional forecasting predominantly hinges on structured data like time-series, recent breakthroughs in language models enable predictions using unstructured text. In particular, (Zou et al., 2022) unveils AutoCast, a new benchmark that employs news articles for answering forecasting queries. Nevertheless, existing methods still trail behind human performance. The cornerstone of accurate forecasting, we argue, lies in identifying a concise, yet rich subset of news snippets from a vast corpus. With this motivation, we introduce AutoCast++, a zero-shot ranking-based context retrieval system, tailored to sift through expansive news document collections for event forecasting. Our approach first re-ranks articles based on zero-shot question-passage relevance, honing in on semantically pertinent news. Following this, the chosen articles are subjected to zero-shot summarization to attain succinct context. Leveraging a pre-trained language model, we conduct both the relevance evaluation and article summarization without needing domain-specific training. Notably, recent articles can sometimes be at odds with preceding ones due to new facts or unanticipated incidents, leading to fluctuating temporal dynamics. To tackle this, our re-ranking mechanism gives preference to more recent articles, and we further regularize the multi-passage representation learning to align with human forecaster responses made on different dates. Empirical results underscore marked improvements across multiple metrics, improving the performance for multiple-choice questions (MCQ) by 48% and true/false (TF) questions by up to 8%.
NATO testing underwater drones off the cost of Europe to deter Russia
NATO Secretary General Jens Stoltenberg shares why its important for America to stay in the fight between Russia and Ukraine on One Nation. NATO is testing new sea drones that can use artificial intelligence to detect suspicious activity near underwater infrastructure. Fourteen members of the NATO alliance, along with Sweden, have teamed up for multiple exercises over 12 days off the cost of Portugal to test underwater sea drones that have real-time ability to send "a deterrence signal to the enemy, be it Russia or somebody else," said Lt. Gen. Hans-Werner Wiermann, head of NATO's cell for protecting undersea infrastructure, according to a report from Bloomberg. The exercises, dubbed Dynamic Messenger 23 and Robotic Experimentation and Prototyping with Maritime Unmanned Systems (REPMUS 23), will bring together over 2,000 civilian amid military personnel with a focus on integrating maritime unmanned systems into the alliance's operations and test new technologies that are currently under development. NATO personnel test new underwater drone technology during Dynamic Messenger 23 and REPMUS 23 exercises.
Russia-Ukraine war: List of key events, day 586
Ukraine said its air defence systems shot down 16 of about 30 drones launched by Russia on Sunday. Authorities said civilian infrastructure and grain storage warehouses were damaged in the Cherkasy region as well as the southern Mykolaiv and eastern Dnipropetrovsk regions. Russia's defence ministry said its forces' air defences in eastern Ukraine had intercepted five United States-made HIMARS shells, an air-launched JDAM bomb and 37 Ukrainian drones. Kyiv began a counteroffensive in June to retake Ukrainian land occupied by Russia since it launched its full-scale invasion of the country in February 2022. Russia's defence ministry said it shot down six Ukrainian drones over Russian regions and two Ukrainian missiles over Crimea, which Moscow annexed from Ukraine in 2014.
Making Retrieval-Augmented Language Models Robust to Irrelevant Context
Yoran, Ori, Wolfson, Tomer, Ram, Ori, Berant, Jonathan
Retrieval-augmented language models (RALMs) hold promise to produce language understanding systems that are are factual, efficient, and up-to-date. An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant, and does not harm performance when it is not. This is particularly important in multi-hop reasoning scenarios, where misuse of irrelevant evidence can lead to cascading errors. However, recent work has shown that retrieval augmentation can sometimes have a negative effect on performance. In this work, we present a thorough analysis on five open-domain question answering benchmarks, characterizing cases when retrieval reduces accuracy. We then propose two methods to mitigate this issue. First, a simple baseline that filters out retrieved passages that do not entail question-answer pairs according to a natural language inference (NLI) model. This is effective in preventing performance reduction, but at a cost of also discarding relevant passages. Thus, we propose a method for automatically generating data to fine-tune the language model to properly leverage retrieved passages, using a mix of relevant and irrelevant contexts at training time. We empirically show that even 1,000 examples suffice to train the model to be robust to irrelevant contexts while maintaining high performance on examples with relevant ones.
Language Model Decoding as Direct Metrics Optimization
Ji, Haozhe, Ke, Pei, Wang, Hongning, Huang, Minlie
Despite the remarkable advances in language modeling, current mainstream decoding methods still struggle to generate texts that align with human texts across different aspects. In particular, sampling-based methods produce less-repetitive texts which are often disjunctive in discourse, while search-based methods maintain topic coherence at the cost of increased repetition. Overall, these methods fall short in achieving holistic alignment across a broad range of aspects. In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts measured by multiple metrics of desired aspects simultaneously. The resulting decoding distribution enjoys an analytical solution that scales the input language model distribution via a sequence-level energy function defined by these metrics. And most importantly, we prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts. To facilitate tractable sampling from this globally normalized distribution, we adopt the Sampling-Importance-Resampling technique. Experiments on various domains and model scales demonstrate the superiority of our method in metrics alignment with human texts and human evaluation over strong baselines.
Knowledge Engineering for Wind Energy
Marykovskiy, Yuriy, Clark, Thomas, Day, Justin, Wiens, Marcus, Henderson, Charles, Quick, Julian, Abdallah, Imad, Sempreviva, Anna Maria, Calbimonte, Jean-Paul, Chatzi, Eleni, Barber, Sarah
To this end, vast amounts of data generated by various sources, including sensors and other monitoring systems, need to be effectively structured and represented in a way that can be easily understood and processed by both Artificial Intelligence (AI) systems and humans. The digitalisation of the wind energy sector is one of the key drivers for reducing costs and risks over the whole wind energy project life cycle [2]. The digitalisation process encompasses solutions such as digital twins, decision support systems and AI systems, some of which need to still be developed, in order to contribute to reducing operation and maintenance costs, for increasing the amount of energy delivered, as well as for maximising the efficiency of wind energy systems. In this context, the term Knowledge-Based Systems (KBS) refers to AI systems that formalize knowledge as rules, logical expressions, and conceptualisations [3, 4]. Such systems can be realised as AI-enabled digital twins or decision support systems that rely on databases of knowledge (also referred to as knowledge bases or knowledge graphs), which contain machine-readable facts, rules, and logics about a domain of interest, to assist with problem-solving and decision-making [5].