Indian Ocean
A Comprehensive Survey on Multi-hop Machine Reading Comprehension Datasets and Metrics
Mohammadi, Azade, Ramezani, Reza, Baraani, Ahmad
Abstract: Multi-hop Machine reading comprehension is a challenging task with aim of answering a question based on disjoint pieces of information across the different passages. The evaluation metrics and datasets are a vital part of multi-hop MRC because it is not possible to train and evaluate models without them, also, the proposed challenges by datasets often are an important motivation for improving the existing models. Due to increasing attention to this field, it is necessary and worth reviewing them in detail. This study aims to present a comprehensive survey on recent advances in multi-hop MRC evaluation metrics and datasets. In this regard, first, the multi-hop MRC problem definition will be presented, then the evaluation metrics based on their multi-hop aspect will be investigated. Also, 15 multi-hop datasets have been reviewed in detail from 2017 to 2022, and a comprehensive analysis has been prepared at the end. Finally, open issues in this field have been discussed. Keywords: Multi-hop Machine Reading Comprehension, Multi-hop Machine Reading Comprehension Dataset, Natural Language Processing, 1-INTRODUCTION Machine reading comprehension (MRC) is one of the most important and long-standing topics in Natural Language Processing (NLP). MRC provides a way to evaluate an NLP system's capability for natural language understanding. An MRC task, in brief, refers to the ability of a computer to read and understand natural language context and then find the answer to questions about that context. The emergence of large-scale single-document MRC datasets, such as SQuAD (Rajpurkar et al., 2016), CNN/Daily mail (Hermann et al., 2015), has led to increased attention to this topic and different models have been proposed to address the MRC problem, such as (D. However, for many of these datasets, it has been found that models don't need to comprehend and reason to answer a question. For example, Khashabi et al (Khashabi et al., 2016) proved that adversarial perturbation in candidate answers has a negative effect on the performance of the QA systems. Similarly, (Jia & Liang, 2017) showed that adding an adversarial sentence to the SQuAD (Rajpurkar et al., 2016) context will drop the result of many existing models.
Towards Automatic Cetacean Photo-Identification: A Framework for Fine-Grain, Few-Shot Learning in Marine Ecology
Trotter, Cameron, Wright, Nick, McGough, A. Stephen, Sharpe, Matt, Cheney, Barbara, Civil, Mònica Arso, Moore, Reny Tyson, Allen, Jason, Berggren, Per
Photo-identification (photo-id) is one of the main non-invasive capture-recapture methods utilised by marine researchers for monitoring cetacean (dolphin, whale, and porpoise) populations. This method has historically been performed manually resulting in high workload and cost due to the vast number of images collected. Recently automated aids have been developed to help speed-up photo-id, although they are often disjoint in their processing and do not utilise all available identifying information. Work presented in this paper aims to create a fully automatic photo-id aid capable of providing most likely matches based on all available information without the need for data pre-processing such as cropping. This is achieved through a pipeline of computer vision models and post-processing techniques aimed at detecting cetaceans in unedited field imagery before passing them downstream for individual level catalogue matching. The system is capable of handling previously uncatalogued individuals and flagging these for investigation thanks to catalogue similarity comparison. We evaluate the system against multiple real-life photo-id catalogues, achieving mAP@IOU[0.5] = 0.91, 0.96 for the task of dorsal fin detection on catalogues from Tanzania and the UK respectively and 83.1, 97.5% top-10 accuracy for the task of individual classification on catalogues from the UK and USA.
A Deep Learning Architecture for Passive Microwave Precipitation Retrievals using CloudSat and GPM Data
Rahimi, Reyhaneh, Vahedizadeh, Sajad, Ebtehaj, Ardeshir
This paper presents an algorithm that relies on a series of dense and deep neural networks for passive microwave retrieval of precipitation. The neural networks learn from coincidences of brightness temperatures from the Global Precipitation Measurement (GPM) Microwave Imager (GMI) with the active precipitating retrievals from the Dual-frequency Precipitation Radar (DPR) onboard GPM as well as those from the {CloudSat} Profiling Radar (CPR). The algorithm first detects the precipitation occurrence and phase and then estimates its rate, while conditioning the results to some key ancillary information including parameters related to cloud microphysical properties. The results indicate that we can reconstruct the DPR rainfall and CPR snowfall with a detection probability of more than 0.95 while the probability of a false alarm remains below 0.08 and 0.03, respectively. Conditioned to the occurrence of precipitation, the unbiased root mean squared error in estimation of rainfall (snowfall) rate using DPR (CPR) data is less than 0.8 (0.1) mm/hr over oceans and land. Beyond methodological developments, comparing the results with ERA5 reanalysis and official GPM products demonstrates that the uncertainty in global satellite snowfall retrievals continues to be large while there is a good agreement among rainfall products. Moreover, the results indicate that CPR active snowfall data can improve passive microwave estimates of global snowfall while the current CPR rainfall retrievals should only be used for detection and not estimation of rates.
WikiWhy: Answering and Explaining Cause-and-Effect Questions
Ho, Matthew, Sharma, Aditya, Chang, Justin, Saxon, Michael, Levy, Sharon, Lu, Yujie, Wang, William Yang
As large language models (LLMs) grow larger and more sophisticated, assessing their "reasoning" capabilities in natural language grows more challenging. Recent question answering (QA) benchmarks that attempt to assess reasoning are often limited by a narrow scope of covered situations and subject matters. We introduce WikiWhy, a QA dataset built around a novel auxiliary task: explaining why an answer is true in natural language. WikiWhy contains over 9,000 "why" question-answer-rationale triples, grounded on Wikipedia facts across a diverse set of topics. Each rationale is a set of supporting statements connecting the question to the answer. WikiWhy serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit rationales for each answer to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-3 baselines achieve only 38.7% human-evaluated correctness in the end-to-end answer & explain condition, leaving significant room for future improvements.
Holonomic Control of Arbitrary Configurations of Docked Modboats
Qiao, Zhijie, Knizhnik, Gedaliah, Yim, Mark
The Modboat is a low-cost, underactuated, modular robot capable of surface swimming, docking to other modules, and undocking from them using only a single motor and two passive flippers. Undocking is achieved by causing intentional self-collision between the tails of neighboring modules in certain configurations; this becomes a challenge, however, when collective swimming as one connected component is desirable. Prior work has developed controllers that turn arbitrary configurations of docked Modboats into steerable vehicles, but they cannot counteract lateral forces and disturbances. In this work we present a centralized control strategy to create holonomic vehicles out of arbitrary configurations of docked Modboats using an iterative potential-field based search. We experimentally demonstrate that our controller performs well and can control surge and sway velocities and yaw angle simultaneously.
A.R. Rahman, Shekhar Kapur Talk Metaverse, VR, AI at Goa Festival - Variety A.R. Rahman, Shekhar Kapur Talk Metaverse, VR, AI at Goa Festival – Variety
Machines can never replace human creativity and technology should be in mankind's service were the biggest takeaways from a heavyweight panel looking to the future of content at the International Film Festival of India, Goa, on Sunday. The panel was devised and led by eminent filmmaker Shekhar Kapur (Red Sea Film Festival opener "What's Love Got to Do with It?"). Participants included Oscar-winning "Slumdog Millionaire" composer A.R. Rahman, Ronald Menzel, co-founder and chief strategy officer at Dreamscape Immersive, with tech maven Pranav Mistry, who was formerly CEO and president of Samsung Technology and Advanced Research, joining via video link. The panelists discussed the concept of the metaverse, which is still in is nascency. Mistry envisaged a future powered by VR, AR and AI where the audience participated in an MCU movie and solved world problems.
Searching for Discriminative Words in Multidimensional Continuous Feature Space
Sajgalik, Marius, Barla, Michal, Bielikova, Maria
Word feature vectors have been proven to improve many NLP tasks. With recent advances in unsupervised learning of these feature vectors, it became possible to train it with much more data, which also resulted in better quality of learned features. Since it learns joint probability of latent features of words, it has the advantage that we can train it without any prior knowledge about the goal task we want to solve. We aim to evaluate the universal applicability property of feature vectors, which has been already proven to hold for many standard NLP tasks like part-of-speech tagging or syntactic parsing. In our case, we want to understand the topical focus of text documents and design an efficient representation suitable for discriminating different topics. The discriminativeness can be evaluated adequately on text categorisation task. We propose a novel method to extract discriminative keywords from documents. We utilise word feature vectors to understand the relations between words better and also understand the latent topics which are discussed in the text and not mentioned directly but inferred logically. We also present a simple way to calculate document feature vectors out of extracted discriminative words. We evaluate our method on the four most popular datasets for text categorisation. We show how different discriminative metrics influence the overall results. We demonstrate the effectiveness of our approach by achieving state-of-the-art results on text categorisation task using just a small number of extracted keywords. We prove that word feature vectors can substantially improve the topical inference of documents' meaning. We conclude that distributed representation of words can be used to build higher levels of abstraction as we demonstrate and build feature vectors of documents.
High-precision Density Mapping of Marine Debris and Floating Plastics via Satellite Imagery
Booth, Henry, Ma, Wanli, Karakus, Oktay
Combining multi-spectral satellite data and machine learning has been suggested as a method for monitoring plastic pollutants in the ocean environment. Recent studies have made theoretical progress regarding the identification of marine plastic via machine learning. However, no study has assessed the application of these methods for mapping and monitoring marine-plastic density. As such, this paper comprised of three main components: (1) the development of a machine learning model, (2) the construction of the MAP-Mapper, an automated tool for mapping marine-plastic density, and finally (3) an evaluation of the whole system for out-of-distribution test locations. The findings from this paper leverage the fact that machine learning models need to be high-precision to reduce the impact of false positives on results. The developed MAP-Mapper architectures provide users choices to reach high-precision ($\textit{abbv.}$ -HP) or optimum precision-recall ($\textit{abbv.}$ -Opt) values in terms of the training/test data set. Our MAP-Mapper-HP model greatly increased the precision of plastic detection to 95\%, whilst MAP-Mapper-Opt reaches precision-recall pair of 87\%-88\%. The MAP-Mapper contributes to the literature with the first tool to exploit advanced deep/machine learning and multi-spectral imagery to map marine-plastic density in automated software. The proposed data pipeline has taken a novel approach to map plastic density in ocean regions. As such, this enables an initial assessment of the challenges and opportunities of this method to help guide future work and scientific study.
PESE: Event Structure Extraction using Pointer Network based Encoder-Decoder Architecture
Kuila, Alapan, Sarkar, Sudeshan
The task of event extraction (EE) aims to find the events and event-related argument information from the text and represent them in a structured format. Most previous works try to solve the problem by separately identifying multiple substructures and aggregating them to get the complete event structure. The problem with the methods is that it fails to identify all the interdependencies among the event participants (event-triggers, arguments, and roles). In this paper, we represent each event record in a unique tuple format that contains trigger phrase, trigger type, argument phrase, and corresponding role information. Our proposed pointer network-based encoder-decoder model generates an event tuple in each time step by exploiting the interactions among event participants and presenting a truly end-to-end solution to the EE task. We evaluate our model on the ACE2005 dataset, and experimental results demonstrate the effectiveness of our model by achieving competitive performance compared to the state-of-the-art methods.
RankGen: Improving Text Generation with Large Ranking Models
Krishna, Kalpesh, Chang, Yapei, Wieting, John, Iyyer, Mohit
Given an input sequence (or prefix), modern language models often assign high probabilities to output sequences that are repetitive, incoherent, or irrelevant to the prefix; as such, model-generated text also contains such artifacts. To address these issues we present RankGen, a 1.2B parameter encoder model for English that scores model generations given a prefix. RankGen can be flexibly incorporated as a scoring function in beam search and used to decode from any pretrained language model. We train RankGen using large-scale contrastive learning to map a prefix close to the ground-truth sequence that follows it and far away from two types of negatives: (1) random sequences from the same document as the prefix, and (2) sequences generated from a large language model conditioned on the prefix. Experiments across four different language models (345M-11B parameters) and two domains show that RankGen significantly outperforms decoding algorithms like nucleus, top-k, and typical sampling, as well as contrastive decoding and search, on both automatic metrics (85.0 vs 77.3 MAUVE over nucleus) as well as human evaluations with English writers (74.5% human preference over nucleus sampling). Analysis reveals that RankGen outputs are more relevant to the prefix and improve continuity and coherence compared to baselines. We release our model checkpoints, code, and human preference data with explanations to facilitate future research.