Antarctica
Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers
Awasthi, Abhijeet, Sathe, Ashutosh, Sarawagi, Sunita
Text-to-SQL parsers typically struggle with databases unseen during the train time. Adapting parsers to new databases is a challenging problem due to the lack of natural language queries in the new schemas. We present ReFill, a framework for synthesizing high-quality and textually diverse parallel datasets for adapting a Text-to-SQL parser to a target schema. ReFill learns to retrieve-and-edit text queries from the existing schemas and transfers them to the target schema. We show that retrieving diverse existing text, masking their schema-specific tokens, and refilling with tokens relevant to the target schema, leads to significantly more diverse text queries than achievable by standard SQL-to-Text generation methods. Through experiments spanning multiple databases, we demonstrate that fine-tuning parsers on datasets synthesized using ReFill consistently outperforms the prior data-augmentation methods.
Stanceosaurus: Classifying Stance Towards Multilingual Misinformation
Zheng, Jonathan, Baheti, Ashutosh, Naous, Tarek, Xu, Wei, Ritter, Alan
We present Stanceosaurus, a new corpus of 28,033 tweets in English, Hindi, and Arabic annotated with stance towards 251 misinformation claims. As far as we are aware, it is the largest corpus annotated with stance towards misinformation claims. The claims in Stanceosaurus originate from 15 fact-checking sources that cover diverse geographical regions and cultures. Unlike existing stance datasets, we introduce a more fine-grained 5-class labeling strategy with additional subcategories to distinguish implicit stance. Pre-trained transformer-based stance classifiers that are fine-tuned on our corpus show good generalization on unseen claims and regional claims from countries outside the training data. Cross-lingual experiments demonstrate Stanceosaurus' capability of training multi-lingual models, achieving 53.1 F1 on Hindi and 50.4 F1 on Arabic without any target-language fine-tuning. Finally, we show how a domain adaptation method can be used to improve performance on Stanceosaurus using additional RumourEval-2019 data. We make Stanceosaurus publicly available to the research community and hope it will encourage further work on misinformation identification across languages and cultures.
Artificial Intelligence and Arms Control
Scharre, Paul, Lamberth, Megan
Potential advancements in artificial intelligence (AI) could have profound implications for how countries research and develop weapons systems, and how militaries deploy those systems on the battlefield. The idea of AI-enabled military systems has motivated some activists to call for restrictions or bans on some weapon systems, while others have argued that AI may be too diffuse to control. This paper argues that while a ban on all military applications of AI is likely infeasible, there may be specific cases where arms control is possible. Throughout history, the international community has attempted to ban or regulate weapons or military systems for a variety of reasons. This paper analyzes both successes and failures and offers several criteria that seem to influence why arms control works in some cases and not others. We argue that success or failure depends on the desirability (i.e., a weapon's military value versus its perceived horribleness) and feasibility (i.e., sociopolitical factors that influence its success) of arms control. Based on these criteria, and the historical record of past attempts at arms control, we analyze the potential for AI arms control in the future and offer recommendations for what policymakers can do today.
The Best Sci-Fi Movies Everyone Should Watch Once
Aliens, astronauts, time travel--you name it, there's a dazzling sci-fi film about it. That makes compiling a list of the best sci-fi nearly impossible. It's almost impossible to know where to start--or where to stop. To understand where sci-fi films came from, you need to head back to the dawn of the cinema age. Right at the beginning, Metropolis, released in 1927, used groundbreaking visuals to create a reference point for all future urban dystopias--it's no fluke, for example, that the aesthetic of Blade Runner bears more than a passing resemblance to Fritz Lang's prophetic city hellscape. Then along came War of the Worlds (1953), a gripping tale of alien invasion adapted from H. G. Wells' classic novel. In 1964, Dr. Strangelove did more than most films before or since to ossify the fear of a nuclear holocaust. Below is WIRED's ever-evolving selection of the sci-fi movies everyone should watch, from the obscure to the hugely influential. You may also enjoy our guides to the best sci-fi books of all time and the best space movies. This content can also be viewed on the site it originates from. When Alfonso Cuarón wrote the screenplay for Gravity, he wasn't setting out to make a film about space itself. Rather, he was interested in exploring the concepts of adversity and human resilience, with space as a secondary background. But it was hard for audiences to not be wowed by the visuals in this Oscar-winning film about two scientists (George Clooney and Sandra Bullock) who find themselves stranded in space, and what they must endure in order to get safely back to Earth.
Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation
Xu, Jin, Liu, Xiaojiang, Yan, Jianhao, Cai, Deng, Li, Huayang, Li, Jian
While large-scale neural language models, such as GPT2 and BART, have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in human corpora (e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context. Through our quantitative experiments, we find that 1) Language models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method \textbf{DITTO} (Pseu\underline{D}o-Repet\underline{IT}ion Penaliza\underline{T}i\underline{O}n), where the model learns to penalize probabilities of sentence-level repetitions from pseudo repetitive data. Although our method is motivated by mitigating repetitions, experiments show that DITTO not only mitigates the repetition issue without sacrificing perplexity, but also achieves better generation quality. Extensive experiments on open-ended text generation (Wikitext-103) and text summarization (CNN/DailyMail) demonstrate the generality and effectiveness of our method.
New Zealand: artificial intelligence comes to the rescue of Māui's dolphins - Actu IA
There are more than 30 species of dolphins in the world, the Māui dolphin, which lives off the west coast of the North Island, New Zealand, faces a threat of extinction. To save this rarest of the world's dolphins, a nonprofit organization has been formed called MAUI63 (Marine Animal Unmanned Identification, with 63 representing the estimated number of Māui dolphins when this initiative began in 2018). The team's scientists and conservationists use an AI-powered drone to locate, track, identify, and ultimately protect these and Hector's dolphins. The Māui dolphin population has declined further since the project began, as a 2021 survey counted only 54. Hector's and Māui dolphins are small coastal dolphins found only in New Zealand.
DALLE-URBAN: Capturing the urban design expertise of large text to image transformers
Seneviratne, Sachith, Senanayake, Damith, Rasnayaka, Sanka, Vidanaarachchi, Rajith, Thompson, Jason
Automatically converting text descriptions into images using transformer architectures has recently received considerable attention. Such advances have implications for many applied design disciplines across fashion, art, architecture, urban planning, landscape design and the future tools available to such disciplines. However, a detailed analysis capturing the capabilities of such models, specifically with a focus on the built environment, has not been performed to date. In this work, we investigate the capabilities and biases of such text-to-image methods as it applies to the built environment in detail. We use a systematic grammar to generate queries related to the built environment and evaluate resulting generated images. We generate 1020 different images and find that text to image transformers are robust at generating realistic images across different domains for this use-case. Generated imagery can be found at the github: https://github.com/sachith500/DALLEURBAN
Amplitude Scintillation Forecasting Using Bagged Trees
Darya, Abdollah Masoud, Al-Owais, Aisha Abdulla, Shaikh, Muhammad Mubasshir, Fernini, Ilias
Electron density irregularities present within the ionosphere induce significant fluctuations in global navigation satellite system (GNSS) signals. Fluctuations in signal power are referred to as amplitude scintillation and can be monitored through the S4 index. Forecasting the severity of amplitude scintillation based on historical S4 index data is beneficial when real-time data is unavailable. In this work, we study the possibility of using historical data from a single GPS scintillation monitoring receiver to train a machine learning (ML) model to forecast the severity of amplitude scintillation, either weak, moderate, or severe, with respect to temporal and spatial parameters. Six different ML models were evaluated and the bagged trees model was the most accurate among them, achieving a forecasting accuracy of $81\%$ using a balanced dataset, and $97\%$ using an imbalanced dataset.
Data-driven Loop Closure Detection in Bathymetric Point Clouds for Underwater SLAM
Tan, Jiarui, Torroba, Ignacio, Xie, Yiping, Folkesson, John
Simultaneous localization and mapping (SLAM) frameworks for autonomous navigation rely on robust data association to identify loop closures for back-end trajectory optimization. In the case of autonomous underwater vehicles (AUVs) equipped with multibeam echosounders (MBES), data association is particularly challenging due to the scarcity of identifiable landmarks in the seabed, the large drift in dead-reckoning navigation estimates to which AUVs are prone and the low resolution characteristic of MBES data. Deep learning solutions to loop closure detection have shown excellent performance on data from more structured environments. However, their transfer to the seabed domain is not immediate and efforts to port them are hindered by the lack of bathymetric datasets. Thus, in this paper we propose a neural network architecture aimed to showcase the potential of adapting such techniques to correspondence matching in bathymetric data. We train our framework on real bathymetry from an AUV mission and evaluate its performance on the tasks of loop closure detection and coarse point cloud alignment. Finally, we show its potential against a more traditional method and release both its implementation and the dataset used.
Autonomous Passage Planning for a Polar Vessel
Smith, Jonathan D., Hall, Samuel, Coombs, George, Byrne, James, Thorne, Michael A. S., Brearley, J. Alexander, Long, Derek, Meredith, Michael, Fox, Maria
We introduce a method for long-distance maritime route planning in polar regions, taking into account complex changing environmental conditions. The method allows the construction of optimised routes, describing the three main stages of the process: discrete modelling of the environmental conditions using a non-uniform mesh, the construction of mesh-optimal paths, and path smoothing. In order to account for different vehicle properties we construct a series of data driven functions that can be applied to the environmental mesh to determine the speed limitations and fuel requirements for a given vessel and mesh cell, representing these quantities graphically and geospatially. In describing our results, we demonstrate an example use case for route planning for the polar research ship the RRS Sir David Attenborough (SDA), accounting for ice-performance characteristics and validating the spatial-temporal route construction in the region of the Weddell Sea, Antarctica. We demonstrate the versatility of this route construction method by demonstrating that routes change depending on the seasonal sea ice variability, differences in the route-planning objective functions used, and the presence of other environmental conditions such as currents. To demonstrate the generality of our approach, we present examples in the Arctic Ocean and the Baltic Sea. The techniques outlined in this manuscript are generic and can therefore be applied to vessels with different characteristics. Our approach can have considerable utility beyond just a single vessel planning procedure, and we outline how this workflow is applicable to a wider community, e.g. commercial and passenger shipping.