Atlantic Ocean
Attention-Based Scattering Network for Satellite Imagery
Multi-channel satellite imagery, from stacked spectral bands or spatiotemporal data, have meaningful representations for various atmospheric properties. Combining these features in an effective manner to create a performant and trustworthy model is of utmost importance to forecasters. Neural networks show promise, yet suffer from unintuitive computations, fusion of high-level features, and may be limited by the quantity of available data. In this work, we leverage the scattering transform to extract high-level features without additional trainable parameters and introduce a separation scheme to bring attention to independent input channels. Experiments show promising results on estimating tropical cyclone intensity and predicting the occurrence of lightning from satellite imagery.
LiBeamsNet: AUV Velocity Vector Estimation in Situations of Limited DVL Beam Measurements
Autonomous underwater vehicles (AUVs) are employed for marine applications and can operate in deep underwater environments beyond human reach. A standard solution for the autonomous navigation problem can be obtained by fusing the inertial navigation system and the Doppler velocity log sensor (DVL). The latter measures four beam velocities to estimate the vehicle's velocity vector. In real-world scenarios, the DVL may receive less than three beam velocities if the AUV operates in complex underwater environments. In such conditions, the vehicle's velocity vector could not be estimated leading to a navigation solution drift and in some situations the AUV is required to abort the mission and return to the surface. To circumvent such a situation, in this paper we propose a deep learning framework, LiBeamsNet, that utilizes the inertial data and the partial beam velocities to regress the missing beams in two missing beams scenarios. Once all the beams are obtained, the vehicle's velocity vector can be estimated. The approach performance was validated by sea experiments in the Mediterranean Sea. The results show up to 7.2% speed error in the vehicle's velocity vector estimation in a scenario that otherwise could not provide an estimate.
OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover Mapping
Xia, Junshi, Yokoya, Naoto, Adriano, Bruno, Broni-Bediako, Clifford
We introduce OpenEarthMap, a benchmark dataset, for global high-resolution land cover mapping. OpenEarthMap consists of 2.2 million segments of 5000 aerial and satellite images covering 97 regions from 44 countries across 6 continents, with manually annotated 8-class land cover labels at a 0.25--0.5m ground sampling distance. Semantic segmentation models trained on the OpenEarthMap generalize worldwide and can be used as off-the-shelf models in a variety of applications. We evaluate the performance of state-of-the-art methods for unsupervised domain adaptation and present challenging problem settings suitable for further technical development. We also investigate lightweight models using automated neural architecture search for limited computational resources and fast mapping. The dataset is available at https://open-earth-map.org.
Tencent's Multilingual Machine Translation System for WMT22 Large-Scale African Languages
Jiao, Wenxiang, Tu, Zhaopeng, Li, Jiarui, Wang, Wenxuan, Huang, Jen-tse, Shi, Shuming
This paper describes Tencent's multilingual machine translation systems for the WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages. We participated in the $\mathbf{constrained}$ translation track in which only the data and pretrained models provided by the organizer are allowed. The task is challenging due to three problems, including the absence of training data for some to-be-evaluated language pairs, the uneven optimization of language pairs caused by data imbalance, and the curse of multilinguality. To address these problems, we adopt data augmentation, distributionally robust optimization, and language family grouping, respectively, to develop our multilingual neural machine translation (MNMT) models. Our submissions won the $\mathbf{1st\ place}$ on the blind test sets in terms of the automatic evaluation metrics. Codes, models, and detailed competition results are available at https://github.com/wxjiao/WMT2022-Large-Scale-African.
Large-Scale Open-Set Classification Protocols for ImageNet
Palechor, Andres, Bhoumik, Annesha, Günther, Manuel
Open-Set Classification (OSC) intends to adapt closed-set classification models to real-world scenarios, where the classifier must correctly label samples of known classes while rejecting previously unseen unknown samples. Only recently, research started to investigate on algorithms that are able to handle these unknown samples correctly. Some of these approaches address OSC by including into the training set negative samples that a classifier learns to reject, expecting that these data increase the robustness of the classifier on unknown classes. Most of these approaches are evaluated on small-scale and low-resolution image datasets like MNIST, SVHN or CIFAR, which makes it difficult to assess their applicability to the real world, and to compare them among each other. We propose three open-set protocols that provide rich datasets of natural images with different levels of similarity between known and unknown classes. The protocols consist of subsets of ImageNet classes selected to provide training and testing data closer to real-world scenarios. Additionally, we propose a new validation metric that can be employed to assess whether the training of deep learning models addresses both the classification of known samples and the rejection of unknown samples. We use the protocols to compare the performance of two baseline open-set algorithms to the standard SoftMax baseline and find that the algorithms work well on negative samples that have been seen during training, and partially on out-of-distribution detection tasks, but drop performance in the presence of samples from previously unseen unknown classes.
Bayesian Spline Learning for Equation Discovery of Nonlinear Dynamics with Quantified Uncertainty
Sun, Luning, Huang, Daniel Zhengyu, Sun, Hao, Wang, Jian-Xun
Nonlinear dynamics are ubiquitous in science and engineering applications, but the physics of most complex systems is far from being fully understood. Discovering interpretable governing equations from measurement data can help us understand and predict the behavior of complex dynamic systems. Although extensive work has recently been done in this field, robustly distilling explicit model forms from very sparse data with considerable noise remains intractable. Moreover, quantifying and propagating the uncertainty of the identified system from noisy data is challenging, and relevant literature is still limited. To bridge this gap, we develop a novel Bayesian spline learning framework to identify parsimonious governing equations of nonlinear (spatio)temporal dynamics from sparse, noisy data with quantified uncertainty. The proposed method utilizes spline basis to handle the data scarcity and measurement noise, upon which a group of derivatives can be accurately computed to form a library of candidate model terms. The equation residuals are used to inform the spline learning in a Bayesian manner, where approximate Bayesian uncertainty calibration techniques are employed to approximate posterior distributions of the trainable parameters. To promote the sparsity, an iterative sequential-threshold Bayesian learning approach is developed, using the alternative direction optimization strategy to systematically approximate L0 sparsity constraints. The proposed algorithm is evaluated on multiple nonlinear dynamical systems governed by canonical ordinary and partial differential equations, and the merit/superiority of the proposed method is demonstrated by comparison with state-of-the-art methods.
Predicting Future Mosquito Larval Habitats Using Time Series Climate Forecasting and Deep Learning
Sun, Christopher, Nimbalkar, Jay, Bedi, Ravnoor
The research described in this article was divided into three phases. The first phase involved gathering meteorological data Mosquito habitats and breeding ranges are expanding globally and larvae counts from various locations in the United States [1]. Habitat preferences are based on the interaction and using this data set to create a predictive model for mosquito of several factors, including temperature, humidity, rainfall, larvae abundance. The second phase involved extracting time elevation, and availability of hosts. Climate change has been series sequences of the said meteorological variables for identified as a key driving factor for the shifts in mosquito specific regions of interest, to allow for the forecasting of distribution over the past 70 years and is likely to continue to environmental conditions. The third phase involved feeding be the chief determinant of mosquito population spread [1].
KnowledgeShovel: An AI-in-the-Loop Document Annotation System for Scientific Knowledge Base Construction
Zhang, Shao, Jia, Yuting, Xu, Hui, Wang, Dakuo, Li, Toby Jia-jun, Wen, Ying, Wang, Xinbing, Zhou, Chenghu
Scientific knowledge bases [16, 23], a collection of structured and verified research results that consists of various numeric, word-oriented, or image-organized data, emerge in this context and bring entirely new approaches and opportunities to scientific research. Researchers in many disciplines uses AI techniques and the scientific knowledge bases, often constructed from the published literature, to drive scientific discoveries [38, 45, 46], such as Geoscience [10, 64], Medicine [9], Biology [3], Chemistry [50]. The rapid development of AI and data science has further promoted the development of scientific knowledge base [26, 42]. For example, AlphaFold [27], which uses Protein Data Bank [63] as input data, can accurately predict protein structure and greatly promote the development of biological and medical research [12, 39]. Although successful research examples illustrate the importance of scientific knowledge bases for scientific research in the data explosive age, there are still many challenges in the composition of the scientific knowledge base and the construction process due to their characteristics. The characteristic of a scientific knowledge base composition is that it is described around one type of scientific entity. For example, "sample" is a general type of scientific entity. The data contained are the values and sources of the relevant attributes of the scientific entity. The current process of constructing a scientific knowledge base includes four main steps:literature collection, entity and attribute extraction, entity linking, and data storage (see Figure 2).
Data-Driven Meets Navigation: Concepts, Models, and Experimental Validation
One of the means to perform navigation is using a dead reckoning (DR) approach. In DR, given initial conditions, velocity or acceleration measurements are integrated to obtain the position. An inertial navigation system (INS) is the most popular tool working with DR principles. Its popularity stems from these facts: it provides a full navigation solution (position, velocity, and orientation), it is a standalone system capable of working in any environment (land, air, underground, underwater, indoors), and it is available in many different grades (ranging from low-cost low-performance to high-cost high-performance systems) [1-3].
Research Invited Speakers
AIMLSystems is a brand new conference targeting research in the intersection of AI/ML techniques and systems engineering. Through this conference we plan to bring out and highlight the natural connections with these two fields. Specifically we explore how immense strides in AI/ML techniques are made possible through computational systems research (e.g., improvements in CPU/GPU architectures, data-intensive infrastructure, communications etc.), how the use of AI/ML can help in the continuous and workload-driven design space exploration of computational systems (e.g., self-tuning databases, learning compiler optimizers, learnable network systems etc.) and, the use of AI/ML in the design of socio-economic systems such as public healthcare, and security.