Atlantic Ocean
Spatio-seasonal risk assessment of upward lightning at tall objects using meteorological reanalysis data
Stucke, Isabell, Morgenstern, Deborah, Mayr, Georg J., Simon, Thorsten, Zeileis, Achim, Diendorfer, Gerhard, Schulz, Wolfgang, Pichler, Hannes
This study investigates lightning at tall objects and evaluates the risk of upward lightning (UL) over the eastern Alps and its surrounding areas. While uncommon, UL poses a threat, especially to wind turbines, as the long-duration current of UL can cause significant damage. Current risk assessment methods overlook the impact of meteorological conditions, potentially underestimating UL risks. Therefore, this study employs random forests, a machine learning technique, to analyze the relationship between UL measured at Gaisberg Tower (Austria) and $35$ larger-scale meteorological variables. Of these, the larger-scale upward velocity, wind speed and direction at 10 meters and cloud physics variables contribute most information. The random forests predict the risk of UL across the study area at a 1 km$^2$ resolution. Strong near-surface winds combined with upward deflection by elevated terrain increase UL risk. The diurnal cycle of the UL risk as well as high-risk areas shift seasonally. They are concentrated north/northeast of the Alps in winter due to prevailing northerly winds, and expanding southward, impacting northern Italy in the transitional and summer months. The model performs best in winter, with the highest predicted UL risk coinciding with observed peaks in measured lightning at tall objects. The highest concentration is north of the Alps, where most wind turbines are located, leading to an increase in overall lightning activity. Comprehensive meteorological information is essential for UL risk assessment, as lightning densities are a poor indicator of lightning at tall objects.
When science fiction becomes reality: Experts reveal the most realistic APOCALPYSE movies - so, does your favourite blockbuster give us a glimpse at how the world will end?
From The Terminator to The Day After Tomorrow, movies have envisioned just about every possibility for how the world might end. If you're a science fiction movie buff, you might think that some of these apocalyptic scenarios seem a little far-fetched. But hold onto your popcorn, as experts say that some of these disastrous plotlines could actually become a reality. While we don't need to worry about an asteroid wiping us out like in Armageddon, experts warn that a bioweapon leak like 12 Monkeys could really end the world. And if your favourite blockbuster does give us a glimpse at how the world will end, not even Bruce Willis will be able to save us. Apocalypse movies find their inspiration in a number of different disasters, but which are the most realistic. An escaped bioweapon could pose a genuine threat of destroying humanity.
A2CI: A Cloud-based, Service-oriented Geospatial Cyberinfrastructure to Support Atmospheric Research
Li, Wenwen, Shao, Hu, Wang, Sizhe, Zhou, Xiran, Wu, Sheng
In recent years, atmospheric research has received increasing attention from environmental experts and the public because atmospheric phenomena such as El Nino, global warming, ozone depletion, and drought that may have negative effects on the Earth's climate and ecosystem are occurring more often (Walther et al. 2002; Karl and Trenberth 2003; Trenberth et al. 2014). In order to model the status quo and predict the trend of atmospheric phenomena and events, researchers need to retrieve data from various relevant domains, such as chemical components of aerosols and gases, the terrestrial surface, energy consumption, the hydrosphere, the biosphere, etc. (Schneider, 2006; Fowler et al., 2009; Guilyardi et al, 2009; Ramanathan et al., 2011; Katul et al., 2012). In complex earth system modeling, the data and services for atmospheric study present the characteristics of being distributed, collaborative and adaptive (Plale et al., 2006). The massive volume, rapid velocity and wide variety of data has led to a new era of atmospheric research that consists of accessing and integrating big data from distributed sources, conducting collaborative analysis in an interactive way, providing intelligent services for data management, and integration and visualization to foster discovery of hidden or new knowledge. One of the most important ways to support these activities is to establish a national or international spatial data infrastructure and geospatial cyberinfrastructure on which the data and computational resources can be easily shared, the spatial analysis tool can be executed on-the-fly and the scientific results can be effectively visualized (Yang et al., 2008; Li et al., 2011). Technically, a geospatial cyberinfrastructure (GCI) is an architecture that effectively utilizes geo-referenced data to connect people, information and computers based on the standardized data access protocols, high speed internet, high-performance computing facilities (HPC) and service-oriented data management (Yang et al., 2010). Since the concept's official introduction by the National Science Foundation (NSF) in its 2003 blue ribbon report, cyberinfrastructure research has attracted much attention from the atmospheric science domain because of its promise of bringing paradigm change for
Regulators Need AI Expertise. They Can't Afford It
ChatGPT caught regulators by surprise when it set off a new AI race. As companies have rushed to develop and release ever more powerful models, lawmakers and regulators around the world have sought to catch up and rein in development. As governments spin up new AI programs, regulators around the world are urgently trying to hire AI experts. But some of the job ads are raising eyebrows and even chuckles among AI researchers and engineers for offering wages that, amid the current AI boom, look pitiful. The European AI Office, which will be central to the implementation of the EU's AI Act, listed vacancies early this month and wants applicants to begin work in the fall.
Distributed Representations of Words and Phrases and their Compositionality
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Fast Multivariate Spatio-temporal Analysis via Low Rank Tensor Learning
Accurate and efficient analysis of multivariate spatio-temporal data is critical in climatology, geology, and sociology applications. Existing models usually assume simple inter-dependence among variables, space, and time, and are computationally expensive. We propose a unified low rank tensor learning framework for multivariate spatio-temporal analysis, which can conveniently incorporate different properties in spatio-temporal data, such as spatial clustering and shared structure among variables. We demonstrate how the general framework can be applied to cokriging and forecasting tasks, and develop an efficient greedy algorithm to solve the resulting optimization problem with convergence guarantee. We conduct experiments on both synthetic datasets and real application datasets to demonstrate that our method is not only significantly faster than existing methods but also achieves lower estimation error.
Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
van Noord, Rik, Kuzman, Taja, Rupnik, Peter, Ljubeลกiฤ, Nikola, Esplร -Gomis, Miquel, Ramรญrez-Sรกnchez, Gema, Toral, Antonio
Large, curated, web-crawled corpora play a vital role in training language models (LMs). They form the lion's share of the training data in virtually all recent LMs, such as the well-known GPT, LLaMA and XLM-RoBERTa models. However, despite this importance, relatively little attention has been given to the quality of these corpora. In this paper, we compare four of the currently most relevant large, web-crawled corpora (CC100, MaCoCu, mC4 and OSCAR) across eleven lower-resourced European languages. Our approach is two-fold: first, we perform an intrinsic evaluation by performing a human evaluation of the quality of samples taken from different corpora; then, we assess the practical impact of the qualitative differences by training specific LMs on each of the corpora and evaluating their performance on downstream tasks. We find that there are clear differences in quality of the corpora, with MaCoCu and OSCAR obtaining the best results. However, during the extrinsic evaluation, we actually find that the CC100 corpus achieves the highest scores. We conclude that, in our experiments, the quality of the web-crawled corpora does not seem to play a significant role when training LMs.
Kernel Observers: Systems-Theoretic Modeling and Inference of Spatiotemporally Evolving Processes
We consider the problem of estimating the latent state of a spatiotemporally evolving continuous function using very few sensor measurements. We show that layering a dynamical systems prior over temporal evolution of weights of a kernel model is a valid approach to spatiotemporal modeling, and that it does not require the design of complex nonstationary kernels. Furthermore, we show that such a differentially constrained predictive model can be utilized to determine sensing locations that guarantee that the hidden state of the phenomena can be recovered with very few measurements. We provide sufficient conditions on the number and spatial location of samples required to guarantee state recovery, and provide a lower bound on the minimum number of samples required to robustly infer the hidden states. Our approach outperforms existing methods in numerical experiments.
Fusing Climate Data Products using a Spatially Varying Autoencoder
Johnson, Jacob A., Heaton, Matthew J., Christensen, William F., Warr, Lynsie R., Rupper, Summer B.
Autoencoders are powerful machine learning models used to compress information from multiple data sources. However, autoencoders, like all artificial neural networks, are often unidentifiable and uninterpretable. This research focuses on creating an identifiable and interpretable autoencoder that can be used to meld and combine climate data products. The proposed autoencoder utilizes a Bayesian statistical framework, allowing for probabilistic interpretations while also varying spatially to capture useful spatial patterns across the various data products. Constraints are placed on the autoencoder as it learns patterns in the data, creating an interpretable consensus that includes the important features from each input. We demonstrate the utility of the autoencoder by combining information from multiple precipitation products in High Mountain Asia.