Energy
Benchmarking the Benchmark -- Analysis of Synthetic NIDS Datasets
Layeghy, Siamak, Gallagher, Marcus, Portmann, Marius
Network Intrusion Detection Systems (NIDSs) are an increasingly important tool for the prevention and mitigation of cyber attacks. A number of labelled synthetic datasets generated have been generated and made publicly available by researchers, and they have become the benchmarks via which new ML-based NIDS classifiers are being evaluated. Recently published results show excellent classification performance with these datasets, increasingly approaching 100 percent performance across key evaluation metrics such as accuracy, F1 score, etc. Unfortunately, we have not yet seen these excellent academic research results translated into practical NIDS systems with such near-perfect performance. This motivated our research presented in this paper, where we analyse the statistical properties of the benign traffic in three of the more recent and relevant NIDS datasets, (CIC, UNSW, ...). As a comparison, we consider two datasets obtained from real-world production networks, one from a university network and one from a medium size Internet Service Provider (ISP). Our results show that the two real-world datasets are quite similar among themselves in regards to most of the considered statistical features. Equally, the three synthetic datasets are also relatively similar within their group. However, and most importantly, our results show a distinct difference of most of the considered statistical features between the three synthetic datasets and the two real-world datasets. Since ML relies on the basic assumption of training and test datasets being sampled from the same distribution, this raises the question of how well the performance results of ML-classifiers trained on the considered synthetic datasets can translate and generalise to real-world networks. We believe this is an interesting and relevant question which provides motivation for further research in this space.
Low-rank State-action Value-function Approximation
Rozada, Sergio, Tenorio, Victor, Marques, Antonio G.
Value functions are central to Dynamic Programming and Reinforcement Learning but their exact estimation suffers from the curse of dimensionality, challenging the development of practical value-function (VF) estimation algorithms. Several approaches have been proposed to overcome this issue, from non-parametric schemes that aggregate states or actions to parametric approximations of state and action VFs via, e.g., linear estimators or deep neural networks. Relevantly, several high-dimensional state problems can be well-approximated by an intrinsic low-rank structure. Motivated by this and leveraging results from low-rank optimization, this paper proposes different stochastic algorithms to estimate a low-rank factorization of the $Q(s, a)$ matrix. This is a non-parametric alternative to VF approximation that dramatically reduces the computational and sample complexities relative to classical $Q$-learning methods that estimate $Q(s,a)$ separately for each state-action pair.
How AI will shape smart cities
Cities worldwide are not just growing, but also trying to reconfigure themselves for a sustainable future, with higher quality of life for every citizen. That means capitalizing on renewable power sources, maximizing energy efficiency and scaling up electrified transport on an unprecedented scale. The 2015 Paris Agreement called for limiting the rise in average global temperatures to 1.5oC compared to pre-industrial levels, implying a massive reduction of greenhouse gas (GHG) emissions. Meeting the ambitious climate goal would require a near-total elimination of emissions from power generation, industry, and transport by 2050, said Ariel Liebman, Director of Monash Energy Institute, at a recent AI for Good webinar convened by an ITU Focus Group studying AI and environmental efficiency. Renewable energy sources, including the sun, wind, biofuels and renewable-based hydrogen, make net-zero emissions theoretically possible.
Transductive Learning for Abstractive News Summarization
Braลพinskas, Arthur, Liu, Mengwen, Nallapati, Ramesh, Ravi, Sujith, Dreyer, Markus
Pre-trained language models have recently advanced abstractive summarization. These models are further fine-tuned on human-written references before summary generation in test time. In this work, we propose the first application of transductive learning to summarization. In this paradigm, a model can learn from the test set's input before inference. To perform transduction, we propose to utilize input document summarizing sentences to construct references for learning in test time. These sentences are often compressed and fused to form abstractive summaries and provide omitted details and additional context to the reader. We show that our approach yields state-of-the-art results on CNN/DM and NYT datasets. For instance, we achieve over 1 ROUGE-L point improvement on CNN/DM. Further, we show the benefits of transduction from older to more recent news. Finally, through human and automatic evaluation, we show that our summaries become more abstractive and coherent.
Ferrari's CEO promises an EV in 2025
Ferrari has already made cars with hybrid powertrains, but during its Annual General Meeting this week, acting CEO John Elkann told investors in prepared remarks (PDF) that the carmaker will unveil "the first all-electric Ferrari" in 2025. Hopefully that plan will hold even after the company confirms a new CEO -- over the past decade execs have said Ferrari will never build an EV, will be the first with an electric supercar, or that an electric Ferrari will not arrive until after 2025. We are continuing to execute our electrification strategy in a highly disciplined way. And our interpretation and application of these technologies both in motor sport and in road cars is a huge opportunity to bring the uniqueness and passion of Ferrari to new generations. As you would expect, we have started by setting the bar high.
Why Machine Learning Integrated Patient Flow Simulation?
Abuhay, Tesfamariam M., Mamuye, Adane, Robinson, Stewart, Kovalchuk, Sergey V.
Patient flow analysis can be studied from a clinical and or operational perspective using simulation. Traditional statistical methods such as stochastic distribution methods have been used to construct patient flow simulation submodels such as patient inflow, Length of Stay (LoS), Cost of Treatment (CoT) and Clinical Pathway (CP) models. However, patient inflow demonstrates seasonality, trend and variation over time. LoS, CoT and CP are significantly determined by attributes of patients and clinical and laboratory test results. For this reason, patient flow simulation models constructed using traditional statistical methods are criticized for ignoring heterogeneity and their contribution to personalized and value based healthcare. On the other hand, machine learning methods have proven to be efficient to study and predict admission rate, LoS, CoT, and CP. This paper, hence, describes why coupling machine learning with patient flow simulation is important and proposes a conceptual architecture that shows how to integrate machine learning with patient flow simulation.
Integrating Domain Knowledge in Data-driven Earth Observation with Process Convolutions
Svendsen, Daniel Heestermans, Piles, Maria, Muรฑoz-Marรญ, Jordi, Luengo, David, Martino, Luca, Camps-Valls, Gustau
The modelling of Earth observation data is a challenging problem, typically approached by either purely mechanistic or purely data-driven methods. Mechanistic models encode the domain knowledge and physical rules governing the system. Such models, however, need the correct specification of all interactions between variables in the problem and the appropriate parameterization is a challenge in itself. On the other hand, machine learning approaches are flexible data-driven tools, able to approximate arbitrarily complex functions, but lack interpretability and struggle when data is scarce or in extrapolation regimes. In this paper, we argue that hybrid learning schemes that combine both approaches can address all these issues efficiently. We introduce Gaussian process (GP) convolution models for hybrid modelling in Earth observation (EO) problems. We specifically propose the use of a class of GP convolution models called latent force models (LFMs) for EO time series modelling, analysis and understanding. LFMs are hybrid models that incorporate physical knowledge encoded in differential equations into a multioutput GP model. LFMs can transfer information across time-series, cope with missing observations, infer explicit latent functions forcing the system, and learn parameterizations which are very helpful for system analysis and interpretability. We consider time series of soil moisture from active (ASCAT) and passive (SMOS, AMSR2) microwave satellites. We show how assuming a first order differential equation as governing equation, the model automatically estimates the e-folding time or decay rate related to soil moisture persistence and discovers latent forces related to precipitation. The proposed hybrid methodology reconciles the two main approaches in remote sensing parameter estimation by blending statistical learning and mechanistic modeling.
Probabilistic water demand forecasting using quantile regression algorithms
Papacharalampous, Georgia, Langousis, Andreas
Machine and statistical learning algorithms can be reliably automated and applied at scale. Therefore, they can constitute a considerable asset for designing practical forecasting systems, such as those related to urban water demand. Quantile regression algorithms are statistical and machine learning algorithms that can provide probabilistic forecasts in a straightforward way, and have not been applied so far for urban water demand forecasting. In this work, we aim to fill this gap by automating and extensively comparing several quantile-regression-based practical systems for probabilistic one-day ahead urban water demand forecasting. For designing the practical systems, we use five individual algorithms (i.e., the quantile regression, linear boosting, generalized random forest, gradient boosting machine and quantile regression neural network algorithms), their mean combiner and their median combiner. The comparison is conducted by exploiting a large urban water flow dataset, as well as several types of hydrometeorological time series (which are considered as exogenous predictor variables in the forecasting setting). The results mostly favour the practical systems designed using the linear boosting algorithm, probably due to the presence of trends in the urban water flow time series. The forecasts of the mean and median combiners are also found to be skilful in general terms.
EarthNet2021: A large-scale dataset and challenge for Earth surface forecasting as a guided video prediction task
Requena-Mesa, Christian, Benson, Vitus, Reichstein, Markus, Runge, Jakob, Denzler, Joachim
Satellite images are snapshots of the Earth surface. We propose to forecast them. We frame Earth surface forecasting as the task of predicting satellite imagery conditioned on future weather. EarthNet2021 is a large dataset suitable for training deep neural networks on the task. It contains Sentinel 2 satellite imagery at 20m resolution, matching topography and mesoscale (1.28km) meteorological variables packaged into 32000 samples. Additionally we frame EarthNet2021 as a challenge allowing for model intercomparison. Resulting forecasts will greatly improve (>x50) over the spatial resolution found in numerical models. This allows localized impacts from extreme weather to be predicted, thus supporting downstream applications such as crop yield prediction, forest health assessments or biodiversity monitoring. Find data, code, and how to participate at www.earthnet.tech
Respawn's Apex Legends Is Just Getting Started
The minds at Respawn Entertainment are wizards when it comes to the action-adventure genre. Twenty-fourteen's Titanfall and its criminally underrated followup, 2016's Titanfall 2, challenged traditional boots-on-the-ground shooters with a heightened sense of scale and verticality, while the more recent Jedi: Fallen Order etched itself as one of the greatest Star Wars narratives told in any medium. The Los Angeles studio's fixation with exoskeletons, Blade Runner, and visuals that bleed Wachowski and Masamune Shirow's Ghost In The Shell is nothing new, but they are intertwined with world-building to create headier pockets of science fiction bliss. The free-to-play shooter set in the Titanfall universe first launched in February 2019. No extended gameplay reveals that cringe out with comms from Chad and the rest of the QA team.