Goto

Collaborating Authors

 Nikitin, Nikolay O.


Integration Of Evolutionary Automated Machine Learning With Structural Sensitivity Analysis For Composite Pipelines

arXiv.org Artificial Intelligence

Automated machine learning (AutoML) systems propose an end-to-end solution to a given machine learning problem, creating either fixed or flexible pipelines. Fixed pipelines are task independent constructs: their general composition remains the same, regardless of the data. In contrast, the structure of flexible pipelines varies depending on the input, making them finely tailored to individual tasks. However, flexible pipelines can be structurally overcomplicated and have poor explainability. We propose the EVOSA approach that compensates for the negative points of flexible pipelines by incorporating a sensitivity analysis which increases the robustness and interpretability of the flexible solutions. EVOSA quantitatively estimates positive and negative impact of an edge or a node on a pipeline graph, and feeds this information to the evolutionary AutoML optimizer. The correctness and efficiency of EVOSA was validated in tabular, multimodal and computer vision tasks, suggesting generalizability of the proposed approach across domains.


Surrogate Modelling for Sea Ice Concentration using Lightweight Neural Ensemble

arXiv.org Artificial Intelligence

The modeling and forecasting of sea ice conditions in the Arctic region are important tasks for ship routing, offshore oil production, and environmental monitoring. We propose the adaptive surrogate modeling approach named LANE-SI (Lightweight Automated Neural Ensembling for Sea Ice) that uses ensemble of relatively simple deep learning models with different loss functions for forecasting of spatial distribution for sea ice concentration in the specified water area. Experimental studies confirm the quality of a long-term forecast based on a deep learning model fitted to the specific water area is comparable to resource-intensive physical modeling, and for some periods of the year, it is superior. We achieved a 20% improvement against the state-of-the-art physics-based forecast system SEAS5 for the Kara Sea.


Improvement of Computational Performance of Evolutionary AutoML in a Heterogeneous Environment

arXiv.org Artificial Intelligence

Resource-intensive computations are a major factor that limits the effectiveness of automated machine learning solutions. In the paper, we propose a modular approach that can be used to increase the quality of evolutionary optimization for modelling pipelines with a graph-based structure. It consists of several stages - parallelization, caching and evaluation. Heterogeneous and remote resources can be involved in the evaluation stage. The conducted experiments confirm the correctness and effectiveness of the proposed approach. The implemented algorithms are available as a part of the open-source framework FEDOT.


Oil and Gas Reservoirs Parameters Analysis Using Mixed Learning of Bayesian Networks

arXiv.org Machine Learning

In this paper, a multipurpose Bayesian-based method for data analysis, causal inference and prediction in the sphere of oil and gas reservoir development is considered. This allows analysing parameters of a reservoir, discovery dependencies among parameters (including cause and effects relations), checking for anomalies, prediction of expected values of missing parameters, looking for the closest analogues, and much more. The method is based on extended algorithm MixLearn@BN for structural learning of Bayesian networks. Key ideas of MixLearn@BN are following: (1) learning the network structure on homogeneous data subsets, (2) assigning a part of the structure by an expert, and (3) learning the distribution parameters on mixed data (discrete and continuous). Homogeneous data subsets are identified as various groups of reservoirs with similar features (analogues), where similarity measure may be based on several types of distances. The aim of the described technique of Bayesian network learning is to improve the quality of predictions and causal inference on such networks. Experimental studies prove that the suggested method gives a significant advantage in missing values prediction and anomalies detection accuracy. Moreover, the method was applied to the database of more than a thousand petroleum reservoirs across the globe and allowed to discover novel insights in geological parameters relationships.


Multi-Objective Evolutionary Design of Composite Data-Driven Models

arXiv.org Artificial Intelligence

The internal structure of the model depends on the type of the There is a variety of approaches that can be used to learning algorithm, so complex data-driven models can consist identify the optimal design of the data-driven model. For of several semi-independent blocks - this approach is usually instance, AutoML solutions can be based on random search referred to as ensembling [2]. There are several techniques to [5], Bayesian optimisation [6], reinforcement learning (RL) build complex models: for example, blending allows creating [7], Monte Carlo tree search [8], sequential model-based single-level ensembles of machine learning (ML) models, and optimization [9], gradient-based approaches [10]. However, stacking allows creating multi-level ones. Other approaches are most of them are less flexible than evolutionary approaches to based on the representation of a model structure (or even the the model design (implemented e.g. in [11]). Their conceptual whole modeling pipeline) as a directed acyclic graph (DAG).