Energy
Clustered Reinforcement Learning
Ma, Xiao, Zhao, Shen-Yi, Li, Wu-Jun
Exploration strategy design is one of the challenging problems in reinforcement learning~(RL), especially when the environment contains a large state space or sparse rewards. During exploration, the agent tries to discover novel areas or high reward~(quality) areas. In most existing methods, the novelty and quality in the neighboring area of the current state are not well utilized to guide the exploration of the agent. To tackle this problem, we propose a novel RL framework, called \underline{c}lustered \underline{r}einforcement \underline{l}earning~(CRL), for efficient exploration in RL. CRL adopts clustering to divide the collected states into several clusters, based on which a bonus reward reflecting both novelty and quality in the neighboring area~(cluster) of the current state is given to the agent. Experiments on a continuous control task and several \emph{Atari 2600} games show that CRL can outperform other state-of-the-art methods to achieve the best performance in most cases.
Failures detection at directional drilling using real-time analogues search
Gurina, Ekaterina, Klyuchnikov, Nikita, Zaytsev, Alexey, Romanenkova, Evgenya, Antipova, Ksenia, Simon, Igor, Makarov, Victor, Koroteev, Dmitry
One of the main challenges in the construction of oil and gas wells is the need to detect and avoid abnormal situations, which can lead to accidents. Accidents have some indicators that help to find them during the drilling process. In this article, we present a data-driven model trained on historical data from drilling accidents that can detect different types of accidents using real-time signals. The results show that using the time-series comparison, based on aggregated statistics and gradient boosting classification, it is possible to detect an anomaly and identify its type by comparing current measurements while drilling with the stored ones from the database of accidents.
Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits
Mukherjee, Subhojyoti, Maillard, Odalric-Ambrym
We consider the setup of stochastic multi-armed bandits in the case when reward distributions are piecewise i.i.d. and bounded with unknown changepoints. We focus on the case when changes happen simultaneously on all arms, and in stark contrast with the existing literature, we target gap-dependent (as opposed to only gap-independent) regret bounds involving the magnitude of changes $(\Delta^{chg}_{i,g})$ and optimality-gaps ($\Delta^{opt}_{i,g}$). Diverging from previous works, we assume the more realistic scenario that there can be undetectable changepoint gaps and under a different set of assumptions, we show that as long as the compounded delayed detection for each changepoint is bounded there is no need for forced exploration to actively detect changepoints. We introduce two adaptations of UCB-strategies that employ scan-statistics in order to actively detect the changepoints, without knowing in advance the changepoints and also the mean before and after any change. Our first method \UCBLCPD does not know the number of changepoints $G$ or time horizon $T$ and achieves the first time-uniform concentration bound for this setting using the Laplace method of integration. The second strategy \ImpCPD makes use of the knowledge of $T$ to achieve the order optimal regret bound of $\min\big\lbrace O(\sum\limits_{i=1}^{K} \sum\limits_{g=1}^{G}\frac{\log(T/H_{1,g})}{\Delta^{opt}_{i,g}}), O(\sqrt{GT})\big\rbrace$, (where $H_{1,g}$ is the problem complexity) thereby closing an important gap with respect to the lower bound in a specific challenging setting. Our theoretical findings are supported by numerical experiments on synthetic and real-life datasets.
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
Lu, Yiping, Li, Zhuohan, He, Di, Sun, Zhiqing, Dong, Bin, Qin, Tao, Wang, Liwei, Liu, Tie-Yan
The Transformer architecture is widely used in natural language processing. Despite its success, the design principle of the Transformer remains elusive. In this paper, we provide a novel perspective towards understanding the architecture: we show that the Transformer can be mathematically interpreted as a numerical Ordinary Differential Equation (ODE) solver for a convection-diffusion equation in a multi-particle dynamic system. In particular, how words in a sentence are abstracted into contexts by passing through the layers of the Transformer can be interpreted as approximating multiple particles' movement in the space using the Lie-Trotter splitting scheme and the Euler's method. Given this ODE's perspective, the rich literature of numerical analysis can be brought to guide us in designing effective structures beyond the Transformer. As an example, we propose to replace the Lie-Trotter splitting scheme by the Strang-Marchuk splitting scheme, a scheme that is more commonly used and with much lower local truncation errors. The Strang-Marchuk splitting scheme suggests that the self-attention and position-wise feed-forward network (FFN) sub-layers should not be treated equally. Instead, in each layer, two position-wise FFN sub-layers should be used, and the self-attention sub-layer is placed in between. This leads to a brand new architecture. Such an FFN-attention-FFN layer is "Macaron-like", and thus we call the network with this new architecture the Macaron Net. Through extensive experiments, we show that the Macaron Net is superior to the Transformer on both supervised and unsupervised learning tasks. The reproducible codes and pretrained models can be found at https://github.com/zhuohan123/macaron-net
Cormorant: Covariant Molecular Neural Networks
Anderson, Brandon, Hy, Truong-Son, Kondor, Risi
We propose Cormorant, a rotationally covariant neural network architecture for learning the behavior and properties of complex many-body physical systems. We apply these networks to molecular systems with two goals: learning atomic potential energy surfaces for use in Molecular Dynamics simulations, and learning ground state properties of molecules calculated by Density Functional Theory. Some of the key features of our network are that (a) each neuron explicitly corresponds to a subset of atoms; (b) the activation of each neuron is covariant to rotations, ensuring that overall the network is fully rotationally invariant. Furthermore, the non-linearity in our network is based upon tensor products and the Clebsch-Gordan decomposition, allowing the network to operate entirely in Fourier space. Cormorant significantly outperforms competing algorithms in learning molecular Potential Energy Surfaces from conformational geometries in the MD-17 dataset, and is competitive with other methods at learning geometric, energetic, electronic, and thermodynamic properties of molecules on the GDB-9 dataset.
Acceleration of Radiation Transport Solves Using Artificial Neural Networks
Discontinuous Finite Element Methods (DFEM) have been widely used for solving $S_n$ radiation transport problems in participative and non-participative media. In the DFEM $S_n$ methodology, the transport equation is discretized into a set of algebraic equations that have to be solved for each spatial cell and angular direction, strictly preserving the following of radiation in the system. At the core of a DFEM solver a small matrix-vector system (of 8 independent equations for tri-linear DFEM in 3D hexehdral cells) has to be assembled and solved for each cell, angle, energy group, and time step. These systems are generally solved by direct Gaussian Elimination. The computational cost of the Gaussian Elimination, repeated for each phase-space cell, amounts to a large fraction to the total compute time. Here, we have designed a Machine Learning algorithm based in a shallow Artificial Neural Networks (ANNs) to replace that Gaussian Elimination step, enabling a sizeable speed up in the solution process. The key idea is to train an ANN with a large set of solutions of random one-cell transport problems and then to use the trained ANN to replace Gaussian Elimination large scale transport solvers. It has been observed that ANNs decrease the solution times by at least a factor of 4, while introducing mean absolute errors between 1-3 \% in large scale transport solutions.
Meeting on the development of artificial intelligence technologies
Before the meeting, the head of state was told about the academic process at School 21 and had a brief conversation with students. The President was informed about the school by Head of Sberbank German Gref and school Principal Svetlana Infimovskaya. The students of the school can study the following areas: Algorithms, Graphics, Mobile Development, Computer Security, Robot Technology, and Artificial Intelligence to name a few. The school has 940 students today. On average, students are expected to study for 2–3.5 years. The course includes two practical training sessions in relevant companies for six months or more. Today I suggest that we discuss concrete steps that will form the foundation for our National Strategy on the development of artificial intelligence technologies. We have repeatedly spoken about the need for such a comprehensive document. I also mentioned it in this year's Address to the Federal Assembly. This is indeed one of the key areas of technological development that determines and will continue to determine the future of the entire world. The artificial intelligence mechanisms will allow for quick real-time decision-making based on analysing vast amounts of information known as big data, which provides tremendous advantages in terms of quality and performance. In addition, such mechanisms are unparalleled in history in terms of their impact on the economy and productivity, the effectiveness of management, education, healthcare and daily life. However, vying for technological leadership, primarily, in the sphere of artificial intelligence – and you are all very well aware of this, colleagues – has already lead to global competition. New products and solutions are being created at an exponential growth rate. I have said it before and I will say it now: he who can establish a monopoly in artificial intelligence – we are aware of the consequences – will rule the world. It is no accident that many developed countries of the world have already adopted action plans to develop such technologies. Of course, we must ensure technological sovereignty in the realm of artificial intelligence. This is the most important prerequisite for the viability of our businesses and the economy, the quality of life for Russian citizens, security and, finally, our defence capability. Here, we are not just talking about algorithms for addressing individual and highly specialised problems; what we need are universal solutions, the use of which gives the optimum effect in any industry. In order to achieve such an ambitious goal in AI technology, we are objectively positioned to have a good start and we have a serious competitive edge. Today, Russia boasts one of the world's highest penetration rates for mobile communications and internet access, as well as for the development of electronic services.
Invariant Tensor Feature Coding
Mukuta, Yusuke, Harada, Tatsuya
We propose a novel feature coding method that exploits invariance. We consider the setting where the transformations that preserve the image contents compose a finite group of orthogonal matrices. This is the case in many image transformations such as image rotations and image flipping. We prove that the group-invariant feature vector contains sufficient discriminative information when we learn a linear classifier using convex loss minimization. From this result, we propose a novel feature modeling for principal component analysis, and k-means clustering, which are used for most feature coding methods, and global feature functions that explicitly consider the group action. Although the global feature functions are complex nonlinear functions in general, we can calculate the group action on this space easily by constructing the functions as the tensor product representations of basic representations, resulting in the explicit form of invariant feature functions. We demonstrate the effectiveness of our methods on several image datasets.
Machine Learning and System Identification for Estimation in Physical Systems
In this thesis, we draw inspiration from both classical system identification and modern machine learning in order to solve estimation problems for real-world, physical systems. The main approach to estimation and learning adopted is optimization based. Concepts such as regularization will be utilized for encoding of prior knowledge and basis-function expansions will be used to add nonlinear modeling power while keeping data requirements practical. The thesis covers a wide range of applications, many inspired by applications within robotics, but also extending outside this already wide field. Usage of the proposed methods and algorithms are in many cases illustrated in the real-world applications that motivated the research. Topics covered include dynamics modeling and estimation, model-based reinforcement learning, spectral estimation, friction modeling and state estimation and calibration in robotic machining. In the work on modeling and identification of dynamics, we develop regularization strategies that allow us to incorporate prior domain knowledge into flexible, overparameterized models. We make use of classical control theory to gain insight into training and regularization while using flexible tools from modern deep learning. A particular focus of the work is to allow use of modern methods in scenarios where gathering data is associated with a high cost. In the robotics-inspired parts of the thesis, we develop methods that are practically motivated and ensure that they are implementable also outside the research setting. We demonstrate this by performing experiments in realistic settings and providing open-source implementations of all proposed methods and algorithms.
Measurement-based Online Available Bandwidth Estimation employing Reinforcement Learning
Khangura, Sukhpreet Kaur, Akın, Sami
An accurate and fast estimation of the available bandwidth in a network with varying cross-traffic is a challenging task. The accepted probing tools, based on the fluid-flow model of a bottleneck link with first-in, first-out multiplexing, estimate the available bandwidth by measuring packet dispersions. The estimation becomes more difficult if packet dispersions deviate from the assumptions of the fluid-flow model in the presence of non-fluid bursty cross-traffic, multiple bottleneck links, and inaccurate time-stamping. This motivates us to explore the use of machine learning tools for available bandwidth estimation. Hence, we consider reinforcement learning and implement the single-state multi-armed bandit technique, which follows the $\epsilon$-greedy algorithm to find the available bandwidth. Our measurements and tests reveal that our proposed method identifies the available bandwidth with high precision. Furthermore, our method converges to the available bandwidth under a variety of notoriously difficult conditions, such as heavy traffic burstiness, different cross-traffic intensities, multiple bottleneck links, and in networks where the tight link and the bottleneck link are not same. Compared to the piece-wise linear network a model-based direct probing technique that employs a Kalman filter, our method shows more accurate estimates and faster convergence in certain network scenarios and does not require measurement noise statistics.