Energy
Improved POMDP Tree Search Planning with Prioritized Action Branching
Mern, John, Yildiz, Anil, Bush, Larry, Mukerji, Tapan, Kochenderfer, Mykel J.
Online solvers for partially observable Markov decision processes have difficulty scaling to problems with large action spaces. This paper proposes a method called PA-POMCPOW to sample a subset of the action space that provides varying mixtures of exploitation and exploration for inclusion in a search tree. The proposed method first evaluates the action space according to a score function that is a linear combination of expected reward and expected information gain. The actions with the highest score are then added to the search tree during tree expansion. Experiments show that PA-POMCPOW is able to outperform existing state-of-the-art solvers on problems with large discrete action spaces.
Artificial Intelligence at Schlumbergers
Schlumberger is a large, multinational corporation concerned primarily with the measurement, collection, and interpretation of data. For the past fifty years, most of the activities have been related to hydrocarbon exploration. The efficient location and production of hydrocarbons from an underground formation requires a great deal of knowledge about the formation, ranging in scale from the size and shape of the rock's pore spaces to the size and shape of the entire reservoir. Schlumberger provides its clients with two types of information: measurements, called logs, of the petrophysical properties of the rock around the borehole, such as its electrical, acoustical, and radioactive characteristics; and in terpretations of these logs in terms of geophysical properties such as porosity and mineral composition. Since log interpretation is expert skill, the emergence of expert systems technology prompted Schlumberger's initial interest in Artificial Intelligence.
Immobile Robots AI in the New Millennium
A new generation of sensor-rich, massively distributed, autonomous systems are being developed that have the potential for profound social, environmental, and economic change. These systems include networked building energy systems, autonomous space probes, chemical plant control systems, satellite constellations for remote ecosystem monitoring, power grids, biospherelike life-support systems, and reconfigurable traffic systems, to highlight but a few. To achieve high performance, these immobile robots (or immobots) will need to develop sophisticated regulatory and immune systems that accurately and robustly control their complex internal functions. Thus, immobots will exploit a vast nervous system of sensors to model themselves and their environment on a grand scale. They will use these models to dramatically reconfigure themselves to survive decades of autonomous operation. Achieving these large-scale modeling and configuration tasks will require a tight coupling between the higher-level coordination function provided by symbolic reasoning and the lower-level autonomic processes of adaptive estimation and control.
Using Artificial Neural Networks to Predict the Quality and Performance of Oil-Field Cements
Inherent batch-to-batch variability, aging, and contamination are major factors contributing to variability in oil-field cement-slurry performance. Of particular concern are problems encountered when a slurry is formulated with one cement sample and used with a batch having different properties. Such variability imposes a heavy burden on performance testing and is often a major factor in operational failure. We describe methods that allow the identification, characterization, and prediction of the variability of oil-field cements. Our approach involves predicting cement compositions, particle-size distributions, and thickening-time curves from the diffuse reflectance infrared Fourier transform spectrum of neat cement powders.
Phase Mapper: Accelerating Materials Discovery with AI
From the stone age, to the bronze, iron age, and modern silicon age, the discovery and characterization of new materials has always been instrumental to humanity's progress and development. With the current pressing need to address sustainability challenges and find alternatives to fossil fuels, we look for solutions in the development of new materials that will allow for renewable energy. To discover materials with the required properties, materials scientists can perform high-throughput materials discovery, which includes rapid synthesis and characterization via X-ray diffraction (XRD) of thousands of materials. A central problem in materials discovery, the phase map identification problem, involves the determination of the crystal structure of materials from materials composition and structural characterization data. This analysis is traditionally performed mainly by hand, which can take days for a single material system.
AI researchers challenge a robot to ride a skateboard in simulation
AI researchers say they've created a framework for controlling four-legged robots that promises better energy efficiency and adaptability than more traditional model-based gait control of robotic legs. To demonstrate the robust nature of the framework that adjusts to conditions in real time, AI researchers made the system slip on frictionless surfaces to mimic a banana peel, ride a skateboard, and climb on a bridge while walking on a treadmill. An Nvidia spokesperson told VentureBeat that only the frictionless surface test was conducted in real life because of limits placed on office staff size due to COVID-19. The spokesperson said all other challenges took place in simulation. "Our framework learns a controller that can adapt to challenging environmental changes on the fly, including novel scenarios not seen during training. The learned controller is up to 85% more energy-efficient and is more robust compared to baseline methods," the paper reads.
Port of Rotterdam testing blockchain and AI for renewables trading
The Port of Rotterdam's blockchain subsidiary, Blocklab, has been trialing a decentralized electricity trading system to help lower costs and optimize the use of renewables on its microgrid. The system, called Distro, has been jointly developed by Blocklab and S&P Global Platts, and has been operational as a trial for two months. Distro uses blockchain technology, smart contracts, and artificial intelligence to support the decentralized and high frequency trading of renewable energy by commercial consumers looking to optimize and manage their energy use. It matches demand with the intermittent power generated from different sources, specifically solar and battery storage. Each market participant is allocated an AI energy trading agent that learns their behavior, choices, and needs and provides them with energy at the optimal price.
Tiny Machine Learning: The Next AI Revolution
Over the past decade, we have witnessed the size of machine learning algorithms grow exponentially due to improvements in processor speeds and the advent of big data. Initially, models were small enough to run on local machines using one or more cores within the central processing unit (CPU). Shortly after, computation using graphics processing units (GPUs) became necessary to handle larger datasets and became more readily available due to introduction of cloud-based services such as SaaS platforms (e.g., Google Colaboratory) and IaaS (e.g., Amazon EC2 Instances). At this time, algorithms could still be run on single machines. More recently, we have seen the development of specialized application-specific integrated circuits (ASICs) and tensor processing units (TPUs), which can pack the power of 8 GPUs.
A Novel Neural Network Training Framework with Data Assimilation
Chen, Chong, Xing, Qinghui, Ding, Xin, Xue, Yaru, Zhong, Tianfu
In recent years, the prosperity of deep learning has revolutionized the Artificial Neural Networks. However, the dependence of gradients and the offline training mechanism in the learning algorithms prevents the ANN for further improvement. In this study, a gradient-free training framework based on data assimilation is proposed to avoid the calculation of gradients. In data assimilation algorithms, the error covariance between the forecasts and observations is used to optimize the parameters. Feedforward Neural Networks (FNNs) are trained by gradient decent, data assimilation algorithms (Ensemble Kalman Filter (EnKF) and Ensemble Smoother with Multiple Data Assimilation (ESMDA)), respectively. ESMDA trains FNN with pre-defined iterations by updating the parameters using all the available observations which can be regard as offline learning. EnKF optimize FNN when new observation available by updating parameters which can be regard as online learning. Two synthetic cases with the regression of a Sine Function and a Mexican Hat function are assumed to validate the effectiveness of the proposed framework. The Root Mean Square Error (RMSE) and coefficient of determination (R2) are used as criteria to assess the performance of different methods. The results show that the proposed training framework performed better than the gradient decent method. The proposed framework provides alternatives for online/offline training the existing ANNs (e.g., Convolutional Neural Networks, Recurrent Neural Networks) without the dependence of gradients.
Temporal Difference Uncertainties as a Signal for Exploration
Flennerhag, Sebastian, Wang, Jane X., Sprechmann, Pablo, Visin, Francesco, Galashov, Alexandre, Kapturowski, Steven, Borsa, Diana L., Heess, Nicolas, Barreto, Andre, Pascanu, Razvan
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy, which can yield near-optimal exploration strategies in tabular settings. However, in non-tabular settings that involve function approximators, obtaining accurate uncertainty estimates is almost as challenging a problem. In this paper, we highlight that value estimates are easily biased and temporally inconsistent. In light of this, we propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors. This exploration signal controls for state-action transitions so as to isolate uncertainty in value that is due to uncertainty over the agent's parameters. Instead, we incorporate it as an intrinsic reward and treat exploration as a separate learning problem, induced by the agent's temporal difference uncertainties. We introduce a distinct exploration policy that learns to collect data with high estimated uncertainty, which gives rise to a "curriculum" that smoothly changes throughout learning and vanishes in the limit of perfect value estimates. We evaluate our method on hard-exploration tasks, including Deep Sea and Atari 2600 environments and find that our proposed form of exploration facilitates both diverse and deep exploration. Striking the right balance between exploration and exploitation is fundamental to the reinforcement learning problem. A common approach is to derive exploration from the policy being learned. Dithering strategies, such as ɛ-greedy exploration, render a reward-maximising policy stochastic around its reward maximising behaviour (Williams & Peng, 1991). Other methods encourage higher entropy in the policy (Ziebart et al., 2008), introduce an intrinsic reward (Singh et al., 2005), or drive exploration by sampling from the agent's belief over the MDP (Strens, 2000). While greedy or entropy-maximising policies cannot facilitate temporally extended exploration (Osband et al., 2013; 2016a), the efficacy of intrinsic rewards depends crucially on how they relate to the extrinsic reward that comes from the environment (Burda et al., 2018a).